Forth: Stack-Manipulation Operators

jnwatson · on May 14, 2021

This reminds me of the excellent shareware game for the Mac called RoboWar [1] by David Harris. In it, you programmed a robot to destroy other programmed robots in an arena. You could spend points on different weapons, shields, and processor speed.

The programming language was a custom stack-based language like Forth. It worked quite well for its purpose, as you could easily count instructions.

1. https://en.wikipedia.org/wiki/RoboWar

huachimingo · on May 14, 2021

Minecraft had a mod with Forth support: https://technicpack.fandom.com/wiki/Forth_language

astrobe_ · on May 14, 2021

And so those Minetest: https://github.com/Ekdohibs/forth_computer (it seems to be an unsafe mod tho cause I see it loads shared libraries).

kragen · on May 14, 2021

There's less demand for this in Minetest, though, because modding Minetest is already super easy in Lua—you don't have to learn Java like you do for Minecraft. But you can't load a Lua mod even in Minetest if you don't run the server.

Mobile-code environments for mutually distrusting code turn out to be more difficult than people used to think, but now we have multiple off-the-shelf battle-tested implementations of JS for that. And you can coerce Lua to do it, kind of.

susam · on May 15, 2021

Incidentally, a couple of weeks ago I wrote a tiny Star Wars-themed Forth program to celebrate Star Wars Day: https://github.com/susam/may4

Forth brings back the fun in computing for me that I once experienced when I began learning to code with Logo. Simple, distraction-free, and fun!

"I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don't think we are. I think we're responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun." -- Alan J. Perlis

_vntb · on May 15, 2021

Programming in Lisp or Forth is such fun since you can stretch the programming language itself. For me, the reflective approach keeps the fun in programming.

kragen · on May 14, 2021

As I wrote here a month ago, I worry that the stack manipulation operators can be a cognitive danger to people who are newly learning Forth: https://news.ycombinator.com/item?id=26884231

It's cool to be able to write : CAB ( c a b -- ) - SWAP / ; but that very coolness can easily lead you into giving up on Forth. It's natural to work to do everything on the stacks, since you can, but that's not a practical way to program; this function is fine, but in sufficiently complicated cases, you end up with unreadable code where you're struggling to play mental chess with the stack. And then you sigh, decide that Forth is for people smarter than you (which is reinforced by some of the self-aggrandizing nonsense around Forth), and you go back to Python.

Maybe a better way to teach Forth would be to teach people about VARIABLEs or VALUEs first, so they can write 0 VALUE C : CAB TO C - C / ; or possibly VARIABLE C : CAB C ! - C @ / ; and then only later introduce SWAP, DUP, and the rest as shortcuts. VALUE or VARIABLE is simpler than stack manipulation operators — with either VALUE and TO, or with @, !, and VARIABLE, you can write every possible stack manipulation operator. So it's less to learn than the whole zoo of DUP, DROP, SWAP, 2DUP, 2DROP, OVER, >R, R>, TUCK, NIP, 2SWAP, ROT, 2OVER, -ROT, R@, and RDROP.

But then, later on — look! With this new SWAP shortcut, you can write this CAB function without a variable! And you probably should. But if you ever find yourself writing code with -ROT you probably should seriously consider rewriting it.

Of course mutable variables are themselves bug-prone — you can forget to set them on some control path, for example, or forget to redeclare them and so accidentally share a variable with some subroutine you're calling — but in my experience these are much less serious pitfalls than ending up with a bunch of @ ROT + SWAP and OVER + SWAP 0 >R.

On the other hand, I don't have a great track record with either writing difficult programs in Forth or teaching people things. So I might be off base here.

retrac · on May 14, 2021

It's Forth tradition to make your own slightly incompatible Forth. Honouring that tradition, I suggest an extension to the VM. Either with a few registers, or a second stack. For pointers. That way, memory access to a single or handful of variables becomes a single word. Add things like dereference-and-increment load/store primitives and it becomes very C-like, and a rather nice target for compilers.

(I can't take credit for this idea: https://www.complang.tuwien.ac.at/anton/euroforth/ef08/paper... "Updating the Forth Virtual Machine" by Stephen Pelc for EuroForth 2008, which is probably also not the first time someone suggested this.)

lebuffon · on May 14, 2021

Indeed. Chuck Moore, the inventor, added the "A" register to his machine Forth for holding addresses and temporary values to reduce "stackrobatics".

kragen · on May 14, 2021

He explained that this was in part due to hardware considerations. I don't really understand why, although unlike stack machines, I can speculate. But then he got to like it.

He also eliminated all the stack operations from machineForth except DUP, DROP, and OVER. Not even SWAP!

protomyth · on May 15, 2021

PostScript has the ability to assign values to variables.

kragen · on May 15, 2021

Can you elaborate on how this relates?

protomyth · on May 15, 2021

A more general solution than Chuck Moore's A register but solving the same problem.

kragen · on May 15, 2021

Hmm, I don't think assigning values to variables in PostScript with "def" is solving the same problem or even a related problem as the A register. Instead, it solves the same problem that assigning values to variables in Forth solves.

As I understand it, the A register, introduced in the MuP21, solved some problems having to do with on-die fanout, fanin, and path length that limited the performance of the Novix NC4016 and the Harris RTX 2000. It also allows you to do operations like memset and memcpy without having to fetch and store the pointers to RAM (at fixed addresses!) for every byte/word read or written, which improves performance dramatically, makes the code a lot more readable (especially with its incrementing addressing modes), and makes the code more compact.

None of these are problems that exist in the PostScript world.

You can't use the A register in the MuP21 or F18A as a local variable, because it isn't local; it's shared with whatever subroutines you call and whatever subroutine called you. Moreover, the A register is the only way to read and write memory, so they can't avoid overwriting it if they need to read or write memory.

[I've programmed in PostScript, and I've written a little Forth, but I've never programmed any of the chips mentioned above.]

kragen · on May 14, 2021

Interesting! Some way to distinguish pointers from integers is kind of a sine qua non for memory safety; I've been kind of thinking along the lines of Smalltalk object descriptors, where each memory allocation is in a "segment" of its own, into which you can index with integers. That would save you from debugging segfaults. (But then, so does Hindley-Milner typing, and that doesn't cost a bounds-check on every memory access.)

The part that so far puzzles me is how to do this without losing too much of Forth's simplicity.

Pelc's proposal is to add scratch/index registers A and B (notably, not segregating integers from pointers) and base pointers X or LP (the frame pointer) and Y or UP (thread-local data pointer). His motivation seems to be primarily efficiency and clarity rather than memory-safety.

Probably you want at least LP and UP to be callee-saved. If you make LP into an additional stack rather than just a register, then you save yourself a Y> >R on entry to subroutines that allocate stack frames, and half of a R> >Y on exit. (The CPU still has to do the same amount of work, but plausibly you spend less on op dispatch overhead and code space.) Doing the same with the A and B scratch/index registers might make sense, or you could just treat them as caller-saved. At some point restoring individual shallow-bound variables one by one stops being a good tradeoff, and it starts making more sense to put them in a stack frame or instance-variable vector and access them by indexing off a base pointer.

Adding a small number of local-variable registers like this is kind of a pain for hand-programming, though, as opposed to as a compiler target. Instead of having to remember whether the quantity two levels down in the stack is the voltage or the current, you have to remember whether the A register is currently the voltage or the current. Much easier to say 0 VALUE VOLTAGE 0 VALUE CURRENT, or maybe VARIABLE V VARIABLE I, and then your code is always explicit about which one it's expecting. So I doubt Pelc's proposal can be justified on the grounds of clarity, rendering it merely a speed hack.

astrobe_ · on May 14, 2021

A strange game. The only winning move is not to play.

If you want memory safety guarantees from the compiler, don't use Forth.

The paradigm in Forth is "correct by design", "fail-fast" and to fight complexity unguibus et rostro.

The paradigm in Forth is to increase the skills of the programmer rather than to increase the skills of the compiler. That's why it is a wonderful language for advanced training, IMO.

kragen · on May 15, 2021

This is precisely the kind of self-aggrandizing macho nonsense I was criticizing in https://news.ycombinator.com/item?id=27159383.

1. What should you use if you want memory safety guarantees—whether because you're a novice, because you have better ways to spend your Saturday nights than disassembling core dumps, because you're building a platform for safely running untrusted code written by players who are competing against one another https://news.ycombinator.com/item?id=27160615, or because, given your own fallibility, you think it's the only way to be safe enough from ransomware gangs? Of course you're right that it's not ANS Forth or polyForth or machineForth, but what should it be? What's the lowest price in complexity we can get away with paying for that pearl? Surely we can do better than JavaScript. Or even PostScript.

2. You say the paradigm in Forth is "correct by design", "fail-fast" and to fight complexity uñas y rostro. Well, precisely what I want is to be fail-fast; standard Forth pointer arithmetic isn't, because indexing past the end of an array will just access some other object. That means that instead of failing fast it will usually produce incorrect results and possibly corrupt the data in that other object, producing errors downstream. To fail fast we need bounds-checking, and for bounds checking we need to distinguish pointers from integers.

3. You say, "The paradigm in Forth is to increase the skills of the programmer," which is to say, to be more difficult than necessary. You're free to make your Forth more difficult than necessary if you want, which is an enjoyable activity that will surely make you a better programmer, but that isn't what Forth was designed for. In fact it's close to the opposite of Forth's design principles, which are to fight complexity because complexity makes programming more difficult. That's why Forth programmers in the 01980s had IDEs with virtual memory, incremental recompilation, compile-time metaprogramming, reflection, and multithreading that fit on a floppy disk and could run in 64K of RAM. It's not because F-83 was written by superhumans; it's because Forth's design removed a lot of accidental complexity from the problems, allowing its implementors to focus on the essence.

I mean, you're probably right that most people using Forth today are doing it, like Apollo, not because it is easy but because it is difficult. But that's not what Chuck Moore was looking for, it's not what Elizabeth Rather is looking for, and it's not what I'm looking for either.

remexre · on May 15, 2021

> because you're a novice, because you have better ways to spend your Saturday nights than disassembling core dumps

My experience is that with a development methodology that emphasizes early-binding, incremental testing, and aggressive factoring, I just don't have the same kinds of memory issues I do in C. Strings having an explicit length helps a lot, and I suspect stack manipulation having a linear-types-like feeling helps somewhat too, since you think a lot more about "should this be removing this from the stack" than "should I be using this variable again some time, you know, later."

> because you're building a platform for safely running untrusted code written by players who are competing against one another, or because, given your own fallibility, you think it's the only way to be safe enough from ransomware gangs

My aarch64 Forth runs under QEMU, and this is easy to set up, and wiring a PL011 to stdio works great.

If you're just trying to avoid buffer overruns, though (which I suppose meets the latter case but not really the former), I'd consider using a 64-bit Forth with 32-bit pointers and 32-bit offsets, storing the size of the allocation in the first word before the pointer, so you can check the offset in a special TAGGED!/TAGGED@. I suspect you'd still want the normal !/@ for speed, and implement TAGGED!/TAGGED@ in terms of them, though?

Perhaps the MTE extensions would work, but I haven't investigated them (I don't have a laptop whose CPU that supports them yet).

kragen · on May 15, 2021

> I just don't have the same kinds of memory issues I do in C.

Interesting! I definitely haven't reached that level—in Forth I have the same kinds of memory problems I have in C, and also parameter-passing bugs and type errors. Is there a codebase you think of as exemplary that I could learn from? Or, maybe better, a screencast of somebody writing some software in the development style? (Interactive testing of new words doesn't leave development artifacts, after all.)

"Much better than C" might still be pretty bad compared to JS, Rust, or OCaml; have you tried them?

> My aarch64 Forth runs under QEMU, and this is easy to set up, and wiring a [UART] to stdio works great.

This is definitely a viable approach that can work in many circumstances! However, each trust domain in QEMU uses 16 megabytes of RAM and roughly a million microseconds of CPU time to start up and shut down, and if you're doing cross-architecture emulation so you can't use kvm, the CPU overhead is typically on the order of 300%. By contrast, each trust domain in Lucet takes about 16 kilobytes of RAM and 50 microseconds of CPU time to start up and shut down https://www.fastly.com/blog/lucet-performance-and-lifecycle, and a new trust domain in a language that implements object-capability security could conceivably take 16 bytes of RAM and 0.1 microseconds. In both cases you can usually come within a few percent of native-code performance—which probably isn't in the cards if every memory access is preceded by a bounds check, as in Valgrind (typically 1000% CPU overhead) or this pseudo-Forth approach. Communications round-trip time might be 1 μs per ping-pong in QEMU and 0.1 μs in the other cases.

So, there are cases where a secure-compiler approach can give you 3–6 orders of magnitude better performance than just using QEMU. If you have a multiplayer game where each player uploads code, you might be able to support a few hundred players on a PC with the QEMU approach, or a few hundred thousand players with the Lucet approach. (Not, probably, all connected over TCP at once.)

> consider using a 64-bit Forth with 32-bit pointers and 32-bit offsets

An interesting idea here is to use a 64-bit system with 64-bit pointers and 32-bit offsets, mapping each allocation 4 gibibytes and change apart. Basically using the high 32 bits of the pointer as an MTE address tag or segment ID. Bloated and probably slow, though, and of course the bounds checking isn't very precise—enough to ensure that you can't index into another allocation, but not enough to catch every off-by-one array indexing bug.

There used to be a fat-pointer bounds-checking version of GCC that replaced pointers with essentially Golang slices, and you could of course implement that kind of scheme on top of a standard Forth, much as you suggest with TAGGED@.

remexre · on May 15, 2021

> Is there a codebase you think of as exemplary that I could learn from?

My "one codebase I keep rewriting" is [0]; I don't consider the artifact as useful as the process, though. Particularly,

> in Forth I have the same kinds of memory problems I have in C, and also parameter-passing bugs and type errors.

I find the memory problems are helped a lot by incremental testing in the REPL, as well as designing in such a way that it's obvious what a word does -- if a word has a surprising side-effect or requirement, that's a good sign some design rethinking is in order. Thinking Forth [1] has some examples of the kinds of redesigns that can be helpful, though as a warning it has strong "grumpy old man" vibes. I find these hilarious, but some don't :) (Some of its design suggestions are also somewhat "off" in my view; not a huge fan of DOER/MAKE, personally.)

I definitely don't often experience parameter-passing bugs (assuming you mean "this takes 3 args, thought it took 2") nor type errors, and I'm actually a bit surprised to hear you do. I suppose I rather religiously stack comment, and write just about every word to have a "static" stack effect. I've also got a shell alias:

    ffd() { rg -FiIN ": $1 (" "${2:-$HOME/Projects/stahl/kernel/src}" };

so I can just quickly do e.g.

    $ ffd nip
    : NIP ( a b -- b ) SWAP DROP ;
    $ ffd allocate-pages
      : ALLOCATE-PAGES ( num-pages -- 0 0 | addr -1 )

I ought to set keywordprg in Vim to that too, then I could get it in-editor. Of course, this doesn't work for builtins and control constructs -- perhaps someday I'll have the compiler put the pointer in SOURCE in the word header, and write a language server that looks through them.

> [...] JS, Rust, or OCaml; have you tried them?

Yeah, more the latter two than JS. Maintaining memory safety is easier in a memory-safe language, but I'm not certain overall correctness and design simplicity is. Certainly, when I'm implementing things in OCaml, I feel like I have to make design compromises much more than in Forth (where Forth gives me more room to redesign instead of getting forced into an unideal compromise). Rust's macros give me a bunch more leeway, but it's still occasionally awkward to express some things where something really is just easier to express at runtime, but it's impossible or UB/unsound to do so.

Common Lisp is perhaps a better competitor in this regard, since like Forth it supports both powerful compile-time and run-time metaprogramming. Maybe MetaOCaml too, though I've yet to try it.

> However, each trust domain in QEMU [...]

Yeah, of course QEMU isn't a silver bullet in this regard; if your application needs to be able to spawn large numbers of sandboxes quickly, especially. If you wanted to push virtualization as far as it could go, I'd consider the rust-vmm/Firecracker work instead -- they have stock Linux booting in 250ms/core, and a (perhaps custom) ringbuffer shared between the two processes would of course be vastly lower overhead than proper emulation of hardware.

> [...] Lucet [...]

I actually considered dropping WASM in initially, but I removed it after remembering my experience with trying to do JITtish things in it -- it's not currently practical (as of like a year ago) to do a DTC Forth, or a Forth that inlines, in WASM, since words are just too small. WASM being an "extreme" Harvard architecture is also fairly annoying. I suppose an ITC Forth ought to work, but without an inliner, performance seems like it'd be fairly bad.

I believe there's an API in spec-development currently to add better primitives for JIT that's sized closer to a basic block, so maybe that would change the situation.

> [...] object-capability security [...]

If you're going that... high-level? abstract? why not use Factor? I think, as you describe above, a Smalltalk-inspired approach might be a better foundation than most Forths' "assembly-inspired" ones. Though, as I understand it (too young to have lived through it!), Smalltalk had awful performance until Self came along and invented modern JIT, by which point you've completely lost the virtue of simplicity of implementation (at which point I'd be biased to say, you should use a Lisp or Haskell variant instead!)

> [...] 64-bit pointers [...]

Yeah, I considered a variation of that (most memory bugs are off-by-n for a small n, so guard pages are almost as good), but having a minimum allocation size of 4k seemed too wasteful.

> [...] replaced pointers [...]

Oh, I think TCC [2] still implements that! My understanding was that doing that relied on the sorts of C UB rules that some people hate (and that I quite like that Forth lacks!) though. I suppose if sizeof(uintptr_t) == 2*sizeof(ptrdiff_t), that wouldn't need to use the rules, but I think that'd suck pretty bad in Forth.

---

I think there are (at least!) two axes being compared here:

1. How easy is it to write correct Forth (I include security in correctness -- arbitrary code execution is typically pretty incorrect!)

2. How easy is it to execute attacker-provided Forth safely

I'd say Forth is strongly competitive on the first axis, and isn't at all on the second. I'd certainly believe that Factor or a Smalltalk-like Forth could be competitive on the latter, but if you include bugs in the compiler and runtime as correctness violations, I suspect it'd suffer a lot on the first axis.

I've thought a bit about getting stronger assurances of correctness for my Forth programs without complexifying the implementation of the system (which would make incorrectness much more likely). Currently I'm thinking about doing proofs in separation logic, with symbolic execution acting to hopefully provide a high degree of proof automation. An important characteristic is doing these proofs on the program starting at some physical machine state at a fixed time (e.g. just before some main loop is entered), rather than on the abstract machine that the runtime and compiler hopefully implement correctly. I think the system is simple enough that no mismatch between the two would occur, but I suppose I'll see whenever I've finished learning enough to start implementing this.

---

[0]: https://git.sr.ht/~remexre/stahl/tree/main/item/kernel/src [1]: https://managedway.dl.sourceforge.net/project/thinking-forth... [2]: https://bellard.org/tcc/

kragen · on May 15, 2021

Wow, this is wonderful stuff! I'll definitely look more at stahl :)

By "parameter-passing bugs" I mean things like "I forgot to put a DROP after this loop that was keeping a variable on the stack" or "this takes 3 args, and I knew that, but I accidentally passed it 2, and then it took me an annoyingly long time to figure it out".

I don't really think of Factor (or PostScript) as being terribly similar to Forth despite their concatenative syntaxes, although maybe that just means I'm emphasizing a different aspect of Forth than you are. I feel like Factor and PostScript are more like Lisp or (especially) Smalltalk than they are like Forth, just with less syntax.

The thing that appeals to me most strongly about Forth is the prospect of a steam-catapult-like leap from machine code, through a few hundred lines of code, to a fairly usable low-level programming language environment: concise code with nested expressions, recursive functions, named variables, ad-hoc compile-time metaprogramming, function pointers (DEFER or DOER/MAKE), a form of closures (CREATE DOES>), and an interactive REPL that doesn't need to be restarted to see code changes (though maybe not a single-stepping debugger). That is, it's the simplicity of the implementation that appeals to me, with the transparency and malleability that implies.

But I don't really prefer to type b negate discrim sqrt + 2 a * / instead of -b + sqrt(discrim()) / (2*a) or (/ (+ (- b) (sqrt (discrim))) (* 2 a)). I don't really like having to type "." or "?" to see the result of an expression I'm evaluating interactively. I don't think the RPN syntax is actually better, mostly, though it has some real advantages as a user interface. I just think the RPN syntax is an acceptable tradeoff for the simpler implementation, more powerful metaprogramming and refactorability, and better performance that the Forth design provides, and it's easy enough to use it to implement something more reasonable.

I'm still more intimidated than I should be by formal methods. I've found DRMacIver's QuickCheck-like Hypothesis, which randomly searches for counterexamples to your assertions of properties you would like your program to have and then reduces them to minimal test cases, to be astoundingly effective at improving my testing, and I've been thinking something like his "Minithesis" could be a significant boost to getting a new programming environment up and running. But that's about as far as I've gotten.

remexre · on May 17, 2021

> [...] parameter-passing bugs [...]

Ah, yeah, incremental development is the panacea here.

> [...] Factor (or PostScript) [...]

Yeah, agreed; if you're needing to be able to safely execute untrusted code though, I'd rather have a language with trustworthy properties (i.e. one restricting one from making arbitrary syscalls, arbitrary pointer reads+writes, etc.) rather than applying sandboxing to a language that allows the program to do all these unsafe operations.

> [...] leap from machine code [...]

Agreed :)

> [...] prefer to type [...]

Yeah, I haven't found a syntax for reading math that's better than infix; I don't actually think the postfix is much better than prefix, but maybe that's just me.

> [...] RPN syntax [...]

Oh, I actually think it does; it emphasizes function composition directly, and unifies it with sequencing statements in a single syntax!

> [...] formal methods [...]

Yeah, especially given the diversity of techniques under the formal methods banner, it's hard to get a good look at "the whole landscape." I think symbolic execution is getting to the point where it's workable for testing with a similar UX to property testing, just slower but with stronger guarantees (since all possible executions are taken, not just the ones that the RNG finds). Haven't actually built it yet though, so maybe Forth's flexibility will make it too hard, who knows.

If you wanna chat more about these things, I and at least one other person looking into formal methods + Forth hang out on #forth on Freenode.

remexre · on May 19, 2021

Well, #forth on Libera.chat now :)

astrobe_ · on May 15, 2021

> What should you use if you want memory safety guarantees [...] but what should it be?

That's a bit off-topic but I would say any interpreter for a dynamically typed language that use a GC. That is to say almost anything but Forth (at least in the non-AoT category)

> Well, precisely what I want is to be fail-fast; standard Forth pointer arithmetic isn't

That particular point is often addressed by "correct by design". If it is really important not to off-by-one, you can make your array a power of two in size, and then access it through "gate keeping" words that bitmask the index. Or perform a quick check on the passed index using the same technique.

This is of course not a one-size-fits-all solution, but one useful thing Forth taught me is to think about things in their context. Specialization leads to simplification, generalization often leads to complications.

> You say, "The paradigm in Forth is to increase the skills of the programmer," which is to say, to be more difficult than necessary.

If you want to jump higher, you train yourself to jump higher by raising the bar a little every day, you don't lower the bar a little every day and pretend you've improved.

When you train yourself you build the muscles involved in the task you are training for. Eventually what was next to impossible last month is easy today. It is not difficult anymore.

"Difficult" is not an absolute value.

> I mean, you're probably right that most people using Forth today are doing it, like Apollo, not because it is easy but because it is difficult.

I did not really say that. I have no idea of how many people are using Forth in production, as a hobby, as a personal tool or as a puzzle game.

> But that's not what Chuck Moore was looking for, it's not what Elizabeth Rather is looking for, and it's not what I'm looking for either.

Chuck Moore has been developing Forth for 50 years. When you say he is not trying to make Forth more difficult, you are right but you are IMHO mistaken on the means. One of his attempts was "source-less programming" [1]. It was a dead-end, but IIRC the experiment led to ColorForth.

E. Rather is retired now I believe, but what she was looking for as a co-founder of Forth, Inc. is to sell Forth. That's a different objective, and that led co-founder C. Moore to leave the company in order to pursue his personal vision of Forth.

For one thing Forth, Inc. has pushed ISO-Forth because a standard gives credibility and makes things a bit easier for their customers, but Moore was quite not favorable to the principle of an "executable" standard and criticized it for being too complicated.

[1] http://www.ultratechnology.com/okad.htm

kragen · on May 16, 2021

> That's a bit off-topic but I would say [to get memory safety you should use] any interpreter for a dynamically typed language that use a GC. That is to say almost anything but Forth (at least in the non-AoT category)

Okay, but if you use, say, Lua instead of Forth, you're going from an interpreter that weighs 4 kilobytes to one that weighs 100 kilobytes. And instead of running 5 times slower than C (assuming a conventional threaded-code Forth), it runs 20 times slower (assuming not LuaJIT). And you've lost inline assembly. And you've lost compile-time metaprogramming (and if you want eval, which lets you get some of it back, the price goes up to 150 kilobytes). And you can no longer write real-time code because of the GC. And, even without real-time requirements, your programs randomly fail in limited-memory environments because they're allocating all the time, and sometimes they suffer atypical allocation fragmentation. And the memory requirements aren't just unpredictable, they're also large, because most of your memory is taken up by pointers, so even when your programs don't crash randomly in a low-memory environment they can't do as much as Forth.

There are other compact interpreted languages with other tradeoffs: BASIC, Smalltalk, XLISP, ZIL, Red, Elisp, and PostScript, for example. But they all have many or all of the above drawbacks.

So I don't think it's unreasonable to try to find a spot in the design space that's closer to Forth than it is to Lua and PostScript. GC and dynamic typing clearly aren't necessary for memory safety. The resulting language clearly wouldn't be the same thing as Forth, but that doesn't mean it has to have the kinds of orders-of-magnitude disadvantages listed above.

> If it is really important not to off-by-one, you can make your array a power of two in size, and then access it through "gate keeping" words that bitmask the index. Or perform a quick check on the passed index using the same technique.

This is going to the opposite extreme from "fail fast": unless the desired semantics is really an infinitely replicated buffer, like if you're using it as a ring buffer, this is deliberately producing incorrect results to mask the programming error of an out-of-bounds index. There are situations where that's a good idea (any possible way of handling such an error in an NPC script in a first-person shooter would be better than crashing the game, and ring-buffer FIFOs are a perfectly reasonable way to communicate with interrupt handlers and real-time threads) but there are also a lot of situations where failing fast is better.

> I have no idea of how many people are using Forth in production, as a hobby, as a personal tool or as a puzzle game.

My objection to your comment was basically that you seemed to be saying that anyone who's using Forth as something other than a puzzle game is doing it wrong.

Yes, doing things in a way that's more difficult than necessary is a worthwhile activity and makes you stronger. Football players get better at playing football by lifting weights. But they don't carry the weights with them onto the football field. That would make them worse at playing football.

> Specialization leads to simplification, generalization often leads to complications.

An important insight. You're right about the history of Forth, too.

liberalbias998 · on May 16, 2021

Messing around with FORTH rather than writing apps using it is also a mental trap, at least at the hobbyist level.

kragen · on May 16, 2021

Yeah, it's a common observation that more people have written their own Forth than have written useful programs in any Forth, whether their own or others'.

Part of the reason is that Forth is a low-level language like C, which makes it easier to implement, and harder to write programs in, than, say, Python or Rust. But, although there are considerably more C compilers out there than Python implementations, C doesn't suffer from this to nearly the same extent.

I'm no expert in Forth (though of course I've written a few quasi-Forths of my own) but I suspect that what's happening here is that, nowadays, the advantages of Forth over C or assembly just aren't great enough to counterbalance the disadvantages of using a minority language (like worse documentation, FFI, libraries, answers on Stack Overflow). Also, Forth compilers typically produce worse code than C compilers or assembly programmers, which rules it out for a lot of the applications where using a low-level language would be most tempting. That is, itself, a category that is smaller every year, in relative terms anyway.

The difficulty gets made worse by the conceited bullshit discourse about how Forth programmers are an elite distinguished by their superior intelligence, a story I sometimes hear even from people who don't write Forth.

zozbot234 · on May 15, 2021

I'm not sure that I agree. I think stack manipulation is the most elegant idiom available given the constraints that FORTH is working with. The values and variables features seem to introduce unwanted coupling, and complicate the language's underlying model.

Of course it goes without saying that most FORTH code should not contain low-level stack manipulation. But most useful stack operations can be abstracted quite readily, and this kind of abstraction is a meaningful part of FORTH idiomatic style.

kragen · on May 15, 2021

It's an interesting point of view.

When you say "The values and variables features...introduce unwanted coupling, and complicate the...model," do you mean implementing them in a Forth, or using them in your application code? I don't think I've ever seen either a Forth or a Forth program that didn't use variables, so I think the model is unavoidably going to get that complicated for any program more complicated than (-b±√(b²-4ac))/2a. (You could replace VARIABLEs (and CONSTANTs) with VALUEs, and having both is clearly unnecessary.)

As for the coupling, do you mean that if one subroutine writes a variable and another reads it, then that couples them? That's true, but I was talking about using a variable inside a single subroutine, as an alternative to stack manipulations. That's not the only good way to use variables in Forth, and it's not always better than stack manipulations, but I'm saying it's a reliable way to get over the "I'm too dumb for Forth" hump that most of us hit at some point.

> But most useful stack operations can be abstracted quite readily, and this kind of abstraction is a meaningful part of FORTH idiomatic style.

Yes, that is surely true.

Is there some Forth code you think of as exemplary that I could learn from?

kragen · on May 15, 2021

> 0 VALUE C : CAB TO C - C / ; or possibly VARIABLE C : CAB C ! - C @ / ;

This is wrong. It should be

    0 VALUE C  0 VALUE D
    : CAB - TO D TO C  D C / ;

or

    VARIABLE C  VARIABLE D
    : CAB - D ! C !  D @ C @ / ;

Incidentally, that's exactly how I implemented DUP, DROP, and SWAP in StoneKnifeForth:

    var X # 0  var Y # 0  ( temp vars )
    : dup  X !  X @  X @ ;  : pop X ! ;  : xchg X !  Y !  X @  Y @ ; ( SWAP/exch )

(The weird names are because only the first byte of the name is significant in SKF.)

threatofrain · on May 14, 2021

It seems that as interesting as Forth is, the community of such languages is very niche and small, even smaller than Lisp. Can anyone speak to recent changes in velocity or energy?

colllectorof · on May 14, 2021

https://factorcode.org/

One of the most impressive languages you've never heard about.

https://www.youtube.com/watch?v=f_0QlhYlS8g

haolez · on May 14, 2021

Forth has almost no syntax, which leads to every project becoming some kind of domain specific DSL. This is not necessarily a problem, but it leaves Forth without a strong sense of identity as a programming language. This is my opinion, of course.

Someone · on May 14, 2021

Not only your opinion. It is fairly common among Forth enthousiasts to talk about “a Forth”, rather than “Forth”, as two Forths can be hugely different, yet be recognized as Forths (lisp is similar in that regard)

The ideal system is optimized for its problem domain. That means not only adding needed functionality, but also dropping unneeded functionality, and tweaking functionality to better fit the problem at hand. That’s not surprising for a system that shines when running with very little memory.

That’s different from, say C++. Nobody says clang, gcc and Visual C++ are “a C++”.

If they were philosophically similar to Forths and Lisps, those writing programs in them would be happy to change int promotion rules, operator precedence, vtable layout, etc, and different compilers would make different choices there.

liberalbias998 · on May 16, 2021

There are really only two complete, commercially supported FORTH implementations out there; and they implement the same standard. So there are far fewer FORTHS out there than people think.

macintux · on May 14, 2021

Presumably that’s compounded by the fact that the inventor, Chuck Moore, feels strongly that Forth is more of a concept to be reimplemented for each project, customized along with the hardware, than a programming language to be standardized.

throwaway17_17 · on May 14, 2021

Moore’s talks and quotes of his from various lecture notes on this point really inspired the underlying theory of my personal programming language. I wanted something that I could adapt to different use cases when I saw fit, but with a solid underlying theoretical base. I have been using it for my personal work for about 8 months now (compiling to JS, C or C++ depending on project) and I’m enjoying it greatly as well as being productive.

macintux · on May 14, 2021

Do you have a blog post about your language? If not, you should write one.

throwaway17_17 · on May 14, 2021

I don’t at this point. I have been playing around with the idea of a release or something similar. But people tend to expect a lot when a new language is shown. I don’t know if anyone would be interested in just reading about the language and then not even being able to use some light playground or anything.

I have been considering doing an Ask HN to see what people felt should be included on an initial ‘release’ or showing of a language. Maybe I should.

macintux · on May 15, 2021

Just a description of your philosophy and how it and the language have evolved would be interesting, along with some sample code (and sample output), regardless of whether you eventually release it.

zzo38computer · on May 15, 2021

Do you have any documentation? I might want to see, maybe.

bear8642 · on May 14, 2021

> Forth has almost no syntax

Feel more Forth has no syntax - just space separated words defined either in assembly or other forth words

thewakalix · on May 15, 2021

Isn't "space-separated words" a syntax, albeit a minimal one?

zzo38computer · on May 15, 2021

I think so, and some words will read additional text other than treating as words (such as comments, text strings, etc). Still, it doesn't have a syntax as much as other programming languages do.

Syzygies · on May 14, 2021

The Wikipedia entry https://en.wikipedia.org/wiki/Forth_(programming_language) is terse but matches what I've read elsewhere over the years:

Forth is both easy to implement and yields very compact binaries, so it has been favored as the first environment implemented on new hardware, and as the controller for satellites. This has become less important as resource constraints ease.

Forth's progeny PostScript continues unabated. Humans tend not to code directly in PDF format, as that is grave-digging. The "specification" has many geological layers (the earliest are easiest to code in) and no verification or even machine-readable definition, so one is at the mercy of testing PDF files with multiple applications. All the problems associated with Markdown, only on steroids. But one can easily write code that generates valid PostScript, and have the results translated to PDF.

PaulHoule · on May 14, 2021

Who cares? If you need some minimal programming language with some special characteristics, you can write a FORTH in 2k lines of assembler or so.

The first time I used 32 bit ints on an 8 bit computer was in a FORTH that implemented 32 bit math, for instance.

protomyth · on May 14, 2021

Because of the original 128K Mac's limited memory, FORTH was one of the first actually useful language you could host on it.

PaulHoule · on May 15, 2021

The 128K Mac had a lot more memory than most other computers in 1984, but it was not a lot of room to fit a GUI in.

Oddly, within a few years people had shoehorned GUIs into much smaller machines, such as

https://en.wikipedia.org/wiki/GEOS_(8-bit_operating_system)

protomyth · on May 22, 2021

The 128K Mac had a lot more memory than most other computers in 1984

They very issue where Byte shows the Macintosh had a lot of computer ads with PC's starting at 128K and being expandable to 512K or 640K. The Macintosh was under specified in memory compared to non-GUI PCs of its era.

ww520 · on May 14, 2021

There's something very practical about postfix and stack based languages like Forth - compactness and simplicity. Just couple weeks ago I had the need to build a small language with the requirement of terse and compact syntax. I settled on a Forth-like stack based language and was able to build the parser, compiler, and code gen to GPU in about 80 lines of JS code in a short time.

openfuture · on May 14, 2021

I don't think forth has a smaller community than lisp just a less web-friendly one. Lots of embedded people.

Lisp is top down, forth is bottom up.

catketch · on May 15, 2021

What would you call HP’s RPL then :)

openfuture · on May 15, 2021

A lattice? (But touché :)

floatingatoll · on May 15, 2021

Apparently I read the book that these images are taken from, because I recognize having read these stack manipulation comics as a kid! Is this site from a book?

wiml · on May 15, 2021

I think they're from Leo Brodie's _Starting Forth_, which I also read as a kid.

cyberdelica · on May 16, 2021

You're right. The linked page, is just Chapter 2 of Starting Forth.

anthk · on May 14, 2021

Sadly for CollapseOS' users, this Forth implementation doesn't have ".S".

sigjuice · on May 14, 2021

I came across this implementation of .S a couple of days ago (Starting-FORTH.pdf, page 50).

  : .S CR 'S S0 @ 2- DO I @ . -2 +LOOP ;

I'm not sure if this could be adapted to work on CollapseOS. For starters, I can't quite figure out what 'S is. S0 is the (non-standard) stack pointer according to https://www.forth.com/starting-forth/9-forth-execution/ . The rest of it appears to be standard Forth, but I'm not sure how much of it is actually implemented.

anthk · on May 14, 2021

Heh, thanks, I just was in the 14th page reviewing it, I forgot about that :D.

Also, 4th implements that with "anscore.4th" I think.

EDIT: Neither 'S is implemented in Collapseos.

nils-m-holm · on May 15, 2021

The dragon cartoon (search for SWAP on the page) is one of the cutest in the history of CS books! :)