Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Is there a codebase you think of as exemplary that I could learn from?

My "one codebase I keep rewriting" is [0]; I don't consider the artifact as useful as the process, though. Particularly,

> in Forth I have the same kinds of memory problems I have in C, and also parameter-passing bugs and type errors.

I find the memory problems are helped a lot by incremental testing in the REPL, as well as designing in such a way that it's obvious what a word does -- if a word has a surprising side-effect or requirement, that's a good sign some design rethinking is in order. Thinking Forth [1] has some examples of the kinds of redesigns that can be helpful, though as a warning it has strong "grumpy old man" vibes. I find these hilarious, but some don't :) (Some of its design suggestions are also somewhat "off" in my view; not a huge fan of DOER/MAKE, personally.)

I definitely don't often experience parameter-passing bugs (assuming you mean "this takes 3 args, thought it took 2") nor type errors, and I'm actually a bit surprised to hear you do. I suppose I rather religiously stack comment, and write just about every word to have a "static" stack effect. I've also got a shell alias:

    ffd() { rg -FiIN ": $1 (" "${2:-$HOME/Projects/stahl/kernel/src}" };
so I can just quickly do e.g.

    $ ffd nip
    : NIP ( a b -- b ) SWAP DROP ;
    $ ffd allocate-pages
      : ALLOCATE-PAGES ( num-pages -- 0 0 | addr -1 )
I ought to set keywordprg in Vim to that too, then I could get it in-editor. Of course, this doesn't work for builtins and control constructs -- perhaps someday I'll have the compiler put the pointer in SOURCE in the word header, and write a language server that looks through them.

> [...] JS, Rust, or OCaml; have you tried them?

Yeah, more the latter two than JS. Maintaining memory safety is easier in a memory-safe language, but I'm not certain overall correctness and design simplicity is. Certainly, when I'm implementing things in OCaml, I feel like I have to make design compromises much more than in Forth (where Forth gives me more room to redesign instead of getting forced into an unideal compromise). Rust's macros give me a bunch more leeway, but it's still occasionally awkward to express some things where something really is just easier to express at runtime, but it's impossible or UB/unsound to do so.

Common Lisp is perhaps a better competitor in this regard, since like Forth it supports both powerful compile-time and run-time metaprogramming. Maybe MetaOCaml too, though I've yet to try it.

> However, each trust domain in QEMU [...]

Yeah, of course QEMU isn't a silver bullet in this regard; if your application needs to be able to spawn large numbers of sandboxes quickly, especially. If you wanted to push virtualization as far as it could go, I'd consider the rust-vmm/Firecracker work instead -- they have stock Linux booting in 250ms/core, and a (perhaps custom) ringbuffer shared between the two processes would of course be vastly lower overhead than proper emulation of hardware.

> [...] Lucet [...]

I actually considered dropping WASM in initially, but I removed it after remembering my experience with trying to do JITtish things in it -- it's not currently practical (as of like a year ago) to do a DTC Forth, or a Forth that inlines, in WASM, since words are just too small. WASM being an "extreme" Harvard architecture is also fairly annoying. I suppose an ITC Forth ought to work, but without an inliner, performance seems like it'd be fairly bad.

I believe there's an API in spec-development currently to add better primitives for JIT that's sized closer to a basic block, so maybe that would change the situation.

> [...] object-capability security [...]

If you're going that... high-level? abstract? why not use Factor? I think, as you describe above, a Smalltalk-inspired approach might be a better foundation than most Forths' "assembly-inspired" ones. Though, as I understand it (too young to have lived through it!), Smalltalk had awful performance until Self came along and invented modern JIT, by which point you've completely lost the virtue of simplicity of implementation (at which point I'd be biased to say, you should use a Lisp or Haskell variant instead!)

> [...] 64-bit pointers [...]

Yeah, I considered a variation of that (most memory bugs are off-by-n for a small n, so guard pages are almost as good), but having a minimum allocation size of 4k seemed too wasteful.

> [...] replaced pointers [...]

Oh, I think TCC [2] still implements that! My understanding was that doing that relied on the sorts of C UB rules that some people hate (and that I quite like that Forth lacks!) though. I suppose if sizeof(uintptr_t) == 2*sizeof(ptrdiff_t), that wouldn't need to use the rules, but I think that'd suck pretty bad in Forth.

---

I think there are (at least!) two axes being compared here:

1. How easy is it to write correct Forth (I include security in correctness -- arbitrary code execution is typically pretty incorrect!)

2. How easy is it to execute attacker-provided Forth safely

I'd say Forth is strongly competitive on the first axis, and isn't at all on the second. I'd certainly believe that Factor or a Smalltalk-like Forth could be competitive on the latter, but if you include bugs in the compiler and runtime as correctness violations, I suspect it'd suffer a lot on the first axis.

I've thought a bit about getting stronger assurances of correctness for my Forth programs without complexifying the implementation of the system (which would make incorrectness much more likely). Currently I'm thinking about doing proofs in separation logic, with symbolic execution acting to hopefully provide a high degree of proof automation. An important characteristic is doing these proofs on the program starting at some physical machine state at a fixed time (e.g. just before some main loop is entered), rather than on the abstract machine that the runtime and compiler hopefully implement correctly. I think the system is simple enough that no mismatch between the two would occur, but I suppose I'll see whenever I've finished learning enough to start implementing this.

---

[0]: https://git.sr.ht/~remexre/stahl/tree/main/item/kernel/src [1]: https://managedway.dl.sourceforge.net/project/thinking-forth... [2]: https://bellard.org/tcc/



Wow, this is wonderful stuff! I'll definitely look more at stahl :)

By "parameter-passing bugs" I mean things like "I forgot to put a DROP after this loop that was keeping a variable on the stack" or "this takes 3 args, and I knew that, but I accidentally passed it 2, and then it took me an annoyingly long time to figure it out".

I don't really think of Factor (or PostScript) as being terribly similar to Forth despite their concatenative syntaxes, although maybe that just means I'm emphasizing a different aspect of Forth than you are. I feel like Factor and PostScript are more like Lisp or (especially) Smalltalk than they are like Forth, just with less syntax.

The thing that appeals to me most strongly about Forth is the prospect of a steam-catapult-like leap from machine code, through a few hundred lines of code, to a fairly usable low-level programming language environment: concise code with nested expressions, recursive functions, named variables, ad-hoc compile-time metaprogramming, function pointers (DEFER or DOER/MAKE), a form of closures (CREATE DOES>), and an interactive REPL that doesn't need to be restarted to see code changes (though maybe not a single-stepping debugger). That is, it's the simplicity of the implementation that appeals to me, with the transparency and malleability that implies.

But I don't really prefer to type b negate discrim sqrt + 2 a * / instead of -b + sqrt(discrim()) / (2*a) or (/ (+ (- b) (sqrt (discrim))) (* 2 a)). I don't really like having to type "." or "?" to see the result of an expression I'm evaluating interactively. I don't think the RPN syntax is actually better, mostly, though it has some real advantages as a user interface. I just think the RPN syntax is an acceptable tradeoff for the simpler implementation, more powerful metaprogramming and refactorability, and better performance that the Forth design provides, and it's easy enough to use it to implement something more reasonable.

I'm still more intimidated than I should be by formal methods. I've found DRMacIver's QuickCheck-like Hypothesis, which randomly searches for counterexamples to your assertions of properties you would like your program to have and then reduces them to minimal test cases, to be astoundingly effective at improving my testing, and I've been thinking something like his "Minithesis" could be a significant boost to getting a new programming environment up and running. But that's about as far as I've gotten.


> [...] parameter-passing bugs [...]

Ah, yeah, incremental development is the panacea here.

> [...] Factor (or PostScript) [...]

Yeah, agreed; if you're needing to be able to safely execute untrusted code though, I'd rather have a language with trustworthy properties (i.e. one restricting one from making arbitrary syscalls, arbitrary pointer reads+writes, etc.) rather than applying sandboxing to a language that allows the program to do all these unsafe operations.

> [...] leap from machine code [...]

Agreed :)

> [...] prefer to type [...]

Yeah, I haven't found a syntax for reading math that's better than infix; I don't actually think the postfix is much better than prefix, but maybe that's just me.

> [...] RPN syntax [...]

Oh, I actually think it does; it emphasizes function composition directly, and unifies it with sequencing statements in a single syntax!

> [...] formal methods [...]

Yeah, especially given the diversity of techniques under the formal methods banner, it's hard to get a good look at "the whole landscape." I think symbolic execution is getting to the point where it's workable for testing with a similar UX to property testing, just slower but with stronger guarantees (since all possible executions are taken, not just the ones that the RNG finds). Haven't actually built it yet though, so maybe Forth's flexibility will make it too hard, who knows.

If you wanna chat more about these things, I and at least one other person looking into formal methods + Forth hang out on #forth on Freenode.


Well, #forth on Libera.chat now :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: