Hacker News new | past | comments | ask | show | jobs | submit login
Bootstrapping a Forth in 40 lines of Lua code (twu.net)
122 points by anothername12 11 months ago | hide | past | favorite | 43 comments



I 'learned' Forth decades ago (on a Sinclair ZX81, so essentially a quite horrible experience), and tripped over it again recently.. ending up ploughing through the books "Thinking Forth" and (currently) "Starting Forth" both of which are extremely interesting and so very very different to what we use these days. It very much fits into that saying "A language that doesn't change how your think about programming, isn't worth knowing". I realise now just how shallow an understanding of Forth I had as a teen, back in the day ;)

But I think the win here - these days - for Forth is in things like "running on small embedded devices".. I'm looking with interest at some of the activity around say Arduinos and am thinking that Forth is a rather nice fit for that sort of system. Have bought a kit, and am looking forwards to some non-work quality time to play further :)

(The Books I meantion can be found online very easily, but you'll forgive them for being written back in a time when systems were _simpler_ to grok ;) )


> I think the win here - these days - for Forth is in things like "running on small embedded devices"

In principle that's not a bad idea, it's just that you're at least 40 years late to the party. FORTH was and propably is used in embedded systems for most of it's existence. Often simply with a minimal text console over UART, that being your "way in". Once you've got that going you can start builing up your system from the inside.


Being late to the party is the story of my life, but fwiw, I was aware that FORTH was used a lot in this kind of use-case ;)


Artic Forth? I had a lot of fun with that. https://worldofspectrum.net/item/0031896/


I think it was, actually, yes. Thanks for the reminder!


Maybe some stuff are hard to perceive as a teen though. I used turbopascal, had a book, but I could't foresee the value of some types or ideas. Forth can be even more alien in a way.


Is there any Forth, that properly handles unicode strings, by which I mean, that it has sensible string length, substring and such functions or some standard library that has them? Basically all the usual stuff one needs. I tried to use GForth a while ago, but how to use strings properly was a mystery. I don't want to have to code all string functionality myself. Well, neither do I know how, nor do I want to. I guess I would have to study other languages' implementations and some standard documents or so for that, when actually I am doing another project. A gigantic rabbithole that would be. Afaik there is more planned in the 1.0 release of GForth.

Reading line after line from a file was also difficult. I could not figure out, how to clear the storage used for the previous line properly. When I read a shorter line after a longer line, the storage would still contain the longer part of the longer line after the content of the shorter line. Perhaps there is a Forth that makes this easy?

But it also needs to support that library, FFL, so that I have useful data types, and don't need to roll my own arrays or something ...

Maybe my design philosophy is fundamentally different from what is needed for Forth?


It'd be interesting to see how compact the binary form is, and what external dependencies it has.

Forth's selling points are mostly how compact a minimal environment can be, and how easy it is to extend that environment to do more useful things. The 'shell' is the compiler.


Trying to get an overview of which languages allow for tiny implementations of their core features. Like, max. ~10KB binary size (but preferably a few KB's). This is assuming basic IO is trivial, like reading/writing to some memory location to get/print a character on text console.

Forth is an obvious one. Basic another. There's some tiny Lisps out there. And Tcl.

Any other languages that go that small?


Binary Lambda Calculus is absolutely tiny [1], with a 29 byte self-interpreter (43 byte for the byte oriented version), and absolutely extensible, since it can transform into any other language.

[1] https://www.ioccc.org/2012/tromp/hint.html


There are a lot of languages that are actually pretty small if you just boil them down to their primitives. Prolog and most other rule engines, especially CLIPS, jump to mind. ANSI C and SML. SML in particular is cute if you use your tiny Prolog to implement the type system ;-)


> SML in particular is cute if you use your tiny Prolog to implement the type system.

That sounds pretty cool; do you know of an implementation of SML that fits into 10 KB of memory because it uses tiny Prolog?


I do not. As far as I'm aware, SML is much like C in that despite being a relatively simple language, there aren't many toy implementations floating around.


One thing you quickly run into with this exercise is needing to establish the boundaries around "programming language" which can be kind of fraught.

For example miniKanren is tiny and is used as a DSL, but is capable of expressing any computation so does that count? Some type systems are turing complete and a naive implementation could be tiny. Is conway's game of life a programming language. You could probably strip a bunch of useful stuff out of sqlite's sql parser and get it below 10k. etc.


> One thing you quickly run into with this exercise is needing to establish the boundaries around "programming language" which can be kind of fraught.

Well I was referring to what one could call a souped-up macro assembler: direct access to a machine's hardware, but with some (but important!) conveniences of higher-level languages.

Eg. many Forth systems provide a read-eval-print loop (REPL), allowing users to re-define Forth words on the fly, poke some memory mapped IO, etc. But still do so in a relatively comfortable manner using Forth words rather than (only) through editing assembly code.

The mapping (the "language", if you will) between Forth words, and assembly code that implements them, is fairly straightforward 1:1. And thus, allows for small implementations. Like, fit comfortably in a 16-bit address space.

Many esoteric languages do not fit this description, since they often discard the convenience factor.

C is on the fence here. It's got many of these aspects, but small-ish? Hmm.. And stripped down to a subset, is it still C?


You have to draw a line somewhere, and it's necessarily somewhat arbitrary.

Mine is the size of the runtime, usually. So the Forth is the size of Lua + the 40 line script, minKanren is the size of the .scm file + the Scheme implementation, and so on. This ignores the host environment, which I think is fair: generally these will be enormously larger than they would actually have to be to support the language.

I do award subjective bonus points to languages which actually have a freestanding runtime, that is, I could actually compile them and run them on a bare chip. Even then, I don't count the microcode which implements the instruction set of that chip, or the Verilog/VHDL which is needed to tape it out.

Usually! ColorForth on GreenArrays gets even more bonus points, because not only is the language exceedingly compact, and freestanding, but the VLSI tool used to create the chip is also tiny to an almost unbelievable degree. So it wins some kind of prize in the special "total system complexity" category.


> I do award subjective bonus points to languages which actually have a freestanding runtime, that is, I could actually compile them and run them on a bare chip.

That's what I was getting at with "trivial IO": when you ignore what's needed to get data in/out, and optional parts of a standard library (if included), how big is the remainder, the implementation of the language constructs?

(Potentially) standalone, self-hosting & small is indeed the bonus points I'm looking for. A tiny Basic or Forth running bare metal on a uC could be it. A 'tiny' language implemented using a multi-MB VM is not.


It seems we're basically in agreement then. I do want to point out that the weight of the Lua 5.4 standalone binary on my machine is 70KiB.

Yes, you read that correctly. Lua is truly amazing in terms of the amount of power you get for how little there really is. One of the reasons it's faster than most other VMs with no JIT functionality is because the entire VM fits in a typical L1 cache. My laptop's L1 is 192KiB, meaning it could fit 122KiB of program state in with the VM without breaking a sweat.


Apparently, the B programming language from 1969 fit into 8 KB of memory: https://www.bell-labs.com/usr/dmr/www/chist.html

After a rapidly scuttled attempt at Fortran, he created instead a language of his own, which he called B. B can be thought of as C without types; more accurately, it is BCPL squeezed into 8K bytes of memory and filtered through Thompson's brain.


I need to do something with either of these languages. I've learned Forth before but never did anything with it. I should learn Lua as well.


The two Forths that I've had my eye on:

https://www.gnu.org/software/gforth/

And the inventor of Forth has a very power efficient multi-computer chip that can be programmed with colorForth or array forth (he claims most power efficient):

https://www.greenarraychips.com/home/products/index.php

https://www.greenarraychips.com/home/documents/greg/cf-intro...

Here is a video about the chip, and some cool Forth demos with it:

https://youtu.be/0PclgBd6_Zs?si=nThy9TnGl9rJX5_u


If you're looking for something to do in Lua, I suggest making a game with the Love2D framework [love2d.org]!


I've had my eye on it for sure!


I've had a very enjoyable experience with Lua, and kaguya for binding it to C++. Match made in heaven, for the most part (:

PUC Lua also has a very solid coherent C implementation; it's basically a masterclass in making an effective, portable, but also necessarily complex C library.


Lua is definitely a pedagogic language and project all the way to its roots. I often feel like I’m taking a class on compilers when using it. :)


Tradition has it that the thing to do with Forth is implement your own toy and then call it a day. Usually not having learned how to actually program in Forth. Yes I'm bitter.


Forth is super interesting, and worth learning, but it seems like an evolutionary dead end. Even Chuck Moore doesn't do much Forth these days, because he doesn't like having to keep track of a whole bunch of details that keep getting changed between processor families.


Chuck Moore designs processors, seems to me he's got those details handled. Last thing I heard he worked on was etherForth, but the guy's 85 years old, so he might just have quietly retired by now.

Forth was never going to be the next Java or Rust, but in its own little niche, it's fantastic. ANS Forth isn't getting any new features anytime soon, but meanwhile forth spawned a whole family of interesting languages like Joy, Factor, and ColorForth. Besides, evolution is overrated: tardigrades are still doing fine :)


> evolution is overrated: tardigrades are still doing fine :)

The evolution of tardigrades as super fascinating: https://eartharchives.org/articles/tardigrade-genome-reveals...


Chuck gave up doing Forth in assembly for modern processors a number of years ago, but developed the GreenArrays chips that are apparently like 144 computers on a chip. He programs that in some version of ColorForth and has done several talks on it. I have no clue if there is any value because I can't understand it's use case. It is really cool though that when he builds an application, he typically understands everything from the machine level all the way through the software and he designs them both to some kind of optimal overall system. He's got to be one of the only people alive who can still claim that I'd guess.


> 144 computers on a chip

Ugh, "computers". I'm not even a stickler for terminology, but the whole "144 computers" thing somehow rubs me the wrong way, like it's some Best Buy salesdroid telling me that a quad-core means having four computers in one box.


It's not 144 CPU cores. It's more like 144 Forth computers, with their own memory and four ports to adjacent units, making up a network mesh where each unit can also make use of both stacks and RAM of its neighbours.

So it's not like a computer with 144 1-core CPU:s sharing RAM and ports, or a CPU with 144 cores, it's something weirder and more reminiscent of a network of small computers.


Given the small amounts of compute and memory you get on each node (it’s certainly no Adapteva Epiphany or even Cell), it seems to me like it’s more of a replacement for a (low-speed) FPGA than a CPU or even a micro.

And that’s not a bad pitch! Modern silicon certainly doesn’t like to be bent into a (single-layer) FPGA shape—the result is comparatively slow, expensive, and power-hungry—but accomodates specialized functional blocks very well, so a matrix of baby processors with accompanying memory sounds genuinely like a good idea for an FPGA-like application.

Unfortunately, the GA144 is the only game in town here, and at $500 per dev kit they sure aren’t interested in hobbyists. So I’m left to admire it from a distance. (And $20 per chip is not a bad price considering the competition—as I said, FPGAs are expensive—but it still falls into the expensive-chip bucket. Not a lot of places are going to build a design around a unique expensive chip from an obscure source.)


Chuck Moore has been pretty clear that he didn't come up with this as a replacement for x86 or ARM CPU:s. The way he talks about it is closer to what you hear among permacomputing folks, it's a platform for relatively fast general computation in low-energy settings.

I've never implemented a gaussian blur on bitmaps with a FPGA, maybe it is simpler somehow than this example: https://www.youtube.com/watch?v=iwM0qfQqmdE


Yep. Thanks for explaining it far better than I could. I recall Chuck referring to applications like "intelligent dust" or something like that. He's clearly thinking very long-term (almost scifi) kinds of applications.


An understandable reaction, and one I can claim to feel myself in similar salesdroid situations, however they have an explanatory sheet which reiterates the claim: https://www.greenarraychips.com/home/documents/greg/PB002-10...

They honestly do sound quite interesting, and if I could fork myself off a few times to look at fun & interesting things, this GreenArray stuff would be in the list ;)


As eval() was removed from Lua, you must reimplement it:

  ```
  function eval(s)
    load(s)()
  end

  ```


For those who don't know Lua very well (like me), here is a link to the reference manual.

https://www.lua.org/manual/5.4/manual.html#pdf-load

A simple example that I tested:

   local chunk = "return 2 + 2"
   local func = load(chunk)
   print(func())  -- Output: 4


loadstring lwas replaced with load, and dostring was removed, but it's basically just assert(loadstring(code))(...)

load is the replacement.

i assume the grandparent meant dostring and not eval?


While a great exercise and kudos to the author, I think languages like Forth and Lisp are much cooler to bootstrap from Assembly.

Anyway, thanks for sharing.


Assembly? Hah!

It only counts if you use a magnetic needle and a very steady hand.


From that point, first we need to mold some gears.


Lua and Forth is like pb&j.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: