More

djwatson24 · 2025-11-14T16:52:14 1763139134

Absolutely. Things that took hours or days to debug before take mere minutes once I have an rr recording.

djwatson24 · 2025-09-30T13:40:29 1759239629

Tail calling (with musttail)+ preserve_none are definitely the future of interpreters. Gets you ~95% of the performance of writing the vm in assembly, while keeping it high level. Unfortunately only clang and llvm support it so far, but hopefully we get some other llvm backend langs in soon!

djwatson24 · 2025-04-07T16:35:33 1744043733

Marc feeley wrote “using closures for code generation” way back in 1987. Everything old is new again!

It’s quite nice and only a small bit more code than an ast interpreter - but not as fast as a luajit- style tail calling bytecode vm, nor a baseline jit like copy and patch. It works wonders in languages that support closures though.

https://www-labs.iro.umontreal.ca/~feeley/papers/FeeleyLapal...

djwatson24 · on Nov 29, 2024

Can you go in to more detail on 'wacky register allocation tricks' or instruction selection needed to support nun-tagging? Or pointers to code somewhere? Would be nice to compare some of them to the paper.

djwatson24 · on May 12, 2023

Deegen is my research meta-compiler to make high-performance VMs easier to write. Deegen takes in a semantic description of the VM bytecodes in C++, and use it as the single source of truth to automatically generate a high-performance VM at build time

djwatson24 · on Oct 25, 2022

This is great, thanks! Fyi I think you have the numbers for the years reversed

djwatson24 · on Oct 18, 2022

> and that relies on having sufficient test and fuzz coverage

At the faang I worked at, some small portion of servers ran the sanitizers in prod, so you’re not reliant on test coverage nearly so much for catching rare issues.

djwatson24 · on Jan 8, 2022

Binary trees is a classic GC benchmark - ideally a GC'd language should be able to do better than a malloc/free implementation in C (tree.c) since it can release the whole tree in bulk (like what the ptree.c does).

Also note that tree.c speeds up substantially if you link with a better malloc like jemalloc or tcmalloc.

djwatson24 · on Nov 26, 2021

I agree the optimization looks backwards to me - in most of the microcontrollers I work on, the flash program space can easily be multiple megabytes, but the ram is usually the limiter around 128-512k.

feeley · on Nov 26, 2021

The introduction in the paper explains the motivation for Ribbit:

The use case which has motivated our work is code mobility where an executable program can be embedded in a document, email, or website. In that use case the size of the program must be small to minimize the transmission or loading time or to satisfy space constraints, such as the size of a disk boot sector, the URL length limit and the UDP packet size. On the other hand, the space used while the program is executing is of secondary importance.

So the main target application is not microcontrollers, even if it can obviously be used for some microcontrollers (for example a 1-2$ ESP32-C3 has 400 KB of RAM).

One of the possible applications of Ribbit is to implement the RVM (Ribbit VM) in the Excel spreadsheet formula language (which will soon be turing-complete with the addition of the LAMBDA form) to be able to program spreadsheets in Scheme. The execution environment has lots and lots of RAM, but you want the .xls file itself (which contains the RVM) to be as small as possible so it can easily be sent in emails, etc.

pumanoir · on Nov 26, 2021

Can Ribbit w/ the max lib run everything from SICP? It'd be really cool to have a self contained (html or other text) file of the SICP book.

feeley · on Nov 26, 2021

Ribbit's max library contains most of the R4RS predefined procedures (even call/cc). The main things missing are variadic procedures (unfortunately this includes the "list" procedure so you need to replace (list 1 2 3) by (cons 1 (cons 2 (cons 3 '())))) and also all the file opening operations. There is read-char, write-char, etc but they only do console I/O. These things could be added at the price of a larger footprint. One simple way to do this is to have the RVM run the code of a meta-circular Scheme evaluator written in Scheme.

If you are interested in building a a self-contained HTML document that embeds a full featured Scheme implementation, you should look into Gambit. Check out https://try.gambitscheme.org

pumanoir · on Nov 28, 2021

Thanks. I will check it out.

djwatson24 · on March 24, 2021

D has three fully functional compilers on Linux, and is probably in better shape than python, rust, even c++ in that regard.

qznc · on March 24, 2021

There is only one frontend though.

The advantage of multiple compilers is that people find subtle differences which humans miss by only reading the spec. Since D has only one frontend, it does not get that benefit although it can technically claim to have three compilers.