No Compiler – On LLVM, and writing software without a compiler

pcwalton · on March 4, 2016

> And actually we’re now entering a world where at minimum supporting LLVM as a backend option is mandatory.

I would have said this too a few years ago, but Go is a huge counterexample. It's become very popular without using LLVM, and the compiler that does use a mature, optimizing backend (gccgo) doesn't get nearly as much use as the Plan 9-based one.

I should also mention that increasingly it's become popular to have a mid-level IR that you lower the AST to before lowering to LLVM IR. GHC does it, Swift is doing it, and Rust is working on doing it. The reasons are (a) you can do language-specific optimizations like devirtualization, which are often pretty hard to do in LLVM; (b) going from AST to LLVM IR is often a big, messy leap for any language that isn't C.

cokernel_hacker · on March 4, 2016

It's actually quite messy for C as well. I guess I should qualify that: it's messy for C as used in practice as well.

#pragma weak, asm labels, always_inline, debug information, attribute regparm, inline assembly, etc...

All of this is doable via incremental lowering from the AST (this is how clang does it) but it is fairly painful.

Even the relatively banal:

  enum e;
  void (*fp)(enum e);
  enum e { v = 1ULL << 32 };

results in quite a bit of gymnastics! (https://github.com/llvm-mirror/clang/blob/202433cccd87a4e7d1...)

klibertp · on March 4, 2016

Go is an outlier, a social phenomenon brought about by just the right combination of people, companies and the niche it occupies. Something like Go won't happen again in the near future, not at this scale for sure.

Which can be a bad or good thing, depending on how you decide to look at the issue.

exDM69 · on March 4, 2016

> I would have said this too a few years ago, but Go is a huge counterexample.

I've been wondering about this ever since Go came out. If I recall correctly, LLVM was already a viable option back then.

The latest release of Go earlier this week featured a pretty big revamp of their backend with a new SSA-based middle/backend. I wonder how much work went into that and could they have achieved that faster had they been on LLVM since day 1.

akavel · on March 4, 2016

There's an "official answer" in the FAQ [1] ("We considered using LLVM for gc but we felt it was too large and slow to meet our performance goals." Note: lowercase gc refers to Go compiler, whereas uppercase GC would refer to Garbage Collector). That said, I seem to recall other additional/more detailed reasons were given on various occasions too, including:

- ease of iteration and freedom to make huge breaking changes (and they did make use of it, see e.g. various generations of the GC, or the SSA work);

- thorough knowledge of the internals of the compiler they used (given they were it's authors; also this is linked to the reason I listed above);

- one of the explicit goals for the Go toolchain was that it must compile very fast (which I believe is also somewhat at odds with advanced optimizations, like those used in LLVM) - they even quipped something like "we designed Go while waiting for C++ to compile";

- additionally, though I'm not sure if it was ever spelled explicitly, I suppose they wouldn't want to get tied to the LLVM as a huge dependency, including having to adjust to each and every change in LLVM API, regardless if sensible or not (I seem to recall from so me articles that LLVM does introduce breaking changes in API from time to time?);

[1] https://golang.org/doc/faq#What_compiler_technology_is_used_...

exDM69 · on March 4, 2016

Yes, GC is a big issue... or was back when Go was just getting started. GC support in LLVM should be MUCH better now. Ie. it has everything you need to hook up a GC in it, but there isn't a built-in GC in it.

And the fact that the first people working on Go have a good knowledge of the Plan9 toolchain probably gave them a head start. But that benefit has probably eroded long ago.

And I completely disagree with the Go team's emphasis on compile times. Fast result programs are far more important than fast compile times. Fast compile times are nice to have, but not much more than that. But this is just my opinion, it's as good as anyone else's.

iampims · on March 4, 2016

Fast compile times let you use Go as a scripting language, which has been key to its adoption from the python/ruby community.

Fast compile times also let you run your test suite every time you hit ctrl + s.

Fast compile times also let you run tools like gomft without feeling like it is slowing you down.

I agree that fast programs are essential and (ideally) not mutually exclusive with fast compile times.

pcwalton · on March 4, 2016

> I agree that fast programs are essential and (ideally) not mutually exclusive with fast compile times.

They are. It is always a tradeoff.

witty_username · on March 4, 2016

> which I believe is also somewhat at odds with advanced optimizations, like those used in LLVM

You can configure the optimization level to 0, then only constant folding is performed when the IR is built, and nothing else is done.

amelius · on March 4, 2016

I'd like to know what is the downside of using LLVM. Why did they choose to write their own optimizer?

Anyway, the most difficult part of writing the compiler was probably the garbage collector.

fmstephe · on March 4, 2016

Russ Cox gave a really good answer to this question. I have to paraphrase because I can't find the original comment.

It was done for implementation speed. They had a compiler backend which they, the early Go team, were all familiar with. It was small so they could quickly change it to suit Go, rather than the other way around.

Russ Cox expressed the sentiment that if they had tried to use LLVM there would have been a lot of up front work that would have taken a long time and the Go project probably would have fizzled out. Using the plan 9 tool chain they were up and running very quickly allowing them to focus on the most important work, designing the language itself.

(Probably shouldn't attribute this to Russ Cox, I've probably read into his (rather briefer than what I wrote above) comments)

In my opinion this was the right choice. If LLVM turns out to be the best backend for Go it's easy to use it later. However, the language itself is largely fixed after 1.0. Better to focus on the language than the compiler backend which can be swapped out later.

pjmlp · on March 4, 2016

> I'd like to know what is the downside of using LLVM.

Depending on C++ code and keep alive the belief that you can only create compilers in C and C++?

As many without experience in compiler development think.

Also, and this is specially painful when compiling Rust or Swift from source, saving themselves the pain of waiting around 4h for a clean compile.

kibwen · on March 4, 2016

LLVM does take a long time to compile, but four hours is an exaggeration. A mid-range computer can compile it in 30 minutes with -j4, and a good computer can compile it in 10 with -j8.

pjmlp · on March 5, 2016

Core duo netbook with 4GB RAM and 5400 RPM hard disk.

I can assure you it really does take 4 hours.

And to bring the point home, I can link you to the Swift issue where I was discussing this matter.

Manishearth · on March 7, 2016

Heh, I was in this same boat two years ago. Servo used to build its own Rust back then, and that involved building llvm. I hadn't yet discovered ccache[1] and I had an old 4GB RAM laptop with a Core something processor. Took 4-6 hours total IIRC. This was also a really old laptop, so that was understandable. But very very annoying, since sometimes I'd need to rebase over a rustup and then wait hours for the build to work. This was a large part of my impetus to move servo to a state where we download Rust snapshots.

[1]: Btw, if LLVM compile times are getting to you, Use. Ccache. It is amazeballs.

bitJericho · on March 5, 2016

A core duo isn't even in the realm of mid-range. That's very nearly a 10 year old processor!

pjmlp · on March 5, 2016

Yeah, but it works perfectly well as a travel netbook for coding on the go.

Also I never had 4h builds with single core processors back when I was working full time with C++. The slowest ones were around 2h.

A core duo would run circles around those computer systems, hence my dismay how much hardware LLVM requires for building.

bitJericho · on March 5, 2016

I would imagine compiler design has become much more complex in 10 years, and a low end i3 is something like at least 3 times faster than a core duo.

pjmlp · on March 5, 2016

I would agree if other compilers, not LLVM based, took as much to compile.

Also that is surely a side effect how they are using C++, which desperately needs modules.

pcwalton · on March 6, 2016

Those compilers don't do nearly as much optimization as LLVM does. It's hard to overstate how wide the gap between the classical compilers you like and LLVM/GCC truly is. LLVM has no fewer than three IRs, each with a full suite of compiler optimizations (four if you count ScheduleDAG). LICM, just to name one example, runs at each stage (see MachineLICM.cpp). Turbo Pascal did nowhere near that level of optimization, and it produced unacceptably slow programs by modern standards.

markokrajnc · on March 4, 2016

Maybe the biggest issue is optimizing register allocation while providing the information for all roots for the garbage collection... For non-GC languages you don't need to care about it and just agressively optimize register allocation. For GC languages you have to be careful, because the GC must know 1) which registers are holding references and which not 2) which of those registers are "alive" and which are not in use anymore (and therefore ready for garbage collection)...

pcwalton · on March 4, 2016

LLVM has full support for precise garbage collection.

aikah · on March 4, 2016

Is LLVM that well supported on windows ? Go aims at running on as much platforms and it seems to me a lot of languages built with LLVM don't run on windows (Swift,Crystal,...).

kryptiskt · on March 4, 2016

LLVM works well on Windows and can even be built with Visual Studio.

mafribe · on March 4, 2016

    Rust is working on doing it

That's interesting. Is this still MIR [1], or has this evolved? The MIR design document is from 2015.

[1] http://blog.rust-lang.org/2015/08/14/Next-year.html

eddyb · on March 4, 2016

Yes, there are no immediate plans for anything other than MIR at the imperative-semantic level (the compiler also uses HIR, which is mostly a desugared AST and not related to LLVM IR).

Currently, initial support for lowering MIR to LLVM IR is close to complete and it might be enabled in the next beta (or the one after that).

None of the analysis passes have started their transition to MIR, though (there is a well-typed check to catch bugs in MIR generation, but that's about it).

There's also talk of using MIR for compile-time (pure) function evaluation [1] and some out-of-tree experimentation with interpreting it [2].

[1] https://internals.rust-lang.org/t/mir-constant-evaluation/31...

[2] https://github.com/tsion/miri

mafribe · on March 4, 2016

Thanks.

MIR looks a bit more low-level than Haskell's Core. Like Core, MIR is typed, but unlike Core, match statements disappear. This is interesting, because matching is really the main computational engine of Core. I wonder if there's a good explanation why it was deemed advantageous to desugar match.

I also wonder if there are plans to integrate the MIR interpreter into the compiler. IIRC, interpretation of Core between optimisation phases is used to good effect in Haskell.

eddyb · on March 4, 2016

Depends what you mean by "match" statements: if you're referring to complex patterns, yes, they get desugared.

If you're instead referring to matching over ADT (enum) variants, that is a MIR primitive, the "Switch" terminator (there is also a "SwitchInt" for integer-like values, as in C or LLVM IR).

As for the interpreter, it's quite likely that we'll use (something like) it for evaluating constants, but not entirely certain atm.

I personally hope we can get the CTFE semantics I described (see [1] above), which despite being pure, go far beyond C++17 constexpr capabilities (e.g. you could potentially run an entire compiler for a different language at compile-time).

mafribe · on March 4, 2016

Aha, I had overlooked the switch stuff, so yes, it's like Core.

I imagine that if Rust plugs in full CTFE then a lot of compiler optimisations can be written at the MIR level, which should simplify the compiler.

eddyb · on March 4, 2016

We don't really have our own optimizations atm, but we do want MIR optimization passes.

The benefits would be two-fold:

We could do transformations LLVM can't figure out itself, like NVRO: `fn bar() -> T {let mut x = ...; foo(&mut x); x} ... box bar()` can (theoretically) end up calling `foo` with a pointer to the heap instead of having to copy `x` from the stack later.

And we can lift the burden of LLVM trudging through monomorphized instances from MIR: since MIR still has type parameters, you can transform it and simplify the resulting LLVM IR for all instances, e.g.:

mem::replace(mut_ref, x) is {mem::swap(mut_ref, &mut x); x} which can be reduced once to {Return = * arg0; * arg0 = arg1;} in MIR and translated to a load, a store and a ret (for immediates) or two memcpy's (for indirect argument & return), depending on the type.

As most of the time rustc spends is in LLVM optimizations, this will result in compilation time speedups linear in the number of instances, sometimes exponential in the size of the original code.

mafribe · on March 4, 2016

All true, but there is more. SPJ and S Marlow [1] list the following additional advantages of using a typed intermediate representation, which would/could apply to MIR too.

1. Running a type-checker on the IR "is a very powerful consistency check on the compiler itself. Imagine that you write an 'optimisation; that accidentally generates code that treats an integer value as a function, and tries to call it. The chances are that the program will segmentation fault, or fail at runtime in a bizarre way. Tracing a seg-fault back to the particular optimisation pass that broke the program is a long road. Now imagine instead that we run [the type-checker on the IR] after every optimisation pass: it will report a precisely located error immediately after the offending optimisation. What a blessing [...] in practice we have found that it is surprisingly hard to accidentally write optimisations that are type-correct but not semantically correct."

2. Running a type-checker after translation to IR serves as a "100% independent check on the type inference engine".

3. A high-level typed IR is a "[s]anity check on the design of the source language."

All quotes from [1].

[1] S. Marlow, S. Peyton-Jones, The Glasgow Haskell Compiler. http://www.aosabook.org/en/ghc.html

eddyb · on March 4, 2016

That's what the well-typed check I mentioned before is, see https://github.com/rust-lang/rust/pull/31474

pjmlp · on March 4, 2016

> I should also mention that increasingly it's become popular to have a mid-level IR that you lower the AST to before lowering to LLVM IR.

I would say that is just mainstream compiler development catching up with how mainframes work.

The idea of using some form of IL that gets JIT or AOT compiled goes back to the mainframe years.

IBM i (aka OS/400) and Unisys Libra (aka Burroughs) are probably the best examples still alive.

Hence why Apple, Google and Microsoft ideas of distributing IL instead of pure native code in their OSes, isn't actually that new.

ericfrederich · on March 4, 2016

Good point. Also, there is CoreCLR which is now open source and may compete with LLVM on some levels.

erichocean · on March 4, 2016

You may be better off building up the LLVM IR as text, vs. using the FFI builder. (If you want/need type safety as opposed to just waiting for the LLVM verifier to tell you what went wrong, you could write the builder in Lua.)

Here's a link with a Python example (along with a justification for this approach): http://eli.thegreenplace.net/2015/building-and-using-llvmlit...

drmeister · on March 4, 2016

So I've done basically the same thing with a few differences: (1) I chose Common Lisp as the language I would implement (2) I used C++ template programming to automatically generate wrappers for C++ library functions (like boost::python and luabind) so that I could expose C++ functions directly to Common Lisp. That allowed me to use the llvm C++ API directly.

The project is called clasp -> github.com/drmeister/clasp Send me a message - we should talk and compare notes.

vbit · on March 4, 2016

Instead of generate LLVM code, you could instead generate 'typed Lua' using Lua as a metaprogramming language. Then the 'typed Lua' can get compiled down by LLVM into object code.

This is what http://terralang.org does.

fizixer · on March 4, 2016

I never understood in simple terms what terralang is actually about. Even in your sentence, if typed Lua is being "generated" (i.e., it's the target) how come Lua is being used as a metaprogramming language (i.e., a language that does the generation).

I understand generative metaprogramming as consisting of a language that a human deals with, a language that gets generated. In the case of Terra/Lua it's not clear which generates and which is generated, or whether it's is a completely different approach (to which I say, what's the point).

On the Terra webpage, it says 'Terra is low-level' but then 'Terra is code embedded in a Lua meta-program', not to mention 'Terra-Lua programs'.

andolanra · on March 4, 2016

Terra is lower-level, and on evaluation compiles to object code. Lua is higher-level, and on evaluation compiles to bytecode before being executed. A source file written entirely in Terra would produce a binary, and a source file written in Lua would produce Lua bytecode. In either case, you can run the result immediately, or save the resulting object code/bytecode to run later.

The strength of the two is that Terra can be manipulated using Lua as a metalanguage: for example, you could write a Lua function which produces Terra code, which is then executed. In some sense, this is like C, which has a lower-level language (C) and a corresponding high-level language which can manipulate that lower-level language (the C preprocessor). In this case, though, Lua is much more powerful and flexible than CPP, and Terra is also a smaller, simpler language than C.

It gets more powerful, though, because the Lua parts don't just have to exist at compile-time. You could use Lua at run-time to produce Terra code, which in turn can get executed immediately, producing object code specialized to a given situation or machine: for example, you might write a language, use Lua to parse it, and then convert it to Terra to produce object code which is then invoked immediately: this effectively just-in-time compiles your code. Or, you could write an algorithm in Terra with specific constants that aren't known until run-time, and then use Lua to stitch those constants in before running the algorithm: for example, to take advantage of specific machine features or parameters.

These examples also assume that you have only "compile" and "run" stages; you could also write programs with more stages: for example "compile source", "specialize for machine", and "run" could be three distinct stages. Terra/Lua would afford you the flexibility to stage your computations in this way.

kungfooman · on March 4, 2016

Terra is on my radar for a long time, but the Lua syntax prevents me every time to actually use it.

Basically making a simple foreach loop in Lua escalates everytime into a hour trial-and-error session.

klibertp · on March 4, 2016

Here:

    for var_1, ···, var_n in explist do block end

use `pairs(a_table)` to get `key, val` variables bound respectively. Use `ipairs` to get a loop over indices and values. This is very similar to Python, only with "do/end" instead of indent/dedent. What you can't do in Python, but can in Lua is to provide less variable names than the iterator function returns. For example, to get a list of keys when iterating over `(key, val)` pairs in Python you'd have to do this:

    for key, _ in some_dict.items()

While in Lua it's less explicit:

    for key in pairs(some_table)

...and that's it.

haberman · on March 4, 2016

The terminology seems to be tripping you up (and I agree these terms aren't super clear). I think it's easiest to understand this through examples. Say you wanted to create a specialized function in pure Lua:

    function make_adder_func(n)
      return function(x) return n + x end
    end

    local add3_func = make_adder_func(3)
    local result = add3_func(6)

    -- Prints "9"
    print(result)

This totally works: because Lua supports both closures and first-class functions, the parameter "3" is captured in the environment of the inner function, so we have a specialized function for adding 3 to any other number.

Now the way Lua will execute this isn't super optimized. It treats "n" as a local variable which could theoretically be modified (it's not here, but proving that takes more of an optimizer than most dynamic languages possess). Also Lua doesn't even statically know the type of "n", so the "+" operation could be string concatenation, or even a user-defined function for the __add metamethod if "n" was a custom type. So even though it looks like the add3_func() is pretty specialized, it's actually not specialized at all in practice.

(For the record, LuaJIT is smarter about this and does have a trace compiler + optimizer that makes this extremely efficient).

But what Terra gives us is the ability to generate fully specialized and optimized functions at runtime. In Terra we could write:

    function make_adder_func(n)
      return terra(x : int) return n + x end
    end

    local add3_func = make_adder_func(3)
    local result = add3_func(6)

    -- Prints "9"
    print(result)

Unlike the Lua function, the Terra function is strongly typed and does not contain a closure over its environment. The inner Terra function isn't referring to the Lua variable "n", it is taking the value of n and making it a constant inside the Terra function. This allows the Terra function to be more thoroughly optimized (more like what LuaJIT would do).

So getting back to your question, the phrase "typed Lua is being generated" is a little confusing. What it means is that these Terra functions are strongly typed, and we're using Lua code to programmatically generate Terra functions, like in make_adder_func() above.

cturner · on March 4, 2016

I know less about this than the other people replying, but will chip in because I have a different perspective that you might find useful.

When I saw what this guy was doing, my first reaction was also "that's like Terra!"

Here's how developers work at the moment: You write some code. Then - you /fork a process/ that reads your code, and puts it into a pipeline to produce machine code or bytecode. It's a ceremony. There's all kind of obscure little technologies to know about - variables defined in your shell language, makefiles, the peculiar arguments to specify when you invoke your compiler.

This article and Terra show that there might be a simpler path than this available. In Terra, you describe a system of code using lua structures. (They could as easily have been s-expressions). Terra then abstracts those into their structural components, and passes that tree to LLVM. And an artifact comes out. This is cool, because now you can easily integrate automatic code generation /into your application layer/.

What is happening here is not revolutionary in its mechanism. But you get access to the whole thing in a way that I haven't seen before. I think this will significantly change the way we think about programming.

You can see this writer struggling with multiple issues. And he's used lua, which is supposed to be an easy platform to integrate with. And he's having problems. Still, once these issues are worked through, I'd expect to see a a hacker language emerge that is a thin wrapper around the LLVM API and allows you to dance close to the LLVM API.

vbit · on March 5, 2016

So here's yet another attempt at describing terra :)

Firstly, 'typed Lua' really meant the terra sub language because it looks like Lua with types.

Going back to terra, consider that 'terra' is a language that can be compiled by LLVM into machine code. So you can choose to either compile and emit object code, or compile and immediately run the generated object code. The other choice you have is to either write terra code by hand, or generate terra code from Lua code.

You say 'if typed Lua is being "generated" (i.e., it's the target) how come Lua is being used as a metaprogramming language'. Why is that not possible? Many languages have built in metaprogramming using the same language (notably lisp code can generate lisp code).

Another way to look at this is Lua is the metaprogramming and control language for generating, compiling and running Terra code.

akavel · on March 4, 2016

I also had trouble understanding that from their description, but then at some point I think I got it. Is I understand, it's like if you used Lua (actually, a Lua-like language dubbed "Terra") in places where you'd normally use the C preprocessor, or C++ templates, or Lisp macros, or Rust macros, or inline assembly fragments in C/C++. While your "base language" that you'd use to write most of the code would be your regular old Lua. Does it make more sense when described this way?

edit: so, from reading the other comments around, I think the language roles are actually reversed vs. what I thought - i.e. Lua is the "macro language", and Terra is the "main language".

smaddox · on March 4, 2016

Essentially it's an implementation of the concept that metaprogramming should use the same semantics and syntax as normal programming. This is the case in Lisp, for example, but Lisp is not (or not typically) compiled, so Lisp metaprogramming is essentially limited to defining syntactic sugar. In Terra, metaprogramming (or multi-stage programming, as they call it), can actually result in improved performance compared to a typical C-like implementation of the same algorithm.

The website has a couple papers that provide benchmarks for some interesting test cases.

zeveb · on March 4, 2016

> This is the case in Lisp, for example, but Lisp is not (or not typically) compiled …

Say what? Lisp is often compiled; the standard extensively discusses compilation[1] and — to address your point re. performance-improving metaprogramming — compiler macros[2] specifically exist in order to advise the compiler, e.g. for performance.

So, no, Lisp macros are not limited to defining syntactic sugar. It's like I keep on saying: the Common Lisp standard from 1994 contains with its covers functionality that people still don't know about.

[1] http://www.lispworks.com/documentation/HyperSpec/Body/03_b.h...

[2] http://www.lispworks.com/documentation/HyperSpec/Body/26_glo...

neukoelln · on March 4, 2016

I think you are mistaken about Lisp. Most Common Lisp / Scheme implementations I know of are compiled.

nickpsecurity · on March 4, 2016

Most are compiled. Many also typically have types you can add in to get performance benefits of static typing. Add in compilation of individual functions and live update of running app to top off key benefits of LISP interpreters/compilers.

daniel-kun · on March 4, 2016

Been there, done that.

See https://github.com/daniel-kun/omni (look at the little gif in the “Status Quo” section, it will say more than 100 words).

I have the “Code Model” (that is what I call the AST, because there is no Syntax) up for a very little, C-like language, and I have a frontend that can manipulate parts of it. However, these are not yet combined, and I stopped working on it because I am in the middle of transforming my efforts into a Web App. This will allow you to edit and run a code model in your browser, and download a compiled binary when you are finished. Btw., you will be able to link code elements such as classes to online-ressources such as a task tracker, a diagram, a mindmap, etc. to have the whole ALM process embedded into your code base.

Sounds good? Sounds good to me.

smaddox · on March 4, 2016

Interesting article. Looking at the code, it's pretty clear that this is not a particularly practical approach, but I think it points in the direction of where language design should head. Abstractions shouldn't be forced, unless absolutely necessary; forced abstractions lead to inefficiency.

I've been working on a related concept, but more along the lines of developing a low-level IR (LLIR) and corresponding syntax that allows direct coding in the LLIR. It would essentially be a portable assembly language. The LLIR could be then be compiled down to machine code for different architectures, or even run directly in a VM.

This isn't too different from the LLVM concept, except the IR would be a practical programming language, much like assembly. Furthermore, the IR would provide access control primitives (reference capabilities) that most languages do not support, but that are extremely useful for low-overhead secure computing and concurrent computing. These primitives could be especially useful in the design of exokernel OS's, and related applications.

moonchrome · on March 4, 2016

>I've been working on a related concept, but more along the lines of developing a low-level IR (LLIR) and corresponding syntax that allows direct coding in the LLIR. It would essentially be a portable assembly language. The LLIR could be then be compiled down to machine code for different architectures, or even run directly in a VM.

Isn't this WebAssembly ? Bonus points they have sexpression based format and a binary format and are working on a LLVM backend for it + a bunch of JIT integrations, making sure it's portable and have a lot of people working on it from multiple companies.

fritzo · on March 4, 2016

Gosh, I thought C was "portable assembly". How would your LLIR differ from C?

eru · on March 4, 2016

C is not really portable assembly. It's full of undefined behaviour, for one.

couchand · on March 4, 2016

I've run into so many of these issues myself, trying to work through the forest of LLVM. The abysmal state of the generated docs and those cryptic segfaults are the bane of my existence.

So I started to slap together a little library that gives some order to the madness. I call it petard [0], for reasons that should be obvious if you've heard that word. At the moment I'm focused on JavaScript bindings, but the idea is to build a sensible object-based API that's simple enough to export pure C bindings to make FFI easy.

There's currently support for lots of the basics, but some significant issues too (lack of features as well as poor C++ style). Stern memory-management focused pull requests would be very welcome, and feature additions would be helpful, too.

[0]: https://github.com/couchand/petard

a-dub · on March 4, 2016

This seems like a fun way to poke around with LLVM but I don't see how this is any easier than writing down a grammar and feeding it into a parser generator...

"Writing a parser" sounds really annoying and complicated, but in my admittedly limited experience with ocamlyacc, it was easy peasy and dare I say it, fun!

The hard part I guess was defining the language and maybe that's what he's trying to avoid.

But even still, scripting around the AST building functions in LLVM and coding directly to them seems way more annoying to me in the long run. (especially the writing programs part)

But who knows, maybe I'm missing the point...

cturner · on March 4, 2016

Here's the potential I see. Imagine if it was routine to write an application stack entirely in a repl language. Not just the application code - everything. You'd write your application code. Then other functions would dispatch the tree of application code to the LLVM API where it would be compiled. Then other functions could take the resulting artifacts and push them through to deployment.

Doing this at the moment would involve writing fiddly wrappers around lots of layers of technology - dynamically constructing makefiles, and shelling out to call things.

witty_username · on March 4, 2016

> But even still, scripting around the AST building functions in LLVM and coding directly to them seems way more annoying to me in the long run. (especially the writing programs part)

Using the IRBuilder to build IR is quite easy in my experience.

poseid · on March 4, 2016

I experimented with the gypsum programming language based on Python a while back: https://github.com/jayconrod/gypsum

It indeed is great fun to see recursion at work - yet, I had my first sleepless night due to high concentration of grasping what was going on.

barrkel · on March 4, 2016

There's an enormous lump of complexity that this chap seems to be unaware of: the type system. This is the semantic analysis bit that he seems to think is done by LLVM, but it isn't.

It's like he mostly thinks of compilers as turning statements and expressions into executable code, but IMO that's one of the easier bits. Defining how different values interact, what operations are legal and aren't, and the runtime support - this is a lot of work, and in particular, it's different for every language, because it's what makes languages distinct.

amasad · on March 4, 2016

I think she's a woman not a "chap". But anyways, you'd be relying on the IR semantics and checking to do the legwork for you. LLVM IR is a language after all.

barrkel · on March 4, 2016

It is typed at a very low level. It doesn't cover interesting data structures, like hashes, vtables, closures, etc.

benjismith · on March 4, 2016

This is the most kickass thing I've read all day.

prewett · on March 4, 2016

Seems like what he really wants is an LLVM assembler.

lmm · on March 4, 2016

Yeah. LLVM IR is the most suitable representation for doing this kind of thing, but it's the one that's not compatible across LLVM versions. Maybe it would be worth defining an alternate format for the bytecode that can be read and written by external tools?

greydius · on March 4, 2016

Excellent post highlighting the problem solving skills and tenacity of a talented engineer. Of course, this is also the "duct tape and paper clips" approach to writing software that I certainly would not advocate for production systems.

driusan · on March 4, 2016

I don't know where you got the impression that he was doing any of this with the intention of putting it on production.

greydius · on March 4, 2016

I didn't get that impression. I was trying to say that I admire the author without also advocating putting a bunch of clever hacks into production.

boomlinde · on March 4, 2016

I don't know where you got the impression that he got the impression that he was doing any of this with the intention of putting it on production.

sklogic · on March 4, 2016

For everyone here who mentioned building wrappers around llvm-c or llvm c++ api: there is a nice tool for automating this job (and many other similar things), it's called "Clang". Or, more specifically, cindex. It's easy to use Python + cindex to parse LLVM headers and spit out nice wrapper library for your language of choice.

And example of this approach: https://github.com/combinatorylogic/clike/blob/master/llvm-w...

kungfooman · on March 4, 2016

I really like the idea, why not hack a scripting engine to add bindings to itself, e.g. in Duktape JavaScript engine. Then a JavaScript function could use the C-API of Duktape to generate AST's at runtime for itself. But I think the VM would be the limit then... support for coroutines etc.