Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
9cc: A Small C Compiler (github.com/rui314)
132 points by matt_d on Dec 14, 2018 | hide | past | favorite | 68 comments


Regarding your Makefile; you should still pass CFLAGS through to the compiler when linking, not only LDFLAGS. Suppose CFLAGS contains -m32 (supported by an x86-64-targetted GCC to do 32 bit). You compile the .o files with that, but then link without it, which fails trying to make a 64 bit executable out of 32 bit .o's. Some crazy distros pass a --sysroot in CFLAGS; if you don't have that, your build finds the wrong library and header files.

Even when linking, don't call "cc", but "$(CC)". You're relying on the implicit rule to compile your .c to .o which will use $(CC). $(CC) could be some ARM cross-compiler supplied by a distro. When you link using "cc", you end up using the build-machine's native compiler.

Write the build system like this is an awesome program that major distros will be eager to pick up, and make it easy for the package maintainer. :)

Speaking of CFLAGS, only set that conditionally; don't clobber it:

  CFLAGS ?= -O2   # Only if CFLAGS is not specified externally, use this default
for things that your program needs in order to build right, put that in other variables of your own:

  DIALECT_CFLAGS := -std=c11

  CFLAGS += $(DIALECT_CFLAGS)  # integrate external CFLAGS with our own.
Same deal with LDFLAGS. Both CFLAGS and LDFLAGS can include important things that cause a bad build if you mess with them.


Where would one go to find more of this conventional Makefile wisdom? I've had so many issues trying to use make the "right" way (flexible, clean, terse, etc.).

I feel like one of the best ways to acquire this wisdom is to post a project with lots of mistakes and let people tear it apart.


I bought what seemed to be overstock or maybe it was just being remaindered in a 50%-off Uni Bookshop sale about 20 years ago. It's a tiny (80 page) little book called "Managing Projects with make" by Steve Talbot and printed by O'Reilly and Associates.

Most of what I know about 'make' I learned from this book.


I got it from working on lots of C for almost thirty years, and also by putting together a from-scratch embedded GNU/Linux distro some decade ago, and more recently doing distro work also.

As far as knowing GNU Make, I recommend just reading the manual from beginning to end, perhaps twice.



That is not, contrary to what the name might imply, the official GNU Make manual. That one can be read here, for free: https://www.gnu.org/software/make/manual/


I don't take the name as implying that.

Anyway, it is excellent and has some good advice and tips.

Unlike the official manual which is more a reference.


https://github.com/rui314/9cc/blob/882e4b2dd8/main.c#L7

    int main(int argc, char **argv) {
      ...

      Vector *tokens = tokenize(path, true);
      Program *prog = parse(tokens);
      sema(prog);
      gen_ir(prog);

      if (dump_ir1)
        dump_ir(prog->funcs);

      optimize(prog);
      liveness(prog);
      alloc_regs(prog);

      if (dump_ir2)
        dump_ir(prog->funcs);

      gen_x86(prog);
      return 0;
    }
This is wonderful.


FWIW I started this thread several months ago to advocate that people write their compilers like this :)

https://www.reddit.com/r/ProgrammingLanguages/comments/89n3w...

In many compilers, including some linked on that thread, this clean structure gets lost.


And also wonderfully devoid of error-handling, too. It's the most common way for beautiful-looking C code to look beautiful.


A compiler is in the happy position where there is little point in continuing to run after encountering an error, so it can bail right out with exit(2) after reporting the error to the user. This means that the contract on parse(), for example, can be that if it returns, it has succeeded.


Except of course LLVM has proven the value in not assuming this pattern & building your compiler as a library of which the executable entrypoint is but one frontend.


Exceptions are the exit() of libraries. Or if you're using plain C, setjmp/longjmp might be a good idea, depending on what you're doing.


Unless the programmer uses the null object pattern in which case everything is hunky dory. For example, an empty "Program" would probably do the job.


Small nitpick. Functions in C that take no arguments are written name(void){...} not name(){...}. This latter form is an old-style definition which doesn't introduce a prototype into the scope.

For functions that have prototypes, it is okay, but when they do not, you're losing type checking (yes, even on the number of arguments).

The following program compiles with no diagnostics for me with -W -Wall -ansi -pedantic, with GCC 7.3 on Ubuntu 18:

  int func()
  {
    return 0;
  }

  int main()
  {
    func(3);
    return 0;
  }
-std=c11 (as you're using) makes no difference.

The same is not true of C++: func() in C++ is a prototype definition. C++ supports (void) for compatibility with C, but even in nonsensical contexts: class::class(void);


Quote from 6.7.6.3.14 of the C++11 spec (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf):

> An identifier list declares only the identifiers of the parameters of the function. An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters. The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of the parameters is supplied.

So, my interpretation is that the following two function definitions define the same function of the exact same type

  void func() {}
  void func(void) {}
although the following two function declarations declare two functions of different types

  void func();
  void func(void);


That is of course the C11 (draft) spec, not C++.

Interesting find there. The wording is also in the C99 draft; it is not new.

It is in fact saying that the empty list in a definition is a special case and does declare that the function takes no parameters. To "specify" here can be understood to mean as inserting information about type into the declaration scope; that which a declaration does.

So that's a bit of a bug in GCC there; it should be treating this the same as (void) and therefore diagnosing that way. If not by default, then at least when -pedantic is applied. But nope:

   $ gcc -Wall -W -std=c11 -pedantic proto.c
   $


My understanding is that, unless you have a function prototype, you will lose type info about the parameters:

"The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of the parameters is supplied."

i.e. the most important is the end of that sentence, so GCC behavior should be right.


Note the: "that is not part of a definition".

This is about declarations only.


Clang (tested with v7) however does issue a warning even with no flags set.


Any compiler should because as a definition, int foo() { } does in fact inform us that the function can only be correctly called with no arguments.

Formally, this is not introduced into the scope as type information according to the language spec, but that's no reason not to warn about it informally.


I'm rewriting 9cc with Rust from first commit. https://github.com/utam0k/r9cc


Good to see more C/subset-C compilers being written. Besides the immense pedagogical value, they are also useful for preventing the "trusting trust" attack: https://dwheeler.com/trusting-trust/

I will also make a suggestion to use precedence climbing instead of plain recursive descent for the parser; it makes the parser even simpler and table-driven, which is important with a language like C that has many precedence levels.


Interesting, do you have any useful links related to this concept?


Here is the discussion of the original Trusting Trust attack: https://news.ycombinator.com/item?id=13569275


Thanks I was referring to precedence climbing parser part of GP reply.


https://eli.thegreenplace.net/2012/08/02/parsing-expressions...

Precedence climbing is also (intimately) related to Pratt parsing, and there's a useful series of articles about the two here: https://www.oilshell.org/blog/2017/03/31.html



Thompson publicized an attack by Paul Karger during MULTICS. One of many. Defeating their totality took what's called high-assurance security. For compilers, you have to verify source despite potentially-malicious developers, verify its binary translation, and its distribution. Maybe the tools used to do that as well. I wrote more about what that takes here:

https://lobste.rs/s/nognrl/creating_language_using_only_asse...


It lists a goal as "compiling real-world programs such as the linux kernel"

The last time I investigated the linux kernel had so many gcc-isms that it was probably true that if you could compile the linux kernel, you could probably compile any program targeted to gcc.


Other "alternative" compilers like tcc have pulled it off, supposedly without patching the kernel, so it is doable.


Does tcc compile a vanilla kernel? I know tccboot added some minor patches:

https://bellard.org/tcc/tccboot_readme.html


xv6 might be nicer start:

https://pdos.csail.mit.edu/6.828/2012/xv6.html

For bootstrapping, the overall consensus is you make enough C to compile, without optimizations, early version of GCC which compiles itself from there until you reach current version. Several folks are trying to compile TCC instead to leverage it and dwheeler's work.


BSD kernel is maybe a cleaner target?


Interesting to mention: the author of this compiler, Rui Ueyama, is also a developer at the LLVM project. He's one of the most active developers of LLD, LLVM's linker.


How does this compare to Nils Holm’s subc compiler that has somewhat similar goals? There is a book describing the compiler as well (though the current version of the compiler supports a larger subset of the C language).

https://www.t3x.org/subc/index.html


I really wish people would choose better names, we just had an hyphen and it becomes a C dialect from the early 80's.

https://en.wikipedia.org/wiki/Small-C


"a small c compiler" is just the description, "9cc" is the name itself, so there's not much similarity there...


I wonder if this is a reference to the band 10cc - https://en.wikipedia.org/wiki/10cc


G is the 7th letter of the alphabet


It took me a distressingly long time to figure out what you meant by this.


From the 2015 discussion about 8cc, the author states that "I chose a random character to append before 'cc'."

https://news.ycombinator.com/item?id=9125912


his last C compiler was called 8cc


How does this compare with tcc?


kparc.com/b


>no memory management is the memory management policy in 9cc. We allocate memory using malloc() but never call free().

>I know that people find the policy odd, but this is actually a reasonable design choice for short-lived programs such as compilers.

I'm strongly disagree at this point. Memory management is important even for short-lived programs. It would bring burden to the OS if you invoke this kind of "short-lived, memory-management-free" programs multiple times.

>This policy greatly simplifies code and also eliminates use-after-free bugs entirely

Not facing it is definitely not a good way to solve a problem. It's a bad attitude as a programmer be honestly.


I remember from the excellent book "Expert C Programming: Deep C Secrets", that the Sun C compiler also took a similar approach to using malloc, so it isn't unheard of for compilers specifically.

I disagree with your characterisation of this as bad practice and "not facing it", it is an informed decision rather than ignorance, and brings considerable benefits. With a C complier, some reasonable estimates can be made of the input data length/complexity and hence allocation sizes, and having poor performance if someone tries to feed in a million line file is perfectly acceptable, especially in a complier designed to be simple like the OP.

I think recognising the special set of circumstances that justify making an unusual tradeoff that wouldn't normally be acceptable is actually a mark of maturity as a programmer, not a bad attitude. The phrase "Don't let perfect be the enemy of good" comes to mind.

Finally, why would it bring burden to the OS if you invoke this multiple times sequentially? malloc/free is implemented in the C library, not in the kernel, it's a mechanism for sharing bigger chunks fetched/given-back from/to the kernel with sbrk(). The kernel would just reclaim the pages on program exit, as it would have to anyway, as it cannot rely on programs being well behaved enough to call free() (which rarely would give the memory back to the kernel immediately anyway). Since the kernel uses virtual memory, its not actually moving the contents about, just manipulating page tables, so it should be quite fast, and zeroing memory is not as time consuming as you think due to Zero Fill On Demand (ZFOD), and hence the kernel does about the same amount of work regardless of if the program called free() or not.

Sure it's using more peak memory than it could be, but I reckon compared to say, a web browser, it's still a very small fraction of the total.


> Memory management is important even for short-lived programs. It would bring burden to the OS if you invoke this kind of "short-lived, memory-management-free" programs multiple times.

All the memory is freed when the process exits. Why does it matter if you run the program multiple times?


It will put more pressure on the allocator when running, doing that a lot will likely have some kind of cumulative consequence down the line. I know from experience [0] that even reusing allocated memory rather than bouncing it back to malloc can have dramatic effects.

[0] https://gitlab.com/sifoo/snigl/blob/master/src/snigl/pool.h


Malloc doesn't interact with the kernel at all. The kernel sees pages, not the data structures that malloc manages. The kernel doesn't even know whether you free()'d the memory by the time the process exits.

There is exactly zero difference from the operating system's perspective between freeing and not freeing the memory before program termination (except that one might have a higher peak memory usage).

The classic implementations of several standard Unix utilities deliberately never deallocated memory for performance reasons. If there was a problem with this practice, I would expect Bell Labs--of all places--to know not to do it.


>There is exactly zero difference from the operating system's perspective (...)

Close, but not quite: in case of larger allocations, malloc() tends to use mmap( , , , | MAP_ANON) rather than brk() to request memory from the OS. For example, the glibc's malloc() uses mmap() when requested size exceeds MMAP_THRESHOLD, which is 128kB by default.

The mmap() approach gives large, continuous memory blocks that can also be easily free()'d via munmap() with little to no bookkeeping needed[1] - not being subject to the same fragmentation woes as memory allocated via brk() - as long as your address space is significantly larger than allocated memory.

That aside, I fully agree with the author of 9cc.

[1] in fact a simplistic libc memory allocator could directly wrap malloc() around mmap(), free() around munmap() and realloc() around mrealloc(), leveraging the in-kernel allocator at the cost of one syscall with at least two context switches at each call - i.e., slowww


Using mmap/munmap isn't quite that trivial: You need to remember how large each allocation is in order to know how many pages to unmap.


An implementation certainly could return memory to the OS on calls to free(), but to my knowledge none of the widely used implementations do so. (I would be interested to learn of counterexamples!)


jemalloc can use madvise(MADV_FREE) to return pages to the OS.


Did I mention kernels anywhere? I simply shared my experience.

If that doesn't fit neatly into your delusions it's really not my problem.


Wait, doesn't it do the opposite? The big thing that "pressures" an allocator is fragmentation, which you don't get at all if you never free.


With no freeing, memory used is the sum of all allocations. With freeing, it is max of the allocation for active objects at any given time. The former can ask for much more memory from the OS for some programs that generate a lot of temp garbage. Though this probably doesn’t matter for a C compiler - unless it is required to compile megabytes of generated code!


An allocation you don't have to free is, at the margin, a simple pointer increment. It's the need to free memory that makes allocators expensive, not the amount of memory they allocate. This is why performance-sensitive programs use pools, arenas, and other preallocation strategies.


What I was getting at is that non-freeing programs would use much more memory. This doesn't increase the "pressure" on the allocator for such a program but does increase the pressure on the OS (that has to handle memory requests for many concurrently running programs).


Sure, but if you're not freeing memory you don't need a real malloc. You allocation function can be just "top += allocsize".


Yes, that's commonly known as "arena allocation" and has been used for decades in programs that have well-defined phases that create lots of connected objects with the same lifetime, like compiler passes, RPC serialization layers, or query planners.


Not everybody wants to wait around all day for their utilities to free up their memory before exiting.

http://lists.gnu.org/archive/html/coreutils/2014-08/msg00012...


Generally speaking a chunk of memory you free is simply marked "free" for reuse but not given back to the OS. So in this regard, it is not a burden to the OS.


It looks like by default, the GNU libc won't return pages to the OS until 128 KiB are free at the top of the heap segment. The compiler has to make many small allocations, so even if it freed memory at every opportunity I doubt the heap trimming would happen very often.

Source: https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#...


This is also how the Apache httpd server works, last I checked. Not the entire server, of course - it's a long-lived program with lots of memory to manage, however when servicing a request, it uses the arena pattern to allocate memory and the entire arena is freed at the end of the request. In fact, its entire memory management model is based on a hierarchy of memory arenas that can only grow and get freed once whatever it was that required their liveness ends. I've had the pleasure to write a plugin for it to handle a certain custom upload protocol.


IMO it depends on how much memory is expected to be consumed. If its only at most tens on MBs, then I think not doing memory management is not just smart, but more performant.


Sometimes this is true, sometimes not. It depends on how many blocks of memory you allocate and for what purposes, I think. For short-lived programs it can make sense that some things are never freed (they will be automatically freed when the program terminates), but still there might be some things which should be managed anyways. It also depends on the program, and on other things.


I think you're reacting more to the words that the author chose than the actual design decision, which is simply an instance of arena allocation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: