Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's really not much work, and it removes another source of dependency/build bloat. Also, if you care about speed, hand-rolling your lexer will always be the fastest (and even naive implementations can rival/outcompete generators).


Not only that, but a manually written lexer has much more flexibility. It allows you to easily implement things that lexer generators struggle with. Think of significant indentation, nested comments, interpolation inside string literals. Sometimes you can't do this at all with generators, sometimes you can, but code becomes a mess. That's the reason why many popular programming language implementations use a hand written lexer and parser.


re2c makes it easy to drop into hand-written code where necessary. For example, in C backslash-newline makes a huge mess of everything unless you can handle it underneath the lexer, which works nicely in re2c. In C++, raw string literals are not even context-free, let alone regular, so they have to be parsed with hand-written code. This is straightforward with re2c.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: