Hacker News new | past | comments | ask | show | jobs | submit login

I am working on a compiler for my new language. I went from learning about Yacc at the university (reaction: disgust) to discovering PEG (reaction: I should build my own parser generator!) to wisdom (reaction: just write the damn thing by hand and be done with it!)

I have a few basic reasons why I now think parser generators are a mistake for my purpose: 1) It is hard to learn all the details of a particular parser generator that you need for a complex parsing project. 2) The mix of grammar and semantic actions one has to come up with is ugly and complex. 3) The output is both ugly and hard to debug.

The most complicated part of a compiler is not recognizing good input, put doing something with it. A hand-written recursive descent parser can be a thing of beauty and simplicity that increases the joy of working on the meat of the compiler, that is semantics!




Parser combinators for lazy streams can be written as recursive descent and perform as, more or less, generalized LR parsing (exploring all tree branches "simultaneously").

http://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf

What is more, these combinators can express context-sensitive grammars: https://www.cs.york.ac.uk/plasma/publications/pdf/partialpar... (page 5).

And, by tweaking combinators, you can arrive to the wonders of stream processing parsing without (mostly) space leaks: http://www.cse.chalmers.se/edu/year/2015/course/afp/Papers/p...

It is easy to add syntax error position reporting, also it is easy to add syntax error corrections. All this can be done in under a day.

For me it offers good things from both worlds - declarative specification of parser generators and easy modifications from "parse damn thing by hand".

I actually encountered one case where that approach shines incommensurably: supporting old languages like VHDL. VHDL fits every parser formalism almost without seams but still has one place where you cannot do things either efficiently or elegantly: character literals and attributes. 'a' is a character literal, a'a is an invocation of attribute a on the object a. The ambiguity is in the lexing level, needing feedback from parsing level. Usual parsing tools cannot do that good enough, with parsing combinators it is trivial.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: