> which is exponential in general In Packrat it's linear. > Pratt parsing And it...

thristian · on June 26, 2016

> With PEG or another lexerless parsing it's totally trivial. You can mix any languages you like.

Not strictly true; combining PEG parsers will always work, but it might not give you the answers you want. If you have some outer layer that has 'value=' and you want to follow it by an expression in various sub-languages, you have to try each language in turn - if language A has some weird corner-case where it happens to accept something that's an important and common case in language B, language A will always win unless you swap the two languages around, in which case B might recognise and mask some important syntax from A.

Worse, your combined language parser cannot tell you that the combination of A and B cause a problem, because PEG parsers don't really support that kind of consistency-checking. It's just a bug that'll crop up at run-time.

You can get around this by manipulating the outer syntax to have special "language A" and "language B" prefixes to mark what syntax to expect, or by manually merging them to create "language AB" which has the syntax priorities you want. But in both cases, that's (potentially delicate and thoughtful) human intervention, not "straightforward combining of two parsers".

sklogic · on June 26, 2016

> because PEG parsers don't really support that kind of consistency-checking

Not true at all. You can easily check if a new parser is breaking anything in the old one.

And in practice you never mix languages at the same level. A more typical example of such a mixing would be, say, regexp syntax in javascript.

EDIT: if you want more details on a theory of this static PEG verification, they will be available some time later when I polish my next bunch of commits.

chubot · on June 26, 2016

I like PEGs a lot and even wrote my own PEG-like parsing language. The main problem I found was that, in practice, mixing lexing and parsing is a bad idea, so I have separate lexing in my system. It depends on the language, but I would say it's true for all programming languages.

It's just obvious that programming languages have separate lexical and grammatical structure. If you want to disprove that, show me some languages like C, Java, Python, etc. expressed as PEGs.

PEGs have been around for 12 years now; I don't see them being deployed widely. There are probably multiple reasons for that, but I believe usability in terms of expressing real languages is a big one. (People always harp on ordered choice; I don't think it's that big a deal because when you write a recursive descent parser, you're mostly using ordered choice too.)

You want to do as much work in the less powerful computational paradigm as possible -- that is, lex with regular languages. And then use the more powerful paradigm (PEGs or CFG algorithms) on top of that token stream.

I believe that lexing and parsing were combined in PEGs for reasons of academic presentation and bootstrapping, not for usability or practicality for recognizing real languages.

Several of your other points are wrong, but I'll leave it at that.

sklogic · on June 26, 2016

> mixing lexing and parsing is a bad idea

Why exactly? I find it rather liberating to mix lexing and parsing, for any possible language.

> show me some languages like C, Java, Python, etc. expressed as PEGs.

Feel free to browse my github repos, username 'combinatorylogic'.

There is a lot of parsers in PEG, including C and Verilog.

One of the very good reasons to do such a mix is to be able to have nested comments (which rules out regular expressions).

> I don't see them being deployed widely

Do you see any new languages deployed widely? Inertia is very strong here.

> Several of your other points are wrong, but I'll leave it at that.

Mind elaborating? All my points are based on years of practice in writing parsers for all languages imaginable.