And it's a natural match for Packrat, they work together really well.
> writing a hand-written parser is straightforward (although laborious)
Not that much more laborious than writing a BNF. You can still use a parser generator, just add some annotations for the error recovery, error reporting, pretty printing and indentation.
> It's not straightforward to combine two parsers for two sublanguages
With PEG or another lexerless parsing it's totally trivial. You can mix any languages you like.
> Writing reusable parsers.
Again, trivial with PEG, where parsers can be extensible and generic, and can be inherited in full or partially by the consequent parsers.
> Try writing a parser that allows you to reformat your source code, with whitespace and comments preserved.
And again, trivial with the lexerless PEG-based parsers.
> Try writing a parser with the hooks necessary to supports code completion and auto-correct.
You're getting this for free when using PEG.
> The line between the lexer and parser is not clear
Just forget about the lexers, once and for all. It's 21st century already. Burn your Dragon Book.
> With PEG or another lexerless parsing it's totally trivial. You can mix any languages you like.
Not strictly true; combining PEG parsers will always work, but it might not give you the answers you want. If you have some outer layer that has 'value=' and you want to follow it by an expression in various sub-languages, you have to try each language in turn - if language A has some weird corner-case where it happens to accept something that's an important and common case in language B, language A will always win unless you swap the two languages around, in which case B might recognise and mask some important syntax from A.
Worse, your combined language parser cannot tell you that the combination of A and B cause a problem, because PEG parsers don't really support that kind of consistency-checking. It's just a bug that'll crop up at run-time.
You can get around this by manipulating the outer syntax to have special "language A" and "language B" prefixes to mark what syntax to expect, or by manually merging them to create "language AB" which has the syntax priorities you want. But in both cases, that's (potentially delicate and thoughtful) human intervention, not "straightforward combining of two parsers".
> because PEG parsers don't really support that kind of consistency-checking
Not true at all. You can easily check if a new parser is breaking anything in the old one.
And in practice you never mix languages at the same level. A more typical example of such a mixing would be, say, regexp syntax in javascript.
EDIT: if you want more details on a theory of this static PEG verification, they will be available some time later when I polish my next bunch of commits.
I like PEGs a lot and even wrote my own PEG-like parsing language. The main problem I found was that, in practice, mixing lexing and parsing is a bad idea, so I have separate lexing in my system. It depends on the language, but I would say it's true for all programming languages.
It's just obvious that programming languages have separate lexical and grammatical structure. If you want to disprove that, show me some languages like C, Java, Python, etc. expressed as PEGs.
PEGs have been around for 12 years now; I don't see them being deployed widely. There are probably multiple reasons for that, but I believe usability in terms of expressing real languages is a big one. (People always harp on ordered choice; I don't think it's that big a deal because when you write a recursive descent parser, you're mostly using ordered choice too.)
You want to do as much work in the less powerful computational paradigm as possible -- that is, lex with regular languages. And then use the more powerful paradigm (PEGs or CFG algorithms) on top of that token stream.
I believe that lexing and parsing were combined in PEGs for reasons of academic presentation and bootstrapping, not for usability or practicality for recognizing real languages.
Several of your other points are wrong, but I'll leave it at that.
In Packrat it's linear.
> Pratt parsing
And it's a natural match for Packrat, they work together really well.
> writing a hand-written parser is straightforward (although laborious)
Not that much more laborious than writing a BNF. You can still use a parser generator, just add some annotations for the error recovery, error reporting, pretty printing and indentation.
> It's not straightforward to combine two parsers for two sublanguages
With PEG or another lexerless parsing it's totally trivial. You can mix any languages you like.
> Writing reusable parsers.
Again, trivial with PEG, where parsers can be extensible and generic, and can be inherited in full or partially by the consequent parsers.
> Try writing a parser that allows you to reformat your source code, with whitespace and comments preserved.
And again, trivial with the lexerless PEG-based parsers.
> Try writing a parser with the hooks necessary to supports code completion and auto-correct.
You're getting this for free when using PEG.
> The line between the lexer and parser is not clear
Just forget about the lexers, once and for all. It's 21st century already. Burn your Dragon Book.