More

DmitrySoshnikov · on Dec 13, 2020

sbrk is emulated via mmap today on MacOS and others, this is an abstraction (for bump allocation), not just a function

you may implement custom sbrk via mmap easily

DmitrySoshnikov · on Oct 27, 2020

Thanks for the feedback, and glad to see more engineers interested in deeper topics!

DmitrySoshnikov · on Oct 27, 2020

That's a great point and I have the https://github.com/DmitrySoshnikov/syntax/issues/99 to add support for IELR in Syntax.

DmitrySoshnikov · on Oct 27, 2020

Yes, in fact for building a language the parsing stage should be skipped altogether (start with interpreter or bytecode). We do this in the interpreters class. And once you have a fully working VM, _now_ it is a good time to shift to parsing and design a good syntax.

For recursive descent we have a separate class "Building a Recursive descent parser from scratch" which is purely practical coding class for those interested mainly in practice.

DmitrySoshnikov · on Oct 27, 2020

Yes, backtracking still might be an option although has its known limitations in terms of parallel paths. We describe backtracking in this class too.

The LL in the the view of manual Recursive descent is the most used on practice along with combinators and LALR(1).

DmitrySoshnikov · on Oct 27, 2020

Professor Aiken is a great teacher and I love his compilers course. However as for the parsering stage, that course goes as maximum as to SLR(1) which is pretty "toy" parsing mode. That's the problem with a combined "compilers class" -- one simply can't put everything, and everything is becoming slightly superficial. That's why I have Parsers and Garbage Collectors class as separate and fully specialized course.

DmitrySoshnikov · on Oct 27, 2020

Thank you for the feedback, glad you liked it, and glad to see more people interested in deeper CS topics!

DmitrySoshnikov · on Oct 27, 2020

Yes, I recommend "Parsing Techniques" book.

DmitrySoshnikov · on Oct 27, 2020

See this small summary doc on different techniques for error recovery: https://gist.github.com/DmitrySoshnikov/feee52cbfb03b7b69110...

DmitrySoshnikov · on Oct 27, 2020

Great details, thanks!

> A benefit of a syntax with indentation-defined block structure is that you don't need to rely on balanced grouping tokens like { ... }

In fact from the lexer perspective there is no big difference, the matching indent-dedent is the same token type as would be { and }

psykotic · on Oct 27, 2020

Indeed, the difference is that the lexer offers guarantees about the synthetic INDENT/DEDENT tokens. From an error sync perspective, the benefit is that the programmer (redundantly) re-asserts the block level every line by the amount of indentation. As a small addendum on Python's suspension of indentation tracking when nesting > 0, when I designed the syntax for indentation-based block structure in another language, I required such nested code to always have indentation beyond the current block's level even though no INDENT/DEDENT/NEWLINE tokens are emitted in this state. So this was legal:

    x = (1 +
        2)

    x = (1 +
            2)

    x = (1 +
      2)

    x = (1 +
     2)

But this was illegal:

    x = (1 +
    2)

The legal variants are all identical to

    x = (1 + 2)

from the parser's perspective. Adding this restriction (which is already the idiomatic way to indent nested multi-line expressions) means that you can reliably sync to block levels even when recovering from an error in a nested state. If your lexer already strips leading indentation from multi-line string literals you could add a similar constraint for them.

The moral of a lot of these tricks is that by turning idioms and conventions into language enforced constraints you can detect programmer errors more reliably and you can do a better job of error recovery. That said, even in a curly brace language like C# you could still use the indentation structure as a heuristic guide for error recovery--it's just going to be less reliable.

DmitrySoshnikov · on Oct 27, 2020

Yeah, this makes sense, thanks.