Indeed, the difference is that the lexer offers guarantees about the synthetic INDENT/DEDENT tokens. From an error sync perspective, the benefit is that the programmer (redundantly) re-asserts the block level every line by the amount of indentation. As a small addendum on Python's suspension of indentation tracking when nesting > 0, when I designed the syntax for indentation-based block structure in another language, I required such nested code to always have indentation beyond the current block's level even though no INDENT/DEDENT/NEWLINE tokens are emitted in this state. So this was legal:
x = (1 +
2)
x = (1 +
2)
x = (1 +
2)
x = (1 +
2)
But this was illegal:
x = (1 +
2)
The legal variants are all identical to
x = (1 + 2)
from the parser's perspective. Adding this restriction (which is already the idiomatic way to indent nested multi-line expressions) means that you can reliably sync to block levels even when recovering from an error in a nested state. If your lexer already strips leading indentation from multi-line string literals you could add a similar constraint for them.
The moral of a lot of these tricks is that by turning idioms and conventions into language enforced constraints you can detect programmer errors more reliably and you can do a better job of error recovery. That said, even in a curly brace language like C# you could still use the indentation structure as a heuristic guide for error recovery--it's just going to be less reliable.
The moral of a lot of these tricks is that by turning idioms and conventions into language enforced constraints you can detect programmer errors more reliably and you can do a better job of error recovery. That said, even in a curly brace language like C# you could still use the indentation structure as a heuristic guide for error recovery--it's just going to be less reliable.