I liked the first half of the article, but I'm not sure I got anything from the ...

Y_Y · 2024-12-02T09:56:14 1733133374

I also struggled with the "bicameral" definition. The best I could come up with is that because e.g. Scheme represents code and and data in the same way (isn't there a word for this?) it's possible to represent and manipulate (semantically) invalid code. This is because the semantics are done in the other "chamber". The example given was `(lambda 1)` which is a perfectly good sexp, but will error if you eval it.

This could be contrasted with C where code (maybe more precisely program logic) is opaque (modulo preprocessor) and can only be represented by function pointers (unless you're doing shellcode). Here the chamber that does the parsing from text (if we don't look inside GCC) also does semantic "checking" and so while valid functions can be represented within C (via the memory contents at the function pointer), the unchecked AST or some partial program is not represented.

I've tried not to give too many parentheticals above, but I'm not sure the concept holds water if you play tricks. Any Turing machine can represent any program, presumably in a way that admits cutting it up into atoms and rearranging to an arbitrary (potentially invalid) form. I'd be surprised if this hasn't been discussed in more detail somewhere in the literature.

This

chubot · 2024-12-02T16:23:39 1733156619

It excludes languages that build a single AST directly from tokens. I am pretty sure Clang is like this, and probably v8. (They don't have structured macros, so it's not observable by users.)

As opposed to building first an untyped CST (concrete syntax tree), and then transforming that into a typed AST.

CPython does exactly this, but it has no macro stage either, so it's not exposed to users. (Python/ast.c is the CST -> AST transformation. It transforms an untyped tree to a typed tree.)

So the key reason it matters is that it's a place to insert the macro stage.

---

I agree that the word "bicameral" is confusing people, but it basically means "reader --> parser" as opposed to just "parser".

The analogies in the article are very clear to me -- in this world, JSON and XML parsers are "readers", but they are NOT "parsers"! (and yes that probably confuses many people, some new words could be necessary)

The JSON Schema or XML Schema would be closer to the parser -- it determines whether you have a "for loop" or "if statement", or an "employee" and "job title", etc.

Another clarifying comment - https://lobste.rs/s/ici6ek/bicameral_not_homoiconic#c_bmx0vf

chubot · 2024-12-02T17:07:07 1733159227

I'll also argue that the ideas in this post absolutely matter in practice.

For example, Github Actions uses YAML as its Reader / S-expression / CST layer.

And then it has a separate "parser", for say "if" nodes, and then another parser for the string value of those "if" nodes.

https://docs.github.com/en/actions/writing-workflows/workflo...

    if: ${{ ! startsWith(github.ref, 'refs/tags/') }}

    if: github.repository == 'octo-org/octo-repo-prod'

This fact is poorly exposed to users:

You must always use the ${{ }} expression syntax or escape with '', "", or () when the expression starts with !, since ! is reserved notation in YAML format.

So I feel that they could have done a better job with language design by taking some lessons from the past.

Gitlab has the same kind of hacky language on top of YAML as far as I remember