Another way to think about this is to imagine you are working in a lazy-by-default language like Haskell. The code might seem to build up an AST and then evaluate it, but (disregarding parse errors) the AST is never materialized in memory at once: the tree might only have one level, with its children represented by thunks that continue to parse more of the source code. (If you are unfamiliar with Haskell laziness, imagine that the so-called tree has children that are functions to produce the next level of the tree.) Of course you will need a carefully designed bytecode in order to generate bytecode from AST where the children is not yet known.