> What interpreters such as GNU APL does is to first evaluate b+c, resulting in ...

i_don_t_know · on March 13, 2020

Some implementations use reference counting. So in your example, for + the output requires as much memory as the inputs, and if at least b or c have a ref count of 1, then you can reuse one of the inputs for the output. Otherwise you have to allocate memory to hold the sum of b and c. But that intermediate sum has a ref count of 1 and can be reused for the final output.

The key is to code your primitives in such a way that you can reuse one of the inputs as the output.

You can also do clever tricks that describe the array transformations rather than perform them. It’s similar to what happens during stream fusion in Haskell. See https://news.ycombinator.com/item?id=17506789 for more details.

beagle3 · on March 14, 2020

IIRC numexpr got about x2-x5 faster when they switched from numpy's implementation (which naively mostly works as you describe, one op at a time, refcounted, though no good reuse) to a VM that reuses temporary result memory and works in 8KB pages (thus getting way, way better cache use).

But I'm not familiar with an APL/J/K implementation that does.

i_don_t_know · on March 14, 2020

I didn’t know about numexpr. That looks really interesting. Thanks.

kd0amg · on March 13, 2020

Typical practice seems to be having a large collection of ad hoc local transformations that look for specific combinations of operations. I've only really seen more general static analysis aimed at not materializing intermediate arrays in academic research.

Expression tree IR is maybe the wrong setting to do this sort of fusion for APL. What you want for a+b+c is a 3-argument scalar function applied to a, b, and c, but you need something other than actual APL expressions for that.

sedachv · on March 14, 2020

The full APL grammar is not context-free:

https://www.youtube.com/watch?v=7Z0wzY9Oip4

The strict top-down, right-to-left evaluation makes parsing unnecessary for interpreters; you just need a tokenizer. Obviously this became a problem when people started to think about compilers.