Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What interpreters such as GNU APL does is to first evaluate b+c, resulting in a new array, and then add that result to a.

Is this really what most implementations would do? I'd assume it would parse the expression `a+b+c` into a tree like

     +
    / \
   a   +
      / \
     b   c
But then run a pass on the tree to identify expressions like this that could be fused together without creating intermediate results. I know that most APL implementations have more efficient implementations of certain idioms (commonly-used sequences of operators/functions), so I figured they must be doing something like this to be able to identify those idioms.

This is just conjecture though, I'd love to know more about how this actually works in practice.



Some implementations use reference counting. So in your example, for + the output requires as much memory as the inputs, and if at least b or c have a ref count of 1, then you can reuse one of the inputs for the output. Otherwise you have to allocate memory to hold the sum of b and c. But that intermediate sum has a ref count of 1 and can be reused for the final output.

The key is to code your primitives in such a way that you can reuse one of the inputs as the output.

You can also do clever tricks that describe the array transformations rather than perform them. It’s similar to what happens during stream fusion in Haskell. See https://news.ycombinator.com/item?id=17506789 for more details.


IIRC numexpr got about x2-x5 faster when they switched from numpy's implementation (which naively mostly works as you describe, one op at a time, refcounted, though no good reuse) to a VM that reuses temporary result memory and works in 8KB pages (thus getting way, way better cache use).

But I'm not familiar with an APL/J/K implementation that does.


I didn’t know about numexpr. That looks really interesting. Thanks.


Typical practice seems to be having a large collection of ad hoc local transformations that look for specific combinations of operations. I've only really seen more general static analysis aimed at not materializing intermediate arrays in academic research.

Expression tree IR is maybe the wrong setting to do this sort of fusion for APL. What you want for a+b+c is a 3-argument scalar function applied to a, b, and c, but you need something other than actual APL expressions for that.


The full APL grammar is not context-free:

https://www.youtube.com/watch?v=7Z0wzY9Oip4

The strict top-down, right-to-left evaluation makes parsing unnecessary for interpreters; you just need a tokenizer. Obviously this became a problem when people started to think about compilers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: