>not once have I seen an explanation of what makes it fast and why those techniques aren't looted for open-source languages.
It's fast (for an interpreter), because instead of for example a bytecode interpreter where each instruction operates on a single value, APL/j/k interpreters execute operators over whole arrays at once with optimized C code.
And the set of operators has been designed to work with each other and cover the basic concrete needs (i.e what the CPU actually needs to do rather than how you would describe it in words)
,// is a complete implementation of “flatten”.
|/0(0|+)\ is a complete implementation of an efficient the maximum-subarray-sum
In both cases, each character does one very well defined thing implemented in tight C (sometimes SIMD) code, and K interpreter just orchestrates how they interact.
(Parentheses here group as a pair, but everything else is an independent operation per character)
It's fast (for an interpreter), because instead of for example a bytecode interpreter where each instruction operates on a single value, APL/j/k interpreters execute operators over whole arrays at once with optimized C code.