Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

author here: thanks so much for the feedback. I agree, 'more memory for accumulators' would be a better title for this section.

and I also see the misconception you are pointing out in the 'best case' section. re-reading this, I realize that if you are accumulating C using outer products between columns of A and rows of B, you can achieve O(N) intensity while storing all of C, and just a column of A and a row of B in fast memory. Whereas if you are using inner products, you need all of A,B,C in fast memory to achieve O(N) intensity.

I guess when I wrote this I was just thinking about an inner product, which is too narrow. Thanks I might tweak this section :)



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: