Matrix multiplication and similar is also one of the few operations where algorithms and special case instructions are interesting for floating point on a massive scale.
I.e adding two arrays together, computing dot products, those operations are just memory bound when the data grows, but matrix multiplication is dense enough with operations per element that it is limited by arithmetic operations too.
I.e adding two arrays together, computing dot products, those operations are just memory bound when the data grows, but matrix multiplication is dense enough with operations per element that it is limited by arithmetic operations too.