> If you have 1e4 machines and run a global video distribution network it might be worth an engineer's salary to look into it.
Sure, and that would be the 0.1% of the time where it make sense.
> if you're implementing a feature in the standard library of a widely used language
I would contend that unless your library is very opinionated on exactly how data will be accessed, you can't possibly optimize cache access in a way that makes sense for every user.
You can't possibly optimize sorting in a way that makes sense for every user. Let us use insertion sort in our standard library and call it a day. Why are you bothering to discuss general-case sorting algorithms?
Sure, and that would be the 0.1% of the time where it make sense.
> if you're implementing a feature in the standard library of a widely used language
I would contend that unless your library is very opinionated on exactly how data will be accessed, you can't possibly optimize cache access in a way that makes sense for every user.