No but some model serving tools like llama.cpp do their best. It's just a matter...

No but some model serving tools like llama.cpp do their best. It's just a matter of choosing the right serving tools. And I am not sure LLMs could not optimize their memory layout. Why not? Just let them play with this and learn. You can do pretty amazing things with evolutionary methods where the LLMs are the mutation operator. You evolve a population of solutions. (https://arxiv.org/abs/2206.08896)