When one says "attention is all you need" the implication is that some believe that you need something more than just attention. What is that something which has been demonstrated as unneeded? Is it a theory of how language works?
Recursion. Before transformers attention was used in recurrent neural networks. "attention is all you need" showed that you can just drop the recursion and just use attention, and the outcome is that you get a very nicely parallelizable architechture, allowing more efficient training.