Because neural networks use dot products, which are just un-normalized cosine similarities, as the main way to compare and transform embeddings in their hidden layers. Therefore, it makes sense that the most important signals in the data arranged in latent space such that they are amenable to manipulations based on dot products