To anyone knowledgeable: where does geometric deep learning fit in here? Is this paper just another geometric view, or is it an attempt at a formalization for transformer mathematics? I don't see prominent GDL authors in this paper's references (Bronstein, Cohen, Bruna, Veličković, ...).
Maybe the start of a new field tackling 'travelling wordsman' (smith?) problems.