The multimodal capabilities especially on next action prediction are quite impressive; watching the github to see if & when they'll open source this: https://github.com/microsoft/Magma
Good catch! A minor correction: Magma - M(ultimodal) Ag(entic) M(odel) at M(icrosoft) (Rese)A(rch), the last part is similar to how the name Llama came out, :)
A bit sad that they reused name of https://icl.utk.edu/magma/ (Matrix Algebra on GPU and Multi-core Architectures). This library is already heavily used in machine learning, for example, it is included in every pytorch-based project.
Also, I wonder why they named it Magma?