* Llama 3.2 multimodal actually still ranks below Molmo from ai2 released this morning.
* AI2D: 92.3 (3.2 90B) vs 96.3 (of Molmo 72B)
* Llama 3.2 1B and 3B is pruned from 3.1 8B so no leapfrogging unlike 3 -> 3.1.
* Notably no code benchmarks. Deliberate exclusion of code data in distillation to maximize mobile on-device use cases?
Was hoping there would be some interesting models I can add to https://double.bot but doesn't seem like any improvements to frontier performance on coding.
* Llama 3.2 multimodal actually still ranks below Molmo from ai2 released this morning.
* AI2D: 92.3 (3.2 90B) vs 96.3 (of Molmo 72B)
* Llama 3.2 1B and 3B is pruned from 3.1 8B so no leapfrogging unlike 3 -> 3.1.
* Notably no code benchmarks. Deliberate exclusion of code data in distillation to maximize mobile on-device use cases?
Was hoping there would be some interesting models I can add to https://double.bot but doesn't seem like any improvements to frontier performance on coding.