yes, both core web datasets are publicly available as well as the rest

Imustaskforhelp · 2025-07-08T19:37:44 1752003464

Thanks!

To be honest, if I might argue then that this is one of the best truly open source models that we have got.

There is AllenAI and (Elmo?) and there is also this one which does distributed training but I think this looks a lot like SOTA for 3B parameters to me.

Thanks for telling me, I am not going to lie, I am going to try to test it now! (Ima try some GGUF since ollama convenience)

peatmoss · 2025-07-08T21:52:44 1752011564

OLMo: https://allenai.org/olmo

AFAIK, they were the first open everything model.

diggan · 2025-07-09T09:08:55 1752052135

> AFAIK, they were the first open everything model.

GPT2 (released ~5 years ago?) was "open" in the sense that weights were available for download (sans license), exact datasets that were used where outlined, the architecture explained and so on, so I guess it was also "open" in the sense that Llama is "open", but neither would be "open source" which I'd feel pretty confident to label OLMo with.

So OLMo seems to be the first actually "open source" model, but maybe not "open" as in "downloadable" (which Facebook tries to call "open source").

vixalien · 2025-07-11T07:08:58 1752217738

there’s also IBM’s Granite