> The model will be fully open: source code and weights will be publicly availab...

TobTobXX · 2025-07-11T21:11:07 1752268267

Well, when the actual content is 100s of terabytes big, providing URLs may be more practical for them and for others.

layer8 · 2025-07-11T22:19:25 1752272365

The difference between content they are allowed to train on vs. being allowed to distribute copies of is likely at least as relevant.

sschueller · 2025-07-12T18:38:47 1752345527

No problem, we have 25 Gbit/s home internet here. [1]

[1] https://www.init7.net/en/internet/fiber7/

glhaynes · 2025-07-11T20:44:48 1752266688

That wouldn't seem reproducible if the content at those URLs changes. (Er, unless it was all web.archive.org URLs or something.)

dietr1ch · 2025-07-11T21:36:34 1752269794

This is a problem with the Web. It should be easier to download content like it was updating a git Repo.

WeirderScience · 2025-07-11T20:21:50 1752265310

Yeah, I suspect you're right. Still, even a list of URLs for a frontier model (assuming it does turn out to be of that level) would be welcome over the current situation.