> Open source just means you have access to the source code. Which you do. No, t...

nuancebydefault · 2025-01-29T18:28:28 1738175308

The weights, which are part of the source, are open. Now you are arguing it not being open source because they don't provide the source for that part of the source. If you follow that reasoning you can ad infinitum claim the absence of sources since every source originates from something.

Kerbonut · 2025-01-30T05:53:59 1738216439

The source is the training data and the code used to turn the training data _into_ the weights. Thus GP is correct, the weights are more akin to a binary from a traditional compiler.

nuancebydefault · 2025-01-30T19:11:11 1738264271

To me this 'source' requirement does not make sense. It is not that you bring training data and the application together and press a train button, there's much more actions involved.

Also the training data is of a massive amount.

Additionally, what about human in the loop training, do you deliver humans as part of the source?

JumpCrisscross · 2025-01-29T15:56:26 1738166186

> they also fail even that test. Neither Meta nor DeepSeek have released the source code of the

This debate is over and makes the open source community look silly. Open model and weights is, practically speaking, open source for LLMs.

I have tremendous respect for FOSS and those who build and maintain it. But arguing for open training data means only toy models can practically exist. As a result, the practical definition will prevail. And if the only people putting forward a practical definition are Meta et al, this is what you get: source available.

sho_hn · 2025-01-29T16:02:43 1738166563

I'm not arguing for open training data BTW, and the problem is exactly this sort of myopic focus on the concerns of the AI community and the benefits of open-washing marketing.

Completely, fully breaking the meaning of the term "open source" is causing collateral damage outside the AI topic, that's where it really hurts. The open source principle is still useful and necessary, and we need words to communicate about it and raise correct expectations and apply correct standards. As a dev you very likely don't want to live in a tech environment where we regress on this.

It's not "source available" either. There's no source. It's freeware.

"I can download it and run it" isn't open source.

I'm actually not too worried that people won't eventually re-discover the same needs that open source originally discovered, but it's pretty lame if we lose a whole bunch of time and effort to re-learn some lessons yet again.

JumpCrisscross · 2025-01-29T16:12:59 1738167179

> it's pretty lame if we lose a whole bunch of time and effort to re-learn some lessons yet again

We need to relearn because we need a different definition for LLMs. One that works in practice, not just at the peripheries.

Maybe we can have FOSS LLMs vs open-source ones, like we do with software licenses. The former refers to the hardcore definition. The latter the practical (and widely used) one.

sho_hn · 2025-01-29T16:14:25 1738167265

Sure, I don't disagree. I fully understand the open-weights folks looking for a word to communicate their approach and its benefits, and I support them in doing so. It's just a shame they picked this one in - and that's giving folks a lot of benefit of the doubt - a snap judgement.

> Maybe we can have FOSS LLMs vs open-source ones, like we do with software licenses.

Why not just call them freeware LLMs, which would be much more accurate?

There's nothing "hardcore" or "zealot" about not calling these open source LLMs because there's just ... absolutely nothing there that you call open source in any way. We don't call any other freeware "open source" for being a free download with a limited use license.

This is just "we chose a word to communicate we are different from the other guys". In games, they chose to call it "free to play (f2p)" when addressing a similar issue (but it's also not a great fit since f2p games usually have a server dependency).

JumpCrisscross · 2025-01-29T16:32:00 1738168320

> Why not just call them freeware LLMs, which would be much more accurate?

Most of the public is unfamiliar with the term. And with some of the FOSS community arguing for open training data, it was easy to overrule them and take the term.

sho_hn · 2025-01-29T16:50:50 1738169450

Most of the public is also unfamiliar with the term open source, and I'm not sure they did themselves any favors by picking one that invites far more questions and needs for explanation. In that sense, it may have accomplished little but its harmful effects.

I get your overall take is "this is just how things go in language", but you can escalate that non-caring perspective all the way to entropy and the heat death of the universe, and I guess I prefer being an element that creates some structure in things, however fleeting.

JumpCrisscross · 2025-01-29T18:01:45 1738173705

> Most of the public is also unfamiliar with the term open source

I’d argue otherwise. (Familiar with, not know.) Particularly in policy circles.

> picking one that invites far more questions and needs for explanation

There wasn't ever a debate. And now, not even the OSI demands training data. (It couldn’t. It, too, would be ignored.)

Flimm · 2025-01-29T16:54:47 1738169687

The only practical and widely used definition of open source is the one known as the Open Source Definition published by the OSI.

The set of free/libre licenses (as defined by the FSF) is almost identical to the set of open sources licenses (as defined by the OSI).

The debate within FOSS communities has been between copyleft licenses like the GPL, and permissive licenses like the MIT licence. Both copyleft and permissive licenses are considered free/libre by the FSF, and both of them are considered open source by the OSI.