Hacker News new | past | comments | ask | show | jobs | submit login

How can they ban something thats open source that you can just run on your own hardware?



There are illegal numbers in the USA land of the "free".

https://en.wikipedia.org/wiki/Illegal_number

> An AACS encryption key (09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0) that came to prominence in May 2007 is an example of a number claimed to be a secret, and whose publication or inappropriate possession is claimed to be illegal in the United States.


> illegal numbers in the USA land of the "free"

This is a silly take for anyone in tech. Any binary sequence is a number. Any information can be, for practical purposes, rendered in binary [1].

Getting worked up about restrictions on numbers works as a meme, for the masses, because it sounds silly, but is tantamount to technically arguing against privacy, confidentiality, the concept of national secrets, IP as a whole, et cetera.

[1] https://en.m.wikipedia.org/wiki/Shannon%27s_source_coding_th...


Good thing that is part of the wikipedia entry:

> Any piece of digital information is representable as a number; consequently, if communicating a specific set of information is illegal in some way, then the number may be illegal as well.


All those things are not self-evident and thus debatable


> not self-evident and thus debatable

Totally agree. But prompting debate or even further thought isn’t the point of the meme.


I'd argue that, as satire, it's the main point ;)


> as satire, it's the main point

There is thought-stopping satire and thought-provoking satire. Much of it depends on the context. I’m not getting the latter from a “USA land of the ‘free’” comment.


> is collecting rain water illegal?

> It depends on where you live. In many places, collecting rainwater is completely legal and even encouraged, but some regions have regulations or restrictions.

United States: Most states allow rainwater collection, but some have restrictions on how much you can collect or how it can be used. For example, Colorado has limits on the amount of rainwater homeowners can store. Australia: Generally legal and encouraged, with many homes using rainwater tanks. UK & Canada: Legal with few restrictions. India & Many Other Countries: Often encouraged due to water scarcity.


That takes me back! Fark.com would delete any comment that contained random hexadecimal.


It was the beginning of the end for Digg, too, IIRC. Started a lot of people leaving for Reddit, right?


I think so; I joined Reddit when it was in tech news as people left Digg after the big redesign. I'm not sure when the exodus started. I left Fark over the hd-dvd mess.


> whose publication or inappropriate possession is claimed to be illegal in the United States.

That's not the same thing as a number being illegal at all. Here, watch this:

> I claim breathing is illegal in the United States

There, now breathing is claimed to be illegal in the United States.


In both cases, legality depends entirely on repercussions, i.e. if there's someone to enforce the ban. I suspect that in the "illegal numbers" case there might be.


man that's very concerning for wikipedia who is publishing it right there on the page linked above.


Only concerning if they are a US based company hosting their data in US data centers. oops


It's not open source. The provide the model and the weights, but not the source code and, crucially, the training data. As long as LLM makers don't provide the training data (and they never will, because then they will be admitting to stealing), LLMs are never going to be open source.


Thanks for reminding people of this.

Open source means two things in spirit:

(a) You have everything you need to be able to re-create something, and at any step of the process change it.

(b) You have broad permissions how to put the result to use.

The "open source" models from both Meta so far fail either both or one of these checks (Meta's fails both). We should resist the dilution of the term open source to the point where it means nothing useful.


I think people are looking for the term "freeware" although the connotations don't match.


Agreed, but the "connotations don't match" is mostly because the folks who chose to call it open source wanted the marketing benefits of doing so. Otherwise it'd match pretty well.


At the risk of being called rms, no, that's not what open source means. Open source just means you have access to the source code. Which you do. Code that is open source but restrictively licensed is still open source.

That's why terms like "libre" were born to describe certain kinds of software. And that's what you're describing.

This is a debate that started, like, twenty years ago or something when we started getting big code projects that were open source but encumbered by patents so that they couldn't be redistributed, but could still be read and modified for internal use.


> Open source just means you have access to the source code.

That's https://en.wikipedia.org/wiki/Source-available_software , not 'open source'. The latter was specifically coined [1] as a way to talk about "free software" (with its freedom connotations) without the price connotations:

The argument was as follows: those new to the term "free software" assume it is referring to the price. Oldtimers must then launch into an explanation, usually given as follows: "We mean free as in freedom, not free as in beer." At this point, a discussion on software has turned into one about the price of an alcoholic beverage. The problem was not that explaining the meaning is impossible—the problem was that the name for an important idea should not be so confusing to newcomers. A clearer term was needed. No political issues were raised regarding the free software term; the issue was its lack of clarity to those new to the concept.

[1] https://opensource.com/article/18/2/coining-term-open-source...


You dont get to redefine what "open" means.


It's common for terms to have a more specific meaning when combined with other terms. "Open source" has had a specific meaning now for decades, which goes beyond "you can see the source" to, among other things, "you're allowed to it without restriction".


So Swedish meatballs are any ball of meat made in Sweden?

And French fries are anything that was fried in France?


Tell that to Sam Altman


He did not succeed, did he?


I don't know why you've been downvoted. This is a 100% correct history. "Open source" was specifically coined as a synonym to "free software", and has always been used that way.


> Open source just means you have access to the source code. Which you do.

No, they also fail even that test. Neither Meta nor DeepSeek have released the source code of their training pipeline or anything like that. There's very little literal "source code" in any of these releases at all.

What you can get from them is the model weights, which for the purpose of this discussion, is very similar to compiler binary executable output you cannot easily reverse, which is what open source seeks to address. In the case of Meta, this comes with additional usage limitations on how you may put them to use.

As a sibling comment said, this is basically "freeware" (with asterisks) but has nothing to do with open source, either according to RMS or OSI.

> This is a debate that started, like, twenty years ago

For the record, I do appreciate the distinction. This isn't meant as an argument from authority at all, but I've been an active open source (and free software) developer for close to those 20 years, am on the board of one of the larger FOSS orgs, and most households have a few copies of FOSS code I've written running. It's also why I care! :-)


The weights, which are part of the source, are open. Now you are arguing it not being open source because they don't provide the source for that part of the source. If you follow that reasoning you can ad infinitum claim the absence of sources since every source originates from something.


The source is the training data and the code used to turn the training data _into_ the weights. Thus GP is correct, the weights are more akin to a binary from a traditional compiler.


To me this 'source' requirement does not make sense. It is not that you bring training data and the application together and press a train button, there's much more actions involved.

Also the training data is of a massive amount.

Additionally, what about human in the loop training, do you deliver humans as part of the source?


> they also fail even that test. Neither Meta nor DeepSeek have released the source code of the

This debate is over and makes the open source community look silly. Open model and weights is, practically speaking, open source for LLMs.

I have tremendous respect for FOSS and those who build and maintain it. But arguing for open training data means only toy models can practically exist. As a result, the practical definition will prevail. And if the only people putting forward a practical definition are Meta et al, this is what you get: source available.


I'm not arguing for open training data BTW, and the problem is exactly this sort of myopic focus on the concerns of the AI community and the benefits of open-washing marketing.

Completely, fully breaking the meaning of the term "open source" is causing collateral damage outside the AI topic, that's where it really hurts. The open source principle is still useful and necessary, and we need words to communicate about it and raise correct expectations and apply correct standards. As a dev you very likely don't want to live in a tech environment where we regress on this.

It's not "source available" either. There's no source. It's freeware.

"I can download it and run it" isn't open source.

I'm actually not too worried that people won't eventually re-discover the same needs that open source originally discovered, but it's pretty lame if we lose a whole bunch of time and effort to re-learn some lessons yet again.


> it's pretty lame if we lose a whole bunch of time and effort to re-learn some lessons yet again

We need to relearn because we need a different definition for LLMs. One that works in practice, not just at the peripheries.

Maybe we can have FOSS LLMs vs open-source ones, like we do with software licenses. The former refers to the hardcore definition. The latter the practical (and widely used) one.


Sure, I don't disagree. I fully understand the open-weights folks looking for a word to communicate their approach and its benefits, and I support them in doing so. It's just a shame they picked this one in - and that's giving folks a lot of benefit of the doubt - a snap judgement.

> Maybe we can have FOSS LLMs vs open-source ones, like we do with software licenses.

Why not just call them freeware LLMs, which would be much more accurate?

There's nothing "hardcore" or "zealot" about not calling these open source LLMs because there's just ... absolutely nothing there that you call open source in any way. We don't call any other freeware "open source" for being a free download with a limited use license.

This is just "we chose a word to communicate we are different from the other guys". In games, they chose to call it "free to play (f2p)" when addressing a similar issue (but it's also not a great fit since f2p games usually have a server dependency).


> Why not just call them freeware LLMs, which would be much more accurate?

Most of the public is unfamiliar with the term. And with some of the FOSS community arguing for open training data, it was easy to overrule them and take the term.


Most of the public is also unfamiliar with the term open source, and I'm not sure they did themselves any favors by picking one that invites far more questions and needs for explanation. In that sense, it may have accomplished little but its harmful effects.

I get your overall take is "this is just how things go in language", but you can escalate that non-caring perspective all the way to entropy and the heat death of the universe, and I guess I prefer being an element that creates some structure in things, however fleeting.


> Most of the public is also unfamiliar with the term open source

I’d argue otherwise. (Familiar with, not know.) Particularly in policy circles.

> picking one that invites far more questions and needs for explanation

There wasn't ever a debate. And now, not even the OSI demands training data. (It couldn’t. It, too, would be ignored.)


The only practical and widely used definition of open source is the one known as the Open Source Definition published by the OSI.

The set of free/libre licenses (as defined by the FSF) is almost identical to the set of open sources licenses (as defined by the OSI).

The debate within FOSS communities has been between copyleft licenses like the GPL, and permissive licenses like the MIT licence. Both copyleft and permissive licenses are considered free/libre by the FSF, and both of them are considered open source by the OSI.


Open source means the source code is freely available. It’s in the name.


The source being available means the code is "source available." Open implies more rights.


People say this, but when it comes to AI models, the training data is not owned by these companies/groups, so it cannot be "open sourced" in any sense. And the training code is basically accessing that training data that cannot be open sourced, therefore it also cannot be shared. So the full open source model you wish to have can only provide subpar results.


They could easily list the data used though. These datasets are mostly known and floating around. When they are constructed, instructions for replication could be provided too


They could, but even if they give this list the detractors will still say it is not open source.


yes and as a bonus they may get sued, which in the long-term, makes free / offline models to not be viable

It would be so much better if all models were trained with LibGen.


Isn't this the same situation that any codebase faces when one thinks about open sourcing it? I can't legally open source the code I don't own.


Thanks, I was not aware of this distinction.

But I think my argument still stands though? Users can run Deepseek locally, so unless the US Gov't wants to reach for book burning levels or idiocy, there is not really a feasible way to ban the American public of running DeepSeek, no?


Yes, your argument still stands. But I think it's important to stand firm that the term "open source" is not a good label for what these "freeware" LLMs are.


Fair point, agreed.


There was an executive order passed by the previous administration that make using anything with more than 10 billion parameters illegal and punishable by government force if done without authorization. Of course like most government regulations (even though this is not a regulation, it is an executive action) the point is not to stop the behavior but instead to create a system where everyone breaks the regulation constantly so that if anyone rocks the boat they can be indicted/charged and dealt with.

https://www.federalregister.gov/documents/2023/11/01/2023-24...

>(k) The term “dual-use foundation model” means an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by: ...


That order does not "make using anything with more than 10 billion parameters illegal and punishable by government force if done without authorization".

It orders the Secretary of Commerce to "solicit input from the private sector, academia, civil society, and other stakeholders through a public consultation process on potential risks, benefits, other implications, and appropriate policy and regulatory approaches related to dual-use foundation models for which the model weights are widely available".


Many regulations are created by executive action, without input from Congress. The Council on Environmental Quality, created by the National Environmental Policy Act, has the power to issue it's own regulations. Executive Orders can function similarly and the executive can order rulemaking bodies to create and remove regulations, though there is a judicial effort to restrict this kind of policymaking and return regulatory power back to Congress.


There’s an effort to restrict certain regulatory rule-making where it’s ideologically convenient, but it isn’t “returning” regulatory power. That rulemaking authority isn’t derived by some bullshit executive order, but by Federal law, as implemented by congress.

Congress has never ceded power to anyone. They wield legislative authority and power of the purse, and wield it as they see fit. The special interests campaigning about this are extreme reactionaries whose stated purpose is to make government ineffective.


If I'm no wrong wasn't PGP encryption once illegal to export ? Not quite the same but the government has a nice habit of feeling like they can bad the export of research.

https://en.wikipedia.org/wiki/Export_of_cryptography_from_th...


Add PS1 too. The US government banned sale of PlayStation to China because the PLA would apparently have access to cutting edge chips for their missiles


You are right, but I cannot find a single example of such a ban actually being effective though. Information wants to be free and all that.


Because you haven't heard of the proprietary software that wasn't ever sold internationally because of these bans.

Of course Joe Sixpack can throw their code up anywhere, but Joe Corporation gets wrecked if they try to sell it.

https://developer.apple.com/documentation/security/complying...

For example, this is enforced by Apple Store.


But that's not the goal, the goal is to protect the "intelectual property" only to American companies. Countries not in the "friends list" cannot sell products in that area without suffering repercussions. That's how the US has maintained technological dominance in some areas by restricting what other countries can do.


If i remember correctly, if you changed the dropdown on the webpage to USA you could download the full version of PGP anyway.


Make commercial hosting illegal, and make the hardware to run it locally cost $6000+


They banned certain branches of math during the cold war, it can be done.


Such as?


All non-trivial encryption algorithms.

https://en.wikipedia.org/wiki/Crypto_Wars




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: