One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit. This will mean that US LLM companies will either fall behind or be too expensive. Which means China and other countries will probably surge ahead in AI, at least in terms of how useful the AI is.
That is not to say that we shouldn't do the right thing regardless, but I do think there is a feeling of "who is going to rule the world in the future?" tha underlies governmental decision-making on how much to regulate AI.
Well hell, by that logic average citizens should be able to launder corporate intellectual property because China will never follow suit in adhering to intellectual property law. I'm game if you are.
Yes, I was being a bit facetious. It was snark intended to point out that corporations don't get to have their cake and eat it too. Either everything is free and there are no boundaries or we live by our own principles.
>It was snark intended to point out that corporations don't get to have their cake and eat it too.
"have their cake and eat it too" allegations only work if you're talking about the same entity. The copyright maximalist corporations (ie. publishers) aren't the same as the permissive ones (ie. AI companies). Making such characterizations make as much sense as saying "citizens don't get to eat their cake and eat it too", when referring to the fact that citizens are anti-AI, but freely pirate movies.
Yes they are. Look at what happened when deepseek came out. Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony
>Altman started crying and alleging that deepseek was trained on OpenAI model outputs without an inkling of irony
Can you link to the exact comments he made? My impression was that he was upset at the fact that they broke T&C of openai, and deepseek's claim of being much cheaper to train than openai didn't factor in the fact that it requried openai's model to bootstrap the training process. Neither of them directly contradict the claim that training is copyright infringement.
It’s barely facetious though. What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?
>What is stopping me from “starting an AI company” (LLC, sure), torrenting all ebooks (which Facebook did), and as long as I don’t seed, I’m golden?
Nothing. You don't even need the LLC. I don't think anyone got prosecuted for only downloading. All prosecutions were for distribution. Note that if you're torrenting, even if you stop the moment it's finished (and thus never goes to "seeding"), you're still uploading, and would count as distribution for the purposes of copyright law.
You can make a patched torrent client that never uploads any pieces to peers. It'd definitely be within Meta's capability to do so. The real problem is that unlike typical torrenting lawusits, they weren't caught red-handed in the act, and would therefore be hard to go after them. This might seem unfair, but it's not any different than you openly posting on Reddit that you torrent, but it'd be tough for rights holders to go after you even with such admission.
> Previously, a Meta executive in charge of project management, Michael Clark, had testified that Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur," which seems to support authors' claims that some seeding occurred. And an internal message from Meta researcher Frank Zhang appeared to show that Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers. Once this information came to light, authors asked the court for a chance to depose Meta executives again, alleging that new facts "contradict prior deposition testimony."
>Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur,"
>Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers
Sounds like they used a VPN, set the upload speed to 1kb/s and stopped after the download is done. If the average Joe copied that setup there's 0% chance he'd get sued, so I don't really see a double standard here. If anything, Meta might get additional scrutiny because they're big enough of a target that rights holders will go through the effort of suing them.
> If the average Joe copied that setup there's 0% chance he'd get sued
Citation needed. RIAA used to just watch torrents and sent cease and desists to everyone who connected, whether for a minute or for months. It was very much a dragnet, and I highly doubt there was any nuance of "but Your Honor, I only seeded 1MB back so it's all good".
Well I always felt rebellious about the contemporary face of "rules for thee but not for me", specifically regarding copyright.
Musicians remain subject to abuse by the recording industry; they're making pennies on each dollar you spend on buying CDs^W^W streaming services. I used to say, don't buy that; go to a concert, buy beer, buy merch, support directly. Nowadays live shows are being swallowed whole through exclusivity deals (both for artists and venues). I used to say, support your favourite artist on Bandcamp, Patreon, etc. But most of these new middlemen are ready for their turn to squeeze.
And now on top of all that, these artists' work is being swallowed whole by yet another machine, disregarding what was left of their rights.
We regulate it like how we did centuries ago that lead to copyright. If we already have rules we enforce it. If no one in power wants to, we put in people who will.
In the end this all comes down to needing the people to care enough.
I don't like it either, but it still comes down to the same issues. We vote in people who can be bought and don't make a scandal out of it when it happens. The first step to fixing that corruption is to make congress afraid of being ousted if discovered. With today's communication structure, that's easier than ever.
But if the people don't care, we see the obvious Victor.
In the long run private IP will eventually become very public despite laws you have, it’s been like that since the Stone Age. The American Industrial Revolution was built partially on stolen IP from Britain. The internet has just sped up diffusion. You can stop it if you are willing to cut the line, but legal action is only some friction and even then only in the short term
I broadly agree in that sure, unfettered access to copyrighted material will AI more capable, but more capable of what exactly?
For national security reasons I'm perfectly fine with giving LLMs unfettered access to various academic publications, scientific and technical information, that sort of thing. I'm a little more on the fence about proprietary code, but I have a hard time believing there isn't enough code out there already for LLMs to ingest.
Otherwise though, what is an LLM with unfettered access to copyrighted material better at vs one that merely has unfettered access to scientific / technical information + licensed copyrighted material? I would suppose that besides maybe being a more creative writer, the other LLM is far more capable of reproducing copyrighted works.
In effect, the other LLM is a more capable plagiarism machine compared to the other, and not necessarily more intelligent, and otherwise doesn't really add any more value. What do we have to gain from condoning it?
I think the argument I'm making is a little easier to see in the case of image and video models. The model that has unfettered access to copyrighted material is more capable, sure, but more capable of what? Capable of making images? Capable of reproducing Mario and Luigi in an infinite number of funny scenarios? What do we have to gain from that? What reason do we have for not banning such models outright? Not like we're really missing out on any critical security or economic advantages here.
If common culture is an effective substrate to communicate ideas as in we can use shared pop culture references to make metaphors to explain complex ideas then the common culture that large companies have ensnared in excessively long copyrights and trademarks to generate massive profits is a useful thing for an LLM that is designed to convey ideas to have embedded in it.
If I'm learning about kinematics maybe it would be more effective to have comparisons to Superman flying faster than a speeding bullet and no amount of dry textbooks and academic papers will make up for the lack of such a comparison.
This is especially relevant when we're talking about science-fiction which has served as the inspiration for many of the leading edge technologies that we use including stuff like LLMs and AI.
Fair point, we use metaphor to explain and understand a variety of topics, and a lot of those metaphors are best understood through pop culture analogies.
A reasonable compromise then is that you can train an AI on Wikipedia, more-or-less. An AI trained this way will have a robust understanding of Superman, enough that it can communicate through metaphor, but it won't have the training data necessary to create a ton of infringing content about Superman (well, it won't be able to create good infringing content anyway. It'll probably have access to a lot of plot summaries but nothing that would help it make a particularly interesting Superman comic or video).
To me it seems like encyclopedias use copyrighted pop culture in a way that constitutes fair use, and so training on them seems fine as long as they consent to it.
This is pre iselt why we need proportional fees for courts. We can't just let companies treat the law as a cost benefits analysis. They should live in fear of a court result against their favor.
> One aspect that I feel is ignored by the comments here is the geo-political forces at work. If the US takes the position that LLMs can't use copyrighted work or has to compensate all copyright holders – other countries (e.g. China) will not follow suit.
Oh really ? They didn't had any problem when people installed copyrighted Windows to come after them. BSA. But now Microsoft turns a blind eye because it suits them.
Big "Mr. President, we cannot allow a mineshaft gap" energy going on, even if it's difficult for me personally to believe that LLMs contribute in any sense to ruling the world.
The government doesn't make tanks, it just shells out gigantic amounts to companies to make them.
That said, there are plenty of successful government actions across the world, where Europe or Japan probably have a good advantage with solid public services. Think streets, healthcare, energy infrastructure, water infrastructure, rail, ...
There's nothing special about the US government that makes it uniquely shit.
The difference here is that we have people like yourself: those who have zero faith in our government and as such act as double agents or saboteurs. When people such as yourself gain power in the legislator they "starve the beast". Meaning, purposefully deconstruct sections of our government such that they have justification for their ideological belief that our government doesn't work.
You guys work backwards. The foregone conclusion is that government programs never work, and then you develop convoluted strategies to prove that.
Medicaid, Medicare, and Social Security are all three programs that have massive approval from US citizens.
Even saying the military is a dumpster fire isn't accurate. The military has led trillions of dollars worth of extraction for the wealthy and elite across the globe.
In no sane world can you say that the ability to protect GLOBAL shipping lanes as a failure. That one service alone has probably paid for itself thousands of times.
We aren't even talking about things like public education (high school education use to be privatized and something only the elites enjoyed 100 years ago; yes public high school education isn't even 100 years old) or libraries or public parks.
---
I really don't understand this "gobermint iz bad" meme you see in tech circles.
I get more out of my taxes compared to equivalent corporate bills that it's laughable.
Government is comprised of people and the last 50 years has been the government mostly giving money and establishing programs to the small cohorts that have been hoarding all the wealth. Somehow this is never an issue with the government however.
Also never understand the arguments from these types either because if you think the government is bad then you should want it to be better. Better mostly meaning having more money to redistribute and more personal to run programs, but it's never about these things. It's always attacking the government to make it worse at the expense of the people.
The US federal government doesn't run most museums, but it does run the massive parks system with 20k employees (pre-Musk) and that system enjoys extremely high ratings from guests.
I mean, name 2 things anyone owns that aren't dumpster fires?
Long time ago industrial engineers used to say, "Even Toyota has recalls."
Something being a dumpster fire is so common nowadays that you really need a better reason to argue in support of a given entity's ownership. (Or even non-ownership for that matter.)
The same president that is putting 145% tariffs on China could put 1000% tariffs on Internet chat bots located in China. Or order the Internet cables to be cut as a last resort (citing a national emergency as is the new practice).
I'm not sure at all what China will do. I find it likely that they'll forbid AI at least for minors so that they do not become less intelligent.
Military applications are another matter that are not really related to these copyright issues.
I get what you're saying, but this is just a race to the bottom, no?
It's annoying to see the current pushback against China focusing so much on inconsequential matters with so much nonsense mixed in, because I do think we do need to push back against China on some things.
The design, manufacture and supply of electronics is far more important than one particular usage, e.g, "LLMs". It has never been a requirement to violate copyrights to produce electronics, or computer software. In fact, arguably there would be no "MicroSoft" were it not for Gates' lobbying for the existence and enforcement of "software copyright". The "Windows" franchise, among others, relies on it. The irony of Microsoft's support for OpenAI is amusing. Copyright enforcement for me but not for thee.
That is not to say that we shouldn't do the right thing regardless, but I do think there is a feeling of "who is going to rule the world in the future?" tha underlies governmental decision-making on how much to regulate AI.