The fact that they called this captioned as "The bottom photo is an AI-generated image created in six prompts using OpenAI’s ChatGPT", without actually releasing the 6 prompts is quite telling. Because that will show they were prompting things to match the original image.
Iconic images (mona lisa, tank man, widely reported news stories, styles like ghibli) would of course be incorporated in as styles. It doesn't refute fair use.
So, you can't say "draw this person in the mona lisa pose, in simpsons style" and then act surprised and shocked when the model does exactly that. That's not theft.
Well, for one, no “theft” occurred at all. There is a reason it is called “infringement”. This is like saying that our brains are “stealing” the art we look at.
Second, absolutely. Even if someone wanted to call it theft, the only theft occurred by the user, not the LLM. This is like saying photoshop is stealing because you can use it to recreate copyrighted images. Or, again, like saying when a user looks at a painting then paints something similar, they are “infringing”. What a silly, silly thing.
Beyond that, this cites Reuters v Ross Inteligence as a precious “win” against AI, but that case had literally nothing to do with AI. It was a straight direct infringement case. It’s the difference between “we had an LLM read the law and create notes”, and “we had the LLM rephrase the copyrighted notes directly so we could try to avoid licensing them”. That activity would be infringing with or without the LLM participation.
Even funnier is that none of the circled items are actually infringing. Right off the top, none of the positions line up with the original image, and none of the items match the original image. In fact, if anyone looks closer, the entire image is different and just shares similarities (likely prompted) with the original. The fire is different, the shadows are different, the persons pants, shirt, hair, mask are different. Literally nothing is the same other than the overall composition…which isn’t protectable.
Are they seriously trying to argue that any image that contains a street sign, tail light, and green blob, and looks similar to another image is, is therefore “infringement”? How many people have taken exactly the same photo of the Eiffel Tower from the same location? Did they all infringe on each other because they have similar features and compositions?
FYI, humans using their brains and corporate for profit products are two different things. How humans human doesn't matter and is in no way relevant. How a corporate product is made useful (in this case, by using other people's work) is what is up for discussion. If AI doesn't have a product without using other peoples work I don't see how you can claim that AIs output is not derivative. AI wouldn't be a useful product without without other peoples works, therefor AI is derivative. Doesn't matter how the human brain works, it matters if a product's value is derived from protected works.
If my product wouldn't exist without yours, it is derivative of it.
I don't understand the example image. The description says:
> The yellow circles highlight areas of similarity between the original photo and the AI-generated photo
What about the person in the middle, doing the throwing? It's exactly the same. Why don't they have a circle around them? Why highlight some lens flares & distorted faces when the actual subject is the same? Or was this some kind of image-to-image generation?
Following this logic, anything that is digital can not be stolen? After all, you only steal a digital copy - not the original, and not something that is physical.
In which case, the following is not theft:
- Pirating anything that is digital video
- Pirating anything that is digital audio
- Pirating anything that is digital text
- Pirating any software
etc.
EDIT: For the proponents of "it is not theft", what's your stance on personal data being stolen ("accessed")? If someone can access your medical records, are you fine with that? They can do whatever they want with that?
Yes. Theft explicitly requires "Intent permanently to deprive". Copying things is not theft because the supposed "victim" was never in fact deprived of their thing, not even temporarily.
England has a whole bunch of legislation to prohibit activities that are not theft because it turns out that sometimes we care about other things. TWOC is an example, "Taking Without Owners Consent", because it turns out that it's also very annoying to have people take your car and then drive it somewhere and abandon it for a laugh ("Joy riding"), compared to them say, stealing it to ship abroad, break into parts or just to set it on fire.
Insisting that it's all theft gets us in the same muddle as when we decide that holding up a protest sign is terrorism, or that a billion dollar bribe is speech.
Theft of IP permanently deprives the author of ownership claim. It’s quite a problem if you want people to innovate or create new stuff, because they usually do this with an expectation to be able to claim ownership and have a healthy degree of control over how the work is used and/or monetized.
Actually, IP theft is more of a problem than theft of physical property. The latter you can just buy again while the former robs you of intangible values that among other things may grant you the ability to just buy physical property. The latter doesn’t scale and is generally more difficult because it’s more visible, while the former can be (and is being, in fact) done at population scale without people realizing it until it’s too late.
The pretence that three almost entirely unrelated "intellectual property rights" somehow make ideas property, at least akin to personal property and sometimes even akin to real property (aka land to lay people) is very silly. Each is silly in its own distinct way, with the most defensible probably being Trademark, an extension of the rather older legal concept of "Passing off". But really abolishing all three of them might be a reasonable step, up there with getting rid of smoking.
Do you remember when there was smoking on aeroplanes? The little glowing sigil telling you not to smoke would go out at cruise altitude and then you could fill the already inadequate and stale air with smoke. Who thought that was a good idea? Well, a lot of the same people who thought Copyright was great. Maybe all the lead damaged their brains?
Does any of this somehow mean what the LLM companies are doing is good? No. But we can't understand what's happening if we insist on giving it the wrong name. Russia is not in Ukraine to undo nazification, the babies Israel is starving aren't terrorists. "But I want to use this word" doesn't make it the correct word.
Again, I agree it is not the same as theft of physical property. In fact, I point out that it is worse. Yes, unfortunately English language lacks a better word.
Correct. Piracy is potentially copyright infringement, not "theft". People insist on using that word, despite it being factually wrong, due to its emotional valence and persuasive value.
The author is a photojournalist, not a lawyer, and not qualified to comment on copyright law beyond simply giving his personal opinion.
Right. It's the debunked, bad-faith argument that "piracy is theft" because the pirates would have bought a movie were it not for The Pirate Bay, to justify calculating monetary damages that are completely unmoored from reality. We shouldn’t let people revive this nonsense, even in the context of AI.
In case of AI it is closer to actual theft because income (of an artist) is taken away, and someone else (AI capitalists) makes money from it. I don't think the comparison is unreasonable here.
How come it's just an "artist" and not an "artist capitalist"? And isn't this "rent seeking" behavior on the part of the artist?
It would be nice if the world used clinical terms to describe what's actually going on instead of using emotional terms to equate "copying" to "stealing" or "pirating", but people would rather confuse the issue than argue it logically in good faith.
OK in clinical terms, A wants to use X which is something that B produces, and by taking X, that will reduce B's opportunities to make an income, while at the same time increasing A's opportunities to make an income. Since A does not even ask for permission, in normal language we call that theft.
You’re suggesting that every time I build a set of shelves, I’m stealing from the shelving manufacturer because I deprived them of the opportunity to sell me THEIR shelves. Every time I borrow a friends car I’ve stolen from the rental company or Uber? What about if I make a song that kinda sounds like Daft Punk after listening to one of their songs, I “stole” their music? If I watch a movie at a friends house, according to you, didn’t I steal from the movie company and the actors? What if I see a game at the store and don’t buy it but plan to make one similar for myself, should I be arrested on the way out of the store for “stealing it”?
Because that’s what you’re saying and it’s certainly…a point of view.
> You’re suggesting that every time I build a set of shelves, I’m stealing from the shelving manufacturer because I deprived them of the opportunity to sell me THEIR shelves.
No, because you did not use any of their intellectual property in doing so.
You are conveniently ignoring the fact that A is taking something for which B did actual work.
So you pick the one and only example out of 5 that wouldn’t include intellectual property as a counter-point? Certainly makes it seem like your position is tenuous.
A song is IP. A car is not technically IP but merchandise is involved so I’d say that counts. A movie is IP. A game is most certainly IP.
By your opinion, observing that game before making my own “infringes” their IP. And same for the song. And by that test, watching a movie at a friend’s is piracy, and borrowing a friends car is robbing the car manufacturer of profits.
It really seems to fall apart if you simply place a black box over the “who” of creating something completely different, seemingly based solely on our biological memory not being digital.
IP theft involves deprivation of the holder of ownership claim. Observing a game before making your own game, listening to music before releasing your own track, viewing photos before making a go at the same composition, do not do that. Claiming them as your own, however, does do that. Using them to train a (commercial) system that allows anybody to recreate them and claim as your own does do that.
It is, actually, more of a problem than theft of physical items. The latter you can just buy again while the former robs you of intangible values that among other things may grant you the ability to just buy physical property. The latter doesn’t scale and is generally more difficult because it’s more visible, while the former can be (and is being, in fact) done at population scale without people realizing it until it’s too late. Also, expectation of ownership claim is a meta level of why people want to make more original things, including new and cool physical items.
You can hold that stance, but it completely undermines the source of income for artists or inventors. If you want a world where only rich and privileged people can afford the time to create art, this is the way to go.
Otherwise, we maybe can agree that people who make something should be eligible for some kind of compensation to encourage them to continue making, for our shared benefit.
You can argue that Beyoncé and George Clooney and Stephen King are so rich already they don’t need the money anyway, but that omits how even these people had to make a career from the bottom up on the sole premise that their focus on their art (regardless of your opinion on it) will pay the bills.
So just saying piracy isn’t theft and thus isn’t a problem is a wholly undercooked answer to a difficult problem.
The "therefore isn't a problem" was not, I believe, present in the original post. It's entirely possible to believe that copyright infringement is not theft and yet also still a problem. It's mainly an appeal to not misrepresent the situation, because theft in a literal sense refers to a situation which is generally worse than copyright infringement.
I fail to see how it’s not just nitpicking. Yes, pirates only copy something and the original owner still gets to keep it.
The end result however is that someone gets something for free that they otherwise would have to pay for, similar to sneaking on a train or into a concert. In these cases, I don’t think many people would argue that what you’re doing isn’t wrong, or meaningfully different from theft; safe for lawyers of course, but I fail to see why the quibbling is relevant for normal people.
It's, if anything, more important for normal people: I think there's a very important moral difference between something that actively harms another person and not paying for something that benefits you. The latter, for example, should be much more acceptable for someone who lacks the means to pay to do. It's also much more important, (i.e. we should spend more resources), to prevent theft. Preventing piracy and your other examples is only important to the extent that it keeps the endeavor viable, or at least the effort in the prevention is worth the increase in payments. Coalescing this all into one thing is behind a lot of what I consider to be very harmful rhetoric and politics.
If you follow this logic, it gets really close to arguing that open source is price dumping. Nobody has an obligation to an income in a particular field of work.
Open Source is the conscious renunciation of any profit and intellectual property rights, whereas artists generally expect compensation for their work.
Plus, open source code gets mostly written by people with a stable and well-paid day job. Take that away, and I doubt there’s much open source code left.
So it's not just price dumping, it's also gatekeeping against unemployed developers. What a terrible practice.
Open source only looks better here because it has a different cultural context. But to a person trying to make money in the field, I'm not sure why that should matter.
People have been using the word for thousands of years so they may not stop now because you say so. (Romans were already using “furtum” - theft - and “plagium” - kidnapping - for ideas and not just for things and people.)
Absolutely, yes. Even as a professional software engineer I think software should be free, and tell everybody I know to pirate software, if they can.
Imagine we discovered a way to generate almost unlimited energy for very cheap, then we told poor people if they want any they have to pay it at the same price per kilowatt hour as current energy or it's stealing. It would be morally wrong. Digital content is the same, current copyright laws are unethical.
> tell everybody I know to pirate software, if they can.
I can see where you’re coming from on a philosophical level.
On a practical level that’s just asking for malware, especially for software that’s been cracked.
I would tell people the opposite.
What happens is you just copy and/or modify something. Kinda like duplicating money, it's not theft since you don't steal from anybody, no ownership or similar changes.
Makes sense. There is very little in common between physical theft and unauthorised information copying. It may be an act worse than theft in ethical or other way, but it's a different thing. Crude theft analogy is good to colour it a particular way and evoke emotions, but harmful for a reasonable discussion.
Exactly. Copyright infringement (a.k.a. pirating) is not theft. The owner is only deprived of a theoretical revenue/leverage/whatever (s)he could have exerced, but (s)he can still enjoy its original work.
I don't feel this is quite the gotcha argument you believe it is. Your last line is indeed correct, he would still have his card, so no theft would have occurred. Just having a copy of said card is not theft. Likewise should a person use said card for a nefarious purpose, that is still not theft, that is fraud.
There was a good essay I saw a bit ago, talking about how this shift from "fraud" to "identity theft" neatly started shifting the victim from the bank to the individual. E.g., 30 years ago, if someone claiming to be me went to my bank, asked for my money, and the bank gave it to them, then the bank that was a victim of fraud. But now, if someone goes to my bank, asks for my money, and the bank gives it to them, then I am the victim of identity theft.
The difference is subtle, but potentially important. If the bank unfortunately gives money to someone else, that's their problem: I can say to the bank, I'm sorry you were the victim of fraud, but you still owe me my money. If I unfortunately "have my identity stolen", then that makes it seem like it's my problem -- the bank may say, we're sorry you "had your identity stolen" and thus lost your money, but that's not really our problem.
Well, not necessarily, on your last sentence. It might also be theft, depending on the precise nefarious purpose and on the jurisdiction. If you take somebody else's property without their consent, that's typically theft, even if the "property" is money in a bank account and no tangible physical object changed hands, and even if the method of taking involved deception. Fraud and theft overlap.
If I steal your ID card, then nothing really happens at all. I can spend years watching it, draw something on it, cut it in pieces, whatever.
The moment I start trying to impersonate you, then it becomes a problem for you.
If Bobby the metalhead downloads a copy of Enter the Sandman, then Metallica and brand doesn’t lose anything at all. If you were to make it industrial, then maybe we can talk about it.
People use "theft" because it's the terminology the companies responsible for the failure insist on using. The word makes it sound like the person failed to protect their identity, and hides that it was the company which failed to validate the identity.
This should be called "identity validation failure".
When scammers impersonate a company to steal your money it's no longer called identity theft.
Digital music is infinitely reproducible, but that doesn't mean that I am allowed to illegally download it. It is similar with articles. The author decides who can consume their art,
> doesn't mean that I am allowed to illegally download it
That word "illegally" is carrying some weight there. You only get to have those rules in law because the rest of us agreed to them in some sense. And in this particular case, the laws were created by powerful and wealthy media industries bribing politicians.
There's not some universal truth about fairness here. It's just a set of conventions where people with guns show up and lock you in prison if you bypass someone else seeking rent.
What?! How would that even happen? Unless you limit your definition of art to performing at salon events, that doesn't make much sense. Typically art is released into the world and at best the authors can get a bit of rent from the people who consume it (typically via a publisher), but they don't have any control over who the consumers would be.
My point is that just like I decide if I am going to distribute to streaming services, I also should be able to decide if I am going to allow models to train on my work. Isn't this something obvious?
It is not just not obvious to me, but goes entirely against my mental model.
Staying with your example of music or book authors - a musician or author might choose which distribution platforms they work with, but they don't have the ability to tell a particular shop to not resell their record/book, let alone to tell a particular "consumer" not to listen to it / read it.
Yes, I agree with you and when you give distribution rights to a third party they have the distribution control. How I imagine this mess being resolved is by a middleman, like the distribution services in music, there need to be companies licensing training data, from which artists could take a small cut.
I think the issue here is the theft of livelihoods.
If you spend months crafting something unique with your livelihood being based on then selling access to that creation ( whether it be music, software or prose ) and somebody copies it in a way that deprives you of a livelihood ( and replaces it with a revenue to the entity that copied it's ) - it's that theft of your income stream?
Do you feel the same about spreadsheet software reducing the need for accountants? What about textile workers creating fabric for clothing by hand? Or do we only romanticize artists such that they're entitled to an income?
Of course not - you are confusing competition with theft.
It's the difference between you writing a piece of competing software and taking my market share, versus you stealing my software and selling it as your own.
Or you attempting to write a competing best seller versus copying mine wholesale and selling under your own name on Amazon.
One activity is ultimately creative destruction that pushes society forward - the other is simply destructive free riding.
ie theft involves no creation of value.
Now the situation with tech companies is that there is theft, but also value add ( like somebody stealing your book and putting a better cover on it online ).
Just because there is some value add doesn't forgive the theft.
Did you edit your post above? Even slightly? Because I was replying specifically to the idea that artists have a right to a livelihood, and I think you changed a few words to add the copying bit. I should be more consistent about pasting in the text I'm replying to.
Regardless, you really should learn the difference between "theft", "copyright violation", and "training", because they are different things.
If I read a bunch of books, learn from the authors' use of word phrasing, then pick one of Vonnegut's 8 Story Shapes, and make a book in some author's style, it's not illegal. I don't see why I can't have a computer do that for me.
None of this really matters. If you feel strongly about it, you should go bribe congress to make a law. Because the existing laws about theft and copyright don't cover "learning from billions of examples and interpolating or extrapolating from them".
I edited it to make it more readable - not changed it's meaning.
In terms of the substance - bottom line they have taken something without permission and sold it on. Sure - they have added value in the process - but if I steal your car and mash it up with another one before selling it on - it's still theft.
The original post was implying there was no harm as a result because it's just copying - the original owner was not deprived of anything.
My point was they are potentially being deprived of a living - and that's through stuff being taken without permission - not through fair competition.
What matters here is not the semnatics of theft or copyright or whatever - what matters is fairness - and I accept that's a judgement.
I don't see a problem with these companies having to either pay to incorporate material into their models or/and the authors having the right to refuse to license.
Note - that's not to stand in the way of the development of these tools, but to ensure that the effort that went in to creating them ( which includes the generation of the source material ) is properly rewarded.
If OpenAI etc al think creation is a trivial part and it doesn't need rewarding - they are free to bootstrap their models by creating all the inputs from scratch.
Perhaps you think it's fine if I took a copy of ChatGPT model without permission and started a competing service - which was cheaper because I didn't have to pay for the training costs?
They haven't lost anything - just took a copy.....
Are they going to stand in the way of me making the output of chatgpt more widely available through my cheaper pricing?????
And note I'm selling access to the output - which is different everytime ( I use a different random number seed from them ) - so I'm not selling the copy of model per se...... perfectly fair use....
> Perhaps you think it's fine if I took a copy of ChatGPT model without permission [...]
There are laws about copyright and trade secret.
> They haven't lost anything - just took a copy.....
Correct. This is why it's a copyright violation and not a theft.
> [...] fair use....
Fair use is also a legal term, and it has some (reasonably) specific meanings. It's noteworthy that the large copyright protection industries don't respect those terms and have automated DMCA takedowns to abuse people for things which age legal:
"the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright."
> This is why it's a copyright violation and not a theft.
But is it? Copying the OpenAI model is only potentially copyright - as you have to prove it's not exempt via fair use etc. Note I'm not selling it on - I'm just selling the output - which isn't soley determined by the model - it's determined by the model plus random numbers plus context - what I'm selling is only partly determined by the source model I copied.
Now if I copied it and used it to undercut your original buisness - then clearly that's not fair use - but that's rather my point no?
These companies have clearly copied source material without permission on a huge scale - but because it's copying and the people haven't lost the original - there is in effect another test - do the original people lose out as result etc.
It's quite clear - say in the news industry which might be supported by advertising - that copying content and then presenting a summary version so that people never visit the source material is clearly damaging the underlying copyright holders.
> I edited it to make it more readable - not changed it's meaning.
I think you might have added the copying bit, and that changed its meaning if not the whole topic. Then you claim I'm confusing competition with theft instead of addressing the "right to a living" part. That's kind of insincere and dishonest on your part, but fine, this other topic is interesting to me too.
I have no idea whether OpenAI, Google, Meta, Anthropic, or any other company got valid licenses for all of the books they trained on. If they didn't, they likely broke some specific laws. Go after them for that if you want. This is copyright violation, not theft.
But if I legally obtain a hundred books and pay a really smart kid to read them all to learn their style, then I pay that kid to write a new book using the style he's learned, that all seems fair and legal to me. It's the way things have been for a very long time.
For any argument you're going to make about this, please imagine a really smart kid doing it instead of a computer. And if you think there should be different laws for computers vs really smart kids, go get it into legislature.
> That's kind of insincere and dishonest on your part,
Ad hominem attack - great.
> This is copyright violation, not theft.
I'm arguing it's copyright violation because it's theft of revenue. If it didn't result in any loss of revenue then it would hard to argue it wasn't fair use.
Note I'm not using theft in any special legal sense - just in the common sense English sense.
If 'learning their style' included incoporating large recognisable chunks - that smart kid would fail his English degree on plagarism grounds.....
The point about LLM's is they an do everything - from an unrecognisable 'original' mashup - to what are quite clearly regurgations of the input. Note also that the kid didn't steal the books he learnt from.....
The question is what's happening fair and good for society, not what is convenient for some very well funded companies in a hurry who see existing laws as annoying things getting in their way, rather than something to respect.
No, it was not. You moved the goal posts from artists deserving a livelihood to copyright issues, and I called you out for that.
> Note also that the kid didn't steal the books he learnt from.....
I should note this? I explicitly stated it as part of the hypothetical.
As I said above, there are already laws about copying. If you're sure they broke those laws, maybe you should criticize the powers that be for not enforcing them.
> The question is what's happening fair and good for society
I think this is a good question.
Personally, I think the benefit from having automated tutors that are attentive and patient and can answer questions about almost any topic known to man dwarfs the benefit from defending intellectual property. I hope they get cheap enough to be accessible to every person who can't afford a traditional education and accurate enough that we trust them more than typical teachers (not a high bar, unfortunately).
I donated to Wikipedia for years specifically because of its educational value while being freely available. Watching people I know learn from LLMs, and do useful interesting things with what they learned, I think the potential is much higher.
>Personally, I think the benefit from having automated tutors that are attentive and patient and can answer questions about almost any topic known to man dwarfs the benefit from defending intellectual property.
Why is it one or the other? Your argument is like saying we shouldn't pay nurses a fair wage because it get's in the way of great care for everyone.
It's not an either/or situation - it's how you allocate rewards for the different contributions to the new tech. Currently tech companies are saying there is zero value in that training data - that's clearly not the case.
I think I finally understand your misunderstanding - I'm not arguing AI should be banned as it destroys a musicians job because in the future all music will be AI generated. That's not my concern - I'm not saying anyone deserves a job in perpetuity.
My point is simply that in building the models they have to respect the current laws - and that means respecting the content owners rights and either paying what they ask or not using it.
> Your argument is like saying we shouldn't pay nurses a fair wage because it get's in the way of great care for everyone.
This argument strategy, where you make a strained analogy/metaphor, and then apply it back to the original topic - it's fragile and depends on how comparable the two ideas are. If you're just interested in winning discussions, it's a bad tactic because it opens up a whole new avenue for your opponent to attack.
Can I COPY nurses into equally valuable robots? Because if I can, then yeah - the world would be a MUCH better place with abundant and affordable nurse robots, and the human nurses can go find other jobs. I have some friends who are nurses, and after watching them fight with the medical system for their own health issues, I'm pretty sure they'd agree.
Picking at tangential points while avoiding the main argument hmm....
Admit it - you misunderstood my original point and accused me of then changing the argument.
Bottom line - the original poster was implying there was no harm because simple copying doesn't create a loss. I was pointing out that a key test ( in considering copyright issues ) is whether such an action causing harm - and in this case there are many very good cases to be made about resulting loss of revenue.
Let's be clear, I think LLMs etc are a huge technical advance - I just think it's wrong to try and ignore the law because it get's in the way of large companies attempts to make money.
> Picking at tangential points while avoiding the main argument hmm....
I've tried (and occasionally failed) to avoid the parts of what you wrote which were just the typical flame war bait. And of course I'm guilty of trying to antagonize you in a few places. The topic is interesting, but our conversation about it was not.
I appreciate the link to the UK law, but the rest of this comment thread is mostly two people talking past each other.
Because that's how it works in reality. Once the copyright holders get their teeth in something, it gets paywalled. For instance, poor people don't have (free/legal/easy) access to lots of research papers/articles which were paid for with government grants. And copyright industry associations (MPAA, RIAA, CCC, AAP, ...) lobby to extend the laws so that creative works take lifetimes to enter the public domain.
You think you're arguing in favor of the little guy who made a series of blog posts or digital art? That's naive.
> My point is simply that in building the models they have to respect the current laws
So go enforce those laws. The rent seekers will thank you.
> So go enforce those laws. The rent seekers will thank you.
Seems you have bought into the idea idea that companies like Google, Facebook and Microsoft are the poor little guys. Wow.
What we are talking about here is certain companies trying to gain a defacto monopoly on the sum of human knowledge - without paying any of those people who built it in the first place.
This is the real story.
Now it may well be their moat isn't as big as they thought it was and the greedy investors trying to do this heist will fail - but that's what they are attempting - and you are cheer leading for it.
> No, it was not. You moved the goal posts from artists deserving a livelihood to copyright issues, and I called you out for that.
Nope. I never said artistic's deserve a living - I said that people deserve protection from their living being stolen via copyright violation. You are confusing what I said, to what you mistakenly understood. I don't understand why your original misunderstanding is somehow a character flaw of mine.
Note how there is an exception for automatic processing - but only for non-commerial use and note the researchers still have to pay for access.
There is also a 'fair dealing' clause.
The key clause here is:
"does using the work affect the market for the original work?"
Content creators are strongly asserting that what these tech companies is doing does.
Also
"s the amount of the work taken reasonable and appropriate? Was it necessary to use the amount that was taken? Usually only part of a work may be used"
Obviously the entire works are being consumed.
> Because the existing laws about theft and copyright don't cover "learning from billions of examples and interpolating or extrapolating from them".
Laws are written in a way where they attempt to predict the future - they are written from a first pirnciples approach - such as the fair dealing clause above - so your claim that it doesn't explicitly ban it is a red herring.
I assumed you were in the US, where companies like OpenAI, Google, and Anthropic are located. If they're breaking UK laws, you should appeal to your government to enforce those laws in your territory, or ban access to them, or whatever you think is relevant.
> [...] so your claim that it doesn't explicitly ban it is a red herring.
Why should I care about your laws any more than any other country I don't live in?
The point I'm trying to make is the ways the laws are structured leaves them open to interpretation - and that's deliberate because trying to nail down everything up front is bound to fail and will allow people to evade the spirit of the law.
ie what matters - both in interpreting todays laws and if they are insufficiently clear, drafting future amendments - is why the law was created in the first place.
The UK law is well drafted and makes it very clear that the aim of copyright is not to stop you copying something per se - but to copying something for commercial gain that is simultaneously damages the copyright owner. I suspect those principals are the same the world over - whatever the exact drafting.
So the question you have to address is do the actions of the AI companies fit these critieria.
Are they copying copyright material without permission - tick.
Are they making money as a result ( it's a commercial operation ) - tick.
Are they damaging the original copyright holders in the process - this is the only one remotely in doubt - but I'd argue it's pretty clearly a tick in many areas - bit bit less clear in others.
In terms of lobbying - it's the big tech companies that are currently trying to get the law changed in the UK - to make what they have done legal ( while stil arguing they haven't done anything wrong..... as laws are not typically changed retrospectively ).
From what I read, I agree. However, UK law isn't very relevant to the matter. It's a small market among many that doesn't create the models or much of the content to train them.
Perhaps - though the UK punches above it's weight in UK English language cultural output - I'm sure you have come across some of it.
By the way, once the UK ruled large part of the world - including the US - but that empire wasn't sustainable for a small country as the other countries caught up in terms of development.
The US is currently facing that issue, and I have to say, not dealing with it very well. The US is going to need friends on the way down and right now all it's doing is making enemies.
Yeah, the US is doing some terribly stupid stuff. No disagreement there.
However, the UK might be more relevant if you hadn't withdrawn from the EU. From the outside it sure looks like you guys decided it was more important to keep out the poor people (or other ethnic backgrounds) than to be part of something with actual collective bargaining power.
The debate about leaving the EU was multi-faceted [1] - but a significant part was a kind of nostalgia for when Britain was indeed Great, and the idea that a UK free from the shackles of the EU could be great again. Take back control was the slogan - a complete misunderstanding of the difference between lost sovereignty and pooled sovereignty.
I see the same forces driving the US now as it undermines international organisations.
[1] and sure xenophobia played a depressingly large part.
I hadn't thought about it before, but I think the nostalgia angle can explain a bunch of the US attitudes. It seems like a lot of people across the spectrum think the 1950s were a better time: The bigots because of race issues, the incels because of sexual expectations, the young generations envying (and resenting) how the boomers "had it easy", and the hippies thinking new technology is ending the world.
I think they're all wrong, but there's no fixing it. Assuming there are no civil or world wars in the near future that radically change the trajectory, China will rise and the US will end up lower on the ladder.
> nostalgia for when Britain was indeed Great
I'm sure you didn't intend it, but that capital G sure sounds a lot like the slogan for a US political party.
Does spreadsheet software have to be trained directly on the copyright-lrptected work of the accountants it's replacing without their consent (and often against their loud protesting)?
I wouldn't call it stupid but I do expect that the legal cases will die somewhere along their way to the supreme courts.
Several reasons:
1) The cat is out of the bag, AI is a thing now. It's not going back in the bag. So, artists are going to have to adapt to that. And are already adapting.
2) AI affecting artists is not any different than photography wiping out the market for portrait painters. Or records wiping out the market for selling music in paper form so that professional musicians might reproduce that for an audience. Those things happened a long time ago of course and any copyright issues around that were resolved over time. Copyright law hasn't really changed much since before that happened. These are the same kind of questions that exists for AIs being obviously inspired by but not really perfectly copying songs, images, text, etc. And they have answers in many decades of case law. Judges are going to take all of that into account and are historically reluctant to introduce new interpretations.
3) AI companies are big enough to outright buy the larger copyright holders. That doesn't mean they will; but it suggests they might reach some settlement that doesn't involve pleading guilty in a court. In the end this is about money, not about principles. At least not for the parties paying the lawyers.
4) If a settlement doesn't happen, judges will be forced to look at existing cases to assess what is and isn't an infringement. And if you remove all the outrage and moralism from the equation (which judges tend to do), AI companies are simply not distributing copies of original works to anyone. They are using them, for sure. But it's distributing copies that gets you in trouble. Not using copies. That narrows it down to whether those copies, which are freely distributed on the internet, were obtained legally.
I think they are more scared that the AI models will reproduce their original content in odd contexts that may bring liability to them and cause overall pandemonium.
I find the argument “Nothing was stolen” very weak compared to “The infinite endeavour of preventing private people from copying bytes, shouldn’t be a prerogative of the state” / ideas are copiable and can’t be decently guaranteed by the public against copy.
That is, public funds, tribunals and lawmaking power, shouldn’t be used to protect a private interests.
Corollary: 1. DRM is ok. 2. If one cracks it, it’s ok too, 3. You gotta find other ways to conduct business than retaining information (licenses, movies, etc.) 4. For software companies, cloud is one of those ways, compared to licensing downloads, 5. Netflix is the cloud of the movies and you pay for earlier access to a shared experience that is synchronized with other friends who will watch the same thing, 5. Patents are another stupid attempt by the state to protect corporations against citizens.
I would just establish that all references to "theft" and "stealing" in the realm of copyright (with the notable exception of plagiarism) is metaphor and emotional rhetoric. Historically it would come from copyright interest groups who want(ed) to use criminal police to enforce their state-granted copyright privileges[1] against regular people.
Sadly these things are often decided by rhetoric in society, but then again, there's no actual debate if it's just throwing slogans.
Now some of the same rhetoric is used in the AI battle. The only question worth asking here is what's the social benefit, as human culture is by nature all commons and derivation. But in this case, the AI companies are also accumulating power, and LLMs are removing attribution which could be argued to discourage publishing new works more than piracy. A "pirate" may learn about you and later buy from you in different ways, a LLM user won't even know that you exist.
[1] Not even discussing how exaggerated these privileges are from what would be reasonable.
Because if you present yourself as the author, it follows that the actual author is deprived of attribution. So you are actually taking something from that person.
LLM could commit plagiarism if authorship of generated media was claimed for either the LLM or its creators.
Hopefully I can get you to do my taxes. when you’re done I’ll just make a copy and then not pay you because I’ll have not stolen anything and so I don’t owe you any money
In terms of property theft: exactly. Which is why your tax advisor would insist on a contract with you that outlines compensation for services rendered, time and material, etc.
not sure I get your point. So because a movie producer didn't sign a contract with a pirate it's okay for the pirate to copy the movie without compensation and that's the only difference?
To me, in both cases someone did some work that someone else wants. In both cases they should pay for that work. If they are not willing to pay the price the person who did the work is asking, then they should go get work from someone else. At no point should they just say "well, I never contracted with you so therefore making a copy of the work you did is totally cool"
I was talking about the definition of theft and whether what you did in your example constituted it.
If I did your taxes (as in: did all the calculations) and you took a picture of my results, copied the values, etc. no theft happened. If I filled out your tax return and you took that without paying then obviously theft happened, but I assumed you didn't mean that since then your example would have no connection to TFA.
What did happen, though, is exactly what you described: you asked someone to do work for you ("contracting"/"work for hire") and they did just that. Then you decided not to pay them, which is a simple civil law case of contract fulfillment.
EDIT: depending on the "creativity" of my tax calculations, copying them might be considered IP theft and I could come after you using the DMCA, but I guess creative accounting only goes so far ;)
He gives two examples that are meant to prove his point, but they don't really convince me.
The first example is the image: the AI has seen the famous photograph of the Ferguson riots, and (with 6 prompts?!) manages to get something fairly similar. But suppose a human had seen that photo, and then you asked that human to draw you a photo of the riots; and then continued prompting them to make it look similar. Is it really unrealistic that the human could generate something that looks as similar? Is the human themselves therefore inherently a violation of copyright?
The NYT article to begin with looks a bit more damning -- except that, it appears that they prompted the AI directly with the beginning of the article. My son, when a toddler, for a long time could recite nearly the full text of his favorite story books with minimal prompting -- does that mean he's inherently a violation of copyright? Because he can recite The Gruffalo almost verbatim when prompted, is he a walking violation of Julia Donaldson's copyright? What about people with photographic memory, that can recite long sections of books verbatim -- are they inherently violating copyright?
Now sure, in both cases, the output might be a violation of copyright, if it's clearly derived from it -- both for humans and for AI. But I don't think the fact that AI can be prompted to generate copyright-violating material is proof that the AI training itself has violated copyright, any more than the fact that a human can be prompted to generate copyright-violating material is proof that human training has violated copyright.
A human doesn't ingest half of the web and simultaneously deal with millions of people.
We've be been through this times and times again. Justice didn't go after humans copying books by hand, it went after reprints of existing copyrighted material.
Music industry didn't go after people singing tunes in their kitchen, but after wide distribution networks.
Removing scale from the discussion leads to absurd conclusions.
> Justice didn't go after humans copying books by hand, it went after reprints of existing copyrighted material.
Not sure who "justice" is, but copyright owners most definitely did go after individuals copying even small excerpts of copyrighted material, in fact that is currently still going on in libraries:
I agree that we have a new situation developing; but we're not going to get any clarity unless we see clearly what the new situation is. There are several things you're still conflating:
1. The ability of an entity (human or AI) which can be prompted to produce copyright-infringing material.
Sure, if OpenAI is actually producing copyright-infringing material at scale, unprompted, then that needs to be addressed. If a common way around NYT's paywall were to copy & paste the first few lines into ChatGPT and then read the rest of the article, then yes, that's a hole that needs to be filled. But that's with the production and dissemination, not the training.
Regarding scale, yes, there is a difference here, but it's more subtle than you think. There are probably millions of toddlers who can recite The Gruffalo nearly verbatim. However, each of those toddlers were trained individually. Similarly, there are probably thousands, maybe tens of thousands of artists who, when prompted, could generate an image that would be similar enough to the presented image to violate copyright. But again, each of those individuals were trained separately.
The difference that modern tech companies have is that they can train their systems once, and then duplicate the same training across millions of instances.
One potential argument to make here would be to say: Training with this material is fair use; but fair use or not, those weights are now a derivative work. You can use exactly one copy of those weights, but you can't copy those weights millions of times, any more than it would be fair use to distribute one copy of that image to everyone in the company. You need to either train millions of copies, or pay licensing fees.
I'm not sure I agree with that argument, but at least it seems to me to bring the actual issues into more clarity.
It's obvious to me that the converse is true here as well... Fair Use is NOT theft.
I've seen plenty of photos of "The Bean" in Chicago, most of which aren't mine if I look really, really closely, but that's all ok. I'd be an idiot if I got upset by any other similar photos.
Clearly copyright needs to be updated for the modern era, but perhaps it's reputation that should replace it? Instead of focusing on the content, let's instead work to secure the sources of information, so that we know a story is from a given source, and everything has a digital provenance.
I'm not AI's biggest fan, but it is training it's virtual brain on images and data, just like a creative human would, yet it is just far quicker at that.
Do we need to limit scraping and learning to only free publicly available data? Therefore is a Reuters photo still not publicly available for learning style of photo, composition, lighting, even with a watermark. What you pay for is the license to reproduce the image, you can still look at them beforehand.
What really matters is the society where we want to live in. It doesn't matter much what kind of technology allows private entities to reproduce creativity. We can assume these brains are not virtual and are actually organic and more capable than human ones, or that they are magical black boxes.
Since it changes incentives and mechanics of the creativity market so much it forces us to reassess current approaches. I can't agree that mechanism behind this tech and the approach to IP is of any consequence. We don't make laws, norms and judgements for the sake of our tools.
It's ok to say that we're not ready to arrange things in a proper way yet, without letting everything slide on some arbitrary technicality.
I think the main issue is that it uses monopolistic behaviour to press its advantage. It feels as if Google et al are scalping information from the actual producers.
There is also the nonconsensual aspect of it. I make things for humans. Just because it’s free for humans does not mean it’s free to be used by corporations, especially they use my work to kill the economics of my industry. It feels like a parasitic relationship.
As a human I am not allowed to freely acquire anything I want. I have to pay, rent, license. This makes me unable to learn and train my brain in many ways I would like to.
Yes, all that data that ai is sucking up belongs to Google, twitter and Facebook! Lol. They are the owners of it. Those corporates didn't commit legal theft!
So ridiculous.
The whole idea of copyright is wrong. Anything that is popular and therefore successful, owes that to the crowd that made it popular. The crowd has is what gives it the interest.
I personally wouldn't even be averse to some sort period of copyright, say 2 years, with the possibility to extend to say 5, but these things are common.
My problem with AI is that it has humanity mediocre, filtered for the masses culture baked into it by definition.
This is machinery that can not suggest experimental jazz- unless it is already mainstream. It can not go to the fringes and shove the species towards new and wild discoveries.
I find it hilarious though, that it may destroy the cultural behemoths with their IPs by mashUp. Death by a million beatles-clone songs, yesterday came suddenly indeed. And after all that is sad and done for- some of us, might even venture out to the fringe and find the weird and wild parts again.
Humanity rarely ventures out to the fringes on its own anyway. Only the committed freaks out there between the creation of experimental jazz and the mainstreaming of LLMs.
The more it changes, the more it stays the same.
As a sibling commenter said in fewer words: the models will still consume the fringe content and will be able to regurgitate it given the right prompt; suggesting experimental jazz or whatever other fringe art pursuit the committed freaks have an itch for.
I'll still slowly progress may way through John Zorn's catalogue, and occasionally re-invigorate my appreciation for HR Giger's bleak works, and keep listening (and subscribing) to the local community radio station, and seeking out uncomfortable movies, safe in the knowledge that likely I'll be watching, listening, and appreciating them all in solitude.
I do not think that "culture" can be forced, it must be slowly, almost imperceptibly, absorbed.
I think most people are already pretty annoyed with how copyright law makes very reasonable applications of technology impossible. What these whiny retards want to do is put copyright between everyone and a technical revolution.
Do that and copyright will go away entirely. If you like copyright, try to come up with practical and reasonable compromises. Don't try to break things.
One of two legal outcomes must follow from OpenAI's piracy.
1. OpenAI must be fined according to law.
2. Piracy is decriminalized.
Failing to do neither is an admission that the US has become a corporatocracy. That is, a form of oligarchy where the rule of law is not by a majority or plurality of people, but by a number or corporations you can count with one hand.
As much as I think that what these companies is doing has moral and legal issues, reframing copyright violations as "theft" will cut both ways and make the arguments for open culture more difficult.
Iconic images (mona lisa, tank man, widely reported news stories, styles like ghibli) would of course be incorporated in as styles. It doesn't refute fair use.
So, you can't say "draw this person in the mona lisa pose, in simpsons style" and then act surprised and shocked when the model does exactly that. That's not theft.