Did OpenAI abide by my service’s terms of service when it ingested my data?

cortesoft · 2025-01-29T22:17:12 1738189032

Did OpenAI have to sign up for your service to gain access?

lolinder · 2025-01-29T22:22:11 1738189331

It probably ignored hundreds of thousands of "by using this site you consent to our Terms and Conditions" notices, many of which probably would be read as prohibiting training. But that's also a great example of why these implicit contracts don't really work as contracts.

otherme123 · 2025-01-29T22:39:43 1738190383

OpenAI scrapped my blog so aggressively that I had to ban their IPs. They ignored the robots.txt (which is kind of ToS) by 2 orders of magnitude, they ignored the explicit ToS that I copypasted blindly from somewhere but turns out it forbids what they did (something like you can't make money with the content). Not that I'm going to enforce it, but they should at least shut up.

freen · 2025-01-29T22:37:15 1738190235

Civil law is only available to deep pockets.

Contracts are enforceable to the degree to which you can pay lawyers to enforce them.

I will run out of money trying to enforce my terms of service against openAI, while they have a massive war chest to enforce theirs.

Ain’t libertarianism great?

blibble · 2025-01-29T23:09:01 1738192141

solution: live in a country OpenAI can't get to you

e.g China

staunton · 2025-01-30T07:32:42 1738222362

Are you suggesting it's easier to successfully sue OpenAI for copyright infringement if you live in China?

qup · 2025-01-30T10:32:22 1738233142

No, they're suggesting that deepseek avoids getting sued by openAI

bayindirh · 2025-01-29T22:53:18 1738191198

No, but some of the data is licensed.

For example, my digital garden is under GFDL, and my blog is CC BY-NC-SA. IOW, They can't remix my digital garden with any other license than GFDL, and they have to credit me if they remix my blog, and can't use it for any commercial endeavor, which OpenAI certainly does now.

So, by scraping my webpages, they agree to my licensing of my data. So they're de-facto breaching my licenses, but they cry "fair-use".

If I tell that they're breaching the license terms, they'd laugh at me, and maybe give me 2 cents of API access to mock me further. When somebody allegedly uses their API with their unenforcable ToS, they scream like an agitated cuckatoo (which is an insult to the cuckatoo, BTW. They're devilishly intelligent birds).

Drinking their own poison was mildly painful, I guess...

BTW, I don't believe that Deepseek has copied/used OpenAI models' outputs or training data to train theirs, even if they did, "the cat is out of the bag", "they did something amazing so they needed no permissions", "they moved fast and broke things", and "all is fair-use because it's just research" regardless of how they did it.

Heh.

Ukv · 2025-01-30T09:51:49 1738230709

> So, by scraping my webpages, they agree to my licensing of my data.

If the fair use defense holds up, they didn't need a license to scrape your webpage. A contract should still apply if you only showed your content to people who've agreed to it.

> and "all is fair-use because it's just research"

Fair use is a defense to copyright infringement, not breach of contract. You can use contracts, like NDAs, to protect even non-copyright-eligible information.

Morally I'd prefer what DeepSeek allegedly did to be legal, but to my understanding there is a good chance that OpenAI is found legally in the right on both sides.

bayindirh · 2025-01-30T11:07:52 1738235272

At this point, what I'm afraid is the justice system will be just an instrument in this all Us vs. Them debate, so their decisions will not be bound by law or legality.

Speculations aside, from what I understood, something like this shouldn't hold a drop of water under fair-use doctrine, because there's a disproportional damage, plus a huge monopolistic monetary gain because of what they did and how they did.

On the other hand, I don't believe that Deepseek used OpenAI (in any capacity or way or method) to develop their models, but again, it doesn't matter how they did it in this current conjecture.

What they successfully did was to upset a bunch of high level people, regardless of the technical things they achieved.

IMHO, AI war has similar dynamics to MAD. The best way is not to play, but we are past the Rubicon now. Future looks dirty.

Ukv · 2025-01-30T11:56:08 1738238168

> from what I understood, something like this shouldn't hold a drop of water under fair-use doctrine, because there's a disproportional damage, plus a huge monopolistic monetary gain

"Something like this" as in what DeepSeek allegedly did, or the web-scraping done by both of them?

For what DeepSeek allegedly did, OpenAI wouldn't have a copyright infringement case against them because the US copyright office determined that AI-generated content is not protected by copyright - and so there's no need here for DeepSeek to invoke fair use. It'll instead be down to whether they agreed to and breached OpenAI's contract.

For the web-scraping it's more complicated. Fair use is determined by the weighing of multiple factors - commercial use and market impact are considered, but do not alone preclude a fair use defense. Machine learning models do seem, at least to me, highly transformative - and "the more transformative the new work, the less will be the significance of other factors".

Additionally, since the market impact factor is the effect of the use of the copyrighted work on the market for that work, I'd say there's a reasonable chance it does not actually include what you may expect it to. For instance if you're a translator suing Google Translate for being trained on your translated book, the impact may not be "how much the existence of Google Translate reduced my future job prospects" nor even "how many fewer people paid for my translated book because of the existence of Google Translate" but rather "how many fewer people paid for my translated book than would have had that book been included in the training data" - which is likely very minor.

addicted · 2025-01-30T01:29:07 1738200547

They probably did to access the NYTimes articles.

outside1234 · 2025-01-29T22:26:00 1738189560

That isn't required to be in violation of copyright

freen · 2025-01-29T22:32:10 1738189930

Actually, yes, they actively agreed to them. Clicked the button and everything.

baq · 2025-01-30T06:37:40 1738219060

Have their scraping bots consented to cookies?

thorncorona · 2025-01-29T22:21:22 1738189282

Can you steal someone else’s laptop if they stood up to get a drink?

addicted · 2025-01-30T01:31:18 1738200678

OpenAI itself has argued, to the degree that your analogy applies, that if the goal of stealing the laptop is to train AI then the answer is Yes.

cortesoft · 2025-01-30T01:47:07 1738201627

Wouldn't this analogy be more like, "can you read my laptop screen if I stood up to get a drink?"

freen · 2025-01-31T01:46:58 1738288018

And steal the ip from your startup and then go public.

gizajob · 2025-01-29T22:55:06 1738191306

If their OS is open to the internet and you can scrape it and copy it off while they’re gone, then that would be about the right analogy. And OpenAi and DeepSeek have done the same thing in that case.

secstate · 2025-01-29T23:02:14 1738191734

Yes, if you can pay off any witnesses.

rpastuszak · 2025-01-29T22:56:52 1738191412

What?