If I were a VC with investments in Reddit who saw it was going nowhere, and on top of that learned that my other investment OpenAI had been gobbling up all its data entirely free of charge for the purpose of fencing it without any significant legal consequences, I would:
1. Try to cut off any upstart rivals of my better performing bet from playing the same trick by sabotaging easy data extraction
2. Cut my losses and rid myself of future ones by somehow sizing Reddit down a notch, not so much that it dies though - I'm a sentimental guy, bite me
3. Ensure no one finds out I had anything to do with this by throwing someone else under the bus and buying them off if they ever figure it out (lucky for hypothetical VC me, Reddit's CEO seems arrogant, impulsive, and greedy enough that he probably wouldn't even realise having been played before its too late + easily convinced to take the fall for far less than that'd actually be worth)
Wouldn't it be something if I could have the subject of step 3 take care of step 1 and 2 for me, perhaps together in one single master-stroke of a go?
You assume that Reddit forms the core of OpenAI's data mining. While it may be large, I suspect that OpenAI has read alot more of the internet than that.
What would be great is an LLM trained on that pirate library we all use, Z Lib I think, with all the books of the world, not just forum opinions.
To me, the data cat is out of the bag, and no single corp will ever put it back again.
> You assume that Reddit forms the core of OpenAI's data mining
Not at all, it's the only source mentioned simply for the fact that no others bear any relevance to the story.
Frustrating other parties' capability to access as much source material as possible makes sense if you forget about common moral values for a second and reduce everything down to a zero sum game: their loss equals your gain.
For additional evil comedic value, regarding your mention of ZLib: I just recalled it was taken down (for a while anyway) by US authorities not too long ago, and it would be extremely sadfunny if that takedown ever turns out to have coincided (taking into account government/bureaucratic slowness) with OpenAI having finished downloading or processing all of the library's content.
They'd have to hack it, or pay/donate a lot to get all those books, though. Z Lib only allows you to download 10 books a day as a free user. The problem now is that the only way to donate seems to be crypto, or some Chinese gift cards. I'm not sure how much of this is because of US authorities directly vs. how many "high risk processors" were taken down by disconnecting Russia from SWIFT, but either way, it's not convenient to support them anymore. Not that people don't, of course.
About the same amount as the number of wheels on a tricycle ;)
Just to be clear it wasn't meant as an allegation and no proof exists. I consider to resemble something closer to a terribly written fanfic more than anything else, really.
1. Try to cut off any upstart rivals of my better performing bet from playing the same trick by sabotaging easy data extraction
2. Cut my losses and rid myself of future ones by somehow sizing Reddit down a notch, not so much that it dies though - I'm a sentimental guy, bite me
3. Ensure no one finds out I had anything to do with this by throwing someone else under the bus and buying them off if they ever figure it out (lucky for hypothetical VC me, Reddit's CEO seems arrogant, impulsive, and greedy enough that he probably wouldn't even realise having been played before its too late + easily convinced to take the fall for far less than that'd actually be worth)
Wouldn't it be something if I could have the subject of step 3 take care of step 1 and 2 for me, perhaps together in one single master-stroke of a go?
*evil laugh*