OpenAI has enough motivation to circumvent whatever anti-scraping measures stack...

Gormo · on May 6, 2024

> OpenAI has enough motivation to circumvent whatever anti-scraping measures stackoverflow could muster.

And even greater motivation to just cooperate with StackOverflow for mutual benefit, rather than engage in a ridiculous arms race with them.

> I assume stackoverflow's metrics (traffic, number of new questions and answers) are down by an amount they are not happy with, so they are eager to strike any deal before their ship sinks.

I'm not sure I'd understand the connection to this even if that were true. The value StackOverflow seems to be bringing to the table is specifically a large dataset of human-curated technical knowledge. Both parties in this arrangement would have strong interest in ensuring that StackOverflow continues to generate this data through its user-centric Q&A website. I'm not sure how a deal with OpenAI would prevent their "ship" from "sinking" if that were the situation they were in.

> Personally, I'm as often on stackoverflow, as I've ever been, whereas my chatGPT usage is down to almost zero.

Same here. ChatGPT is a nice novelty, but I haven't found all that much productive use for it. Most people I know who do use it regularly are using it for either correcting their spelling/grammar, or as a conversational-interface search engine, neither of which I find to be superior to proofreading my own writing or evaluating information from its original sources after doing a conventional search.

But there might be a value-add for StackOverflow in the latter case: finding specific answers to complex questions can be a hit-or-miss proposition, and ChatGPT might at least provide a more efficient way of finding the articles that answer your questions, if implemented properly.

Of course, implementing it properly would likely involve designing the LLM to track the sources of the data it's tokenizing, and present a 'bibliography' for each of its answers, rather than just blindly compositing data from all sources into single probability values.