Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Once you have that, further fine-tuning, specialisation, or other incremental changes are comparatively easy.

the problem for Google and OpenAI is that most websites are going to start blocking them in robots.txt if they don't find some way to provide value back for allowing them to scrape and train on their content. Pretty much every other bot or search engine is blocked by default and Cloudflare helps block them too.

if they don't find a way to balance this they are going to kill their own golden goose at some point



It's not like robots.txt is a great deterrent for crawlers. Only reasonable barrier would be a paywall hiding the content from the web.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: