Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the questions I have is whether models are being trained on the SEO {spam|blogspam|adsense optimized|spun} websites.


Almost certainly. The web crawl data that GPT (and similar) LLMs are trained on is far too large to be entirely curated.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: