Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>>>There is an astronomical amount of data siloed by publishers, professional journals etc. that is yet to be tapped.

You seem to think these models haven't already been trained on pirated versions of this content, for some reason.



Yep, books3 is what llama was famously trained on before it was taken down.

That’s not even considering AI crawlers or all the copyright text on archive.org




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: