> If an AI model recongnizably regurgitates the essay it trained on then it has infringed.
I completely agree — that’s why I explicitly wrote ‘non-copy paintings’ in my example.
> The AI argument that passing original content through an algorithm insulates the output from claims of infringement because of "fair use" is pigwash.
Sure, but the argument that training an AI on content is necessarily infringement is equally pigwash. So long as the resulting model does not contain copies, it is not infringement; and so long as it does not produce a copy, it is not infringement.
> So long as the resulting model does not contain copies, it is not infringement
That's not true.
The article specifically deals with training by scraping sites. That does necessarily involve producing a copy from the server to the machine(s) doing the scraping & training. If the TOS of the site incorporates robots.txt or otherwise denies a license for such activity, it is arguably infringement. Sourcehut's TOS for example specifically denies the use of automated tools to obtain information for profit.
I completely agree — that’s why I explicitly wrote ‘non-copy paintings’ in my example.
> The AI argument that passing original content through an algorithm insulates the output from claims of infringement because of "fair use" is pigwash.
Sure, but the argument that training an AI on content is necessarily infringement is equally pigwash. So long as the resulting model does not contain copies, it is not infringement; and so long as it does not produce a copy, it is not infringement.