Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not something to dismiss but it is something that has already been addressed. Authors Guild v Google. Google Books is built upon scanning millions of books from libraries without first gaining permission from copyright holders, this was found to not be a violation of copyright.

Building a product on top of copyright works that does not directly distribute those works is legal. More specifically, a computer consuming a copyright work is not a violation of copyright.



At the time the suit was launched, Google search would only display snippet views. The very nature presents the attribution to the user, enabling them to separately obtain a license for the content.

This would be more or less analogous to Copilot linking to lines in repositories. If Copilot was doing that, there wouldn't be much outrage.

The fact that they are producing the entire relevant snippet, without attribution and in a way that does not necessitate referencing the source corpus, suggests the transgression is different. It is further amplified by the fact that the output itself is typically integrated in other copyrighted works.


Attribution is irrelevant in Authors Guild, the books were not released under open source licenses where attribution is sufficient to meeting the licensing terms. Google never sought or obtained licenses from any of the publishers, and the court ruled such a license was not needed as Google's usage of the contents of the books (scanning them to build a product) did not represent a copyright infringement.

Attribution is mentioned in this filing because such attribution would be sufficient to meet the licensing terms for some of the alleged infringements.

It's an irrelevant discussion though, the suit does not make a claim that the training of Copilot was an infringement which is where Authors Guild is a controlling precedent.


Attribution goes directly to factors 1, 3, and 4 of the fair use test.


In some contexts it's used to characterize the purpose of the copying, but it's not a consideration that was made in Authors Guild.


> Authors Guild v Google. Google Books is built upon scanning millions of books from libraries

I agree it's relevant precedent, but not exactly the same. Libraries are a public good and more importantly Google books references the original works. In short, I don't think that's the final word in all seemingly related cases.

> More specifically, a computer consuming a copyright work is not a violation of copyright.

I don't agree with this way of describing technology, as if humans weren't responsible for operating and designing the technology. Law is concerned with humans and their actions. If you create an autonomous scraper that takes copyrighted works and distributes them, you are (morally) responsible for the act of distributing them, even if you didn't "handle" them or even see them yourself.

Neither of the important aspects – remixing and automation – is novel, but the combination is. That's what we should focus on, instead of treating AI as some separate anthropomorphized entity.


Your disagreement and feelings about how copyright and the law should work are valid, they have very little to do with how copyright is addressed judicially in the United States


>Authors Guild v Google

At which case Google paid some hundred million $ to companies and authors, created a registry collecting revenues and giving to rightsholders, provided opt-out to already scanned books, etc. Hey, doesn't sound that bad for same thing to happen with Copilot.


But Copilot has been shown to distribute (parts of) the copyrighted works used to create it. That’s the difference.


A) No it doesn't, there's nothing in the Copilot model or the plugin that represents or constitutes a reproduction of copyright code being distributed by GH/MS. The allegation is it generates code that constitutes a copyright violation. This distinction is not academic, it's significant, and represents an unexplored area of copyright law.

B) "parts of" copyright works are not themselves sufficient to constitute a copyright violation. The violation must be a substantial reproduction. While it's up to the court to determine if the alleged infringements demonstrated in the suit (I'm sure far more will be submitted if this case moves forward) meet this bar, from what I've seen none of them have.

Historically the bar is pretty high for software, hundreds or thousands of lines depending on use case. A purely mechanical description of an operation is not sufficient for copyright, you cannot copyright an implementation of a matrix transformation in isolation no matter what license you slap on the repo. Recall that the recent Google v Oracle case was litigated over tens of thousands of lines of code and found to be fair use because of the context of those lines.

I've yet to see a demonstrated case of Copilot generating code that is both non-transformative and represents a significant reproduction of the source work.


> The allegation is it generates code that constitutes a copyright violation.

The weights of the Copilot very likely contain verbatim parts of the copyrighted code, just like in a zip archive. It chooses semi-randomly which parts to show and sometimes breaks copyright by displaying large enough pieces.

https://news.ycombinator.com/item?id=33458603


Speculation, and furthermore the model itself isn't distributed to consumers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: