It does address it, although not that clearly. This happens all the time with ne...

leni536 · on Oct 17, 2022

If that's true, than Github is just "washing its hands". Not at all reassuring for copyright holders and users of copilot.

helsinkiandrew · on Oct 17, 2022

That code seems to appear in thousands of repositories on GitHub, I’m sure some of them haven’t copied the license.

The vast majority of people who would use a matrix transform function they got from code completion (or from a GitHub or stack overflow search) probably don’t care what the license is. They’ll just paste in the code. To many developers publicly viewable code is in the public domain. Code pilot just shortens the search by a few seconds.

Microsoft should try todo better (I’m not sure how), but the sad fact is that trying to enforce a license on a code fragment is like dropping dollar bills on the sidewalk with a note pinned to them saying “do not buy candy with this dollar”

extropy · on Oct 17, 2022

I still remember the days when we hand billion dollar lawsuits over 20 lines of code (Oracle vs Google).

If CoPilot makes everyone see how ridiculous that is, that's a win in my book.

yardstick · on Oct 17, 2022

What’s the most github could reasonably be expected to do? Identify if multiple licenses are found for the same code then maybe it should be flagged for review or the most restrictive license applied.

kitsune_ · on Oct 17, 2022

If it's possible for video and audio content (ContentID, YT), then I don't see why it shouldn't be possible for OSS.

rocqua · on Oct 17, 2022

Do we want that though? I personally believe copyright as implemented today is harmful. The fact that code largely is able to dodge this could be seen as arguing we should be laxer with copyright, rather than arguing for strict enforcement of copyright on code.

bjourne · on Oct 17, 2022

The point is that CoPilot should not emit a word-for-word copy of someone else's work because that is called plagiarism.

samastur · on Oct 17, 2022

Check timestamps of commits of replicated code to find the original.

LelouBil · on Oct 17, 2022

Timestamps of commits can't be trusted, just like commit authors.

Github can only trust push timestamps.

barsonme · on Oct 17, 2022

That would only work if the original was uploaded to GitHub before the copies. Like, somebody could copy from GitLab or BitBucket. And git histories don’t always help if they’re not copied over.

lokedhs · on Oct 17, 2022

But copyright law doesn't really care about how you prevent infringement, just that it doesn't happen. Isn't it up to Github to come up with a way to do it, or otherwise not do it at all?

yardstick · on Oct 17, 2022

GitHub just needs to show they have taken reasonable precautions, and if a conflict is identified, that they remediate it without undue delay.

It’s not a binary all perfectly or nothing at all. The law looks at intent and so doesn’t punish mistakes or errors so long as you aren’t being malicious or reckless or negligent.

minhazm · on Oct 18, 2022

Github is protected by section 230, which states:

> No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider

So the act of hosting copyrighted content is not actually a copyright violation for Github. They're not obligated to preemptively determine who the original copyright owner of some piece of code is, as they're not the judge of that in the first place. Even if you complain that someone stole your code, how is Github supposed to know who's lying? Copyright is a legal issue between the copyright holder and the copyright infringer. So the only thing Github is required to do is to respond to DMCA takedown notices.

hnbad · on Oct 17, 2022

Yes. GitHub can get away with "oh well, we're all learning" because if the code is violating copyright, it's the user who is infringing directly by publishing it, not GitHub via Copilot. Either the user would have to bring a case against GitHub demonstrating liability (good luck) or the copyright holder would have to bring a case against GitHub demonstrating copyright violation (again, good luck). Otherwise this is entirely between the copyright holder and the Copilot user, legally speaking.

Of course if someone does manage to set a precedent that including copyrighted works in AI training data without an explicit license to do so, GitHub Copilot would be screwed and at best have to start over with a blank slate if they can't be grandfathered. But this would affect almost all products based on the recent advancements in AI and they're backed by fairly large companies (after all, GitHub is owned by Microsoft and a lot of the other AI stuff traces back to Alphabet and there are a lot of startups funded by huge and influential VC companies). Given the US's history of business-friendly legislation, I doubt we'll see copyright laws being enforced against training data unless someone upsets Disney.

svnt · on Oct 17, 2022

Do you think that as part of this Github discovered that essentially everyone was in violation of copyright? That copyright of material without public knowledge or review (which exists in music, but not most code), is basically unenforceable?

Then they decided to wade in and build a house of cards where the cards are everyone else’s code, just waiting for the grenade pin puller and we’ve potentially witnessed the moment?

That’s the only thing that makes sense to me here. They don’t care because opening the issue will bring down everyone else with them.

olliej · on Oct 17, 2022

Yeah, so if a news agency publishes a picture without knowing where it came from, the originator can sue them for violating copyright.

There is no “I don’t know who owns the IP” defense: the image has a copyright, a person owns that copyright, publishing the image without licensing or purchasing the copyright, is a violation. The fine is something like $100k per offense for a business.

hnbad · on Oct 17, 2022

FWIW this in consequence means you can't legally use Copilot without becoming liable to copyright violations because it's essentially a black box and you have no insight into where the code it generates originated and even if it isn't a 1-to-1 copy it might be a "derivative work".

This is why I'm gnashing my teeth whenever I hear companies being fine with their employees using Copilot for public-facing code. In terms of liability, this is like going back from package managers to copying code snippets of blogs and forum posts.

VonGallifrey · on Oct 17, 2022

> using Copilot for public-facing code

Why this restriction on public-facing code? Are you OK with Copilot being used for "private"/closed source code? I get that it would be less likely to be noticed if the code is not published, but (if I understand right) is even worse for license reasons.

hnbad · on Oct 17, 2022

I don't advocate people use Copilot for anything but hobby toy projects.

I have lower expectations of the rigor with which companies police their internal codebases, though. Seeing Copilot banned for internal use too is a pleasant surprise. Companies tend to be a lot more "liberal" in what kind of legal liabilities they accept for their internal tooling in my experience.

b3morales · on Oct 17, 2022

Turn the parties in this argument around and see if you think it still holds.

J. Random Hacker acquires and uses a copy of some of GitHub's, or Microsoft's source. When sued, the defense says that the code was not taken directly from GH/MS, just copied from a newsgroup where it had been posted. Does this get J. off the hook?

Dylan16807 · on Oct 18, 2022

Was J using automated methods based on false claims of ownership by the newsgroup posters, with no direct knowledge of the violation? If so J should not be punished.

itronitron · on Oct 17, 2022

I may be misinformed but my understanding of copyright is that it protects the 'expression' of something (like an algorithm or recipe) so someone can rewrite a copyrighted chunk of code into another language and be free of the original copyright, while also able to assert their own copyright on their new expression.

If that is true then one way to get around copyright restrictions on existing code is to create a new language.

kyleee · on Oct 17, 2022

fascinating idea, copilot could do the translations internally and also work torwards widening the pool of suggestions to all languages instead of the individual lamguage a user is using (bit then again, they might be writing in the "new" language already