It’s not “oops we didn’t know” it’s, “someone published a project under a permissive license which included this code.”
If your standard is “Github should have an oracle to the US court system and predict what the outcome of a lawsuit alleging copyright infringement for a given snippet of code would be” then it is literally impossible for anyone to use any open source code ever because it might contain infringing code.
There is no chain of custody for this kind of thing which is what it would require.
This reminds me my 4 year old daughter. She often comes from kindergarten with new toys. When i ask here, where did she get it, she tells that her friend gave this as a gift to her. When i dig deeper and ask around, i turns out that the friend who were gifting her things were not real owners of the gift. I see why i could be difficult for children to understand concept of ownership and that you should not gift things to others that are not your own.
So in this case copilot just looks at the situation like that someone gifted me this, and does not question if the person gifting was the real owner of the gift.
> and does not question if the person gifting was the real owner of the gift
If you can figure out a method of determining whether someone owns the code that doesn't involve, "try suing in court for copyright infringement and see if it sticks" then we're kinda stuck. Because just because a codebase contains an exact or similar snippet from another codebase doesn't mean that snippet reaches the threshold of copyrightable work. Or the reverse being that just because two code snippets look wildly different doesn't mean it's not infringement and detecting that automatically is solving the halting problem.
The thing you want for software to actually solve this is chain of custody which we don't have. If you require everyone assume everyone else could be lying or mistaken about infringement then using any open source project for anything becomes legal hot water.
In fact when you upload code to Github you grant them a license to do things like "display it" which you can't do if you don't actually own the copyright or have a license so even before the code is ever slurrped into Copilot the same exact legal situation arises as to wether Github is legally allowed to host the code at all. Can you imagine if when you uploaded code to Github you had to sign a document saying you owned the code and indemnifying Microsoft against any lawsuit alleging infringement o boy people would not enjoy that.
Exactly, the chain of custody is absolutely required for this to be legal because no oracle can exist. It must be able to attribute exactly who contributed the suspect code. It must be able to handle the edge case where some humans might publish code without permission.
Either that or we effectively get rid of software copyright as copilot can be used (or even claim to be used) to launder code of license restrictions. Eg No I didn't copy your code, I used copilot and it copied your code so I did nothing wrong.
Right, so we need a system for when a dev goes and grabs code-snippets from blogs and open-source freely licensed projects on e.g. github in which they can say that the code is from so-and-so source?
So like a way to distribute and inherit git blame?
If someone created an AI for making movies, and it started spitting out star wars and marvel stuff, you can bet them saying "we trained it on other materials that violate copywriter" wouldn't be enough. They are banking on most devs not knowing, caring or having the ability to follow through on this.
I am going to make a robot that burns your house down. You might think this is unethical, but what you expect me to do? Implement an oracle to the US court system?
You might think it's unreasonable to build such a house-burning robot, but you have to realize that I actually designed it as a lawn-mowing robot. The robot will simply not value your life or property because its utility function is descended from my own, so may burn your house down in the regular course of its duties (if it decides to use a powerful laser to trim the grass to the exact nanometer). Sorry neighbor.
What do you expect me to do? NOT build this robot? How dare you stand in the way of progress!
If your standard is “Github should have an oracle to the US court system and predict what the outcome of a lawsuit alleging copyright infringement for a given snippet of code would be” then it is literally impossible for anyone to use any open source code ever because it might contain infringing code.
There is no chain of custody for this kind of thing which is what it would require.