I don't see the use itself as a problem, but rather that the result is not treated as a derivative work of the input. If I train it on GPL code, the result should be GPL, too.
This is kind of like saying that any programmer who has ever learned something from reading GPL code can only use that knowledge when writing GPL code. It's not literally copying the code. The training set isn't stored on disk and regurgitated.
Also - there is logic in copilot that checks to make sure it is not suggesting exact duplicates of code from its training set, and if it does, it never sends them to the user.
But Copilot is not a programmer, Copilot is a program. Slapping the "ML" label on a program doesn't magically abdicate its programmers of all responsibility as much as tech companies over the past decade have tried to convince people otherwise.
I really dislike this false equivalence between human learning and machine learning. The two are significantly distinct in almost every way, both in their process and in their output. The scale is also vastly different. No human could possibly ingest all of the open source code on GitHub, much less regurgitate millions of snippets from what they “studied.”
> This is kind of like saying that any programmer who has ever learned something from reading GPL code can only use that knowledge when writing GPL code. It's not literally copying the code. The training set isn't stored on disk and regurgitated.
I wouldn't put any hard rules on it, but it does seem very fair for programmers who have learned a lot from GPL code to contribute back to GPL projects. I have learned from and used a lot of open source software so whenever possible I try to make projects available to learn from or use.