I don't follow. Aside from the ignoring of all ethical and legal concerns built ...

heavyset_go · on Nov 6, 2022

Terms of many licenses require more than just attribution, but also the inclusion of the copyright notices and licenses verbatim. MIT, BSD, MPL, GPL, LGPL, AGPL, etc all require those at minimum.

Then there is the application of the licenses themselves. GPL licensed code stipulates that its inclusion in a new work means that the new work must be licensed under the GPL, as well. AGPL and LGPL have similar terms, but with different stipulations.

xyzzy_plugh · on Nov 6, 2022

Indeed, not to mention that you cannot just mix licenses. The vast majority of licenses are not compatible. It just so happens that some popular ones are, but in general the output is either GPL-like, MIT-like, or incompatible.

seadan83 · on Nov 6, 2022

AFAIK copilot suggestions are sourced from a model that is trained on many sources. Presumably when it claims a solution (an autocomplete suggestion), it would be using data taken from dozens, maybe hundreds of examples. How does a person even provide attribution in this case?

It's like saying, I know 'polo' comes after Marco not only because I was at your pool party where we all signed NDAs, but also because I've been to dozens of other pool parties. How do I know where to attribute this knowledge? In part that knowledge is due to the prevalence of examples and not just any single example.

brigandish · on Nov 6, 2022

I'm not in ML, can the program not know where its suggestions are drawn from?

seadan83 · on Nov 8, 2022

Depends, could be a no if copilot identifies solutions by looking for consensus matches.

To illustrate, google translate got suddenly better some time ago when it searched literary texts for phrase matches and then cross-referenced known translations to give you a translation.

Copilot is perhaps doing something similar (somewhat as an analogy) copilot could maybe be finding exactly one sample that it suggests, in which case there could be attribution.

It is unlikely to offer any solution without a consensus. To further the example, how can you be confident in that translation? What if there is not just one literary match, but many thousand and 99.9% agree.

Without that high match percentage, it would be difficult to know if the result is a accurate, or with a small set of disagreeing matches, again it would. E hard to give an accurate answer. I don't know for sure, though seemingly copilot would have some threshold of agreement across multiple matches before it suggests a solution