Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> So everything generated also GPLv2?

Almost certainly not everything.

But possibly things that were spit out verbatim from the training set, which the FAQ mentions does happen about .1% of the time [1]. Another comment in this thread indicated that the model outputs something that's verbatim usable about 10% of the time. So, taking those two numbers together, if you're using a whole generated function verbatim, a bit of caveat emptor re: licensing might not be the worst idea. At least until the origin tracker mentioned in the FAQ becomes available.

[1] https://docs.github.com/en/early-access/github/copilot/resea...

[2] "GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set. Here is an in-depth study on the model’s behavior. Many of these cases happen when you don’t provide sufficient context (in particular, when editing an empty file), or when there is a common, perhaps even universal, solution to the problem. We are building an origin tracker to help detect the rare instances of code that is repeated from the training set, to help you make good real-time decisions about GitHub Copilot’s suggestions."



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: