Perhaps someone at Github can chime in, but I suspect that open source code data... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		lend000 on June 30, 2021 \| parent \| context \| favorite \| on: GitHub Copilot as open source code laundering? Perhaps someone at Github can chime in, but I suspect that open source code datasets (the kind they are trained on) should require relatively permissive licenses in the first place. Perhaps they filter for MIT licenses in Github projects and StackOverflow answers used to train the models?

jonstaab on June 30, 2021 [–]

Nope, they explicitly note that the GPL showed up 700k times in the training data: https://twitter.com/eevee/status/1410067860299255810

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact