Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Perhaps someone at Github can chime in, but I suspect that open source code datasets (the kind they are trained on) should require relatively permissive licenses in the first place. Perhaps they filter for MIT licenses in Github projects and StackOverflow answers used to train the models?


Nope, they explicitly note that the GPL showed up 700k times in the training data: https://twitter.com/eevee/status/1410067860299255810




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: