Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most of the code is badly written. Models are doing what most of their dataset is doing.

I remember, fresh out of college, being shocked by the amount of bugs in open source.



More recent models are producing much higher quality code than models from 6/12/18 months ago. I believe a lot of this is because the AI labs have figured out how to feed them better examples in the training - filtering for higher quality open source code libraries, or loading up on code that passes automated tests.

A lot of model training these days uses synthetic data. Generating good code synthetic data is a whole lot easier than any other category, as you can at least ensure the code you're generating is gramatically valid and executes without syntax errors.


The dataset isn't making up fake dependencies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: