Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How would gaming the system work here? Is there some flaw in the way the tasks are generated?


AI models have historically found lots of ways to game systems. My favorite example is exploiting bugs in simulator physics to "cheat" at games of computer tag. Another is a model for radiology tasks finding biases in diagnostic results using dates on the images. And of course whenever people discuss a benchmark publicly it leaks the benchmark into the training set, so the benchmark becomes a worse measure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: