Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As far as I can tell the only way of doing a comparison of two models, that cannot be easily gamed, is being having them in open weights form and then running them against a benchmark that was created after both of the two models were created.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: