Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, I mentioned that in the comment in the linked post. I wish someone was running this methodology as an ongoing project, for new models.

Ideally, isn't this a metric that should be included on all model cards? It seems like a crucial metric.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: