Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This response is very poor and ignores many very well-developed arguments in the Stanford paper (such as incorrect NLP regex, or that exact answer nonlinearity can still be measured more finely with larger number of exact answer test questions)


It’s literally addressed in his first bullet point…

“ Response: While there is evidence that some tasks that appear emergent under exact match have smoothly improving performance under another metric, I don’t think this rebuts the significance of emergence, since metrics like exact match are what we ultimately want to optimize for many tasks.”


You haven’t read the paper so you don’t understand what I just wrote and why Wei is not responding to the paper. It’s not a different metric


I did read the paper and Wei is responding to it, and others. What do you mean "It's not a different metric?"




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: