Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am not talking about SoTA. I am talking about deliberate poor baseline. GSM8k consists of two things: solving the problem and getting the output format correct. Getting the output format corrects gives 30% accuracy for the same model where they got 11%. SoTA is 97%.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: