Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've literally build a dynamic bench mark where I test reasoning models on their performance on deriving conclusions from assumptions through sequent calculus.

o3-mini high effort can derive chains that are 8 inference rules deep with >95% confidence I didn't have the money to test it further. This is better than the average professor in logic when given pen and paper.

It seems like a course critiquing 5 year old technology at this point.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: