Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Without wishing to denigrate the achievements of all the people involved in this tournament, nor the entertainment aspect I'd like to ask:

What is the statistical power of a 90-game round robin? Would this be a publishable result with p < 0.05 (or the new 0.005) against the null hypothesis that Stockfish and Houdini (2nd place) were of equal skill?



I don't know if you can really put a p-value on this result without a more specific null hypothesis, but anyway it looks like this tournament result provides extremely weak evidence that Stockfish is better than Houdini. In the round robin component, Stockfish and Houdini played two games against each other, each winning one and losing one. In the "superfinal", they had 15 draws, Stockfish won 3 games, and Houdini won 2 games.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: