> Cicero participated anonymously in 40 games of Diplomacy in a “blitz” league on webDiplomacy.net
> Cicero ranked in the top 10% of participants who played more than one game and 2 nd out of 19 participants in the league that played 5 or more games.
> As part of the league, Cicero participated in an 8-game tournament involving 21 participants, 6 of whom played at least 5 games. Participants could play a maximum of 6 games with their rank determined by the average of their best 3 games. Cicero placed 1st in this tournament.
This bit seems a little more impressive I think. Being in the top 10% of people who’ve played at least two games might leave a lot of bad players to beat up on. Winning a tournament might (?) mean you have to beat at least a couple players who understand the thing.
It is sort of funny to think about — anyone who gets really legitimately good at anything competitive goes through multiple rounds of being the best in their social group, and then moving on from that group to a new one that is comprised of people who were the top-tier of that previous level. It isn’t obvious to me where on that informal ladder this tournament was.
But anyway, maybe the AI will follow the trajectory of chess AIs and quickly race away from human competition.
There's an interesting question of "how much do the literal best humans suck at this?" For example, in chess Magnus Carlsen might be able to beat Stockfish given a handicap of just a pawn or two. An even better computer player than Stockfish might give up three or more pawns, but even a perfect player would likely lose to Carlsen if giving up a rook. -- I'm making this up, I don't think anyone knows the real values, but as far as I know no one is remotely projecting that perfect play could overcome e.g. a queen handicap.
Similarly, in Go it seems unlikely that perfect play could overcome a nine-stone handicap (again, I could be wrong, I'm not remotely a dan-level player).
All to say, it seems likely that Diplomacy is a game where the difference between "the best human play" and "the best possible play" is much larger than either Go or Chess.
We happened to talk about this at the Go club this evening. The strong chess players more or less agreed with you about the chess predictions, and the dan level Go players say AIs today can give the best pros definitely a 3 stone handicap (tried and tested), probably 4 or more, and perfect play is a few stones more (unclear how many, but probably not many, so not 9 stones altogether)
I attend the Ramat Gan Go Club, but there are Go clubs everywhere around the world, and they tend to be in the same places HN commenters live, go figure. See e.g. https://www.usgo.org/where-play-go
> All to say, it seems likely that Diplomacy is a game where the difference between "the best human play" and "the best possible play" is much larger than either Go or Chess.
Definitely, Diplomacy in general is substantially understudied compared to Go or Chess (largely because it's a tiny community). You can play for less than a year and get to top-level performance and much of the established wisdom/strategy of players is fairly bad.
Even the best Diplomacy players are only scratching the performance of how good someone could be.
I'm a little bit suspicious of this. They're not explicit about the scoring but taking the average of top 3 results is a huge advantage to those that played more games.
Diplomacy is a bit of a choose your own adventure game too. Like there's an objective criteria (average SCs at the agreed end of game) but the human tendency is to try and win individual games. Humans will often choose to play sub-optimal strategies for better entertainment value.
I think the real accomplishment here is the ability to fool humans into thinking their not playing a bot. That's an impressive thing to do even these days.
> Cicero ranked in the top 10% of participants who played more than one game and 2 nd out of 19 participants in the league that played 5 or more games.
> As part of the league, Cicero participated in an 8-game tournament involving 21 participants, 6 of whom played at least 5 games. Participants could play a maximum of 6 games with their rank determined by the average of their best 3 games. Cicero placed 1st in this tournament.
This bit seems a little more impressive I think. Being in the top 10% of people who’ve played at least two games might leave a lot of bad players to beat up on. Winning a tournament might (?) mean you have to beat at least a couple players who understand the thing.
It is sort of funny to think about — anyone who gets really legitimately good at anything competitive goes through multiple rounds of being the best in their social group, and then moving on from that group to a new one that is comprised of people who were the top-tier of that previous level. It isn’t obvious to me where on that informal ladder this tournament was.
But anyway, maybe the AI will follow the trajectory of chess AIs and quickly race away from human competition.