Grandmaster blunders: a statistical analysis

1/30/2015 – Everyone remembers the blunder from game six of the last world championship match, a move that caused shock and disbelief by fans and experts alike. Of course, grandmasters and even a world champion is only human, but how likely is such an oversight likely to occur. Using an engine and a database of millions of games, Joe Doliner set out to find the mathematical truth.

ChessBase 15 - Mega package ChessBase 15 - Mega package

Find the right combination! ChessBase 15 program + new Mega Database 2020 with 8 million games and more than 80,000 master analyses. Plus ChessBase Magazine (DVD + magazine) and CB Premium membership for 1 year!

More...

By Joe Doliner

Sochi, Russia — Magnus Carlsen was 26 moves into game six of his title defense against Viswanathan Anand when he experienced the worst feeling in chess. The feeling that comes with the realization that you’ve left one of your pieces out to dry and there’s nothing left to do, but pray. Blunders like this are all too common when I play chess, but they’re incredibly rare at this level. Anand and Carlsen are some of the greatest to play the game, they (almost) never do things like this. What followed was even more incredible. Despite his blunder, Carlsen went on to win game six (and the series) thanks to Anand responding immediately with a blunder of his own. After the game Carlsen described it as “a comical exchange of blunders.”

Blunders at this level are rare, but just how lucky are we to have seen a turn of patzer play from this pair? In this post we’ll take an analytic approach to this question. We’ll start by developing a computational way to classify blunders. Then we’ll gather a year’s worth of chess games and store it in a distributed file system so that we can use a cluster of machines to analyze the games with a MapReduce engine. Full disclosure: I’m one of the founders of Pachyderm, the distributed file system and MapReduce engine that we’re going to be using. However, I’m not a data scientist. 

Classification of all the moves played in 2014. Created using Crafty and Pachyderm

The first thing we need to settle is “what is a blunder?” A human will tell you that a blunder is a move which substantially decreases the player’s chances of winning. Good players can classify a move as a blunder with just a few seconds’ thought, but even that’s too slow for our purposes. Instead we’re going to be using a chess engine called Crafty.

Crafty computed that Carlsen’s move 26. Kd2 was 2.11 pawns worse than his best move 26. Rg3

For example, applying the engine to the Carlsen-Anand game shows that players hurt their positions by approximately two pawns with their blunders. This might not seem like a lot, but in high-level chess, a two-pawn deficit is almost always a loss.

Now that we have a way to classify blunders, we’ll need to bundle Crafty up in a Docker image so we can use it in Pachyderm. The source for our image is available on GitHub or it can be pulled directly from the Docker registry. The image contains two HTTP servers. A map server which takes chess games in PGN format and returns the ratings of the players and a bucketed count of the engine's scores of the moves. And a reduce server which takes the results from the map server and aggregates them into buckets based on the player’s rating.

Our MapReduce job gives us a mapping from rating to a vector of blunders

Next we’ll need to get a Pachyderm cluster up and running and filled with data. Using data from a large database, we wrote a simple script to upload it to Pachyderm’s file system (pfs) and kick off the pipeline. The script and data are available in the repo along with more detailed instructions on how to reproduce the results yourself.

Crunching all the games from 2014 took about six hours on Google Compute Engine. In total, Crafty analyzed 4,899,067 moves and found that a scant 67,175 (1.37%) were two-pawn blunders or worse. Limiting ourselves to players with ratings above 2500 (Grandmasters) that number falls to 1.07%. If we narrow it down to players above 2775, which both Carlsen and Anand were during the championships, it falls all the way to 0.96%. Assuming Anand and Carlsen’s blunders were independent events, what we saw was a one in 10,000 occurrence. In other words, one in every 10,000 pairs of moves exchanged by players at this level should result in a double blunder. Of course, The World Chess Championship consists of more than a single pair of moves. Assuming twelve games of about 50 moves each, we can expect to see 600 move pairs which means seeing an exchange like this in a WCC event is more like a one in 20 event. So what we saw wasn’t actually that incredible, merely unlikely.

Blunders become exponentially less likely as rating increases

The data reveals a strong correlation between blunders and rating. As we’d expect, high-rated players blunder much less frequently than their lower-rated counterparts. Playing around with the data in Excel, we found exponential functions to be the best fit. The trendline above indicates that gaining 600 rating points halves the number of blunders a player makes. Chess, it seems, is a game of diminishing returns.

There are lots of great stories you could tell with this data. We limited ourselves to games from 2014. I’d be interested to see how blunder occurrence has changed over time. There are also a few obvious ways that our analysis could be improved. Due to cost limitations we had to limit crafty to two seconds of analysis time per move and we only looked at a fraction of the database's total corpus. We may look into doing an updated version of this post with a bigger budget.



Discuss

Rules for reader comments

 
 

Not registered yet? Register