Grandmaster blunders: a statistical analysis

1/30/2015 – Everyone remembers the blunder from game six of the last world championship match, a move that caused shock and disbelief by fans and experts alike. Of course, grandmasters and even a world champion is only human, but how likely is such an oversight likely to occur. Using an engine and a database of millions of games, Joe Doliner set out to find the mathematical truth.

ChessBase 15 - Mega package ChessBase 15 - Mega package

Find the right combination! ChessBase 15 program + new Mega Database 2020 with 8 million games and more than 80,000 master analyses. Plus ChessBase Magazine (DVD + magazine) and CB Premium membership for 1 year!


By Joe Doliner

Sochi, Russia — Magnus Carlsen was 26 moves into game six of his title defense against Viswanathan Anand when he experienced the worst feeling in chess. The feeling that comes with the realization that you’ve left one of your pieces out to dry and there’s nothing left to do, but pray. Blunders like this are all too common when I play chess, but they’re incredibly rare at this level. Anand and Carlsen are some of the greatest to play the game, they (almost) never do things like this. What followed was even more incredible. Despite his blunder, Carlsen went on to win game six (and the series) thanks to Anand responding immediately with a blunder of his own. After the game Carlsen described it as “a comical exchange of blunders.”

Blunders at this level are rare, but just how lucky are we to have seen a turn of patzer play from this pair? In this post we’ll take an analytic approach to this question. We’ll start by developing a computational way to classify blunders. Then we’ll gather a year’s worth of chess games and store it in a distributed file system so that we can use a cluster of machines to analyze the games with a MapReduce engine. Full disclosure: I’m one of the founders of Pachyderm, the distributed file system and MapReduce engine that we’re going to be using. However, I’m not a data scientist. 

Classification of all the moves played in 2014. Created using Crafty and Pachyderm

The first thing we need to settle is “what is a blunder?” A human will tell you that a blunder is a move which substantially decreases the player’s chances of winning. Good players can classify a move as a blunder with just a few seconds’ thought, but even that’s too slow for our purposes. Instead we’re going to be using a chess engine called Crafty.

Crafty computed that Carlsen’s move 26. Kd2 was 2.11 pawns worse than his best move 26. Rg3

For example, applying the engine to the Carlsen-Anand game shows that players hurt their positions by approximately two pawns with their blunders. This might not seem like a lot, but in high-level chess, a two-pawn deficit is almost always a loss.

Now that we have a way to classify blunders, we’ll need to bundle Crafty up in a Docker image so we can use it in Pachyderm. The source for our image is available on GitHub or it can be pulled directly from the Docker registry. The image contains two HTTP servers. A map server which takes chess games in PGN format and returns the ratings of the players and a bucketed count of the engine's scores of the moves. And a reduce server which takes the results from the map server and aggregates them into buckets based on the player’s rating.

Our MapReduce job gives us a mapping from rating to a vector of blunders

Next we’ll need to get a Pachyderm cluster up and running and filled with data. Using data from a large database, we wrote a simple script to upload it to Pachyderm’s file system (pfs) and kick off the pipeline. The script and data are available in the repo along with more detailed instructions on how to reproduce the results yourself.

Crunching all the games from 2014 took about six hours on Google Compute Engine. In total, Crafty analyzed 4,899,067 moves and found that a scant 67,175 (1.37%) were two-pawn blunders or worse. Limiting ourselves to players with ratings above 2500 (Grandmasters) that number falls to 1.07%. If we narrow it down to players above 2775, which both Carlsen and Anand were during the championships, it falls all the way to 0.96%. Assuming Anand and Carlsen’s blunders were independent events, what we saw was a one in 10,000 occurrence. In other words, one in every 10,000 pairs of moves exchanged by players at this level should result in a double blunder. Of course, The World Chess Championship consists of more than a single pair of moves. Assuming twelve games of about 50 moves each, we can expect to see 600 move pairs which means seeing an exchange like this in a WCC event is more like a one in 20 event. So what we saw wasn’t actually that incredible, merely unlikely.

Blunders become exponentially less likely as rating increases

The data reveals a strong correlation between blunders and rating. As we’d expect, high-rated players blunder much less frequently than their lower-rated counterparts. Playing around with the data in Excel, we found exponential functions to be the best fit. The trendline above indicates that gaining 600 rating points halves the number of blunders a player makes. Chess, it seems, is a game of diminishing returns.

There are lots of great stories you could tell with this data. We limited ourselves to games from 2014. I’d be interested to see how blunder occurrence has changed over time. There are also a few obvious ways that our analysis could be improved. Due to cost limitations we had to limit crafty to two seconds of analysis time per move and we only looked at a fraction of the database's total corpus. We may look into doing an updated version of this post with a bigger budget.

Discussion and Feedback Join the public discussion or submit your feedback to the editors


Rules for reader comments


Not registered yet? Register

MichaelCurrie MichaelCurrie 5/18/2017 02:39
You could test your assumption that blunders are independent events using a time-series correlation.
NJD NJD 1/31/2015 03:58
Blunders are the mainly the result of loss of focus and concentration. After playing for several hours, mental fatigue can set in. That's why, just like a lot of games and sports, getting old make it tough.
daftarche daftarche 1/31/2015 09:40
nice comment, @TMMM.
TMMM TMMM 1/31/2015 02:11
One of the reasons why these statistics may be off (too high) is that once you get to a completely winning or losing position, sometimes you just go for a practical option: what wins easily if you are winning, or try to complicate the position and go for a cheapo if you are losing.

Example: Caruana - Vachier-Lagrave (Tata, R13): 33... Kf7 (-3.7) 34. Rfe1? (-1000) ... 0-1. Caruana probably saw it was completely losing, but thought he'd just go for the best "practical chance;" keep the material, and hope the opponent does not find the right moves. The analysis above would count Caruana's move as a blunder.

Example: Aronian - Ding (Tata, R13): 54... Ke7 (-50) 55. Qc2? (-1000) 55... Qd5? (-53) 0-1. Aronian saw he was completely lost, and just played 55. Qc2 with a chance of some cheapo checks. Ding did not want to take any risks with cheapos and, perhaps in time trouble, simply played the easily winning 55... Qd5 to take away all Aronian's hopes. The analysis above counts both these moves as major blunders.

Example: Carlsen - Radjabov (Tata, R9): 37... Qh6? (+13.7) 36. e5? (+5.9) ... 1-0. Carlsen sees that he's completely winning after 36. e5, and he misses a slightly faster computer win. Again, not really a "blunder," although it would be qualified as a blunder above.

If you did not take these effects into account in your analysis, then it is no wonder that these numbers may be higher than expected.
dfan dfan 1/31/2015 01:35
MrL2014: I bet some of those "2 pawn blunders" just reduced the evaluation from +7 to +5.
MrL2014 MrL2014 1/31/2015 01:10
You need to check your program for accuracy; 1% blunders means that almost every other game a GM makes a 2 pawn blunder (I am assuming a 50 moves per game, on average). Or, since there are 2 players for each game, each game has a 2 pawn blunder, statistically speaking.
Are you sure that you allowed enough time for analysis? Or is the 2 pawn number you used, 2 pawn indeed or is it 0.2 pawns?
KevinC KevinC 1/31/2015 12:03
Joe Doliner, you lost ALL credibility when you said that you use Crafty. It is nicknamed "Crapty" for a reason. You could have used Stockfish, or even the older free version of Houdini.