Winning starts with what you know
The new version 18 offers completely new possibilities for chess training and analysis: playing style analysis, search for strategic themes, access to 6 billion Lichess games, player preparation by matching Lichess games, download Chess.com games with built-in API, built-in cloud engine and much more.
Following chess tournaments online, I often found myself asking the same question - who is most likely to win this tournament? While Kramnik might be leading Dortmund by a full point after three rounds, how likely is he to win it all? And if four players are tied with one round to go, how will tie breaks affect their chances?
I made my best educated guess, but it was little more than that. Out of curiosity, I began to design a tool to help answer these questions. I built a spreadsheet that could simulate the outcomes of chess games, filling in a cross table with hypothetical wins, draws, and losses and tabulating the results over thousands of trials - a prediction engine for chess tournaments. But it soon bogged down, and adding new ideas or algorithms was impractical. What I needed was the power and flexibility of custom software.
Fortunately my friend Chris Capobianco is a software engineer at Google. He agreed to take the coding on as a fun side project, and we started spending some evenings brainstorming and problem solving. We filled whiteboards with equations describing rating calculations, aggression factors, player biases, adjusted draw percentages and more that he eventually turned into dynamic, flexible software.
Given the time and effort required, we were soon reaching out to others to help out. We needed standardized data from a variety of sources and complicated statistical algorithms. We also needed deep learning software to improve the accuracy of the probability of each player ending up the winner from this complex system.
As with any human competition there are naturally factors that don’t easily lend themselves to number crunching. The player who gets a big lead and then slows down with a number of draws to “protect” that lead, or players taking a draw in the last round when they are not in contention for the top prize. Or a player taking a draw in the last round when it will only get him into a tiebreak, rather than taking a risk and playing for the win. But while these factors can be hard to handle algorithmically, we have been modeling some of them and incorporating our findings into the software. We don’t claim our model will ever be perfect. But we are excited by pushing it as far as it will go.
With those caveats aside, we first ran the software in advance of the 2016 Candidates Tournament. We simulated one million tournaments, tracking how many times each player won.
Caruana, Nakamura, and Giri head up the top of our simulated tournaments. But the results are very close, and the winning probabilities range from Caruana’s 19.0% to Svidler’s 5.8%. This was clearly a tournament that any of the eight can win.
With our software, however, a chart like the above is just the beginning. The real fun begins on Friday March 11th, when we start updating the program with actual tournament results. We entered wins in the first two rounds for Nakamura and saw his chances rise from 17.1% to 40.6%. There are a several reasons for this rise, including the fact that one of those two wins is against Caruana, a key rival for first place. The two wins also go a long way toward getting Nakamura through the first two tiebreaks of the tournament. Topalov winning his fifth, sixth, and seventh rounds brings him to 50.2%. His seventh round victim is Nakamura, who plunges to a 5.6% chance of winning the tournament as he then faces an uphill battle to gain back three points - and can no longer win a head-to-head tiebreak against Topalov. What if there are three players tied for first with one round to go? With the press of a button we can now discover the answer.
What one rule change would get Nakamura to become the predicted winner? Keep reading!
Gathering the data
Early in the process of building the software we knew that we would have a lot of data to gather to fuel the algorithms of the software. We started by looking at some of the public resources for data, such as the ages of the players and their recent FIDE ratings.
But all of this data brings up a difficult question. How far back do you go? Do you look at just the most recent ratings? Or do you take an average? And if so, over how many months/years?
We decided to look at many different data sets. One of our main tools in gathering the data was of course Mega Database 2016, which contains 6.4 million games. We used the ‘filter’ tool to create databases of the games of the eight competitors. For example, we created a set of games for each player in which both that player and his opponent were rated 2700 or above. This eliminated games against lower-rated opposition that might not be relevant to the Candidates Tournament, and also eliminated those games played when the candidates were young and not as highly rated.
We then eliminated other games that didn’t seem relevant, like blindfold, chess960, Basque, Advanced Chess, and Thematic tournaments. Once that cleanup was done, we divided the data into classical, rapid, and blitz time controls. Mega Database 2016 made this straightforward, allowing us to spend more time on analysis and less on sorting games.
One of the data sets we looked at was the games played amongst the candidates themselves, filtering down again to only 2700 vs. 2700. The data set included 864 games, though these were not evenly spread out. The most common matchup so far has been Anand/Aronian with 63 games, while the pairing of Svidler/Giri has happened just seven times.
But one of the biggest challenges was working through the issue of how far back to go with these data sets. Recent games offer only a small sample size, but games from too far back may not be relevant. We tried to look at different lengths of time and different sample sizes and then work with the ones that seemed to be the most effective.
Some of the data collection proved challenging. Digging up tiebreak rules from past tournaments, for example, meant scouring the web. Sometimes the official tournament website had no information or the website didn’t even exist. Tournament reports sometimes indicated that tiebreaks were used, but didn’t specify the type. As a result, we had to piece together the information as best we could.
Discovering insights
In many ways we acted like scientists. We came up with a hypothesis, gathered data, and then determined whether or not the hypothesis was supported. How does age factor in? What role do openings play in determining a player’s overall draw percentage? Again, Mega Database 2016 was a key resource.
We have also incorporated recent software advances such as data mining and deep learning as a way to bring fresh insights into our model. To that end, we’ve been working with DDNT, a Sydney-based consultancy founded by Dalibor Frtunik specializing in deep learning and predictive analysis. With deep learning, we could let sophisticated algorithms dig through large data sets to automatically find hidden causative factors.
Feeding the software
Once we had the data and insights, we tuned our model for the eight candidates. We used the data sets to do things like determine the likelihood of a player drawing, and the circumstances in which that should be adjusted upward or downward. We know that the expectation for white is about 54% (i.e. someone playing 100 games with the white pieces would be expected to score about 54 points), but under what circumstances can that vary, player by player, depending on the current tournament result?
Once we were confident in the data, the algorithms, and the code, we were ready to try out the software to see what it could tell us about tournaments.
Tiebreaks
One of the benefits of being able to simulate an entire tournament is that you can begin to see the effects of tiebreak rules at the macro level. For the Candidates Tournament our model calculates the winning scores at the end of the tournament and then applies the tiebreak rules as laid out by FIDE. The first tiebreak is head-to-head score, then number of wins, then Sonneborn-Berger score. We call these the “math” tie breaks, because they are determined simply by applying mathematical rules to the results of the tied players. If the math tiebreaks do not result in a single winner, then there are “game” tiebreaks applied. The first is a round of two rapid games, then if necessary two blitz games, and finally an Armageddon game.
Handling these tiebreaks required application of a lot of algorithms, and required our Player Profiles to include rapid and blitz proficiencies. Without any formal ratings lists for Armageddon, we improvised a bit. Having finished the tiebreak algorithms, we were able to ask some fundamental questions about the impact of tiebreaks on the player win probabilities.
How many tournaments need tiebreaks?
In our million-tournament simulation, about three quarters of them ended with a clear winner. Of the 24% of tournaments ending in a tie, the vast majority were decided by application of the math tiebreaks. Just 0.4% of tournaments required additional games to be played to break the tie. So Nakamura’s well-known rapid and blitz skills are unlikely to help him much in the Candidates Tournament - except of course during time pressure in the classical games.
How would Nakamura’s winning chances improve if the math tiebreaks were removed? We ran a million tournaments under those parameters to see what would happen.
While Caruana, Anand, and Karjakin see modest gains, Nakamura’s chances to win the tournament jump a couple of percentage points - enough to make him co-favorite with Caruana to win the tournament.
Winning score
With an average FIDE rating of about 2775, this is clearly going to be a very difficult tournament to win. But what exactly is it going to take to win? Out of 14 games, would 9.0 points be enough? Or maybe 9.5 points? In his interview with Frederic Friedel, GM Peter Heine Nielsen mentioned that +4 (or 9.0 points) would be a likely winning score. We had our simulation track a million outcomes, with the results shown below.
Peter’s intuition seems to be spot on here, with scores of 8.5/14 and 9.0/14 comprising almost 60% of the simulated tournaments. But when you look at a million runs, there are sometimes some very unusual results. Six times our simulation showed a winning score of 7.0/14 in which everyone ties with the same score and the winner is determined by tiebreaks. At the other end of the spectrum, in three tournaments the winner managed to score an impressive 13.0/14 for a performance rating of about 3200 - enough to get him on the cover of every chess magazine on the planet.
Player profiles
In simulating the 2016 Candidates Tournament, we created a Player Profile for each participant, gathering a number of different factors that could affect their probability of winning and losing. This helped make our simulations more accurate, but it also gave us some intriguing insights into the players.
One example of a factor in our Player Profiles is their probability of drawing a game. It lets us better understand the variance of the player’s final score. Players more likely to draw will get results in a tighter band than players with lower chances to draw.
So when GM Peter Heine Nielsen said in his interview that Karjakin had a high draw percentage, we decided to compare it to the draw percentage of the other candidates.
But in our model, Karjakin falls closer to the middle of the pack. We hypothesized that older players might have higher draw percentages, but in this case we have 21 year old Giri with the most draws and 40 year old Topalov with the fewest!
Bookmakers
There are quite a few Internet bookmakers who will take bets on the 2016 Candidates Tournament. We took the posted odds for nine sites, calculated the implied probabilities of winning, and then averaged those implied probabilities as shown in the graph below.
Conclusion
As the Candidates Tournament heats up, we will no longer have to wonder about each player’s chances. We will be generating simulation results in real time as it unfolds, so that we can enjoy the games and have a better understanding of how the tournament will play out.
About the authors
|
|
James Jorasch is the founder of Science House and an inventor named on more than 700 patents. He plays tournament chess, backgammon, Scrabble, and poker. He lives in Manhattan and is a member of the Marshall Chess Club. | |
Chris Capobianco is a software engineer at Google. He is a two-time finalist at the USA Memory Championships and has consulted on memory for finance, media, advertising and Fortune 500 companies such as Xerox and GE. |