The Deloitte/FIDE Chess Rating Challenge

By Jeff Sonas

The contest's sponsor, Deloitte Australia, has provided the $10,000 prize to be awarded to the team that submits the most accurate predictions. Deloitte is a preeminent provider of analytics globally and helps companies capture, manage and analyze their data as part of their overall business strategy.

For more than four decades the FIDE Elo system has served as the primary yardstick in the world for estimating the strength of chess players. Yet despite the popularity of the Elo system, it has never been conclusively shown to be superior to other rating systems at predicting future results. A player's Elo rating provides a single number summarizing their past history of results, but does it really tell us how the player will perform in their next event?

In an initial attempt to investigate this question last year, Jeff Sonas and Kaggle held an online "Elo versus the Rest of the World" contest, which was won by Yannis Sismanis (a research scientist from IBM’s Almaden Research Center). The contest, which required participants to develop predictive models that could forecast the results of chess games with great accuracy, was immensely popular among chess enthusiasts and data scientists, drawing more than 3,500 online submissions from 258 participating teams across 41 countries.

But even the massive efforts of this first contest were not sufficient to identify a clearly superior approach. There was a wide variety in the methodologies of just the top ten prizewinners, all of whom documented their approaches in significant detail on the Kaggle website after the completion of the contest. The benchmark submission of the Elo formula finished far, far, behind the prizewinners, in 141st place out of 258.

It is no longer really a question of "Elo versus the Rest of the World"; we must now hold a second contest and focus on identifying a truly superior system. This contest recently was launched on Kaggle, and runs until May 3rd. Contest participants will "train" their rating systems using a training dataset of over 1.84 million game results for more than 54,000 chess players across a recent eleven year period. Participants then use their method to predict the outcome of a further 100,000 games played among those same players during the following three months. Contest entries will be scored automatically by the website, based on the accuracy of their predictions. Two entries per day can be submitted by each team, and prizes will be determined according to each team's best-scoring single submission.

The prize structure reflects the dual nature of what a "superior" system would be. First, for those interested in the more scientific questions of which parameterized mathematical models for chess performance and chess strength are most suitable, or which techniques work best to optimize predictive models, there is the $10,000 prize provided by the professional services firm Deloitte. This prize will be awarded to the team that submits the most accurate predictions, no matter how complex the predictive model may be. ChessBase has also donated chess software with signatures by famous players, for the three teams finishing in 2nd/3rd/4th place in this category.

Second, for those interested in the more practical questions about how the Elo system could be modified to become more accurate, or if there is another approach that would retain the simplicity of the Elo system while predicting chess results more accurately, there is the special "FIDE prize". This prize will be awarded by FIDE representatives to what they consider to be the most promising approach, out of the ten most accurate entries that meet a restrictive definition of a "practical chess rating system". This restrictive definition is specified within the rules of the contest. For the selected winner, FIDE will provide air fare for a round trip flight to Athens, Greece, and full board for three nights in Athens, and payment toward other expenses, for one person to present and discuss their system during a special FIDE meeting of chess rating experts in Athens.

Many lessons were learned from the first contest, and the design of the second contest reflects this. FIDE has provided a complete dataset of multiple years of game-by-game and tournament-by-tournament results that were used for calculating the official FIDE ratings, and Jeff Sonas has cleaned and expanded the data by cross-referencing it with ChessBase game collections. So the second contest provides more than 30 times as many games as the first contest did, and a much larger population of chess players as well, reflecting the whole pool of FIDE-rated players, rather than just the fraction of top players covered by the first contest. Until now it has never been possible to perform this level of analysis, because the dataset had not been assembled, and the contest website is the only place in the world where this data is available.

Week One Update:

At the end of Week One, with eleven weeks still remaining in the contest, the leaderboard was already very crowded! The winner of the previous chess prediction contest, Yannis Sismanis, seized the lead in the first few days of the current contest, but Yannis was overtaken at the end of Week One by a collaborative team named “Pragmatic Theory”. You might recognize the team name; its members are Martin Chabbert and Martin Piotte, two members of the team that won the famous one million Dollar “Netflix Prize” competition in 2009 to predict the movie preferences of Netflix users. The predictions made by team Pragmatic Theory are already 9% more accurate than the best predictions that Jeff Sonas could wring out of the Elo approach. We can only imagine what further progress in the theory and practice of chess ratings will occur over the next several weeks!

Jeff Sonas in California

Jeff Sonas is a statistical chess analyst who invented the Chessmetrics system for rating chess players, which is intended as an improvement on the Elo rating system. He is the founder and proprietor of the Chessmetrics.com website, which gives Sonas' calculations of the ratings of current players and historical ratings going back as far as January 1843. Sonas writes: "Since the summer of 1999, I have spent countless hours analyzing chess statistics, inventing formulas and other analysis techniques, and calculating historical ratings." He has written dozens of articles since 1999 for ChessBase.com and other chess websites. He was a participant in the FIDE ratings committee meeting in Athens in June 2010. [Wikipedia]

In September last year Jeff visited Ken Thompson at his house in Sea Ranch, California. The two spent many hours discussing the subject of chess ratings and enjoying the beautiful Pacific coast that runs just a few yards from the house. Here are some pictorial impressions.

Two great minds with Stupid: Jeff Sonas (shorts), computer pioneer Ken Thompson
(kneeling) and Frederic Friedel (striped shirt), who instigated the encounter

Jeff and his family at a luncheon with friends

Whale spotting: Ken shows everyone the size of the one that swam by last week

Jeff with daughter Emma, 11, and son Jamie, just turned 5

Skipping with kelp on the Californian beach

Emma and Katie, age 14, capture the sunset on the West Coast of the USA

Computer scientists at work late at night: Ken Thompson and Emma Sonas

Links

Kaggle is a platform that allows companies, researchers, governments and other organizations to post their problems and have statisticians worldwide compete to predict the future (produce the best forecasts) or predict the past (find the best insights hiding in data). Statisticians on Kaggle are rated and ranked based on past performance so the competition host will know who are the smartest people in the room. Statistician Jeff Sonas and Kaggle teamed up to present a new competition in this grand tradition: it is “Elo versus the Rest of the World”, pitting hundreds of statisticians against Arpad Elo’s greatest legacy: the Elo rating system. We reported on this.

SHOP

SHOP

The Deloitte/FIDE Chess Rating Challenge

ONLINE SHOP

Silence the Sicilian - Win with the Alapin Variation (2.c3)