The Deloitte/FIDE Chess Rating Challenge
By Jeff Sonas
The
contest's sponsor, Deloitte Australia, has provided the $10,000 prize to be
awarded to the team that submits the most accurate predictions. Deloitte
is a preeminent provider of analytics globally and helps companies capture,
manage and analyze their data as part of their overall business strategy.
For more than four decades the FIDE Elo system has served as the primary yardstick
in the world for estimating the strength of chess players. Yet despite the popularity
of the Elo system, it has never been conclusively shown to be superior to other
rating systems at predicting future results. A player's Elo rating provides
a single number summarizing their past history of results, but does it really
tell us how the player will perform in their next event?
In an initial attempt to investigate this question last year, Jeff Sonas and
Kaggle held an online "Elo
versus the Rest of the World" contest, which was won
by Yannis Sismanis (a research scientist from IBM’s Almaden Research
Center). The contest, which required participants to develop predictive models
that could forecast the results of chess games with great accuracy, was immensely
popular among chess enthusiasts and data scientists, drawing more than 3,500
online submissions from 258 participating teams across 41 countries.
But even the massive efforts of this first contest were not sufficient to identify
a clearly superior approach. There was a wide variety in the methodologies of
just the top ten prizewinners, all of whom documented their approaches in significant
detail on the Kaggle website after the completion of the contest. The benchmark
submission of the Elo formula finished far, far, behind the prizewinners, in
141st place out of 258.
It is no longer really a question of "Elo versus the Rest of the World";
we must now hold a second contest and focus on identifying a truly superior
system. This contest recently was launched on Kaggle, and runs until May 3rd.
Contest participants will "train" their rating systems using a training
dataset of over 1.84 million game results for more than 54,000 chess players
across a recent eleven year period. Participants then use their method to predict
the outcome of a further 100,000 games played among those same players during
the following three months. Contest entries will be scored automatically by
the website, based on the accuracy of their predictions. Two entries per day
can be submitted by each team, and prizes will be determined according to each
team's best-scoring single submission.
The prize structure reflects the dual nature of what a "superior"
system would be. First, for those interested in the more scientific questions
of which parameterized mathematical models for chess performance and chess strength
are most suitable, or which techniques work best to optimize predictive models,
there is the $10,000 prize provided by the professional services firm Deloitte.
This prize will be awarded to the team that submits the most accurate predictions,
no matter how complex the predictive model may be. ChessBase has also donated
chess software with signatures by famous players, for the three teams finishing
in 2nd/3rd/4th place in this category.
Second, for those interested in the more practical questions about how the
Elo system could be modified to become more accurate, or if there is another
approach that would retain the simplicity of the Elo system while predicting
chess results more accurately, there is the special "FIDE prize".
This prize will be awarded by FIDE representatives to what they consider to
be the most promising approach, out of the ten most accurate entries that meet
a restrictive definition of a "practical chess rating system". This
restrictive definition is specified within the rules of the contest. For the
selected winner, FIDE will provide air fare for a round trip flight to Athens,
Greece, and full board for three nights in Athens, and payment toward other
expenses, for one person to present and discuss their system during a special
FIDE meeting of chess rating experts in Athens.
Many lessons were learned from the first contest, and the design of the second
contest reflects this. FIDE has provided a complete dataset of multiple years
of game-by-game and tournament-by-tournament results that were used for calculating
the official FIDE ratings, and Jeff Sonas has cleaned and expanded the data
by cross-referencing it with ChessBase game collections. So the second contest
provides more than 30 times as many games as the first contest did, and a much
larger population of chess players as well, reflecting the whole pool of FIDE-rated
players, rather than just the fraction of top players covered by the first contest.
Until now it has never been possible to perform this level of analysis, because
the dataset had not been assembled, and the contest website is the only place
in the world where this data is available.
Week One Update:
At the end of Week One, with eleven weeks still remaining in the contest,
the leaderboard was already very crowded! The winner of the previous chess prediction
contest, Yannis Sismanis, seized the lead in the first few days of the current
contest, but Yannis was overtaken at the end of Week One by a collaborative
team named “Pragmatic Theory”. You might recognize the team name;
its members are Martin Chabbert and Martin Piotte, two members of the team that
won the famous one million Dollar “Netflix
Prize” competition in 2009 to predict the movie preferences of Netflix
users. The predictions made by team Pragmatic Theory are already 9% more accurate
than the best predictions that Jeff Sonas could wring out of the Elo approach.
We can only imagine what further progress in the theory and practice of chess
ratings will occur over the next several weeks!
Jeff Sonas in California
Jeff Sonas is a statistical chess analyst who invented the Chessmetrics system
for rating chess players, which is intended as an improvement on the Elo rating
system. He is the founder and proprietor of the Chessmetrics.com
website, which gives Sonas' calculations of the ratings of current players
and historical ratings going back as far as January 1843. Sonas writes: "Since
the summer of 1999, I have spent countless hours analyzing chess statistics,
inventing formulas and other analysis techniques, and calculating historical
ratings." He has written dozens of articles since 1999 for ChessBase.com
and other chess websites. He was a participant in the FIDE ratings committee
meeting in Athens in June 2010. [Wikipedia]
In September last year Jeff visited Ken Thompson at his house in Sea Ranch,
California. The two spent many hours discussing the subject of chess ratings
and enjoying the beautiful Pacific coast that runs just a few yards from the
house. Here are some pictorial impressions.

Two great minds with Stupid: Jeff Sonas (shorts), computer pioneer Ken Thompson
(kneeling) and Frederic Friedel (striped shirt), who instigated the encounter

Jeff and his family at a luncheon with friends

Whale spotting: Ken shows everyone the size of the one that swam by last
week

Jeff with daughter Emma, 11, and son Jamie, just turned 5

Skipping with kelp on the Californian beach

Emma and Katie, age 14, capture the sunset on the West Coast of the USA

Computer scientists at work late at night: Ken Thompson and Emma Sonas
Links
Kaggle is a platform that allows companies, researchers, governments and other
organizations to post their problems and have statisticians worldwide compete
to predict the future (produce the best forecasts) or predict the past (find
the best insights hiding in data). Statisticians on Kaggle are rated and ranked
based on past performance so the competition host will know who are the smartest
people in the room. Statistician Jeff Sonas and Kaggle teamed up to present
a new competition in this grand tradition: it is “Elo
versus the Rest of the World”, pitting hundreds of statisticians against
Arpad Elo’s greatest legacy: the Elo rating system. We reported
on this.