Kaggle
is a platform that allows companies, researchers, governments and other organizations
to post their problems and have statisticians worldwide compete to predict the
future (produce the best forecasts) or predict the past (find the best insights
hiding in data). Statisticians on Kaggle are rated and ranked based on past
performance so the competition host will know who are the smartest people in
the room. Statistician Jeff Sonas and Kaggle teamed up to present a new competition
in this grand tradition: it is “Elo versus
the Rest of the World”, pitting hundreds of statisticians against Arpad
Elo’s greatest legacy: the Elo rating system. We reported
on this.
Progress Update: Elo versus the Rest of the World
By Jeff Sonas
We have just passed the halfway mark of the "Elo vs. the Rest of the World"
contest, scheduled to end on November 14th. The contest is based upon the premise
that a primary purpose of any chess rating system is to accurately assess the
current strength of players, and we can measure the accuracy of a rating system
by seeing how well the ratings do at predicting players' results in upcoming
events. The winner of the contest will be the one whose rating system does the
best job at predicting the results of a set of 7,800 games played recently among
players rated 2200+.
So far we have had an unprecedented level of participation, with 162 different
teams submitting entries to the contest! There is also a very active discussion
forum to promote the free flow of ideas, although many teams are still hesitant
to share too many details about their approach (especially considering that
the winner will receive a copy of Fritz signed by Garry Kasparov, Viswanathan
Anand, Anatoly Karpov, and Viktor Korchnoi) Both Chessbase and Kaggle have donated
generous prizes, to be awarded to top-performing participants who are willing
to share their methodology publicly.

First prize is a copy of Fritz signed by all-time greats Garry Kasparov,
Viswanathan Anand, Anatoly Karpov, and Vladimir Kramnik!
The Kaggle website provides access to a training dataset of 65,000 recent chess
games, that can be used to train their rating system, as well as a test dataset
(a list of 7,800 recent chess games whose outcomes must be predicted and submitted).
The website (which knows the results of those 7,800 games) automatically scores
each submission and maintains a public leaderboard of each participant's best-performing
entry. A more complete score for each entry is kept secret so that participants
cannot "decode" the leaderboard, and this secret list determines the
final winners, to be announced at the end of the contest.
A wide range of approaches has been used, including almost every known chess
rating system as well as other tries involving neural networks, machine learning,
data mining, business intelligence tools, and artificial intelligence. In fact
over 1,600 different tries have been submitted so far, and we anticipate far
more submissions as the competition heats up over the final seven weeks.

Filipe Maia, a PhD student at Janos Hajdu Molecular
Biophysics group at Uppsala University
The #1 spot is currently held by Portuguese physicist Filipe Maia, who confesses
to little knowledge about statistics or chess ratings, but is nevertheless managing
to lead the competition! He is also the author of El Turco, the first-ever Portuguese
chess engine. Out of the current top ten teams on the leaderboard, seven use
variants of the Chessmetrics rating system, two are modified Elo systems, and
one is a "home-grown variant of ensemble recursive binary partitioning".
That last approach belongs to the #3 team on the public leaderboard, a team
known as "Old Dogs With New Tricks". This team is a collaborative
effort between Dave Slate and Peter Frey, both prominent leaders in computer
chess for many years.
Although the "Old Dogs With New Tricks" team clearly has a lot of
chess expertise, and the #2 spot is held by Israeli mathematician and chess
player Uri Blass (FIDE rating 2051), the top ten or twenty teams are primarily
comprised of mathematicians, data miners, and other scientists having minimal
direct experience with chess or chess ratings. This suggests that experts on
chess rating theory might still have a lot to learn from experts in other fields,
which of course is one of the desired outcomes of this contest! We have attracted
interest from around the globe, with the top twenty comprised of participants
from Portugal, Israel, USA, Germany, Australia, UK, Singapore, Denmark, and
Ecuador.
As the organizer of the contest, I have "benchmarked" several prominent
rating systems, starting with Chessmetrics, Elo, PCA, and Glicko/Glicko-2. Other
systems (including TrueSkill) will also be benchmarked in the near future. A
"benchmark" consists of implementing those systems, optimizing any
parameters for predictive power, submitting predictions based on their ratings,
and publicly describing the details of the methodology in the discussion forum.
These benchmark entries help other competitors to gauge the success of their
own entries and to get some ideas of what other people have tried in the past.
If you are interested in learning more about any of the benchmarked systems,
you can find detailed descriptions in the discussion forum on the contest website.
Currently, out of 162 teams, the benchmarks hold the following rankings:
Chessmetrics Benchmark: |
#10 |
Glicko-2 Benchmark: |
#38 |
Glicko Benchmark: |
#39 |
PCA Benchmark: |
#66 |
Elo Benchmark: |
#82 |
Thus it is becoming increasingly clear that there are many alternative approaches
that seem more accurate than the Elo system, being more effective at measuring
players' current strength and predicting players' results in upcoming events.
However, predictive power and accuracy are not the only yardsticks to use in
assessing a rating system; it is clear that inertia, familiarity, and simplicity
are powerful advantages of the Elo system…
The first half of the contest has been a great success and we look forward
to a very competitive and productive second half! Please come visit the contest
website and join the fun!
Links