A statisticians view of the FIDE World Championship
By Jeff Sonas
The FIDE world championship tournament, which takes place from September 27th
through October 16th in San Luis (Argentina), should prove to be a fascinating
event. It is the first time in more than half a century that the FIDE (men's)
world champion will be determined by a round robin tournament, rather than
a two-player match or a huge knockout tournament. I have recently finished
a detailed random simulation of the possible tournament outcomes, and it has
produced some very interesting numbers.
By now you've probably read Nigel
Short's article in which he questions Garry Kasparov's claim of a 95% chance
that the tournament would either be won by Viswanathan Anand, Veselin Topalov,
or Peter Leko. Nigel offered to take the remaining five players (Peter Svidler,
Alexander Morozevich, Michael Adams, Judit Polgar, and Rustam Kasimdzhanov)
if Kasparov would give him 17-to-1 odds for a $100 wager on the tournament's
winner.
There is no doubt that a win by either Anand, Topalov, or Leko is likely,
and of course there are many subjective considerations which are difficult
to analyze numerically. However, the statistics do paint a more uncertain picture
than that suggested by Kasparov. According to my calculations, the remaining
five players have a combined 41% chance (not a mere 5% chance) to win the tournament,
making those 17-to-1 odds seem pretty attractive! In addition, because Leko's
strong tendency toward draws makes a large plus score unlikely for him, and
it could easily require a +4 score to win the tournament, Peter Svidler is
actually given a slightly greater chance to win the tournament than is Leko
(12% vs. 11%), and Judit Polgar is right there in the same group with an 11%
chance to win.
Viswanathan Anand seems to be the clear favorite, with a 31% chance to win.
Veselin Topalov has the next-best prospects, with about a 17% chance, and each
of the other six players has somewhere between an 8%–12% chance to win,
except for current FIDE champion Rustam Kasimdzhanov, who is easily the lowest-rated
participant and is only given one chance in thirty of winning the tournament.
We will talk some more about the individual players further down, but first
I want to discuss the format of the tournament itself, because it is a far
cry from the FIDE championship knockout tournaments of recent years. We could
debate endlessly about whether a tournament or a match is preferable, but instead
I would prefer to emphasize that out of the possible tournament formats,
this one is excellent.
It might be tempting to glance over the tournament rules and immediately start
criticizing the fact that, just as in the past, a shared first place could
ultimately be decided by rapid games, blitz games, or even the "Armageddon"
sudden-death game. Of course, nobody wants a rapid game to decide the championship,
but that is indeed a very real possibility in a knockout tournament, where
there just aren't that many tiebreaking options other than proceeding to the
faster-time-control games. You can't use head-to-head results, or number of
wins, because they are always the same for both players. And "first win"
(or "last win") is considered to be too unfair to the player who
has Black first (or last). The FIDE championship was in fact determined by
rapid games twice during the knockout era, in the first event when Anatoly
Karpov defeated Anand in 1998, and then the last one when Kasimdzhanov defeated
Michael Adams in 2004.
However, this concern is far less relevant in San Luis than it was for the
knockout tournaments. In a round-robin event, particularly a long one, it is
possible to set up the tiebreaking rules so that it’s almost certain
that first place will be resolved via the classical games themselves. Over
the course of the long tournament, there will be many differences in the quantity
of wins for each player, and various head-to-head results can be used as well.
Admittedly, some of the tiebreak criteria may seem somewhat arbitrary (e.g.,
why not “fewest losses” rather than “most wins”?) However,
at least this way the players know beforehand what they are getting themselves
into, and they can plan accordingly as the final rounds approach and the tiebreak
situation crystalizes. Of course, you can always play the “What If?”
game and revert to criticizing the ultimate tiebreaker (games at the faster
time control), but for this particular event, and these particular rules, it
is very unlikely that the championship will need to be resolved by the rapid/blitz
games. More specifically, the odds are 38-to-1 against a need for rapid games
to resolve the championship.
Let’s go through the various possibilities. First of all, remember that
this is a very long event, fourteen rounds. That means it’s pretty likely
that the players will sort themselves out enough; in fact, according to my
calculations there’s almost an 80% chance that there will be a clear
winner after fourteen rounds, meaning all of this concern about tiebreaks would
be quite irrelevant. In other words, if we played this tournament 40 different
times, we would see a clear first place winner 32 times, and we would see a
shared first place only 8 times. What happens in those remaining 8 cases?
According to the rules, the first tiebreak criteria comes from head-to-head
results among the tied players. For instance, if there was a three-way-tie,
we would look at the head-to-head results among just those three players. If
there is still a tie, then it falls through to the next criteria, which is
to count up the total number of games won during the tournament by each player
(against all opponents, even those who didn’t share first place). Most
of the time, the criteria will suffice to determine a single winner. Only one
time out of forty would it actually move on to rapid games. And even if it
does get to rapid games, there’s about a 98% chance that the rapid tiebreaker
would only involve two players, rather than the weird multi-player mini-round-robin
that the rules provide for.
While we’re on the topic of rapid games, I do want to point out one
other thing. I know that Rustam Kasimdzhanov’s victory in Tripoli last
year was a huge surprise, but it actually could have been somewhat anticipated
statistically, if rapid ratings had been used in the calculations. I didn’t
use any, because there wasn’t an official FIDE rapid list handy. But
in retrospect I do want to call attention to Stefan Fischl, who maintains a
web site that includes an unofficial “rapid rating list” going
back a few years. According to Stefan’s list, as of the start of the
2004 Tripoli tournament, Kasimdzhanov was ranked #2 at rapid chess among all
124 of the tournament participants, trailing only Veselin Topalov. Thus it
is perhaps not that surprising that Kasimdzhanov was able to eliminate Alejandro
Ramirez, Vassily Ivanchuk, Topalov, and finally Adams during the rapid games.
And by now Kasimdzhanov’s unofficial rapid rating is second in the world
among active players (behind only Viswanathan Anand and Garry Kasparov).
So if it does get to a rapid tiebreak, you might be interested to know that
Anand’s unofficial rapid rating is more than sixty points higher than
anyone else at San Luis, but the #2 and #3 spots (among San Luis participants)
are held by underdogs Kasimdzhanov and Alexander Morozevich, with Judit Polgar
far down on the rapid rating list, more than 200 points below Anand. I used
those rapid ratings in my simulation model, but it really isn’t too significant
since a rapid tiebreak is quite unlikely.
Enough about tiebreakers; let’s get back to rounds 1-14, the classical
part of the tournament, which (as I’ve said) has a 97% chance of being
sufficient to determine the next FIDE champion. Who is favored to win, and
why?
My simulation model took several factors into consideration. The most important
factor, of course, is the estimated strength for each player: their rating.
Rather than just using the FIDE ratings, I have chosen to use my more accurate
Chessmetrics rating formula to compute the strength of each player as of September
1st. I have also considered other factors such as White vs. Black strength
for each player, along with their draw frequencies with each color. I have
also looked for significant head-to-head results from the past, and finally
(after a lot of agonizing) I decided to include a bonus/penalty for players
who have done particularly well/poorly against 2700+ opposition. Taking all
of these factors into consideration, and randomly simulating the entire tournament
a million different times, this is how it turned out:
There are definitely some important differences between these numbers and
what you would have expected from the FIDE rating list. First and foremost,
Anand and Topalov are tied on the latest FIDE list, with Peter Leko being 25
rating points behind, and then there’s another 25-30 point gap before
we get to Peter Svidler and Judit Polgar. Why, then, do I have Anand so far
ahead of Topalov, and how did Svidler and Polgar catch up to Leko?
Well, it’s kind of hard to explain but I’ll give it a shot. You
might remember an article I wrote a couple of years ago where I suggested replacing
the current Elo formula with a simpler “linear” formula. My analysis
showed that the Elo formula creates an unintentional bias against the players
who tend to outrate their opponents by 100-200 points. For instance, if you
have a 150-point rating advantage over your opponent, empirical data says you
should score about 67%, whereas the Elo formula expects you to score 70%. So
if you played 100 games against opposition rated 150 points below you, and
let’s say you really did score 67/100 (exactly matching the empirical
predictions), then the Elo formula would claim that you should have scored
70/100, and thus you had scored a full 3 points below expectations, and you
would (undeservedly) lose 30 rating points.
In this tournament field, the players other than Kasimdzhanov who typically
face the lowest-rated opponents (due to the events they tend to play in) are
Peter Svidler and Alexander Morozevich. These two players have had an average
rating advantage of about 100 points in their games from recent years, and
so their FIDE ratings are quite lower than they really deserve, due to the
aforementioned bias. At the other end of the spectrum, Peter Leko and Veselin
Topalov face such strong opposition that on average they only outrate their
opponents by 25-30 points. Leko and Topalov have not had this Elo bias working
against them, so their FIDE ratings are a little higher than they deserve,
relative to the others. And so what looked like a 25-point gap in strength
between Leko and Svidler on the FIDE rating list, is revealed as just a function
of the kinds of events they play in, and in fact Leko and Svidler are probably
about the same strength, despite what the rating list would tell you. And whereas
Anand faces about the same caliber of opponent that Topalov does, Anand’s
rating is so high that he too outrates his typical opponent by about 100 points.
Since the Elo formula places an unreasonable expectation upon Anand, his FIDE
rating is also lower than he deserves.
That’s the simplistic explanation. There’s a lot more going on,
because the rating formulas really are very different. For instance, the FIDE
ratings don’t care that Judit Polgar took an entire year off, whereas
my ratings (which account for inactivity) are sensitive to the time lapse.
Topalov is playing much more frequently than he did a few years ago, and that
is also affecting his rating. If anything, it’s amazing that the FIDE
list and the Chessmetrics list match up so closely! However, you probably don’t
care too much about the intricacies of rating calculations, so let’s
just leave it at this: I have optimized my formula to provide maximal predictive
power, and it says that Svidler is indeed as strong as Leko, and that Anand
is somewhat stronger than Topalov, but who really knows the truth?
As long as we’re taking this nostalgic trip back through my past journalistic
efforts, let me remind you of another article I wrote a few years back, trying
to see whether past head-to-head results really have any bearing on future
results. I know that the players think they do, that there are certain opponents
they love to face and others they hate to face. In that article, I concluded
that there really was no such effect, that it didn’t matter whether you’d
over-performed or under-performed in the past against someone.
However, because the head-to-head results are so relevant in this tournament
(it’s the first tiebreaker criteria), I figured I should revisit that
investigation a little bit. So, I re-ran the analysis using the newer Chessmetrics
ratings, and I checked to see whether accounting for past head-to-head results
would in fact have improved the predictions of future matchups. It turns out
that once you pass a certain level of significance, it does improve the future
predictions if you correct for matchups where one player seems to have a particular
“knack” for beating another
The most famous historical example is (of course) Vladimir Kramnik’s
career mastery of Garry Kasparov. Over the course of his career, Kramnik took
about 79 rating points from Kasparov (meaning that Kramnik scored 7.9 full
points more than expected, across all their games). This is by far the largest
amount in chess history. If they ever played each other again, this methodology
would award Kramnik a special 26-rating-point bonus when facing Kasparov. Tied
for second with 60 rating points each, are Viktor Korchnoi’s domination
of Lev Polugaevsky, and Kramnik’s domination of Judit Polgar (which might
become particularly relevant if Polgar does manage to win this tournament;
Kramnik would receive a special 20-rating-point bonus against Polgar!) Down
at #33 on the historical list is Leko’s historical overpeformance against
Topalov, and way down at #102 on the list is Anand’s overperformance
against Polgar. Those two San Luis matchups are the only ones that qualify
as being “significant”. Thus in my model, I give Leko an extra
14-point rating advantage when he faces Topalov, and I give Anand an extra
11-point rating advantage against Polgar. But all in all, I think this factor
is not particularly relevant.
I also decided to see whether certain players do particularly well (or particularly
poorly) when facing elite opposition. I set the boundary at a 2700-level, and
for each player I examined their historical results against players rated 2700+,
and against lower-rated players, to see whether there was any unusual difference
in results. Out of the eight participants, the three who have done unusually
well against 2700+ opponents were Peter Svidler, Peter Leko, and Rustam Kasimdzhanov,
and they all got a 7-rating-point bonus in my calculations (since this is such
an elite event). On the other hand, Alexander Morozevich has historically done
much better against lower-rated players, and not so well against 2700-level
opposition, so he got an 11-rating-point penalty in my calculations. Veselin
Topalov and Judit Polgar also received smaller penalties of this kind. Here
is a summary of the various modifiers that I used:
Player |
FIDE |
CM |
2700+ |
Final |
Draw |
Chances |
Anand |
2788 |
2794 |
–
|
2794 |
50% |
31% |
Topalov |
2788 |
2767 |
-7 |
2760 |
43% |
17% |
Svidler |
2738 |
2742 |
+7 |
2749 |
49% |
12% |
Leko |
2763 |
2744 |
+7 |
2751 |
57% |
11% |
Polgar |
2735 |
2741 |
-6 |
2735 |
41% |
11% |
Morozevich |
2707 |
2727 |
-11 |
2716 |
37% |
8% |
Adams |
2719 |
2723 |
+4 |
2727 |
53% |
7% |
Kasimdzhanov |
2670 |
2671 |
+7 |
2678 |
38% |
3% |
In this list, I want to call your attention to the column about draw percentage.
Overall, I expect a draw percentage around 46%. Although it is an elite event,
and thus traditionally full of draws, the inclusion of Topalov, Polgar, Morozevich,
and Kasimdzhanov should ensure plenty of decisive game, and this could have
a very interesting result. Remember that a player’s total number of wins
is one of the tiebreakers, and so players who win a lot and lose a few, will
have a tangible advantage over other players who win a few but never lose.
And there is another mathematical factor involved, which is that the riskier
players stand a better chance of putting a string of wins together and building
up a really high score to actually win the tournament.
To illustrate this, let’s compare Judit Polgar and Peter Leko. You can
see in the above list that after all the factors are considered, Leko is given
a “final San Luis rating” above 2750 whereas Polgar’s is
down at 2735. Nevetheless, they are given identical 11% chance to win the tournament.
Why is this? Again, kind of hard to explain but I’ll try. You’ve
made it this far so you must be at least marginally interested. Players with
a lot of decisive results have a “wider” bell curve of total scores,
meaning that they have a pretty significant chance of a major plus score, along
with a pretty significant chance of a large minus score. On the other hand,
drawish players don’t have such a wide variation in possible scores:

You can see that Leko’s curve has a higher peak at an even score or
a +1 score, so he is more likely than Polgar to end up with those scores. She
is clearly more likely than Leko to put up a big minus score like -4 or -5,
but despite her lower rating, once we get out in the +4 or +5 range, the unpredictable
Polgar actually has a better chance than the drawish Leko to finish with such
a high score. And since it is probably going to take at least a +3 score (or
higher) to win the tournament, Polgar’s chances of winning the tournament
are quite comparable to Leko’s. Her average score is lower, but that
doesn’t matter as much as her chance of a high final score. You would
get the same identical explanation for why Svidler actually has a slightly
better chance to win the tournament than the higher-rated Leko, and for why
Morozevich has a slightly better chance to win the tournament than the higher-rated
Adams.
Finally, I want to provide some justification for what I said in the last
paragraph about it requiring a +3 or +4 score to win the tournament. Remember
that I simulated this tournament a million different times. Sometimes a +2
score was good enough for clear first place, and sometimes a +9 score was not
even enough for a share of first place. There was even one simulated tournament
where all eight players tied with a score of 7/14, and the classical tiebreak
criteria were sufficient to bring it down to two players in the rapid tiebreak,
where Svidler defeated Kasimdzhanov to win the title! However, by taking an
aggregate average you can get a good sense of the overall trends in what it
will probably take to win.
I can tell you that more than 70% of the time, the tournament will be won
by somebody scoring either +3, +4, or +5, with the most likely winning score
being +4. A score of +2 is probably not going to be good enough to win the
tournament; in fact, the odds are about 17-to-1 against your becoming champion
if you finish with a +2 score. Managing a +3 score is obviously more promising,
but the odds are still 2-to-1 against you. On the other hand, a +4 score probably
gives you a 51% chance to win clear first, and an additional 11% chance to
share first and still win the title, meaning that overall you have a 62% chance
to become champion if you score +4. Here is a graphic illustrating these numbers
for all scores between +1 and +10.
Part of the fun of this is to watch as the numbers change over the course
of a tournament, and to try to figure out why. Oftentimes there is something
very significant going on, that I never would have noticed without digging
a little bit more. My plan is to provide statistical updates on each rest day
of the tournament, and to try to explain why the numbers have changed. I hope
that you have enjoyed this article. I know that the statistical perspective
is not the only perspective, and is not even the most important one. But perhaps
it will provide a useful counterpoint to some of the more subjective approaches.
In any event, I'll see you again on the first rest day. In the meantime, feel
free to visit my Chessmetrics site or send me email about any of this.
Previous articles by Jeff Sonas
The
Greatest Chess Player of All Time – Part IV
25.05.2005
So
tell us already, who was the greatest chess performance of all time?
After analysing and dissecting many different aspects of this question,
Jeff Sonas wraps it up in the final installment of this series, awarding
his all-time chess "Oscar" nomination to the overall greatest
of all time.
|
The
Greatest Chess Player of All Time – Part III
06.05.2005
What
was the greatest chess performance of all time? Jeff Sonas has analysed
the duration different players have stayed at the top of the ratings
list, at the highest individual rating spikes and best tournament performances.
Today he looks at the most impressive over-all tournament performances
in history, and comes up with some very
impressive statistics.
|
The
Greatest Chess Player of All Time – Part I
24.04.2005
Last
month Garry Kasparov retired from professional chess. Was he the greatest,
most dominant chess player of all time? That is a question that can
be interpreted in many different ways, and most answers will be extremely
subjective. Jeff Sonas has conducted extensive historical research
and applied ruthlesss statistics to seek a solution
to an age-old debate.
|
FIDE
Championship odds after round three
26.06.2004
The
favorites are still the favorites, says Jeff Sonas. He has calculated
the odds of each player in the sweet sixteen winning the event. Nakamura's
122 to 1 doesn't look so good, unless you placed your bet back when
he was at 785 to 1! Plus, what is the best qualifying tournament system
for
a world championship?
|
Revised
statistics for FIDE championship
22.06.2004
A
week ago we published full statistics for participants of the world
championship in Libya. After the first round the odds have changed
significantly, due to a number of factors – especially the non-appearance
of #2 seed Morozevich and the release of FIDE's July rating list. Jeff
Sonas has recalculated
the odds.
|
Who
will be the next FIDE world champion?
14.06.2004
We
cannot be 100% sure, but at least the statistical odds are 13% in favour
of Morozevich or Topalov to win. Adams has a 9% chance, Grischuk 7%.
And there is a 1 in 100 million possibility that the next FIDE world
champion will be Tarik Abulhul of Libya. Read all about it in Jeff
Sonas' World
Championship Statistics.
|
Putting
his money where his stats are
11.11.2003
We
hope you've been keeping up with statistician Jeff Sonas' fascinating
series of articles on man-machine chess. After looking at computer
ratings, human playing styles, openings, and tossing it all into the
analysis blender, he is ready to take the logical last step in this
fifth and final article: predicting the result of the Kasparov-X3D
Fritz match that starts today. The
envelope please...
|
How
(not) to play chess against computers
11.11.2003
Is
there any way that human chess players can withstand the onslaught
of increasingly powerful computer opponents? Only by modifying their
own playing style, suggests statistician Jeff Sonas, who illustrates
a fascinating link between chess aggression and failure against computers.
There may still be a chance for humanity. More..
|
Physical
Strength and Chess Expertise
07.11.2003
How
can humans hope to hold their ground in their uphill struggle against
chess computers? Play shorter matches, stop sacrificing material, and
don't fear the Sicilian Defense, says statistician Jeff Sonas, who
also questions the high computer ratings on the Swedish SSDF list.
Here
is his evidence.
|
Are
chess computers improving faster than grandmasters?
17.10.2003
The
battle between humans and machines over the chessbaord appears to be
dead-even – in spite of giant leaps in computer technology. "Don't
forget that human players are improving too," says statistician Jeff
Sonas, who doesn't think it is inevitable that computers will surpass
humans. Here is his statistical
evidence.
|
ChessBase
in Slashdot
09.10.2003
Once
again Slashdot, the gigantic discussion forum for technophiles, is
running a debate on one of our stories. If you have strong opinions
about Jeff
Sonas' article on Man vs Machine go to the Slashdot
forum to debate the subject and vent your feelings.
|
Man
vs Machine – who is winning?
08.10.2003
Every
year computers are becoming stronger at chess, holding their own against
the very strongest players. So very soon they will overtake their human
counterparts. Right? Not necessarily, says statistician Jeff Sonas,
who doesn't believe that computers will inevitably surpass the top
humans. In a series of articles Jeff presents empirical
evidence to support his claim.
|
Does
Kasparov play 2800 Elo against a computer?
26.08.2003
On
August 24 the well-known statistician Jeff Sonas presented an article
entitled "How
strong are the top chess programs?" In it he looked at the
performance of top programs against humans, and attempted to estimate
an Elo rating on the basis of these games. One of the programs, Brutus,
is the work of another statistician, Dr Chrilly Donninger, who replies
to Jeff Sonas.
|
Computers
vs computers and humans
24.08.2003
The
SSDF
list ranks chess playing programs on the basis of 90,000 games.
But these are games the computers played against each other. How does
that correlate to playing strength against human beings? Statistician
Jeff Sonas uses a number of recent tournaments to evaluate the true
strength of the programs.
|
Dortmund
statistics: who will win
04.08.2003
Five
days ago we asked our visitors to predict who was going to win the
Super GM in Dortmund. We agreed to accept predictions based on any
of the great divination methods. Most used common sense, one, Jeff
Sonas, ran a multi-million-game simulation to reach his conclusions.
Here
are the results.
|
The
Sonas Rating Formula – Better than Elo?
22.10.2002
Every
three months, FIDE publishes a list of chess ratings calculated by
a formula that Professor Arpad Elo developed decades ago. This formula
has served the chess world quite well for a long time. However, statistician
Jeff Sonas believes that the time has come to make some significant
changes to that formula. He presents his proposal in this milestone
article.
|
The
best of all possible world championships
14.04.2002
FIDE
have recently concluded a world
championship cycle, the Einstein Group is running their own world
championship, and Yasser Seirawan has proposed a "fresh
start". Now statistician Jeff Sonas has analysed the relative merits
of these three (and 13,000 other possible) systems to find out which
are the most practical, effective, inclusive and unbiased. There are
some suprises in store (the FIDE system is no. 12,671 on the list,
Seirawan's proposal is no. 345). More
|