San Luis World Championship – who will win?

A statisticians view of the FIDE World Championship

By Jeff Sonas

The FIDE world championship tournament, which takes place from September 27th through October 16th in San Luis (Argentina), should prove to be a fascinating event. It is the first time in more than half a century that the FIDE (men's) world champion will be determined by a round robin tournament, rather than a two-player match or a huge knockout tournament. I have recently finished a detailed random simulation of the possible tournament outcomes, and it has produced some very interesting numbers.

By now you've probably read Nigel Short's article in which he questions Garry Kasparov's claim of a 95% chance that the tournament would either be won by Viswanathan Anand, Veselin Topalov, or Peter Leko. Nigel offered to take the remaining five players (Peter Svidler, Alexander Morozevich, Michael Adams, Judit Polgar, and Rustam Kasimdzhanov) if Kasparov would give him 17-to-1 odds for a $100 wager on the tournament's winner.

There is no doubt that a win by either Anand, Topalov, or Leko is likely, and of course there are many subjective considerations which are difficult to analyze numerically. However, the statistics do paint a more uncertain picture than that suggested by Kasparov. According to my calculations, the remaining five players have a combined 41% chance (not a mere 5% chance) to win the tournament, making those 17-to-1 odds seem pretty attractive! In addition, because Leko's strong tendency toward draws makes a large plus score unlikely for him, and it could easily require a +4 score to win the tournament, Peter Svidler is actually given a slightly greater chance to win the tournament than is Leko (12% vs. 11%), and Judit Polgar is right there in the same group with an 11% chance to win.

Viswanathan Anand seems to be the clear favorite, with a 31% chance to win. Veselin Topalov has the next-best prospects, with about a 17% chance, and each of the other six players has somewhere between an 8%–12% chance to win, except for current FIDE champion Rustam Kasimdzhanov, who is easily the lowest-rated participant and is only given one chance in thirty of winning the tournament. We will talk some more about the individual players further down, but first I want to discuss the format of the tournament itself, because it is a far cry from the FIDE championship knockout tournaments of recent years. We could debate endlessly about whether a tournament or a match is preferable, but instead I would prefer to emphasize that out of the possible tournament formats, this one is excellent.

It might be tempting to glance over the tournament rules and immediately start criticizing the fact that, just as in the past, a shared first place could ultimately be decided by rapid games, blitz games, or even the "Armageddon" sudden-death game. Of course, nobody wants a rapid game to decide the championship, but that is indeed a very real possibility in a knockout tournament, where there just aren't that many tiebreaking options other than proceeding to the faster-time-control games. You can't use head-to-head results, or number of wins, because they are always the same for both players. And "first win" (or "last win") is considered to be too unfair to the player who has Black first (or last). The FIDE championship was in fact determined by rapid games twice during the knockout era, in the first event when Anatoly Karpov defeated Anand in 1998, and then the last one when Kasimdzhanov defeated Michael Adams in 2004.

However, this concern is far less relevant in San Luis than it was for the knockout tournaments. In a round-robin event, particularly a long one, it is possible to set up the tiebreaking rules so that it’s almost certain that first place will be resolved via the classical games themselves. Over the course of the long tournament, there will be many differences in the quantity of wins for each player, and various head-to-head results can be used as well. Admittedly, some of the tiebreak criteria may seem somewhat arbitrary (e.g., why not “fewest losses” rather than “most wins”?) However, at least this way the players know beforehand what they are getting themselves into, and they can plan accordingly as the final rounds approach and the tiebreak situation crystalizes. Of course, you can always play the “What If?” game and revert to criticizing the ultimate tiebreaker (games at the faster time control), but for this particular event, and these particular rules, it is very unlikely that the championship will need to be resolved by the rapid/blitz games. More specifically, the odds are 38-to-1 against a need for rapid games to resolve the championship.

Let’s go through the various possibilities. First of all, remember that this is a very long event, fourteen rounds. That means it’s pretty likely that the players will sort themselves out enough; in fact, according to my calculations there’s almost an 80% chance that there will be a clear winner after fourteen rounds, meaning all of this concern about tiebreaks would be quite irrelevant. In other words, if we played this tournament 40 different times, we would see a clear first place winner 32 times, and we would see a shared first place only 8 times. What happens in those remaining 8 cases?

According to the rules, the first tiebreak criteria comes from head-to-head results among the tied players. For instance, if there was a three-way-tie, we would look at the head-to-head results among just those three players. If there is still a tie, then it falls through to the next criteria, which is to count up the total number of games won during the tournament by each player (against all opponents, even those who didn’t share first place). Most of the time, the criteria will suffice to determine a single winner. Only one time out of forty would it actually move on to rapid games. And even if it does get to rapid games, there’s about a 98% chance that the rapid tiebreaker would only involve two players, rather than the weird multi-player mini-round-robin that the rules provide for.

While we’re on the topic of rapid games, I do want to point out one other thing. I know that Rustam Kasimdzhanov’s victory in Tripoli last year was a huge surprise, but it actually could have been somewhat anticipated statistically, if rapid ratings had been used in the calculations. I didn’t use any, because there wasn’t an official FIDE rapid list handy. But in retrospect I do want to call attention to Stefan Fischl, who maintains a web site that includes an unofficial “rapid rating list” going back a few years. According to Stefan’s list, as of the start of the 2004 Tripoli tournament, Kasimdzhanov was ranked #2 at rapid chess among all 124 of the tournament participants, trailing only Veselin Topalov. Thus it is perhaps not that surprising that Kasimdzhanov was able to eliminate Alejandro Ramirez, Vassily Ivanchuk, Topalov, and finally Adams during the rapid games. And by now Kasimdzhanov’s unofficial rapid rating is second in the world among active players (behind only Viswanathan Anand and Garry Kasparov).

So if it does get to a rapid tiebreak, you might be interested to know that Anand’s unofficial rapid rating is more than sixty points higher than anyone else at San Luis, but the #2 and #3 spots (among San Luis participants) are held by underdogs Kasimdzhanov and Alexander Morozevich, with Judit Polgar far down on the rapid rating list, more than 200 points below Anand. I used those rapid ratings in my simulation model, but it really isn’t too significant since a rapid tiebreak is quite unlikely.

Enough about tiebreakers; let’s get back to rounds 1-14, the classical part of the tournament, which (as I’ve said) has a 97% chance of being sufficient to determine the next FIDE champion. Who is favored to win, and why?

My simulation model took several factors into consideration. The most important factor, of course, is the estimated strength for each player: their rating. Rather than just using the FIDE ratings, I have chosen to use my more accurate Chessmetrics rating formula to compute the strength of each player as of September 1st. I have also considered other factors such as White vs. Black strength for each player, along with their draw frequencies with each color. I have also looked for significant head-to-head results from the past, and finally (after a lot of agonizing) I decided to include a bonus/penalty for players who have done particularly well/poorly against 2700+ opposition. Taking all of these factors into consideration, and randomly simulating the entire tournament a million different times, this is how it turned out:

There are definitely some important differences between these numbers and what you would have expected from the FIDE rating list. First and foremost, Anand and Topalov are tied on the latest FIDE list, with Peter Leko being 25 rating points behind, and then there’s another 25-30 point gap before we get to Peter Svidler and Judit Polgar. Why, then, do I have Anand so far ahead of Topalov, and how did Svidler and Polgar catch up to Leko?

Well, it’s kind of hard to explain but I’ll give it a shot. You might remember an article I wrote a couple of years ago where I suggested replacing the current Elo formula with a simpler “linear” formula. My analysis showed that the Elo formula creates an unintentional bias against the players who tend to outrate their opponents by 100-200 points. For instance, if you have a 150-point rating advantage over your opponent, empirical data says you should score about 67%, whereas the Elo formula expects you to score 70%. So if you played 100 games against opposition rated 150 points below you, and let’s say you really did score 67/100 (exactly matching the empirical predictions), then the Elo formula would claim that you should have scored 70/100, and thus you had scored a full 3 points below expectations, and you would (undeservedly) lose 30 rating points.

In this tournament field, the players other than Kasimdzhanov who typically face the lowest-rated opponents (due to the events they tend to play in) are Peter Svidler and Alexander Morozevich. These two players have had an average rating advantage of about 100 points in their games from recent years, and so their FIDE ratings are quite lower than they really deserve, due to the aforementioned bias. At the other end of the spectrum, Peter Leko and Veselin Topalov face such strong opposition that on average they only outrate their opponents by 25-30 points. Leko and Topalov have not had this Elo bias working against them, so their FIDE ratings are a little higher than they deserve, relative to the others. And so what looked like a 25-point gap in strength between Leko and Svidler on the FIDE rating list, is revealed as just a function of the kinds of events they play in, and in fact Leko and Svidler are probably about the same strength, despite what the rating list would tell you. And whereas Anand faces about the same caliber of opponent that Topalov does, Anand’s rating is so high that he too outrates his typical opponent by about 100 points. Since the Elo formula places an unreasonable expectation upon Anand, his FIDE rating is also lower than he deserves.

That’s the simplistic explanation. There’s a lot more going on, because the rating formulas really are very different. For instance, the FIDE ratings don’t care that Judit Polgar took an entire year off, whereas my ratings (which account for inactivity) are sensitive to the time lapse. Topalov is playing much more frequently than he did a few years ago, and that is also affecting his rating. If anything, it’s amazing that the FIDE list and the Chessmetrics list match up so closely! However, you probably don’t care too much about the intricacies of rating calculations, so let’s just leave it at this: I have optimized my formula to provide maximal predictive power, and it says that Svidler is indeed as strong as Leko, and that Anand is somewhat stronger than Topalov, but who really knows the truth?

As long as we’re taking this nostalgic trip back through my past journalistic efforts, let me remind you of another article I wrote a few years back, trying to see whether past head-to-head results really have any bearing on future results. I know that the players think they do, that there are certain opponents they love to face and others they hate to face. In that article, I concluded that there really was no such effect, that it didn’t matter whether you’d over-performed or under-performed in the past against someone.

However, because the head-to-head results are so relevant in this tournament (it’s the first tiebreaker criteria), I figured I should revisit that investigation a little bit. So, I re-ran the analysis using the newer Chessmetrics ratings, and I checked to see whether accounting for past head-to-head results would in fact have improved the predictions of future matchups. It turns out that once you pass a certain level of significance, it does improve the future predictions if you correct for matchups where one player seems to have a particular “knack” for beating another

The most famous historical example is (of course) Vladimir Kramnik’s career mastery of Garry Kasparov. Over the course of his career, Kramnik took about 79 rating points from Kasparov (meaning that Kramnik scored 7.9 full points more than expected, across all their games). This is by far the largest amount in chess history. If they ever played each other again, this methodology would award Kramnik a special 26-rating-point bonus when facing Kasparov. Tied for second with 60 rating points each, are Viktor Korchnoi’s domination of Lev Polugaevsky, and Kramnik’s domination of Judit Polgar (which might become particularly relevant if Polgar does manage to win this tournament; Kramnik would receive a special 20-rating-point bonus against Polgar!) Down at #33 on the historical list is Leko’s historical overpeformance against Topalov, and way down at #102 on the list is Anand’s overperformance against Polgar. Those two San Luis matchups are the only ones that qualify as being “significant”. Thus in my model, I give Leko an extra 14-point rating advantage when he faces Topalov, and I give Anand an extra 11-point rating advantage against Polgar. But all in all, I think this factor is not particularly relevant.

I also decided to see whether certain players do particularly well (or particularly poorly) when facing elite opposition. I set the boundary at a 2700-level, and for each player I examined their historical results against players rated 2700+, and against lower-rated players, to see whether there was any unusual difference in results. Out of the eight participants, the three who have done unusually well against 2700+ opponents were Peter Svidler, Peter Leko, and Rustam Kasimdzhanov, and they all got a 7-rating-point bonus in my calculations (since this is such an elite event). On the other hand, Alexander Morozevich has historically done much better against lower-rated players, and not so well against 2700-level opposition, so he got an 11-rating-point penalty in my calculations. Veselin Topalov and Judit Polgar also received smaller penalties of this kind. Here is a summary of the various modifiers that I used:

Player	FIDE	CM	2700+	Final	Draw	Chances
Anand	2788	2794	–	2794	50%	31%
Topalov	2788	2767	-7	2760	43%	17%
Svidler	2738	2742	+7	2749	49%	12%
Leko	2763	2744	+7	2751	57%	11%
Polgar	2735	2741	-6	2735	41%	11%
Morozevich	2707	2727	-11	2716	37%	8%
Adams	2719	2723	+4	2727	53%	7%
Kasimdzhanov	2670	2671	+7	2678	38%	3%

In this list, I want to call your attention to the column about draw percentage. Overall, I expect a draw percentage around 46%. Although it is an elite event, and thus traditionally full of draws, the inclusion of Topalov, Polgar, Morozevich, and Kasimdzhanov should ensure plenty of decisive game, and this could have a very interesting result. Remember that a player’s total number of wins is one of the tiebreakers, and so players who win a lot and lose a few, will have a tangible advantage over other players who win a few but never lose. And there is another mathematical factor involved, which is that the riskier players stand a better chance of putting a string of wins together and building up a really high score to actually win the tournament.

To illustrate this, let’s compare Judit Polgar and Peter Leko. You can see in the above list that after all the factors are considered, Leko is given a “final San Luis rating” above 2750 whereas Polgar’s is down at 2735. Nevetheless, they are given identical 11% chance to win the tournament. Why is this? Again, kind of hard to explain but I’ll try. You’ve made it this far so you must be at least marginally interested. Players with a lot of decisive results have a “wider” bell curve of total scores, meaning that they have a pretty significant chance of a major plus score, along with a pretty significant chance of a large minus score. On the other hand, drawish players don’t have such a wide variation in possible scores:

You can see that Leko’s curve has a higher peak at an even score or a +1 score, so he is more likely than Polgar to end up with those scores. She is clearly more likely than Leko to put up a big minus score like -4 or -5, but despite her lower rating, once we get out in the +4 or +5 range, the unpredictable Polgar actually has a better chance than the drawish Leko to finish with such a high score. And since it is probably going to take at least a +3 score (or higher) to win the tournament, Polgar’s chances of winning the tournament are quite comparable to Leko’s. Her average score is lower, but that doesn’t matter as much as her chance of a high final score. You would get the same identical explanation for why Svidler actually has a slightly better chance to win the tournament than the higher-rated Leko, and for why Morozevich has a slightly better chance to win the tournament than the higher-rated Adams.

Finally, I want to provide some justification for what I said in the last paragraph about it requiring a +3 or +4 score to win the tournament. Remember that I simulated this tournament a million different times. Sometimes a +2 score was good enough for clear first place, and sometimes a +9 score was not even enough for a share of first place. There was even one simulated tournament where all eight players tied with a score of 7/14, and the classical tiebreak criteria were sufficient to bring it down to two players in the rapid tiebreak, where Svidler defeated Kasimdzhanov to win the title! However, by taking an aggregate average you can get a good sense of the overall trends in what it will probably take to win.

I can tell you that more than 70% of the time, the tournament will be won by somebody scoring either +3, +4, or +5, with the most likely winning score being +4. A score of +2 is probably not going to be good enough to win the tournament; in fact, the odds are about 17-to-1 against your becoming champion if you finish with a +2 score. Managing a +3 score is obviously more promising, but the odds are still 2-to-1 against you. On the other hand, a +4 score probably gives you a 51% chance to win clear first, and an additional 11% chance to share first and still win the title, meaning that overall you have a 62% chance to become champion if you score +4. Here is a graphic illustrating these numbers for all scores between +1 and +10.

Part of the fun of this is to watch as the numbers change over the course of a tournament, and to try to figure out why. Oftentimes there is something very significant going on, that I never would have noticed without digging a little bit more. My plan is to provide statistical updates on each rest day of the tournament, and to try to explain why the numbers have changed. I hope that you have enjoyed this article. I know that the statistical perspective is not the only perspective, and is not even the most important one. But perhaps it will provide a useful counterpoint to some of the more subjective approaches. In any event, I'll see you again on the first rest day. In the meantime, feel free to visit my Chessmetrics site or send me email about any of this.

Previous articles by Jeff Sonas

The Greatest Chess Player of All Time – Part IV
25.05.2005 So tell us already, who was the greatest chess performance of all time? After analysing and dissecting many different aspects of this question, Jeff Sonas wraps it up in the final installment of this series, awarding his all-time chess "Oscar" nomination to the overall greatest of all time.

The Greatest Chess Player of All Time – Part III
06.05.2005 What was the greatest chess performance of all time? Jeff Sonas has analysed the duration different players have stayed at the top of the ratings list, at the highest individual rating spikes and best tournament performances. Today he looks at the most impressive over-all tournament performances in history, and comes up with some very impressive statistics.

The Greatest Chess Player of All Time – Part I
24.04.2005 Last month Garry Kasparov retired from professional chess. Was he the greatest, most dominant chess player of all time? That is a question that can be interpreted in many different ways, and most answers will be extremely subjective. Jeff Sonas has conducted extensive historical research and applied ruthlesss statistics to seek a solution to an age-old debate.

FIDE Championship odds after round three
26.06.2004 The favorites are still the favorites, says Jeff Sonas. He has calculated the odds of each player in the sweet sixteen winning the event. Nakamura's 122 to 1 doesn't look so good, unless you placed your bet back when he was at 785 to 1! Plus, what is the best qualifying tournament system for a world championship?

Revised statistics for FIDE championship
22.06.2004 A week ago we published full statistics for participants of the world championship in Libya. After the first round the odds have changed significantly, due to a number of factors – especially the non-appearance of #2 seed Morozevich and the release of FIDE's July rating list. Jeff Sonas has recalculated the odds.

Who will be the next FIDE world champion?
14.06.2004 We cannot be 100% sure, but at least the statistical odds are 13% in favour of Morozevich or Topalov to win. Adams has a 9% chance, Grischuk 7%. And there is a 1 in 100 million possibility that the next FIDE world champion will be Tarik Abulhul of Libya. Read all about it in Jeff Sonas' World Championship Statistics.

Putting his money where his stats are
11.11.2003 We hope you've been keeping up with statistician Jeff Sonas' fascinating series of articles on man-machine chess. After looking at computer ratings, human playing styles, openings, and tossing it all into the analysis blender, he is ready to take the logical last step in this fifth and final article: predicting the result of the Kasparov-X3D Fritz match that starts today. The envelope please...

How (not) to play chess against computers
11.11.2003 Is there any way that human chess players can withstand the onslaught of increasingly powerful computer opponents? Only by modifying their own playing style, suggests statistician Jeff Sonas, who illustrates a fascinating link between chess aggression and failure against computers. There may still be a chance for humanity. More..

Physical Strength and Chess Expertise
07.11.2003 How can humans hope to hold their ground in their uphill struggle against chess computers? Play shorter matches, stop sacrificing material, and don't fear the Sicilian Defense, says statistician Jeff Sonas, who also questions the high computer ratings on the Swedish SSDF list. Here is his evidence.

Are chess computers improving faster than grandmasters?
17.10.2003 The battle between humans and machines over the chessbaord appears to be dead-even – in spite of giant leaps in computer technology. "Don't forget that human players are improving too," says statistician Jeff Sonas, who doesn't think it is inevitable that computers will surpass humans. Here is his statistical evidence.

ChessBase in Slashdot
09.10.2003 Once again Slashdot, the gigantic discussion forum for technophiles, is running a debate on one of our stories. If you have strong opinions about Jeff Sonas' article on Man vs Machine go to the Slashdot forum to debate the subject and vent your feelings.

Man vs Machine – who is winning?
08.10.2003 Every year computers are becoming stronger at chess, holding their own against the very strongest players. So very soon they will overtake their human counterparts. Right? Not necessarily, says statistician Jeff Sonas, who doesn't believe that computers will inevitably surpass the top humans. In a series of articles Jeff presents empirical evidence to support his claim.

Does Kasparov play 2800 Elo against a computer?
26.08.2003 On August 24 the well-known statistician Jeff Sonas presented an article entitled "How strong are the top chess programs?" In it he looked at the performance of top programs against humans, and attempted to estimate an Elo rating on the basis of these games. One of the programs, Brutus, is the work of another statistician, Dr Chrilly Donninger, who replies to Jeff Sonas.

Computers vs computers and humans
24.08.2003 The SSDF list ranks chess playing programs on the basis of 90,000 games. But these are games the computers played against each other. How does that correlate to playing strength against human beings? Statistician Jeff Sonas uses a number of recent tournaments to evaluate the true strength of the programs.

Dortmund statistics: who will win
04.08.2003 Five days ago we asked our visitors to predict who was going to win the Super GM in Dortmund. We agreed to accept predictions based on any of the great divination methods. Most used common sense, one, Jeff Sonas, ran a multi-million-game simulation to reach his conclusions. Here are the results.

The Sonas Rating Formula – Better than Elo?
22.10.2002 Every three months, FIDE publishes a list of chess ratings calculated by a formula that Professor Arpad Elo developed decades ago. This formula has served the chess world quite well for a long time. However, statistician Jeff Sonas believes that the time has come to make some significant changes to that formula. He presents his proposal in this milestone article.

The best of all possible world championships
14.04.2002 FIDE have recently concluded a world championship cycle, the Einstein Group is running their own world championship, and Yasser Seirawan has proposed a "fresh start". Now statistician Jeff Sonas has analysed the relative merits of these three (and 13,000 other possible) systems to find out which are the most practical, effective, inclusive and unbiased. There are some suprises in store (the FIDE system is no. 12,671 on the list, Seirawan's proposal is no. 345). More

SHOP

SHOP

San Luis World Championship – who will win?

ONLINE SHOP

The flexible Taimanov Sicilian + A Complete Guide for Black against the Anti-Sicilian

A statisticians view of the FIDE World Championship

By Jeff Sonas

Previous articles by Jeff Sonas

Discuss

Fritz 20

ChessBase Magazine Extra 225

Reinventing the Ragozin

Ruy Lopez for the tournament player - A Complete White Repertoire against the Marshall, Berlin & Co

ChessBase Magazine 225

Master Class Vol 1 to 18

Master Class Vol. 18: Max Euwe

Master the Pirc Defence! - A strategic and dynamic approach

Pop-up for detailed settings