Championship Chessmetrics Analysis
By Jeff Sonas.
We are cursed to live in "interesting times" in the chess world. We have two different organizations sponsoring their own versions of the World Championship, and the top-rated player in the world wants no part of either championship. Countless proposals and unification plans have been suggested, and rejected, and there is no end in sight.
I am a relatively weak chess player, but a very strong computer programmer and statistician. Most of all, I am a big fan of chess, and I want to help. I have little to contribute in the arenas of business plans, organizational details, or negotiations, but nevertheless I do have something quite useful to offer. I have developed some very sophisticated statistical tools that enable me to objectively explore various chess topics, and in recent weeks I have devoted considerable time to analyzing thousands of different world championship formats. I would like to share the results of that analysis.
I am not affiliated with any chess organization, and I have no particular agenda to promote. What I do have, instead, is the distinct impression that people are making important decisions about the world championship, without adequate information. One possible explanation is that the decision-makers are simply unaware that it is often possible to use statistics to draw reasonably sound conclusions about some of these topics. Or maybe they don't even care about objective truth, and simply want to promote their own agendas or improve their own situations. I'm going to adopt the role of the optimist here, and assume that many people would love to have an objective analysis of the various options available for the world chess championship, but that nobody ever thought to ask for one. Well, here's your analysis...
Based on my calculations, I can now tell you whether one world championship format is "objectively better" than another one, and I can explain why. If you describe a typical world championship format to me, I can tell you, with reasonably good accuracy, the average percentage chance of the strongest player in the world winning the championship cycle. I call that percentage the "effectiveness" of a world championship format.
For instance, it turns out that the 128-player FIDE World Championship has an "effectiveness" of 38%, which means that 38% of the time, it will be won by the strongest player in the world (assuming no boycotts). In other words, five times out of eight the strongest player will fail to win the tournament. The Einstein Group's world championship cycle (which will debut in July in Dortmund, Germany) has a much better effectiveness of 50%, which still means that the best player will be champion only half of the time. By comparison, a slightly modified version of Yasser Seirawan's "Fresh Start" proposal is extremely effective, at 67%. In fact, none of the 13,000 formats under consideration managed to break the 70% barrier, so Yasser's proposal is almost maximally effective.
Through statistical analysis combined with random simulation, I have analyzed 13,000 different world championship formats in great detail, including Swiss tournaments, knockout tournaments, long matches, short matches, round-robin tournaments of various types, qualifier tournaments, and much more. I have tried to include all of the formats which have been used historically or which are currently under consideration, as well as many experimental formats. Out of those 13,000 formats, the FIDE World Championship format is ranked #12,671, which means that it is in the bottom 5% in effectiveness. Although the Einstein Group format is clearly better, a 50% effectiveness is still not very good: it ranks #10,945 on my list. The modified Seirawan proposal, by comparison, is way up at #345.
After that introduction, you might be chomping at the bit to learn what format is #1 on my list. However, I'm not going to tell you just yet, because "effectiveness" is not the only important factor. Without giving those other factors their due consideration, it doesn't make sense to talk yet about what is "best" or even "objectively best".
THE FOUR IDEAL CHARACTERISTICS OF A WORLD CHAMPIONSHIP
In evaluating various world championship formats, I believe there are four important characteristics to consider. I want to introduce a little bit of terminology here, in an attempt to make all this easier to talk about. An ideal world championship format would be "practical", "effective", "inclusive", and "unbiased". Let me briefly cover what I mean with each of those four words.
(1) "Practical" – The top players must be willing to participate, the sponsors must be willing to sponsor the tournaments and/or matches, and the playing sites must be available. Thus, World Championship formats that include relatively shorter events, or just one event, would be more "practical" than multi-stage formats or formats with very long matches or tournaments. And, of course, World Championship formats with greater prize money will also be more attractive to the players, although there are other important considerations for most players.
(2) "Effective" – The overall purpose of the World Championship is to allow the strongest player (whoever that may be) to demonstrate their superiority by winning the championship. For instance, World Championship formats with inadequate length or inefficient structure will frequently be won by weaker players, whereas more effective formats would provide that strongest player sufficient maneuvering space (even if they lose a game or two) to demonstrate their superiority by winning the championship.
(3) "Inclusive" – It is easy to tell which players have been the most successful in the recent past; just consult the rating list. However, ratings are known to be somewhat inaccurate as measures of players' actual strength, and it is quite conceivable that the strongest player is not actually the highest-rated player. Thus it is typically a good idea to include several players in the World Championship cycle, to give more people an option to demonstrate their ability. However, the tricky part is that many super-inclusive formats, such as the FIDE championships, are extremely ineffective at determining the strongest player. Nevertheless, it is still possible (though challenging) to be both "inclusive" and "effective" simultaneously.
(4) "Unbiased" – Traditionally, specific players in the world championship cycle have been given certain advantages, due to their past accomplishments. For instance, the defending champion might be seeded directly into the final match, or a recent semifinalist might automatically qualify as a Candidate without needing to play in an Interzonal. Other advantages have included "draw odds", and the champion's right to an automatic rematch, and first-round byes for high-rated players (as in the earlier 100-player FIDE knockout tournaments). These "biases" are often perceived as being unfair to everyone else, and should be avoided when possible. However, a "bias" is not inherently bad; it is simply an advantage granted to a particular player. It can be one way to make an event more "effective" without having to make it impractically long.
In all fairness to the FIDE and Einstein Group approaches, they do have their important advantages. The FIDE approach is extremely inclusive and unbiased, and reasonably practical (as long as there is sufficient funding for such an event). The Einstein Group's format is not particularly inclusive, though it has the large practical advantage that it bears some resemblance to the traditional way the championship has been run, and thus its winner might indeed be more accepted by the public, as a legitimate champion, than the FIDE champion often has been.
THE FIDE CHAMPIONSHIPS
The FIDE championship format takes a mere 22 days of play to reduce 128 competitors down to one champion. It is very inclusive, and has no biases in favor of any specific participant. For comparison, I identified 72 other formats that are 22 days or shorter, and also have no biases. Out of these possibilities, the FIDE format (38% effectiveness) is right in the middle, ranked 37th out of 73. Most options are in the 30%-40% range, and only one format managed to finish above 50%. If FIDE were to invite just the eight top-rated players to its knockout tournament, with two rounds of 6-game matches and then a 10-game final (22 playing days), it would have a 52% chance to be won by the strongest player in the world, slightly better than the Einstein Group approach. Another good unbiased and practical option would be to have four simultaneous single-round-robin tournaments with 10 players each (9 playing days), with the four winners advancing to two rounds of knockout matches (4-game semifinal matches and then an 8-game final match). That approach would be significantly more inclusive and only slightly less effective (46% effectiveness) than the eight-player knockout.
When there are no biases introduced (i.e., nobody gets automatically seeded into any later stage, and everyone is treated equally), a knockout event seems to be far better than a Swiss. For instance, the options to take the top two or four finishers from a 13-round Swiss tournament and then play short matches between those top finishers, turn out to be very ineffective, often lower than 20%. However, as you will see in a little while, a format based on a Swiss qualifier can actually be considerably more effective than a comparable format with a knockout qualifier. This discovery greatly surprised me, and I will go into more detail further down, when I discuss the Fresh Start proposal. However, first let's finish talking about the FIDE and Einstein Group approaches.
The major criticism of the FIDE championship, of course, is that the individual matches are too short. A single loss can mean almost certain elimination. Everyone loses a game now and then, so it seems an overly drastic punishment to be eliminated because you happened to have a minus score over the span of two games. The 2002 tournament made a half-hearted attempt to address this by lengthening the final match from 6 games to 8 games. As I mentioned before the tournament, that is hardly much of an improvement (it raised the effectiveness by 0.2%). It would have been better (39% effectiveness) to use those extra two days to make the quarterfinal round 4 games long, instead, although of course even better would be a 4-game quarterfinal AND an 8-game final (41% effectiveness).
Another obvious option would be to change all of the 2-game matches into 4-game matches. Of course, this would have the unfortunate result of adding at least 10 days to the length of the event if it stayed a 128-player tournament. To compensate, the number of players could be reduced from 128 down to 64. Thus with 4-game matches throughout, leading to an 8-game final (32 playing days), the effectiveness would rise to 42%.
Unsurprisingly, the knockout tournament would become more and more effective, as we make it less and less inclusive and lengthen various rounds. If we were to halve the number of players again, a reasonably inclusive knockout tournament (32 players) could still be held, with four-game matches throughout, leaving room for either an 8-game final match (43% effectiveness) or a longer 14-game match (44% effectiveness). With sixteen players, the effectiveness could be improved to 46% by 4-game matches and a 14-game final. Finally, as I already mentioned, the most effective unbiased tournament would be an eight-player knockout tournament with six-game quarterfinal and semifinal matches, with a ten-game final, an overall effectiveness of 52%.
THE EINSTEIN GROUP CHAMPIONSHIPS
Now let us turn to the Einstein Group championship format. This is an amazing attempt to compress an entire Candidates Cycle and World Championship match into a mere 30 days of play. The format has come under severe criticism because the round-robin preliminaries and the subsequent two rounds of four-game matches are perilously short. In its current state, the only significant bias involved is that the defending champion gets to play in the final. So, I considered all of my formats lasting 30 or fewer playing days, with the single bias that the champion is seeded into the final automatically (assuming rapid tiebreaks throughout). There were 208 different formats, and the Einstein Group approach (50% effectiveness) ranked 148th, placing it in the bottom third.
The most effective approach (62% effectiveness), within these constraints, would be to only invite the four top-rated players (other than the champion). They would then play two rounds of six-game knockout matches to get from four players down to one, and the winner would play the defending champion in an 18-game match. Even just a 14-game match would still be a 61% effectiveness, and better than any other approach (given the constraints). If it were necessary to include eight candidates, plus the champion (as is the case in Dortmund), the 30 days would be better spent in three rounds of 4-game knockout matches, followed by an 18-game match against the defending champion (60% effectiveness).
If it were desirable to be even more inclusive (for instance so that a "wildcard" local participant like Christopher Lutz could be chosen, without impacting the odds too significantly), you could have two simultaneous 10-player single-round-robins, where the two winners play each other in a 4-game match, and the winner plays the defending champion in a 16-game match (56% effectiveness). Or you could even go the super-inclusive route, with a 196-player 13-round Swiss like Yasser Seirawan suggests. The two top finishers could play each other in a 4-game match, and the winner challenges the defending champion in an 8-game match. That would only last 25 days, and would still have an effectiveness of 55%. All of these options are significantly more effective than the actual format chosen by the Einstein Group, while still lasting no more than 30 playing days.
Of course, none of those options resemble the format that will actually happen in Dortmund. Are there less significant changes that would still greatly improve the effectiveness? Absolutely. For instance, the pair of 4-game knockout matches is hazardous. Even in a four-game match, it is very difficult to recover from a loss. How about getting rid of one of those matches? Instead of picking the top two players from each preliminary round-robin, you could just pick the top finisher from each round-robin. Then a single four-game match between those two winners, followed by the same 16-game final against the defending champion, would make the event four days shorter, and it would raise the effectiveness from 50% to 56%. It would be even better (60% effectiveness) to make use of the whole 30 days by playing matches of 10 and then 14 games (rather than 4 and then 16).
Finally, there was a way to be even more effective, within the 30-day constraint, although it did involve introducing another bias into the world championship cycle. It always helps the effectiveness of a format if you allow the highest-rated player to automatically bypass the qualifier event. For instance, you could have the highest-rated player compete in a 10-game match against the winner of a 4-player double-round-robin, and the winner would challenge the defending champion in a 14-game match. That would be a 64% effectiveness, and it seems likely that Garry Kasparov would have been more amenable to that option, although of course I have no idea what went on with the negotiations. I should point out that all of these numbers assume that nobody declines an invitation. With neither Kasparov, Viswanathan Anand, nor Ruslan Ponomariov participating, the effectiveness of the Einstein Group approach, in this particular cycle, will of course be way lower than 50%. It will probably be more like 20% or 25%, since it is reasonably likely that the best player in the world is either Kasparov, Anand, or Ponomariov, and there is less than a 50-50 chance that the best player in the world is actually participating in the championship cycle at all.
WHERE THESE NUMBERS COME FROM
I don't expect you to blindly accept all of these numbers. If you're still paying attention by this point, you might be wondering whether I'm just making up the numbers to serve my own purposes, or if I actually calculated them somehow. I don't want to bog you down with all of the gory details, but here is a brief summary of what I did.
I didn't want my conclusions to be skewed by any special characteristics of the current rating list, such as an unusually large gap between #2 and #3, or between #3 and #4. So I decided that my calculations would be based upon a "representative rating list", rather than an actual one. I did some analysis of rating list trends over the past few decades, and came up with a way to randomly simulate millions of "typical" rating lists. Thus sometimes there is a huge gap between #1 and #2, and sometimes it's very crowded at the top, with no clear leader. Sometimes the champion isn't even the top-rated player.
However, it is also important to acknowledge that ratings are inaccurate. They are merely estimates of players' true strengths, and those estimates have errors associated with them (a standard deviation of about 50, if you're interested). Somebody might have a rating of 2700, but their true strength could easily be 2580 or 2780. So, for each random rating list, I had to simulate a "true strength" for each player. The one player with the highest "true strength" is that elusive "strongest player in the world", whom we are trying to identify through the use of an effective world championship format. Thus sometimes the "strongest player in the world" might not be the world champion or the top-rated player; they might even be rated #8 or #10 or #20 in the world, though it's unlikely. That is why it is important to be inclusive with your world championship cycle; if you just use the top two or three players, you might easily leave out the strongest player.
Armed with the ratings and true strengths of everyone on a simulated rating list, I could then proceed to simulate a world championship cycle. I tried various types of qualifier formats, different numbers of simultaneous qualifying tournaments, allowing the top-rated one or two players to bypass the qualifier, different ways of resolving tied matches, and/or allowing the champion to enter the cycle at various stages. The breakthrough was my realization that all popular world championship formats could in fact be expressed as an "Interzonal" qualifier followed by a series of knockout matches. This allowed me to tackle the problem systematically, rather than just trying a few options which I thought might me "ideal". For instance, the FIDE championships were treated as eight different qualifier tournaments (each of which were 16-player knockout events won by a single player) and then a series of knockout matches among the final eight players. The Einstein Group championships were treated as two simultaneous qualifier tournaments (each of which were 4-player double-round-robin tournaments that qualified two players), and then there were three rounds of knockout matches, with the champion entering the cycle in the third and final round of knockouts. And so on. For each simulated championship cycle, I could see whether the "strongest player" actually won, and over an average of many thousands of iterations for each format, that would tell me the "effectiveness" of each world championship format.
YASSER SEIRAWAN'S "FRESH START" PROPOSAL
I have to admit that I expected my analysis to reveal a searing criticism of Yasser Seirawan's "Fresh Start" proposal, with its Swiss qualifier. Swiss tournaments are generally perceived to be very ineffective, especially compared to knockout tournaments of comparable size. I expected that I would have to conclude that "it's all well and nice to play three rounds of long matches at the end of your world championship cycle, but what good is that when the majority of Candidates were chosen in a lottery?"
I was even advised to save myself the effort of trying to program Swiss tournaments in my simulations, since they were obviously so ineffective. A very prominent arbiter told me, "You do not need that for your simulation. It is perfectly obvious, if you want to obtain a winner who has the highest rating prior to the event, then the current FIDE knockout system is best." However, I really wanted to compare the FIDE and Einstein approaches against Yasser's proposal (which is based upon a Swiss qualifier), so I ultimately decided to include the Swiss qualifiers in my analysis.
Well, guess what? Out of the 13,000 world championship formats I evaluated, number TWO on the list, with an effectiveness of 69.4%, was the following structure: The world champion and the two highest-rated players (other than the world champion) bypass the qualifier and automatically become Candidates. They are joined by the top five finishers from a 196-player 13-round Swiss. Those eight players then play three rounds of knockout matches (16-game quarterfinal, 20-game semifinal, and 20-game final).
Does that sound familiar? It's almost exactly what Yasser Seirawan suggests for the next world championship cycle. He actually suggests a 10-game quarterfinal, a 14-game semifinal, and a 20-game final, and that shorter format (67% effectiveness) shows up at #181 on my list (still in the top 2% of formats). And there are details in his proposal about tiebreaks that were not included in my overall analysis (though I do cover them further down); I assumed rapid tiebreaks everywhere for the eight-player candidate cycles, since otherwise the calculations would have taken months to run all the possibilities! And Yasser doesn't actually say that it should be the two highest-rated players who bypass the qualifier; he specifically names Garry Kasparov and Ruslan Ponomariov as the two players.
The number one format on my list, with an effectiveness of 69.5%, was actually very similar to number two. In this scenario, only the top finisher from that same Swiss tournament qualifies, to play the #1-rated player in a 20-game match. The winner then plays the defending world champion in a 20-game match for the title. That is the single most effective world championship that I could find, but unfortunately it includes two biases: the world champion gets automatically seeded into the final round, AND the top-rated player doesn't have to play in the Swiss. Yasser's proposal would be somewhat less biased, as it is less of an advantage to be an "automatic Candidate" when there are eight Candidates rather than two, and of course in his proposal the defending world champion does not get automatically seeded into the final match.
Since we're on the topic, I should point out that the #3 format on my list has actually been tried, sort of, in the world championship. In 1959 Mikhail Tal won an eight-player quadruple-round-robin tournament in Yugoslavia, allowing him to play a 24-game match against the defending champion. In 1962 Tigran Petrosian won an identical format in Curacao. And that same format is #3 on my list, with an effectiveness of 69.3%, although it says that the winner of the round-robin should face the top-rated player rather than the defending champion. Thus if the defending champion was not the top-rated player, the champion would have to play in (and win) the round-robin tournament for the opportunity to play a championship match against the top-rated player. Also, it's not strictly like the 1959 and 1962 Candidates tournaments, because back then the eight players came from Interzonals, whereas this format recommends just taking the players from the top of the rating list. Presumably the bias in favor of the top-rated player is too much to make this format acceptable, although it is clearly very effective.
Of course, there is no real difference between 68.3% and 68.5%. The point is not so much that nine of the top twelve formats happened to have Swiss qualifiers. The real dazzler is that a Swiss qualifier can with any seriousness be called "optimal". Conventional wisdom tells us that knockout tournaments are more effective than Swiss tournaments of comparable length. It says that knockout tournaments work better, because the strongest players are in control of their own destiny, and nobody can finish ahead of you unless you are actually knocked out by someone. By contrast, in a Swiss you might do well but someone else might happen to do even better.
Why is conventional wisdom wrong? Well, I have two possible explanations. One has to do with information theory. In a multi-stage event such as a knockout tournament, it only matters if you make it to the next stage, whether that be from a 2-0 whitewash or a 3-3 standoff where somebody advances from a sudden-death game. After each round, the slate is wiped clean and all remaining players start with the same score. Obviously, that means discarding a considerable amount of information about how players have been performing. When the whole point is to identify the strongest player, it seems unwise to discard so much information. By contrast, in a Swiss tournament, your total score reflects the whole of your performance in the event. Of course, this "additional information" has to be balanced against the fact that players face different levels of opposition in a Swiss tournament, so a score of +2 might sometimes be more impressive than a score of +4. But there are obviously ways to address that by optimizing the pairings and/or scoring method, though that lies outside the scope of my analysis... for now.
To understand my other explanation, consider an alteration to Yasser's proposal. Rather than a large Swiss which generates five Candidates, you could instead have five different simultaneous 16-player knockout tournaments (2-game matches throughout), where the winner of each knockout tournament becomes a Candidate. That approach would be good (62%) but not as good as the Swiss approach (67%). With the knockout approach, you are basically splitting your field into five subgroups, and deciding to take the single top-performing player from each subgroup. If the strongest player in the world happens to be playing in the same subgroup as another player who is almost as strong, then it becomes reasonably likely (in the knockout approach) that the strongest player would lose a two-game match to the slightly weaker player. You can't qualify both players and resolve their differences later in a long match, since you are required to take exactly one player from each subgroup (i.e., the one who wins each knockout tournament). The numbers (62% vs. 67%) suggest that it would work much better to have all of the players intermingled in one big tournament, so the five strongest performances can advance, independent of who would have been in which subgroup.
However, the Swiss tournament is not some magical solution that should be used anywhere; it is very easy to use it poorly. The Swiss only works well if the highest-rated players bypass it and automatically become Candidates. Thus the Swiss is best viewed as a super-inclusive way to sort through the rabble and find the rare player who is extremely under-rated (literally) and actually very strong. If we already know that a player is very strong (the defending champion, or one of the two top-rated players in the world), it is far better to allow them to bypass a Swiss where they might potentially lose a couple of games and fail to qualify. For instance, if you had everyone (including the defending champion) play in the Swiss, and picked the top eight finishers as your candidates, then the effectiveness would only be 17%. If you automatically qualified the defending champion, but the other seven qualifiers had to come from the Swiss, the effectiveness would only be 53%, barely better than the Dortmund style. The most important thing is to include at least the highest-rated player automatically, along with the defending champion. If the two automatic qualifiers are the defending champion and the (remaining) highest-rated player, the effectiveness jumps up to 64%. And as we've seen already, if the second-rated player is also allowed to bypass the qualifier, the effectiveness is a nearly-ideal 67%.
Another interesting question is whether the qualifier tournament becomes more effective if you make it more inclusive. We have seen earlier, in the discussion about the FIDE format, that a knockout loses effectiveness significantly when you double the number of players. In the case of a Swiss, however, the inclusion of extra players actually helps, rather than hurts, the effectiveness. For instance, if you modify the Seirawan proposal to only include 64 players, the effectiveness is 61%, but doubling the field of players, for a total of 128, raises the effectiveness to 65%, and tripling the field (to Yasser's suggested 196-player level) leads to the best effectiveness, the 67% already mentioned. Presumably this is because the weaker players don't get in the way as much in a Swiss, after the first round or two.
In a 128-player knockout, you have a large number of players who clearly are not the strongest players in the tournament, but who can have a huge impact on the outcome through the chance elimination of a top seed. We almost saw the extreme example of that in Moscow, where a single loss to the bottom seed just about resulted in the first-round elimination of #1 seed Viswanathan Anand. On the other hand, by having such an inclusive field in the large Swiss, you give yourself the possibility of identifying an extremely underrated player who actually deserves to play in the Candidate section.
If you're trying to get a feel for what level of player would typically finish in the top five in the 196-player Swiss tournament, I can tell you that an average set of five qualifiers would have ratings ranging from 2600 to 2780. A very strong set of five qualifiers (which would happen one time out of every ten) might be something like: Michael Adams, Alexei Shirov, Peter Leko, Alexander Morozevich, and Judit Polgar. A much weaker set of five qualifiers (which also whould also happen one time out of every ten) would be like: Viswanathan Anand, Zoltan Almasi, Konstantin Sakaev, Giorgi Giorgadze, and Xie Jun. On average, out of the five top Swiss finishers, there would be two or three players rated above 2700, and two or three players rated below 2700. Once every 25 or 30 tournaments, all five qualifiers would be rated below 2700, and once every 40 or 45 tournaments, all five qualifiers would be rated above 2700. About 45% of the time, at least one qualifier would be a sub-2600 player.
One controversial issue is whether rapid games are a good way to break ties. This only matters, of course, if a tie actually occurs, so it is a more significant factor when there are short events (such as the FIDE championships or the Dortmund qualifier), and it wouldn't matter as much in the Seirawan proposal (though of course it still could happen). There is a general perception that rapid and blitz games are more "random" than classical games. This is undoubtedly true, since time trouble always introduces an element of randomness into the outcome of a game. However, I recently analyzed the results of several thousand games played at various time controls over the past few years, and (statistically speaking) this issue doesn't seem to be a particularly significant one. The higher-rated player still manages about the expected percentage score, whether the game is played at classical, rapid, or blitz controls. Here is a picture to illustrate what I am talking about.
In this graph, we see the well-known trend that as the white player's rating advantage gets bigger and bigger, White tends to score a higher and higher percentage. If the two players have the same rating, then White scores 55%. If White has a rating advantage of 200 points, then White would score almost 70%. The blue line represents this relationship at classical time controls.
Now look at the red line, which represents rapid games. If rapid time controls really did make the game a lot more random, then the higher-rated player would tend to score closer to 50% than predicted, with either color. That means we would see the red line being flatter, more horizontal, than the blue line. This is true to a certain degree, especially on the right side of the graph, in those scenarios where White has a large rating advantage. This means that rapid games do indeed turn out more randomly when White is the big favorite; White is not able to score as high a percentage as the ratings would suggest. For instance, with a +300 rating point advantage, White would score 75% in classical games but only 72% in rapid games. However, when Black is the favorite by more than 100 rating points (the left side of the graph), the rapid results are exactly the same as classical. Thus, when outrated by 300 points, White scores an identical 33% whether it be classical or rapid. So, the conclusion to be drawn is that the advantage of the white pieces is not as large in rapid games as in classical games, especially when White is the higher-rated player. But the higher-rated player should do just about as well in rapid as in classical. Perhaps the real "randomness" comes from the fact that rapid matches are typically only two games long, rather than four or six.
The blitz data (the white line on my graph) is a little more suspect, because there are fewer results available to analyze. However, there is no compelling evidence that blitz games are "more random" than rapid or even classical games; the white line is not any more horizontal than the blue line. You can see a distinctive bend in the middle of the white line, suggesting that the advantage of the white pieces is magnified when the two players are of similar strength. For instance, when the two players have the same rating, White scores 58% in blitz but only 55% in classical. As I just mentioned, the advantage of the white pieces is not as large in rapid chess as it is in classical chess, so in rapid games, when the players have identical ratings, White only manages to score 53%. But again, I see no real evidence that the faster time controls are diminishing or obscuring the rating difference between the two players in blitz. Thus it seems that rapid games, or even blitz games if need be, are a reasonably effective way to resolve ties.
Now, it is certainly true that we see a lot more decisive results in the faster time controls, particularly in blitz. What do I mean by "a lot more"? Well, switching the time controls from classical to rapid, has about the same effect (on the frequency of draws) as changing one of the players from Peter Leko to either Veselin Topalov or Alexei Shirov, or changing the opening from a 1.d4 game to a Sicilian Dragon. Further, switching the time controls from classical to blitz, has about the same effect (on the frequency of draws) as changing a Peter Leko-Anatoly Karpov matchup into an Alexander Morozevich-Alexei Fedorov matchup, or changing a Petroff's Defense into a King's Gambit. This will indeed make the results slightly more random, which (as I said) could be addressed by making the rapid tiebreaks longer. I hate to sound like a broken record, but I should again point out that this exact approach (using 4-game matches if a rapid tiebreak becomes necessary) was already suggested by Yasser Seirawan in his "Fresh Start" proposal.
For instance, let's take a very simple unbiased case, where two simultaneous 10-player single-round-robin tournaments are held, and the winners play each other in a title match. First let's consider the case where the final match is only six games long. If a drawn match is to be resolved by the spin of a roulette wheel, the effectiveness of this format is 37.3%. Obviously, it would be better to actually play games to resolve the tie, since the stronger player would have a better-than-even chance to win the tiebreak. So if we use the rapid-blitz progression like in the FIDE championships, the effectiveness goes up to 39.2%. Since blitz games are more random, if we simply played a long set of 2-game rapid matches, it would be slightly better (39.3%). Finally, Yasser's suggestion of a rapid match which would be four games long (rather than two), is the most effective tiebreak method (39.5% effectiveness).
You can see from those numbers that the tiebreak method doesn't matter too much, even for a mere six-game match; the effectiveness ranged from 37.3% (random) to 39.5% (4-game rapid match). Of course, as the match length is increased, the tiebreak method becomes less and less of a factor; for a 16-game match, the random option has an effectiveness of 41.1% and the other options are all 41.6% or 41.7%. And for a 24-game match, the random tiebreak has an effectiveness of 42.1% and all other options are tied at 42.3%. A drawn match is just too unlikely.
However, sometimes this issue does not even arise. Specifically, if one of the players has been granted "draw odds" in a particular match, that player is automatically declared the winner in the event of a drawn match. Usually, the defending champion is granted draw odds in their match, and this is obviously a key part of Yasser's proposal, since it acknowledges two champions, and there are also the curious provisions about "inheriting" draw odds if you overcome them in your quarterfinal match. Generally, draw odds are not a good way to resolve ties. They are better than a roulette wheel (since on average the defending champion will be stronger than the challenger), but slightly less effective than any other tiebreak method. The main benefit of draw odds is that they provide an incentive for a defending champion to actually participate in a world championship cycle, since the draw odds are a bias that favors the defending champion.
Everything that I have said to this point applies to chess world championships in general. The conclusions would have been identical a decade ago, or fifteen years in the future, even with a completely different set of top players. However, at this point I must leave off my attempts to be "generic", because there is one final issue I want to cover, which must be handled "specifically". I want to discuss the topic of who would be favored by the various biases in the "Fresh Start" proposal, and in order to do that we must start talking about "Vladimir Kramnik" and "Garry Kasparov" and "Ruslan Ponomariov", rather than just "the defending champion" or "the highest-rated player".
WHO IS FAVORED BY THE FRESH START PROPOSAL?
The "Fresh Start" proposal has an interesting set of biases. Kramnik, Kasparov, and Ponomariov are all "rewarded" by being allowed to bypass the qualifier, but each in turn is "punished" by the fact that the other two players are also bypassing the qualifier. Ponomariov would presumably be happy to avoid the qualifier, but sad that Kasparov and Kramnik (probably his two strongest potential opponents) were guaranteed to qualify. Further, as champions of their respective organizations, Kramnik and Ponomariov are additionally granted another bias: draw odds in their quarterfinal and semifinal matches. Finally, Kasparov is "punished" by the fact that he will have to overcome draw odds in his semifinal match, whoever the opponent. So clearly Kramnik and Ponomariov would benefit from the match structure, and Kasparov would probably not benefit, but how big of a deal is this? What are the magnitudes of each player's advantages and disadvantages? This is an extremely important question, perhaps THE most important question about the relative merits of Yasser's proposal.
First of all, let's once again draw an important distinction between the meaning of "highest-rated player" and "strongest player". Ratings are inexact, and so the player with the highest rating might not actually be the strongest player. There is no way to exactly measure who the strongest player is; all we can do is talk about the "likelihood" that each player really is the strongest in the world. The rating list tells us (with great accuracy) who has been most successful recently, and gives us some idea of who will do best in the near future, but we should always remember that no rating difference is ever 100% conclusive; you have to deal with probabilities rather than absolutes.
By the way, I want to applaud the decision of the Einstein Group to use an average of the FIDE and Professional ratings for the invitations and seedings in their Dortmund qualifier. I had already mentioned a year ago that a simple average of the two ratings did an excellent job of masking the limitations of each individual one, so I think it was a great decision. To keep things consistent, I have done the same thing in the following analysis (using the April 1st 2002 rating lists), although I had to add 50 points to each Professional rating to make the numbers similar to the FIDE ratings. With these ratings, we can apply some simple statistics and calculate each player's likelihood of being the strongest in the world.
Unsurprisingly, it's probably either Garry Kasparov or Vladimir Kramnik. Kasparov (average FIDE/Prof rating 2842) has a 49% chance of being the strongest player, whereas Kramnik (2827) has a 34% chance. Veselin Topalov (2758), Ruslan Ponomariov (2751), and Viswanathan Anand (2751) each have about a 3% chance, and the rest of the world (2740 and below) has a combined 8% chance. In a perfect world championship format, whenever Kasparov was indeed the strongest player, he would win the championship. And likewise for Kramnik. Thus, in a perfect format, Kasparov would have a 49% chance overall to win the championship, and Kramnik would have a 34% chance, and so on.
However, the "perfect world championship" is only a myth. We've already seen (above) that no known world championship format is even 70% effective, so even in the best case, a third of the time the championship will be won by somebody who is not the strongest player. We have to keep the matches down to a reasonable and practical length, and sometimes that just isn't long enough for the strongest player to demonstrate their superiority over another very strong player.
I have spent several hours analyzing the statistical effect of draw odds, and I can state very confidently that the actual selection of Candidates is far more important than the question of who gets draw odds in a 10-game (or longer) match. For instance, even if there were no draw odds, Kasparov and Kramnik would still be "punished" by the fact that they have to play fairly short matches against players who are certainly weaker, but nevertheless have some chance to eliminate them. For instance, I just told you that we can be 83% sure that either Kasparov or Kramnik is the strongest player in the world, but even after they bypassed the qualifier, there would still be more than a 25% chance that someone else would actually win the championship.
Ruslan Ponomariov is clearly the beneficiary of the most significant biases in the "Fresh Start" proposal. Although his combined rating of 2751 puts Ponomariov in a virtual tie for fourth in the world with Viswanathan Anand, he still has less than a 3% chance of actually being the strongest player in the world. Nevertheless, Ponomariov would have a 10.4% chance to actually win the championship. It turns out that if Ponomariov's rating were actually 2783 (rather than 2751), then the numbers would claim that Ponomariov did in fact have a 10.4% chance of being the strongest player. Thus we can say that the specific Fresh Start proposal "awards" Ponomariov 32 rating points, in effect.
This is a very large bias in favor of Ponomariov. To try and put that bias in more concrete terms, let's envision a fantasy scenario where Kasparov and Kramnik are the only two players who bypass the qualifier, so Ponomariov has to finish in the top six in the Swiss qualifier like anyone else. However, in this fantasy, Ponomariov gets a special advantage (in the Swiss and in the final rounds of matches) that he receives the white pieces every five games out of six, instead of every one game out of two. According to my calculations, that fantasy scenario gives Ponomariov about the same advantage that the actual Fresh Start proposal gives him. Is that an unfair advantage? Or is it commensurate with his position as FIDE World Champion? That is for someone else to decide, I suppose.
It would be tempting to say that +32 rating points is way too many to "award" Ponomariov, and that he should be granted an automatic place but not given draw odds. Well, that doesn't really help very much, because the lion's share of his advantage lies in his automatic Candidate status. Here is how the various biases are measured by my technique:
(1) Being an automatic qualifier for the three rounds of matches (10/14/20 games): Kasparov -14 rating points, Kramnik -7 rating points, and Ponomariov +22 rating points.
(2) Draw odds given to Kramnik and Ponomariov in the quarterfinal: Kramnik +4 rating points, Ponomariov +6 rating points.
(3) Draw odds given to Kramnik and Ponomariov in the semifinal: Kramnik +4 rating points, Ponomariov +4 rating points.
(4) Any player who eliminates Kramnik or Ponomariov in the quarterfinal, inherits draw odds for the semifinal: Kasparov -2 rating points.
Interestingly enough, this collection of small advantages for Kramnik, and small disadvantages for Kasparov, are sufficient to make Kramnik the statistical favorite if the Fresh Start proposal were to actually happen. Kramnik would have a 38% chance to win the championship, Kasparov would have a 36% chance to win the championship, and (as I've already said) Ponomariov would have just over a 10% chance to win the championship. Nevertheless, that is only because Kasparov and Kramnik are already so close together. In the bigger picture, this draw odds issue does not seem to merit the attention it gets. A +4 rating point advantage, across the entire world championship cycle, is less important statistically than the total advantage you would get from your opponent blundering a pawn in one single game, sometime during the cycle. Probably this is more of a prestige issue than anything else, or perhaps there is a huge psychological issue I am ignoring with my statistics (like the feeling that you are battling uphill from the start, if the other person has draw odds).
As I said way back at the beginning, I have no particular agenda to promote. However, I have had to re-examine many of my assumptions about chess, as a result of this analysis, and I hope that will happen for you as well. Among other things, I now have a much greater respect for Swiss tournaments than before, along with a greater respect for Yasser Seirawan's judgment and intuition about what makes a good tournament format! Perhaps some deeply-held beliefs about the "randomness" of rapid chess will also be challenged as a result of my analysis, but possibly that is too much to expect. Likewise for the "draw odds" debate, I suppose...
This essay is the culmination of many, many late-night hours of effort. However, I hope that it will prove to be a beginning, rather than an end. There are many problems with the current state of the chess world, and statistics will never be the only answer to any of them. Statistics are merely a tool, a source of information, to assist people in finding a better answer to some of their problems. There has been so much debate, and yet so little objective exploration of the facts, and so I hope that this will be the beginning of a new effort, a new kind of debate. I invite you to send me e-mail at email@example.com, and if there is enough interest perhaps I will publish a follow-up analysis which incorporates feedback from all of you.
I would like to conclude with a quote from baseball analyst Bill James: "It has always been my experience that if you can present a good argument and back up what you are saying, there are people who will be persuaded. It is sometimes possible to change the tenor of the debate by injecting information into the discussion." I hope, very much, that he is correct.
Thank you for taking the time to read this.