Elo oddities: the tortoise and the hare

by ChessBase
2/24/2012 – There has been ample debate on whether Magnus Carlsen would break Garry Kasparov's record 2851 rating. Though it might seem like quite a mountain to climb, an oddity in the FIDE rating system means he could do it without breaking a sweat. The good news is that if you are rated at least 1400 Elo, FIDE declares you have an 8% chance against the world number one. Expert discussion.

ChessBase 17 - Mega package - Edition 2024 ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.


Elo oddities: the tortoise and the hare

Chinese GM Li Chao recently rejoined the 2700 club after winning 5.6 Elo from his 9.0/9 victory at the 5th Colombo International Chess Festival 2012. While nine in nine sounded impressive, the results page showed something quite odd: the Chinese player had a 2622 performance, considerably below his 2693 rating. Six of the eight rated opponents had ratings under 2000 and one was even rated 1405. In spite of this he had still managed to gain 5.6 Elo. How was that possible?

The reason is that today games are rated individually, and any player rated 400 Elo or more below your rating will still yield 0.8 Elo. At a time when the minimum FIDE rating was 2200 the greatest disparity possible was around 600 Elo, but now games with a full 1400 Elo difference could take place. Could this be the source of ratings inflation? Perhaps Magnus Carlsen could find a much easier way to break Garry Kasparov’s record: win a few very weak opens, and presto! He would be the new record holder, even if he did not play a single opponent within a thousand Elo of his rating.

In order to get to the truth of the matter, experts and mathematicians Jeff Sonas, John Nunn, and Ken Thompson were consulted and held a dialogue between themselves which we publish here.

Jeff Sonas, Statistician and creator of Chessmetrics, was the first to reply.

My understanding is that this rule was introduced for political reasons long ago, purportedly to give top players some incentive to play in open events.  When FIDE proposed raising the cutoff from 350 points to 400 points a couple of years ago, and asked for my opinion, I agreed that it was a change in the right direction, and stated that it would be even better to go up to 500 instead. I have since changed my opinion on this, and decided we either should get rid of the rule altogether, or make it an "800-point rule" instead. You can see my most recent analysis from a year ago I discuss the 400-point rule amongst the two graphs that have large blue rectangle outlines, about halfway down.

There is a statistical rationale for the rule to some degree, since the chaotic nature of chess (and the variability of ratings) means you can never really be 100% sure that one player will beat another, and rarely can you even be 99% sure. It is certainly not a "guarantee" to score 100% in a whole event. For one thing, it is quite possible that opponents paired against the tournament leader in a late round of a Swiss would have done better than expected and in fact would be stronger than their pre-tournament rating would indicate.  I think an 8.5/9 score for Li Chao in such an event would be a reasonable prediction, and would not have resulted in a single point of Elo gain for him.

But yes, if someone strong were able to organize lots of events like this, and the rating administrators looked the other way, a strong player would likely be able to boost their own rating a bit. I certainly don't think it has much to do with the increased ratings of top players, though, and the artificially high rating would sort itself out eventually because it would also artificially increase the player's predicted score and they would therefore struggle to maintain the artificially high rating.  In this way the Elo system is supposedly “self-correcting”, although I’m not sure this really works very rapidly when opponents’ ratings are at all inaccurate.

Ken Thompson, computer chess pioneer

I don’t think it makes a difference whether you pick a large number (say 400 or 800) and employ this rule. I don’t think that human-human interactions will accurately follow the normal curve beyond this point. Even if it is true that this difference can be milked for rating, there are several practical issues working against it. The player has to spend lots of silly time pounding on weak players to gain points. He has to continually keep vigilance to accomplish this. One mistake and he will lose many times what he has slowly gained. Once the player has amassed his 3800 Elo, what is he to do with it? Compete and give it to strong players? Sit on it and brag about being on the top of the rating lists? I don’t think this is a cause of inflation since the weaker player loses as many points as the stronger player gains, i.e. zero sum, no inflation.

John Nunn, Grandmaster, mathematician

This rule was introduced for a simple reason: to prevent players losing Elo points by winning a game. If one player in a tournament is rated far below the others, then including that player can lower the average rating of your opponents by so much that your expected score is increased by more than one point. Then beating that player will leave you worse off than if you had not played him at all. This is not just an academic situation. For very strong players it could easily happen in cases where, for example, a strong tournament includes a 'local player' who is much weaker than the others, or in Olympiads where your first round opponent was relatively weak.

In recent years there have been a few cases in which relatively well-known GMs have apparently attempted to exploit this rule by playing in very weak tournaments. However, in these cases the player concerned only appears to have gained a few points (less than five). Of course, by doing the same thing over and over again a player could artificially boost his rating, but it would be very time-consuming.

Jeff Sonas (to John Nunn)

It seems to me that the scenario you describe (wanting to avoid a victory over a low-rated player causing you to lose rating points) motivated the rule to calculate rating change on a game-by-game basis instead of using average opponent rating for an event. It also motivates the related tiebreak rule that might calculate performance rating after first removing the lowest-rated opponent, where we don't want to penalize someone's performance rating due to a win against a very weak player.

However, I don't see how capping the expected score at 89% would prevent players losing Elo points due to a win.  As long as you are calculating rating change on a game-by-game basis, even with an expected score of 99% or 100% you would still never lose rating points from a win.

John Nunn (to Jeff Sonas)

The original rating rules had no 350-point rule. It must be remembered that in 1970 the rating list was very small and it was never envisaged that it would be extended substantially. Only top tournaments were rated (and these were invariably all-play-all events) and all calculations were done by hand. The 'Losing Rating Points For Winning A Game' problem didn't really arise. The top players always skipped the first round or two of an Olympiad, and in any case any weak opponents would be unrated, so the games simply didn't count.

But times changed, the rating list expanded and started including more and more weaker players. Open tournaments appeared, and the LRPFWAG problem started to become significant. I don't remember exactly when the 350-point rule was introduced to deal with this, but I believe it was around 1980. Later, in 1988 or 1989, Karpov was unhappy about losing points when he had won a tournament, arguing that the aim of taking part in a tournament was to win it, and if you have achieved that then you shouldn't lose points through not having won by a large margin. For a time the 'Karpov rule' was in effect, that you couldn't lose points for winning an event. This was later scrapped.

Rating games individually is a fairly recent innovation, and for the vast bulk of the lifetime of the Elo system it was done on a tournament basis (because it made the calculations easier, especially for all-play-all events). The current rule is in a way a hangover from the earlier times, but I think there is some logic in it, although one can argue over the exact details. Rapidly improving young players, for example, can be massively underrated.

What to make of all of this? To start with, one must reiterate Ken Thompson’s point that it is ultimately a zero-sum game. In other words, even if a 2700 player can win Elo by beating rank beginners, those beginners will also lose the same corresponding Elo. So, does this mean all is okay? Sort of.

As explained by Jeff Sonas if the purpose is to prevent a player from losing Elo because he won a game, then rating each game individually solves this even without a ratings cap. The artificial 400 Elo cap just means that the FIDE rating system will always consider that any player rated at least 1400 never has worse than an 8% chance against another player whether he be Magnus Carlsen, Garry Kasparov, or Bobby Fischer. We beg to disagree.

Still, does that mean there is no inflation? Perhaps, perhaps not. Jeff Sonas explained:

In my opinion the main source of inflation is that newly rated players are receiving ratings that on average are too high, which is basically injecting rating points into the pool. This is possibly because rated players take it easy on unrated players in their games, and the unrated players therefore are showing an artificially high performance rating in those games. I have demonstrated through simulation that over a few years, these excess rating points (despite appearing only in new players) will gradually propagate through the entire rating pool, eventually reaching the very top.  I believe this to be one possible explanation for the effect seen in the following graph:

(taken from Rating inflation – its causes and possible cures)

If excess rating points were appearing at the bottom of the pool, and slowly distributing around, you would eventually see the rating of the #100 player (whoever that may be) go up, then the #50 player, and finally (maybe even years later) the #10 or #5 player.  As I said, I have demonstrated in FIDE meetings that a more aggressive (or more conservative) formula for new player ratings will ultimately raise (or lower) everyone’s rating, depending on players’ connectivity within the rating pool.  So a zero-sum operation will eventually settle down to become non-inflationary, I believe.  Within reason.  I certainly don’t think the “350 point rule” or “400 point rule” explains all of the 2800+ ratings and 2700+ ratings we are seeing; I think it is far more likely that it comes from new players’ initial ratings.

Copyright ChessBase

Reports about chess: tournaments, championships, portraits, interviews, World Championships, product launches and more.


Rules for reader comments


Not registered yet? Register