Rating and K-factor: wrapping up the debate

by ChessBase
5/11/2009 – The discussions regarding the K-factor – the rate at which ratings go up or down when they are calculated – reaches its climax with a wrap-up article by Dr John Nunn, grandmaster and mathematician, who evaluates the arguments that have been presented by the different parties. After this it is up to FIDE, which has already initiated positive steps settle the matter. Final installment.

ChessBase 17 - Mega package - Edition 2024 ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.

More...

Wrapping up the K-factor debate

GM Dr John Nunn, England

My short piece on the increase of the K-factor in the Elo system has provoked a number of responses, which I have read carefully. Some of the correspondents made interesting and worthwhile points, and I would like to reply, giving more detail than I did in my original comment, which was intended to be short and to the point.

I did read the 2002 ChessBase article by Jeff Sonas. In this article Sonas analyses the results of his rating system and provides evidence that it better predicts future results than the Elo system. It may be true that his system is better in this respect than the Elo system; it’s a complex question. However, it’s important to realise that the Sonas system is fundamentally different to the Elo system – for example, it is based on a linear function rather than a normal probability distribution. What FIDE are going to do is not to adopt the Sonas system in its entirety, but to simply increase the K-factor in their current system. Therefore the possible merits of the Sonas system are not relevant to the current discussion. I stick to my original point, that there’s no proof that increasing K to 20 will improve the current system.

Several correspondents pointed out that the frequency of rating lists does affect your rating, and therefore my original comment to the contrary was inaccurate. For example, suppose X is a young player with a published rating of 2400. X has improved rapidly over the past few months and is now 2500 strength. For the sake of simplicity assume that X achieves a performance rating of 2500 in every tournament he now plays. Then each tournament results in a rating increase. These rating increases are, as it were, ‘in the bank’, but have no effect on X’s rating until the next list is published. If this publication is far in the future (at one time Elo lists only appeared once a year, for example) X’s rating will make a large jump when the next list is published, and may even be above 2500; in other words, his published rating may be higher than all his performance ratings. If the rating lists are published more frequently, say monthly, then X’s rating will increase to say 2415 after one month. Because his rating is now higher, the next time he plays at 2500 his rating increase will be less and his rating will eventually converge on 2500.

I will call this phenomenon ‘rating lag’ because it results from the lag between the playing of a game and its effect on published Elo ratings. The longer the interval between rating lists, the greater the effect of rating lag. When lists were published only once a year its influence was obvious. Rating lag is undesirable because it distorts ratings. Rapidly improving players get higher ratings than they would if ratings were calculated on a continuous basis, while rapidly declining players end up with lower ratings. As noted above, in extreme cases it can lead to serious anomalies such as a player having a rating higher than all his performance ratings. However, these extreme cases were unlikely even with annual lists, and with the current three-monthly lists they are virtually impossible. Rating lag under the current three-monthly system affects only a very small number of ratings and then only slightly. Of course, it’s possible to construct artificial examples (as some correspondents did) but with real-world data the effects of rating lag are almost always very small (I will return to this later).

It is worth noting that rating lag is not an inherent part of the Elo system; it results from the way it is administered. Decades ago, tournament results had to be sent it by post and the calculations were all done by hand, so it wasn’t feasible to update the list more often than once a year. Now, with results sent in by Internet and all calculations done by computer, it is both possible and desirable to issue the lists far more often, and that is what FIDE has done. The switch to one-monthly lists will have the desirable effect of further reduce rating lag, as well as giving players more up-to-date information.

Rating lag is a phenomenon which is basically a function of the frequency of rating lists and is not directly connected to the K-factor, although a higher K-factor makes it more noticeable. GM Bartlomiej Macieja’s argument is that he wants to increase the K-factor so as to have the same amount of rating lag with one-monthly lists as with the previous three-monthly lists. This is missing the point. Rating lag is a defect which should be eliminated so far as possible, not magnified. Also, his argument about two players with ratings of 2500 and 2600 who then have the same results is flawed. With a very low K-factor (say K=1) the gap between the players will remain large for a long time. But with a very high K-factor (say K=100) it is very likely that the 2600 player will end up with a lower published rating than the 2500 player, which is even more anomalous. Depending on the details of the results, there will be a K-factor which results in the two ratings being the same after a certain period of time, but this so-called ‘optimum’ K-factor depends entirely on the precise figures chosen, and by repeating the argument with different initial conditions and results you can end up with an ‘optimum’ K-factor of any value you like.

This is because rating lag and K-factor are not directly related. A higher K-factor will magnify rating lag, since it magnifies all rating changes, but of the two changes (increasing the K-factor and having more frequent lists) increasing the K-factor has a much more profound effect. The reason is that as lists are published more and more often, the rating will converge to what one would get with a theoretical ‘continuous’ rating system, i.e., one which is updated every time a game is played. This convergence implies that increasing the frequency of lists has a smaller and smaller effect the more lists there are. Increasing the K-factor, however, has a linear effect; twice the K-factor, twice the rating change; three times the K-factor, three times the rating change.

An example is probably in order. I chose a player at random from the FIDE rating (actually I looked through it alphabetically until I found a player with a 2500+ rating who had played fairly actively during 2008). The person I chose was Farid Abbasov. I then listed all the games he played in 2008 and, starting at 1 January 2008, I calculated how his rating would have changed by 1 January 2009 with different values of K and different frequencies of rating lists. It is worth noting that this player should show more of a ‘rating lag’ effect than most, as he played as many as 80 games in 2008 and also showed a significant improvement in this period; these are precisely the circumstances in which rating lag would be expected to be most significant.

Here’s the result: Rating at 1st January 2009

K=10

K=20

K=28

12-monthly lists

2576

2646

2701

6-monthly lists

2559

2588

2598

3-monthly lists

2557

2581

2585

monthly lists

2554

2572

2577

The table shows some effects already mentioned; the more often the rating list is updated, the less Abbasov’s rating increases, but this effect is rather small; indeed, one can see that with K=10 the rating doesn’t change much once the lists are at least twice a year. However, the change resulting from increasing K=10 to K=20 is much more substantial than any effect resulting from altering the frequency of rating lists. And remember that this player was chosen to emphasise the effects of ‘rating lag’. Readers may note that the 2557 rating differs from FIDE’s published rating for Abbasov on the January 2009 list, which was 2565. This is because FIDE included a tournament he played in December 2007 in their 2008 calculations, whereas I went by the actual dates the games were played and so this tournament was excluded (there were a couple of other minor effects due to FIDE’s cut-off dates for rating lists).

Of course, it’s only one player. It would be interesting for someone with more computing skills than I have to extend this to a wider range of players, although I wouldn’t expect the results to be much different.

In summary, I stick to my two basic points: there’s no proof that increasing the K-factor will improve the rating system, and that there’s no genuine connection between K-factor and the frequency of rating lists. Both have an effect on ratings, but these effects are largely independent.

I considered my point about ‘rating cheats’ to be the most significant remark in my original piece, but to my surprise few correspondents took this up. Recent years have already seen a regrettable increase in cases of dishonesty in chess, and my concern is that with ratings having an ever-greater importance, making it easier to cheat on your rating is only going to encourage more people to do so. For professional players, their rating is perhaps the main factor in influencing their career, and their livelihood and the welfare of their families depends on it. FIDE were very slow to respond to the possibility of cheating using electronic devices; indeed, it could be said that they have still not fully got to grips with this issue. I hope that they will carefully consider the implications of the K=20 change and not exacerbate the problems that already exist in the chess world.

Dr John Nunn (born April 25, 1955) is one of the world’s best-known chess players and authors. He showed early promise in winning the British Under-14 Championship at the age of twelve and captured several other junior titles before winning the European Junior Championship in 1974-5. At the same time he was studying mathematics at Oxford, after entering the university at the unusually early age of 15. In 1978 he achieved a double success by gaining both his doctorate, with a thesis in algebraic topology, and the GM title by winning a tournament in Budapest. In 1981 he abandoned academic life for a career as a professional chess player. In 1984 he gained three individual gold medals at the Thessaloniki Olympiad, two for his 10/11 performance on board two for England and one for winning the problem-solving event held on a free day.

John's best period for over-the-board play was 1988-91. In 1989 he was ranked in the world top ten, and in the same year he finished sixth in the GMA World Cup series, which included virtually all the world’s top players. He also won the tournament at Wijk aan Zee outright in 1990 and 1991, to add to a previous tie for first place in 1982.

John Nunn was also active as a chess author in the late 1980s and 1990s, twice winning the prestigious British Chess Federation Book of the Year prize. When his playing career started to wind down in the 1990s, he devoted more energy to chess publishing and in 1997, dissatisfied with the existing chess publishers, he (together with Murray Chandler and Graham Burgess) founded Gambit Publications, which now has more than 200 books in print. When he effectively retired from over-the-board play in 2003, he revisited an early interest in chess problems and in 2004 won the World Chess Problem Solving Championship, at the same time adding a GM solving title to his earlier over-the-board title. In 2005 and 2006 he was part of the British team which won the team World Problem Solving Championship. In 2007 he repeated his earlier success by winning the World Chess Problem Solving Championship for a second time.

In 1995 he married the German woman player Petra Fink. They have one son, Michael, currently aged ten.

This, for the time being, is our final publication on the subject. Statistician Jeff Sonas will be performing some practical experiments with past FIDE ratings data, and FIDE itself is exploring the options available at the current time. There will be a meeting in June to decide the further course of action. Naturally you will be informed about any developments in this area. Until then please accept that we have illuminated all sides of the argument adaquately and will not publish and opinions or replies to any of the authors until then.

References


Reports about chess: tournaments, championships, portraits, interviews, World Championships, product launches and more.

Discuss

Rules for reader comments

 
 

Not registered yet? Register