Nunn on the K-factor: show me the proof!

4/30/2009 – With the debate raging over FIDE's decision to change or not to change the K-factor used in calculating players' ratings, we are glad to receive an important message from our voice-of-reason grandmaster. Dr John Nunn says "there seems no real evidence that K=20 will result in a more accurate rating system, while there are a number of risks and disadvantages." His explanation and reader feedback.

GM Dr. John Nunn, England

I just thought I’d make some comments on the K-Factor debate.

Changing the K-factor from 10 to 20 for the whole Elo list is a radical change, the most radical in the 40-year history of the Elo system. What is curious is that in all the debate nobody seems to have set out exactly what the supposed advantages to this change are. Certainly, people’s ratings will go up and down faster, but why should this make the ratings ‘better’?

Jeff Sonas states "Using a more dynamic K-factor (24 instead of 10) would result in ratings that more accurately predict players' future results, and thus I would call those ratings more ‘accurate’." But where’s the proof of this? It strikes me that this statement is in fact very unlikely to be true. The performance of players varies considerably from one tournament to the next. Increasing the K-factor effectively places greater weight on the most recent results, so that the rating is based less on an average of results and more on the latest tournament. Since this is subject to a wide random variation, there seems no particular reason to believe that the rating will more accurately reflect a player’s strength or better predict future results.

The argument put forward by GM Bartlomiej Macieja that an increase in the K-factor is a necessary consequence of more frequent rating lists doesn't stand up to examination. The K-factor and the frequency of rating lists are unrelated to one another. Rating change depends on the number of games you have played. If you have played 40 games in 6 months, it doesn't make any difference whether FIDE publishes one rating list at the end of six months or one every day; you've still played the same number of games and the change in your rating should be the same.

I am against the increase in the K-factor, for two reasons. The first is that the Elo system has worked well for 40 years, and while all rating systems have their defects, during this time the Elo system has gained respect as a good indicator of current playing strength. Why then change it in such a dramatic fashion?

The second reason doesn’t seem to have been mentioned so far, but I think this is the reason why many top players are against the K=20 change. With qualification to many important events, including the World Championship, being based on ratings, obtaining a higher rating can be extremely valuable. In the past there has been a certain amount of ‘rating cheating’, ranging from the buying of individual games to the construction of entire imaginary tournaments. With the stakes so high, this will doubtless also occur in the future. The problem is that it is much easier to cheat on your rating with K=20. With 20 points at stake in each game, it only takes a small amount of cheating to cause a massive surge in your rating. What the top players are concerned about is that the places in elite tournaments and even the World Championship which should rightfully go to them will instead go to ‘rating cheats’. Of course, you can cheat on your rating with K=10 too, but why make the task so much easier?

On the whole, there seems no real evidence that K=20 will result in a more accurate rating system, while there are a number of risks and disadvantages.


GM Michal Krasenkow, Gorzow Wlkp., Poland
For me personally the FIDE decision to increase the K-factor was a bolt from the blue. Before the Dresden Congress it was only discussed in a purely theoretical manner within a small group of specialists. And suddenly it was put into practice – without wider discussion in the world chess community, without any computer simulation (was is really difficult to recalculate the events of, say, 2006-2008 according to the proposed rules?). Is it the right way to make such revolutionary changes?

My opinion is that the K factor can and should be increased, to 15 or 20 – that can be a subject of discussion and – let me repeat – a computer simulation. What is absolutely ridiculous is the introduction of K=30 for "before 2400" players. One of the main ideas of the Elo system was that the winner of a single game got as many rating points as the loser lost. That was infringed by the introduction of K=15 for players who have never reached 2400, which IMHO was the main cause of rating inflation in recent decades (until the most recent years, when the "350 rule", combined with game-by-game calculation, became an even stronger inflation factor). What will happen when we introduce K=30 – God knows. Definitely, ratings of "before 2400" players will have nothing to do with their playing strength, rather with the number of games they will manage to play during the two-month rating period. 36 games, i.e. two open tournaments a month – nothing special for an active player, he needn't do it "intentionally" – will lead to a 1.5-fold rating "overleap" as Mr. Lorscheid showed in his example (i.e. a player rated 2300, with an average performance 2400, will get 2450!). I think, it is obvious for everyone that any inertion in rating changes is better than overlap. Besides, the inflation of ratings, with a lot of players getting 2400+ with K=30 and then dropping back with K=20, will increase to a level no-one can even predict. Therefore, in the present situation I fully support the decision to halt the changes. Then the wide discusiion should be reopened, a computer simulation of different versions of changes (K=15 for all; K=20 for all etc.) should be made; only after that a new rating calculation system can be introduced.

Elmer Dumlao Sangalang, Manila, the Philippines
The Current Rating Rn is given by the formula, Rn = Ro + K(W - We). Ro (original rating) is based on No games which is dependent on K (the rating point value of a single game). If K = 10, No = 80. If K = 15, No = 50. If K = 25, No = 30. During a competition, the player plays N new games. The Current Rating formula performs the operation of averaging the latest performance in N games into the prior rating so as to smoothly diminish the effect of the earlier performances, while retaining the full contribution of the latest performance. For the smooth blending of the new into the old, the number of games to be newly rated should not exceed the number of games on which Ro is based.

The reliability of a rating depends on the number of games used in the calculation of the rating. With 30 games, the rating is 95% reliable. With 50 games, 98.8%. With 80 games, 99.7%. When k = 25, we are satisfied with a rating that's based on 30 games which is only 95% credible. With k = 15, we settle for a rating based of 50 games which is 98.8% credible. With k = 10, we want a rating to be based on 80 games so that it will be 99.7% credible.

The number of times the Rating List is produced has nothing to do with the choice of K. The claim that two players should have the same rating if they score the same number of points against a common set of opponents in the same tournament is equivalent to demanding that their Performance Rating in the single event count as their Current Rating. The Current Rating is different from the Performance Rating. The Current Rating is made up of several Performance Ratings.

FIDE should not support the increase in the K Factor if it does not want the reliability of ratings to be diminished.

Johan Ostergaard, Copenhagen, Denmark
Jeff Sonas is brillant – I only wish that the people at FIDE spent more time reading and understanding his results. Does he really do all this work for free?

John Rood, Holbrook, MA, USA
This is the infamous Jeff Sonas who around 2005 or so predicted it would be a long time before computers surpassed the top grandmasters in playing skill, right?

Bostjan Sirnik, Ljubljana, Slovenia
As a response to this debate I have decided to share with you my concerns about the effects of new formulas for calculating the chess ratings on rating inflation/deflation. I suggest to empirically test the following "common sense" conclusions that can be made regarding the proposed new rating system:

  1. The increase of the K-factor will boost the variance of the player's individual performance.

  2. Consequently players with rating under 2400 with a higher K-factor will have a much higher probability to eventually reach the limit of 2400 points and thus getting a lower K-factor (without significant improvement of their game).

  3. Consequently the percentage of players with rating under 2400 and with lower K-factor will become much higher then it is today.

  4. Basically this will lead to an important deflation of rating points in the pool of "just under 2400" players. It is even possible that this will undermine the stability (and validity) of the whole rating system.

Although this concerns cover only a minor part of the whole picture I believe that they should be discussed and above all empirically examined. I also want to point out my impression that FIDE chose a highly irresponsible and unprofessional way to put in practice such an important and revolutionary set of changes as new formulas behind the rating system certainly are. In his letter published here at chessbase.com Jeff Sonas comments that "the question of rating inflation is a very difficult one and really the only way to tackle it is to look at actual data and see the result of various approaches". GM Bartlomiej Macieja also asked FIDE to perform statistical studies with the available empirical data: "Instead of waiting for a year or two in order to show consequences of the change of the value of the K-factor, it is much better and faster to calculate results from last two or even five years using the new value of the K-factor."

It's incomprehensible that FIDE has the empirical data about chess ratings from the past but it's not willing to use them before making such decisions. If FIDE lacks the know-how then at least it could make these historical data public and invite the chessplayers with the appropriate expertise (e.g. Mr. Sonas) to contribute with their analyses and conclusions. Obviously the corrections to the current rating system must be done, but in this case forcing ad hoc solutions without scientific justification will do more damage then good.

Mark Adams, Wales
I have been a rating officer for 25 years and I am convinced that we, as chess players, have lost the plot with ratings. I suggest we get rid of ratings and go back to the old system of classes, where progress is determined by norms. Imagine sitting down to a game and not worrying about losing rating points! Acure for the 'too many draws' issue? There will still be a need to determine some form of ranking for the top players – why not use a system similar to tennis where you get points for winning events? OK, too radical for most, as ratings are ingrained in a chess players psyche. But try to think above this and maybe you'll agree?

References


Feedback and mail to our news service Please use this account if you want to contribute to or comment on our news page service



Discuss

Rules for reader comments

 
 

Not registered yet? Register