Wrapping up the K-factor debate
GM Dr John Nunn, England
My short piece on the increase of the K-factor in the Elo system has provoked
a number of responses, which I have read carefully. Some of the correspondents
made interesting and worthwhile points, and I would like to reply, giving more
detail than I did in my original comment, which was intended to be short and
to the point.
I did read the 2002
ChessBase article by Jeff Sonas. In this article Sonas analyses the results
of his rating system and provides evidence that it better predicts future results
than the Elo system. It may be true that his system is better in this respect
than the Elo system; it’s a complex question. However, it’s important to realise
that the Sonas system is fundamentally different to the Elo system – for
example, it is based on a linear function rather than a normal probability distribution.
What FIDE are going to do is not to adopt the Sonas system in its entirety,
but to simply increase the K-factor in their current system. Therefore the possible
merits of the Sonas system are not relevant to the current discussion. I stick
to my original point, that there’s no proof that increasing K to 20 will improve
the current system.
Several correspondents pointed out that the frequency of rating lists does
affect your rating, and therefore my original comment to the contrary was inaccurate.
For example, suppose X is a young player with a published rating of 2400. X
has improved rapidly over the past few months and is now 2500 strength. For
the sake of simplicity assume that X achieves a performance rating of 2500 in
every tournament he now plays. Then each tournament results in a rating increase.
These rating increases are, as it were, ‘in the bank’, but have no effect on
X’s rating until the next list is published. If this publication is far in the
future (at one time Elo lists only appeared once a year, for example) X’s rating
will make a large jump when the next list is published, and may even be above
2500; in other words, his published rating may be higher than all his performance
ratings. If the rating lists are published more frequently, say monthly, then
X’s rating will increase to say 2415 after one month. Because his rating is
now higher, the next time he plays at 2500 his rating increase will be less
and his rating will eventually converge on 2500.
I will call this phenomenon ‘rating lag’ because it results from the lag between
the playing of a game and its effect on published Elo ratings. The longer the
interval between rating lists, the greater the effect of rating lag. When lists
were published only once a year its influence was obvious. Rating lag is undesirable
because it distorts ratings. Rapidly improving players get higher ratings than
they would if ratings were calculated on a continuous basis, while rapidly declining
players end up with lower ratings. As noted above, in extreme cases it can lead
to serious anomalies such as a player having a rating higher than all his performance
ratings. However, these extreme cases were unlikely even with annual lists,
and with the current three-monthly lists they are virtually impossible. Rating
lag under the current three-monthly system affects only a very small number
of ratings and then only slightly. Of course, it’s possible to construct artificial
examples (as some correspondents did) but with real-world data the effects of
rating lag are almost always very small (I will return to this later).
It is worth noting that rating lag is not an inherent part of the Elo system;
it results from the way it is administered. Decades ago, tournament results
had to be sent it by post and the calculations were all done by hand, so it
wasn’t feasible to update the list more often than once a year. Now, with results
sent in by Internet and all calculations done by computer, it is both possible
and desirable to issue the lists far more often, and that is what FIDE has done.
The switch to one-monthly lists will have the desirable effect of further reduce
rating lag, as well as giving players more up-to-date information.
Rating lag is a phenomenon which is basically a function of the frequency of
rating lists and is not directly connected to the K-factor, although a higher
K-factor makes it more noticeable. GM Bartlomiej Macieja’s argument is that
he wants to increase the K-factor so as to have the same amount of rating lag
with one-monthly lists as with the previous three-monthly lists. This is missing
the point. Rating lag is a defect which should be eliminated so far as possible,
not magnified. Also, his argument about two players with ratings of 2500 and
2600 who then have the same results is flawed. With a very low K-factor (say
K=1) the gap between the players will remain large for a long time. But with
a very high K-factor (say K=100) it is very likely that the 2600 player will
end up with a lower published rating than the 2500 player, which is even more
anomalous. Depending on the details of the results, there will be a K-factor
which results in the two ratings being the same after a certain period of time,
but this so-called ‘optimum’ K-factor depends entirely on the precise figures
chosen, and by repeating the argument with different initial conditions and
results you can end up with an ‘optimum’ K-factor of any value you like.
This is because rating lag and K-factor are not directly related. A higher
K-factor will magnify rating lag, since it magnifies all rating changes, but
of the two changes (increasing the K-factor and having more frequent lists)
increasing the K-factor has a much more profound effect. The reason is that
as lists are published more and more often, the rating will converge to what
one would get with a theoretical ‘continuous’ rating system, i.e., one which
is updated every time a game is played. This convergence implies that increasing
the frequency of lists has a smaller and smaller effect the more lists there
are. Increasing the K-factor, however, has a linear effect; twice the K-factor,
twice the rating change; three times the K-factor, three times the rating change.
An example is probably in order. I chose a player at random from the FIDE rating
(actually I looked through it alphabetically until I found a player with a 2500+
rating who had played fairly actively during 2008). The person I chose was Farid
Abbasov. I then listed all the games he played in 2008 and, starting at 1 January
2008, I calculated how his rating would have changed by 1 January 2009 with
different values of K and different frequencies of rating lists. It is worth
noting that this player should show more of a ‘rating lag’ effect than most,
as he played as many as 80 games in 2008 and also showed a significant improvement
in this period; these are precisely the circumstances in which rating lag would
be expected to be most significant.
Here’s the result: Rating at 1st January 2009
|
K=10 |
K=20 |
K=28 |
12-monthly lists |
2576 |
2646 |
2701 |
6-monthly lists |
2559 |
2588 |
2598 |
3-monthly lists |
2557 |
2581 |
2585 |
monthly lists |
2554 |
2572 |
2577 |
The table shows some effects already mentioned; the more often the rating list
is updated, the less Abbasov’s rating increases, but this effect is rather small;
indeed, one can see that with K=10 the rating doesn’t change much once the lists
are at least twice a year. However, the change resulting from increasing K=10
to K=20 is much more substantial than any effect resulting from altering the
frequency of rating lists. And remember that this player was chosen to emphasise
the effects of ‘rating lag’. Readers may note that the 2557 rating differs from
FIDE’s published rating for Abbasov on the January 2009 list, which was 2565.
This is because FIDE included a tournament he played in December 2007 in their
2008 calculations, whereas I went by the actual dates the games were played
and so this tournament was excluded (there were a couple of other minor effects
due to FIDE’s cut-off dates for rating lists).
Of course, it’s only one player. It would be interesting for someone with more
computing skills than I have to extend this to a wider range of players, although
I wouldn’t expect the results to be much different.
In summary, I stick to my two basic points: there’s no proof that increasing
the K-factor will improve the rating system, and that there’s no genuine connection
between K-factor and the frequency of rating lists. Both have an effect on ratings,
but these effects are largely independent.
I considered my point about ‘rating cheats’ to be the most significant remark
in my original piece, but to my surprise few correspondents took this up. Recent
years have already seen a regrettable increase in cases of dishonesty in chess,
and my concern is that with ratings having an ever-greater importance, making
it easier to cheat on your rating is only going to encourage more people to
do so. For professional players, their rating is perhaps the main factor in
influencing their career, and their livelihood and the welfare of their families
depends on it. FIDE were very slow to respond to the possibility of cheating
using electronic devices; indeed, it could be said that they have still not
fully got to grips with this issue. I hope that they will carefully consider
the implications of the K=20 change and not exacerbate the problems that already
exist in the chess world.
Dr John Nunn (born April 25, 1955) is one of the world’s
best-known chess players and authors. He showed early promise in winning
the British Under-14 Championship at the age of twelve and captured several
other junior titles before winning the European Junior Championship in
1974-5. At the same time he was studying mathematics at Oxford, after
entering the university at the unusually early age of 15. In 1978 he achieved
a double success by gaining both his doctorate, with a thesis in algebraic
topology, and the GM title by winning a tournament in Budapest. In 1981
he abandoned academic life for a career as a professional chess player.
In 1984 he gained three individual gold medals at the Thessaloniki Olympiad,
two for his 10/11 performance on board two for England and one for winning
the problem-solving event held on a free day.

John's best period for over-the-board play was 1988-91. In 1989 he was
ranked in the world top ten, and in the same year he finished sixth in
the GMA World Cup series, which included virtually all the world’s top
players. He also won the tournament at Wijk aan Zee outright in 1990 and
1991, to add to a previous tie for first place in 1982.
John Nunn was also active as a chess author in the late 1980s and 1990s,
twice winning the prestigious British Chess Federation Book of the Year
prize. When his playing career started to wind down in the 1990s, he devoted
more energy to chess publishing and in 1997, dissatisfied with the existing
chess publishers, he (together with Murray Chandler and Graham Burgess)
founded Gambit Publications,
which now has more than 200 books in print. When he effectively retired
from over-the-board play in 2003, he revisited an early interest in chess
problems and in 2004 won the World Chess Problem Solving Championship,
at the same time adding a GM solving title to his earlier over-the-board
title. In 2005 and 2006 he was part of the British team which won the
team World Problem Solving Championship. In 2007 he repeated his earlier
success by winning the World Chess Problem Solving Championship for a
second time.
In 1995 he married the German woman player Petra Fink. They have one
son, Michael, currently aged ten.
|
This, for the time being, is our final publication on the subject. Statistician
Jeff Sonas will be performing some practical experiments with past FIDE ratings
data, and FIDE itself is exploring the options available at the current time.
There will be a meeting in June to decide the further course of action.
Naturally you will be informed about any developments in this area. Until then
please accept that we have illuminated all sides of the argument adaquately
and will not publish and opinions or replies to any of the authors until then.
References