Impressions from FIDE rating conference 2010

6/10/2010 – The FIDE ratings conference, held last week in Athens, Greece, spent quite a bit of time discussing the problem of rating inflation. Two different opinions met head on: one of chess statistician Jeff Sonas, USA, and one represented by Polish GM Bartlomiej Macieja. The subject matter is not easy to understand, but our colleague Michalis Kaloumenos made a serious effort to do so. Food for thought.

Impressions from FIDE rating conference 2010

By Michalis Kaloumenos

I was well prepared to attend the FIDE rating conference held in Athens, Greece from June 1st to June 4th. I made a plan and wrote down questions seeking for answers that would help me complete my task. I introduced myself and I told them that I wanted to write in simple words an article about the rating system and the proceedings of the meeting, so that ordinary people could understand the debate. “Good luck” said Stewart Reuben and soon I found out that his wish was wise: There were no easy answers to my questions!

The participants of FIDE rating conference 2010 (left to right): Mikko Markkula (Chairman of FIDE Qualification Commission), Stewart Reuben (Secretary of FIDE QC), Nick Faulks (Councillor of FIDE QC), David Jarrett (FIDE Executive Director and moderator of the panel), Jeff Sonas, GM Bartlomiej Macieja and Vladimir Kukaev (Director of FIDE Elista Office)

I joined the Wednesday afternoon session when Jeff Sonas presented his graphs regarding the “rating inflation” problem. The well known statistician has spent years of research in analyzing the rating system using the FIDE lists as input, and in order to apply his own ideas a posteriori over the past years, he even reconstructed tournament tables from chess databases, because prior to 2006 FIDE required only the final points of a player and the average Elo of his opponents in order to calculate ratings, not the individual results in detail. However, despite so many years of analysis the “rating inflation” problem lacks a widely accepted definition. In order to accept a situation as a problem, a definition that identifies the problem is required. So, Jeff Sonas draws a graph of the Elo points of the player ranked in number five (also #10, #20, #50 and #100) of the FIDE lists over the past years and finds that the points of this #5 player have risen.

Well, every coin has a flip side. It is a common observation related to the “rating inflation” problem that the May 2010 list includes 37 +2700 players compared to only 11 in the July 2000 list. When the rating system was introduced in 1971, only Bobby Fischer was above this threshold. From this point of view Bartlomiej Macieja had an explanation. He presented a graph with the number of players rated in intervals of 100 points. But instead of using the absolute number of players in every area, he used the ratio of these players over the total population of the rating list. The result was amazing. This normalization showed that the distribution of the players in the same interval did not change over the years. According to Bartek the “rating inflation” problem is a result of the expansion of the population. Nowadays there are almost 112,000 players with FIDE Elo ratings compared to 33,384 in year 2000.

This example is characteristic of the different approach between the two main speakers of the conference. Both of them want to improve the accuracy of the rating formula, but they understand this simple word “accuracy” in a different way. For Jeff Sonas the formula is accurate if it provides an expected result distribution identical to the actual result distribution for all circumstances (unrated players, established players, higher rated, lower rated). Wherever deviation is observed, a retouch is necessary. On the contrary, GM Bartlomiej Macieja requests to be accurately ranked in the FIDE rating list. (A simple reason for this: Sometimes Elo points alone qualify players for FIDE tournaments.) Do you understand the difference?

For Jeff, the numbers provide the required input in order to improve the rating system. For Bartek, further interpretation of the numbers is needed. This research may lead to new parameters that must be taken into account. (For example, did you ever notice that there are players who select their tournaments with care so that a possible bad performance will not result to a big loss of points?) When David Jarrett opened the topic “towards the future”, Bartlomiej Macieja recommended a radical change: abandon completely the Elo system and adopt the Glicko system instead. He compared Arpad Elo’s system to Newton’s mechanics and Marc Glickman’s system to Einstein’s theory of relativity. That is, the Elo system is a subset of Glicko system. Glickman allows the evaluation and use of quality factors that the Elo system completely ignores. However the construction of the accurate widely accepted formula that everybody agrees with is far from obvious.


Jeff Sonas – always in front of his computer

It suddenly became clear to me that this situation was not a dipole but rather a triangle. It is FIDE’s responsibility to organize and conduct a creative dialogue among all interested individuals in order to find the right solution able to survive over the following ten years at least. As David Jarrett pointed out (and everybody agreed) they’d rather prepare carefully a solid proposition for the 2012 General Assembly and use plaster to fix the holes until then.

From a technical point of view every solution is applicable. As Vladimir Kukaev confirmed, six lists per year provide more accurate ratings compared to four lists per year. Even a monthly list publication is possible (if this is decided). In fact, the system provides live rating changes (available in the profile of each player individually) as soon as tournament results enter the system. However, it is impossible to force tournament directors to update the players’ rating every day. In addition, tournament organizers do not provide the results by themselves. The National Federations are the members of FIDE accredited to fulfill this task. As a result, a number of days usually intervene before results enter the system.


Vladimir Kukaev is responsible for publishing the FIDE rating list six times per year

Imagine that even a slight change to the present system might cause strong objections. For example a unique K-factor (K=25) may lead high rated players to deliberate partial inactivity. Another example: doubling the required games for an unrated player (currently 9) in order to obtain his initial Elo points is an accepted mathematical solution towards a more accurate first entry. But I believe that certain National Federations would not like this, especially if they struggle to develop chess in their country and provide a lot of newcomers to the system with a limited financial support that does not allow them to organize many tournaments. Further changes (not to mention a revolutionary one) require diplomacy, politics and a well prepared proposition that can be finally voted by FIDE General Assembly.

So what was this conference all about? The main speakers presented the data from their recent analysis, examined the significance of every parameter of the formula, exchanged opinions with the rest of the participants and finally the team decided to write down what changes they should consider in the future, if any changes are required to the present system. I wouldn’t like to mention this part of the proceedings for two reasons: a) this conference had simply an advisory purpose, since they are going to propose possible changes to the Presidential Board and then address the General Assembly, and b) I expect that an official announcement will soon become available. Furthermore, I expect that Jeff Sonas and Bartlomiej Macieja are going to publish their own ideas and conclusions and provide all technical details in order to feed a public dialogue. In the end everyone was satisfied with the particularly productive four days they spent in Athens. As Mikko Markkula pointed out the FIDE rating system is stable and highly appreciated and the conference found the right path of discussion in order to make it even better.

 

Mikko Markkula (Chrairman of FIDE QC). In front of him a heavily used copy of Arpad Elo’s book “The Rating of chess players, Past and Present” (currently published by Ishi Press International). Bartek’s copy was in much better condition.


Michalis Kaloumenos and Stewart Reuben

It seems that as chess develops globally, ratings become a more complex and dynamically changing system that requires continuous attention and re-evaluation. Had it been easier in the past to calculate ratings? I really do not know, but provided that the first July 1971 list includes only 592 players, it must have been an easier task to produce the list. By the way, I discovered that this first list was allegedly prepared by Mrs. Elo alone (not Mr. Elo) in her kitchen, and she didn’t even use a calculator. This amazing piece of information was confidentially revealed to me by SR, a respectable 71 years old Englishman who wishes that his name remains secret.

Appendix

As soon as I was going to mail my article to ChessBase, I received an e-mail from GM Bartlomiej Macieja stating:

Jeff has the following approach: he wants a player ranked Nth (for instance 10th or 100th) to have the same rating throughout history. If it doesn't happen, he says we have inflation or deflation. For me, such an approach is not very interesting. If I want to see which player was closer to the top at his times, I simply check his position on a rating list published that time. Why do I need to double columns in order to get the same information?

What I would like to measure is what happens with ratings of players who keep their level. If their rating increases, I call it inflation, if their rating decreases, I call it deflation.

It is clear for everybody that Kasparov is much stronger than Steinitz was. In Jeff's approach, ratings of Kasparov and Steinitz would be more or less the same. You can see it in his "chessmetrics ratings":

 In my approach, there should be a huge gap in ratings between Kasparov and Steinitz. You can look at it also from another side. As there are more and more good players, in Jeff's approach if a player keeps his level, his rating should decrease. It may lead to situations in which players would not like to play (to be active), because by average, in Jeff's approach, they should lose rating by playing games. In fact there is no escape for anybody, because Jeff wants to punish inactive players as well.

In other words, in Jeff's approach, a player needs to constantly make a progress to keep his rating stable. In my approach, a player needs to keep his level in order to keep his rating.  

References


A few words about the author

Michalis Kaloumenos is an electrical and computer engineer who graduated from the National Technical University of Athens. He lives in Athens with his wife and three children. Michalis is a ChessBase software expert. From 2006 to 2009 he was responsible for the column “chess and computers” for the Greek chess magazine “Skaki gia olous”. He conducted and edited many interviews for the magazine including one with Georgios Makropoulos and another one with chess engine Fritz 10. His current chess project regards the construction and management of the yet-to-be-launched www.e-skaki.gr website (only in Greek) together with his old friends from the editorial team of the magazine.

Copyright ChessBase


Feedback and mail to our news service Please use this account if you want to contribute to or comment on our news page service



Discuss

Rules for reader comments

 
 

Not registered yet? Register