Ranking chess players according to the quality of their moves

by Frederic Friedel
4/27/2017 – How do you rate players from different periods? An AI researcher has undertaken to do it based not on the results of the games, but on the quality of the moves played. Jean-Marc Alliot used a strong chess engine running on a 640 processor cluster to analyse over two million positions that occurred in 26,000 games of World Champions since Steinitz. From this he produced a table of probable results between players of different eras. Example: Carlsen would have beaten Smyslov 57:43.

ChessBase 17 - Mega package - Edition 2024 ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.

More...

Artificial Intelligence evaluates chess champions

The Elo rating system in chess, well known to all of us, is based on the results of players against each other. Designed by the Hungarian physics professor and chess master Árpád Imre Élo in 1970 the system is used to predict the probability of rated players winning or losing their games against another rated players. If a player performs better or worse than predicted then rating points are added to or deducted from his rating. However, the Elo system does not take into account the the quality of the moves played during a game and is therefore unable to reliably rank players who have played at different periods in history.

Now computer scientist and AI researcher Jean-Marc Alliot of the Institut de Recherche en Informatique de Toulouse has come up with a new system (and reported on it in the journal of the International Computer Games Association) that does exactly that: rank players by evaluating the quality of their actual moves. He does this by comparing the moves of World Champions with those of a strong chess engine – the program Stockfish running on a supercomputer. The assumption is that the engine is executing almost perfect moves.

Alliot has evaluated 26,000 games played by World Champions since Steinitz, estimating the probability of their making a mistake – and the magnitude of the mistake – for each position in their games. From this he derived a probabilistic model for each player, and used it to compute the win/draw/lose probability for any given match between any two players. The predictions, he says, have proven not only to be extremely close to the results from actual encounters between the players, but they also fare better than those based on Elo scores. The results, he claims, demonstrate that the level of chess players has been steadily increasing. The current world champion, Magnus Carlsen, tops the list, while Bobby Fischer is third.

Here are predictions of game results between the different world champions in their best year:

  Ca Kr Fi Ka An Kh Sm Pe Kp Ks
Carlsen   52 54 54 57 58 57 58 56 60
Kramnik 49   52 52 55 56 56 57 55 59
Fischer 47 49   51 53 57 56 57 56 59
Kasparov 47 49 50   53 54 54 54 53 57
Anand 44 46 48 48   54 52 53 53 57
Khalifman 43 45 44 47 47   50 51 52 53
Smyslov 43 45 45 47 49 51   50 51 53
Petrosian 43 44 45 47 49 50 51   52 53
Karpov 44 46 45 48 48 49 50 49   51
Kasimdzhanov 41 43 42 45 45 48 48 48 50  

Under current conditions, Alliot feels, this new ranking method cannot immediately replace the Elo system, which is easier to set up and implement. However, increases in computing power will make it possible to extend the new method to an ever-growing pool of players in the near future.

Read the full detailed paper published by Jean-Marc Alliot in the ICGA Journal, Volume 39 -1, April 2017. Mathematically proficient readers are welcome to comment on his method and his results.


Editor-in-Chief emeritus of the ChessBase News page. Studied Philosophy and Linguistics at the University of Hamburg and Oxford, graduating with a thesis on speech act theory and moral language. He started a university career but switched to science journalism, producing documentaries for German TV. In 1986 he co-founded ChessBase.

Discuss

Rules for reader comments

 
 

Not registered yet? Register

benedictralph benedictralph 4/27/2017 02:22
There appear to be things not uncommon between human players that the research does not account for. For instance, mistakes (human fallibility) that either side has a chance of recovering from. This may lead to unbalanced positions (materially) that could go either way. We see this quite often, even between games at the highest levels. A computer engine simply would not allow for this sort of thing. Those positions and games are pruned out of existence in terms of what constitutes the "correct" or "best" move in any given situation. Basically, the research fails to take into account the "art" of human play, for lack of a better word. Just replace Stockfish with Carlsen (and remove Carlsen from the list of players analyzed). How viable does the research look now? Consider then that Stockfish isn't even human yet is being used as a benchmark of sorts for humans. Again, how viable does the research look now? I fail to see the point of this kind of study, sorry to say.
WildKid WildKid 4/27/2017 01:17
I have one further statistical criticism. The authors show that their measure 'predicts' World Championship results better than ELO, and imply that this shows their measure is better than ELO. This is fallacious, since they are retrofitting the very data from which their model is derived.

In general, if we have a large dataset D and a subset S, a reasonable measure based on S will almost always retrofit S better than a reasonable model based on D. For example, a theory T(post) derived to fit the data after an experiment will almost always fit the data better than a theory T(prior) devised before the experiment. That does NOT mean that T(post) is a better theory than T(prior). This fallacy comes up all the time in Evidence-Based Medicine, among other fields.

To make their inference valid, the authors would need to compare to an ELO-like measure based only on the set of games they are using, rather than based on all ELO-valid games.
KevinC KevinC 4/27/2017 12:55
Or often second-best moves are the clearest way to win for a human.
anamanam anamanam 4/27/2017 12:04
Statistics is a fantastic science.

1. "it is now possible to predict the outcome of a match between any World Champion from any active year with any other Champion
taken in any active year; it is even possible to predict the result of Fischer 1970 against Fischer 1971."
Looking forward to that, too, not only best-year base ranking.

2. "for each player, the “best year” was found by searching for the year where the player had the largest number of victories against all other players and all other years".
How different would it be if the "best year" was defined based on the highest conformance, i.e. the year of best match between actual vs. computer moves?
satman satman 4/27/2017 11:53
So Khalifman at his best would have had the edge on both Petrosian and Karpov.
You learn something every day!
nimzobob nimzobob 4/27/2017 11:44
Are there some factors that are not being considered here. Sub-optimal moves can be played for a reason - reduce counter-play in a winning position, to complicate the game, to give better winning chances, to keep the game alive, etc.

What is the margin of error here? When the margins are very small it is easy to lie with statistics :-)
WildKid WildKid 4/27/2017 11:36
I have read the whole paper with some attention. On the whole it seems to be a good piece of work, but I do have the following suggestions for improvement.

A) The methodology of measuring the mean difference between chosen and optimal moves is biased in favor of 'safe' players who stick to balanced lines, and thus rarely make big mistakes. It would evaluate Petrosian as a much better player than Tal, for example. Tal would lost points for 'unsound' sacrifices where the correct defense is hard to see, and also for possibly suboptimal play in highly tactical situations that still may result in a won game. The algorithm could be improved by giving credit to players when their OPPONENTS frequently make mistakes (presumably because they are in positions that are very difficult for a human to evaluate for either side, but that some players such as Tal and Mamedyarov(?) prefer.)

B) Another improvement would allow e.g. Stockfish to retrospectively re-evaluate a position where evaluation of subsequent moves proves the initial engine valuation to be mistaken. For example, there is a very famous game where Kasparov, as White, sacrificed a knight, and engines ruled the sacrifice unsound. However, three or four moves later, the engines evaluate the position as good for White, without being able to identify any error in Black's play. The algorithm as written would punish Kasparov for disagreeing with the engine, rather than reward him for being smarter than it!

That said, I think this paper is a good start.