Ranking chess players according to the quality of their moves

4/27/2017 – How do you rate players from different periods? An AI researcher has undertaken to do it based not on the results of the games, but on the quality of the moves played. Jean-Marc Alliot used a strong chess engine running on a 640 processor cluster to analyse over two million positions that occurred in 26,000 games of World Champions since Steinitz. From this he produced a table of probable results between players of different eras. Example: Carlsen would have beaten Smyslov 57:43.

ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.

More...

Artificial Intelligence evaluates chess champions

The Elo rating system in chess, well known to all of us, is based on the results of players against each other. Designed by the Hungarian physics professor and chess master Árpád Imre Élo in 1970 the system is used to predict the probability of rated players winning or losing their games against another rated players. If a player performs better or worse than predicted then rating points are added to or deducted from his rating. However, the Elo system does not take into account the the quality of the moves played during a game and is therefore unable to reliably rank players who have played at different periods in history.

Now computer scientist and AI researcher Jean-Marc Alliot of the Institut de Recherche en Informatique de Toulouse has come up with a new system (and reported on it in the journal of the International Computer Games Association) that does exactly that: rank players by evaluating the quality of their actual moves. He does this by comparing the moves of World Champions with those of a strong chess engine – the program Stockfish running on a supercomputer. The assumption is that the engine is executing almost perfect moves.

Alliot has evaluated 26,000 games played by World Champions since Steinitz, estimating the probability of their making a mistake – and the magnitude of the mistake – for each position in their games. From this he derived a probabilistic model for each player, and used it to compute the win/draw/lose probability for any given match between any two players. The predictions, he says, have proven not only to be extremely close to the results from actual encounters between the players, but they also fare better than those based on Elo scores. The results, he claims, demonstrate that the level of chess players has been steadily increasing. The current world champion, Magnus Carlsen, tops the list, while Bobby Fischer is third.

Here are predictions of game results between the different world champions in their best year:

Carlsen

Kramnik

Fischer

Kasparov

Anand

Khalifman

Smyslov

Petrosian

Karpov

Kasimdzhanov

Under current conditions, Alliot feels, this new ranking method cannot immediately replace the Elo system, which is easier to set up and implement. However, increases in computing power will make it possible to extend the new method to an ever-growing pool of players in the near future.

Read the full detailed paper published by Jean-Marc Alliot in the ICGA Journal, Volume 39 -1, April 2017. Mathematically proficient readers are welcome to comment on his method and his results.

Advertising

Books, boards, sets: Chess Niggemann

Frederic Friedel Editor-in-Chief emeritus of the ChessBase News page. Studied Philosophy and Linguistics at the University of Hamburg and Oxford, graduating with a thesis on speech act theory and moral language. He started a university career but switched to science journalism, producing documentaries for German TV. In 1986 he co-founded ChessBase.

Discuss

Rules for reader comments

benedictralph 4/27/2017 02:22

There appear to be things not uncommon between human players that the research does not account for. For instance, mistakes (human fallibility) that either side has a chance of recovering from. This may lead to unbalanced positions (materially) that could go either way. We see this quite often, even between games at the highest levels. A computer engine simply would not allow for this sort of thing. Those positions and games are pruned out of existence in terms of what constitutes the "correct" or "best" move in any given situation. Basically, the research fails to take into account the "art" of human play, for lack of a better word. Just replace Stockfish with Carlsen (and remove Carlsen from the list of players analyzed). How viable does the research look now? Consider then that Stockfish isn't even human yet is being used as a benchmark of sorts for humans. Again, how viable does the research look now? I fail to see the point of this kind of study, sorry to say.

WildKid 4/27/2017 01:17

I have one further statistical criticism. The authors show that their measure 'predicts' World Championship results better than ELO, and imply that this shows their measure is better than ELO. This is fallacious, since they are retrofitting the very data from which their model is derived.

In general, if we have a large dataset D and a subset S, a reasonable measure based on S will almost always retrofit S better than a reasonable model based on D. For example, a theory T(post) derived to fit the data after an experiment will almost always fit the data better than a theory T(prior) devised before the experiment. That does NOT mean that T(post) is a better theory than T(prior). This fallacy comes up all the time in Evidence-Based Medicine, among other fields.

To make their inference valid, the authors would need to compare to an ELO-like measure based only on the set of games they are using, rather than based on all ELO-valid games.

KevinC 4/27/2017 12:55

Or often second-best moves are the clearest way to win for a human.

anamanam 4/27/2017 12:04

Statistics is a fantastic science.

1. "it is now possible to predict the outcome of a match between any World Champion from any active year with any other Champion
taken in any active year; it is even possible to predict the result of Fischer 1970 against Fischer 1971."
Looking forward to that, too, not only best-year base ranking.

2. "for each player, the “best year” was found by searching for the year where the player had the largest number of victories against all other players and all other years".
How different would it be if the "best year" was defined based on the highest conformance, i.e. the year of best match between actual vs. computer moves?

satman 4/27/2017 11:53

So Khalifman at his best would have had the edge on both Petrosian and Karpov.
You learn something every day!

nimzobob 4/27/2017 11:44

Are there some factors that are not being considered here. Sub-optimal moves can be played for a reason - reduce counter-play in a winning position, to complicate the game, to give better winning chances, to keep the game alive, etc.

What is the margin of error here? When the margins are very small it is easy to lie with statistics :-)

WildKid 4/27/2017 11:36

I have read the whole paper with some attention. On the whole it seems to be a good piece of work, but I do have the following suggestions for improvement.

A) The methodology of measuring the mean difference between chosen and optimal moves is biased in favor of 'safe' players who stick to balanced lines, and thus rarely make big mistakes. It would evaluate Petrosian as a much better player than Tal, for example. Tal would lost points for 'unsound' sacrifices where the correct defense is hard to see, and also for possibly suboptimal play in highly tactical situations that still may result in a won game. The algorithm could be improved by giving credit to players when their OPPONENTS frequently make mistakes (presumably because they are in positions that are very difficult for a human to evaluate for either side, but that some players such as Tal and Mamedyarov(?) prefer.)

B) Another improvement would allow e.g. Stockfish to retrospectively re-evaluate a position where evaluation of subsequent moves proves the initial engine valuation to be mistaken. For example, there is a very famous game where Kasparov, as White, sacrificed a knight, and engines ruled the sacrifice unsound. However, three or four moves later, the engines evaluate the position as good for White, without being able to identify any error in Black's play. The algorithm as written would punish Kasparov for disagreeing with the engine, rather than reward him for being smarter than it!

That said, I think this paper is a good start.

News

Fritztrainer in App Store

for iPads and iPhones

	Ca	Kr	Fi	Ka	An	Kh	Sm	Pe	Kp	Ks
Carlsen		52	54	54	57	58	57	58	56	60
Kramnik	49		52	52	55	56	56	57	55	59
Fischer	47	49		51	53	57	56	57	56	59
Kasparov	47	49	50		53	54	54	54	53	57
Anand	44	46	48	48		54	52	53	53	57
Khalifman	43	45	44	47	47		50	51	52	53
Smyslov	43	45	45	47	49	51		50	51	53
Petrosian	43	44	45	47	49	50	51		52	53
Karpov	44	46	45	48	48	49	50	49		51
Kasimdzhanov	41	43	42	45	45	48	48	48	50

SHOP

SHOP

Ranking chess players according to the quality of their moves

ONLINE SHOP

The evergreen Philidor

Artificial Intelligence evaluates chess champions

Discuss

Opening Encyclopaedia 2024

The surprising Meadow Hay Gambit 1.a4 e5 2.Ra3

Dominate the Scandinavian Defence: Expert Strategies for White in 60 Minutes

Dominate the Pirc/Modern Defence: Expert Strategies for White

ChessBase Magazine 219

Rock Solid with the Queen's Indian Defence

Queen's Indian Powerbase 2024

Queen's Indian Powerbook 2024

Fritztrainer in App Store

Pop-up for detailed settings