Review of "Computer Analysis of World Chess Champions"
by Matej Guid and Ivan Bratko, published in ICGA Journal, Vol 29, No.
2, June 2006, pages 65-73, republished by ChessBase.com.
By Dr. Søren Riis, Oxford, UK
In the paper the authors present a method that is claimed to identify who
is the best chess player of all times. The basic idea is to compare the moves
played by world champions with the evaluation of those moves given by a strong
computer chess program.
If we are to believe the authors, it is possible to determine a player’s
strength by having a version of Crafty (one that always looks only 12 plies)
judge the quality of the champions’ moves. The quality of a move is calculated
by how many pawns (calculated by the program) the move chosen by the player
is inferior to the move the program judges to be the best.
First, let me note that if we tried to decide which contemporary chess program
is the strongest, based on the authors’ method; we would almost certainly
get some quite absurd results!
There are different versions of Crafty, but none of them has a rating of
more than 2700 on the latest rating lists. The version used by the Authors
is a modified version of Crafty (“amputated”) that for each move
searches a fixed number of moves (6 moves and 6.5 in the endgame) before evaluating
the quality of each move available in the position. The strength of the program
becomes quite unreliable because the horizon effect sets in. Anyway assume
(for the sake of the argument) that this amputated version of crafty plays
roughly at the same level as the standard version of crafty.
- • Absurdly, on top of the list we would (by definition!) have the
"amputated" version of Crafty itself (used by the authors).
- • Almost as absurdly, we would expect that the standard version of
crafty would also be on top.
On the other hand, some top programs (especially when run on fast 4 CPU machines)
are much stronger than Crafty, and would almost literally shred Crafty to pieces.
Yet, essentially the stronger a program is, the less it is likely to behave
Thus to put it in a somewhat simplified way, Crafty would have a tendency
to rank all engines rated above 2700 in reverse order, with the weakest at
the top of the list, and the strongest engines appearing further down.
But, maybe the method makes sense when testing former world champions? No!
What the authors are testing, is simply which of the world champions played
chess most in the style of the "amputated" version of Crafty. Capablanca
played quite simple chess where the way to make progress apparently is within
reach of Crafty. On the other hand Kasparov played numerous games that are
well above the grasp of Crafty. It is worth noticing that quite frequently,
engines of the level of Crafty (but also much stronger engines) misjudge positions
and moves considerably. Most news groups on Computer Chess are full of such
examples. The frequency of computers misjudging moves and positions varies
with the type of position etc. However, there is no doubt that some players
play chess that is simply too deep to be fully appreciated by an engine at
In fact many pretty standard moves are completely missed by Crafty at search
depth 12. Crafty penalizes Fisher’s Rxh5! against Larsen (played in Portoroz)
by 0.41 pawns. Crafty at depth 12 thinks Bxg7 is the better move, while in
fact only Fisher’s move leads to a clear win. Kasparov’s Bh6! against
Short in Zurich 2001 is crushing, and might be the only winning move. Yet Crafty
searching at depth 12 penalises Kasparov’s brilliant concept with more
than two pawns. In fact Crafty does not have Bh6 among any of the 20 best moves!!
It is not unlikely the Kasparov based part of his attack on the possibility
of Bh6, and saw this move even earlier in the game. This is utterly beyond
what Crafty can handle.
In fact, my conclusion is (based on more examples) that in general Crafty
completely fails to understand the depth of Kasparov’s play. Capablanca
plays chess that Crafty apparently finds easier to appreciate even though Crafty
occasionally also punishes Capablanca unfairly (though I did not find any examples
where Crafty completely fail to understand a move by Capablanca).
Now suppose we do the same test with significantly stronger engines. Would
the proposed method then make sense? I'm afraid this would not give very good
results either. First of all it would favor safe positional players to wild
attacking players – I suppose that Capablanca would still look much better
than Tal. In fact Tal might look like a hopeless patzer who was lucky to be
playing against other patzers.
But, suppose we really just want to be "objective" without any preconception
and simply ask who of the world’s champions plays the most perfect chess.
What is so wrong with taking the strongest programs – assuming that they
are significantly better than Crafty – and ask for their opinion? To
pinpoint the problem I will look at the issue from a somewhat theoretical perspective.
Objectively, each chess position is either won for white, drawn or won for
black. For won positions the quality of a move can be judged on how much closer
it moves the position to mate. The best achievable is +1 (i.e. one move closer).
From this abstract perspective a move of minus 10 is a "mistake"
in the sense that it changes the position to a position where there are 10
more moves to the mate. I will call such mistakes "harmless" mistakes.
A much more serious type of mistake occurs if the player makes a move that
converts a won position to a drawn or lost position. From this highly abstract
view let us call a move that changes a won position (drawn position) to a drawn
position (lost position) a "serious mistake", and a move that changes
a won position to a lost position a "double serious mistake".
Is there any way we can measure the quality of moves in positions that are
objectively drawn? There is not! This is where psychology
and knowledge of the opponent enter the equation. A move that is best against
one opponent (i.e. most likely in the long term lead the opponent to make a
"mistake" and produce a lost position), might differ from what is
the best move against another opponent. From a purely theoretical (and logical)
perspective, there is no objective measure why one move is better than another
move as long as the position stays balanced (i.e. is objectively drawn). All
moves guaranteeing a draw are equally good against perfect play. But, in a
real game the opponent is not perfect. The task is to produce moves
that maximise the likelihood that the opponent at some stage makes a serious
mistake leading to a lost position. But, what is best way to achieve this depends
to some extent on the opponent and his/hers strengths and weaknesses. Maybe
Capablanca’s way of playing was good enough to get convincing results
in 1920. However, Capablanca’s way of playing balanced positions might
not have worked very well against contemporary masters. In modern
chess, some players find it more important to create complex difficult positions,
rather than positions with a cosmetic advantage that are unlikely to cause
the opponent great difficulties.
There are, of course, some general principles how best to put pressure on
the opponent. Chess is a game of skill with relative clear criteria for good
play, so grandmasters have often pretty similar ways of judging positions.
It is, however, important to realise that the evaluation of balanced chess
positions is to some extent, an art, and that the greatest players (like Kasparov)
to some extent, also take psychological factors and strengths and weaknesses
of the opponent into account when playing.
Chess engines in the future might play on such a high level that all games
essentially result in a draw, and this happens even when one engine is given
much less time than the other engine! Still different engines (though they
in some sense play perfect) might still evaluate balanced positions somewhat
differently. Thus even the future (almost) perfect engines might not agree
on who of the champions were the greatest.
To let Crafty judge who was the greatest Chess World Champion is an insult.
It is like having a tone-deaf judge who was the greatest composer.
Søren Riis is a Computer Scientist at
Queen Mary University of London. He has a PhD in Pure Maths from University
of Oxford. He is Danish but currently living near Oxford. He used to play competitive
chess around 20 years ago (Elo 2300). Riis has been briefly involved with chess
programming, and his interest includes theoretical aspects of computer chess.
The following letter was sent to us independently of Soren Riis's article.
It was in reaction to some of the letters that follow below, and to messages
that were posted on different computer forums.
Computer Analysis of World Chess Champions – answer to some comments
We would like to thank the readers for their interest in our article on computer
analysis of chess champions (ChessBase, 30 October 2006).
We would also like to answer a frequent comment by the readers. The comment
goes like this: “A very interesting study, but it has a flaw in that
program Crafty, whose rating is only about 2620, was used to analyse the performance
of players stronger than this. For this reason the results cannot be useful”.
Some readers speculate further that the program will give better ranking to
players that have a similar rating to the program itself.
These reservations are perhaps based on a straightforward intuition that the
program used must be necessarily stronger than the players analysed. However,
things are not so simple and the intuition seems to be misguided in this case.
A simple math shows, possibly surprisingly, that:
(a) To obtain a sensible ranking of players, it is not necessary to use a
computer that is stronger than the players themselves. There are good chances
to obtain a sensible ranking even using a computer that is weaker than the
(b) The (fallible) computer will not exhibit preference for players of similar
strength to the computer.
These points can be illustrated by a simple example. Let there be three players
and let us assume that it is agreed what is the best move in every position.
Player 1 plays the best move in 90% of positions, player 2 in 80%, and player
3 in 70%. Assume that we do not know these percentages, so we use a computer
program to estimate the players’ performance. Let the program available
for the analysis only play the best move in 70% of the positions. In addition
to the best move in each position, let there be 10 other moves that are inferior
to the best move, but the players occasionally make mistakes and play one of
these moves instead of the best move. For simplicity we take that each of these
moves is equally likely to be chosen by mistake by a player. So player 1 who
plays the best move 90% of the time, will distribute the remaining 10% equally
among these 10 moves, giving 1% chance to each of them. Similarly, player 2
will choose any of the inferior moves in 2% of the cases, etc. We also assume
that mistakes by all the players, including the computer, are probabilistically
In what situations will the computer, in its imperfect judgement, credit a
player for the “best” move? There are two possibilities:
- The player plays the best move, and the computer also believes that this
is the best move;
- The player makes an inferior move, and the computer also confuses this
same inferior move for the best.
By simple probabilistic reasoning we can now work out the computer’s
approximations of the players’ performance based on the computer’s
analysis of a large number of positions. Using the formula given below determines
that the computer will report the estimated percentages of correct moves as
follows: player 1: 63.3%, player 2: 56.6%, and player 3: 49.9%. These values
are quite a bit off the true percentages, but they nevertheless preserve the
correct ranking of the players. The example also illustrates that the computer
did not particularly favour player 3, although that player is of similar strength
as the computer.
The simple example above does not exactly correspond to our method which also
takes into account the cost of mistakes. But it should help to bring home the
point that for sensible analysis we do not necessarily need computers stronger
than human players. This is of course not to say that a stronger program, if
available, would not be more desirable. Also, it should be noted that our method
makes other, more subtle assumptions. Our results should therefore be interpreted
in the light of these assumptions.
Ivan Bratko and Matej Guid
P.S. Formula to compute computer’s estimates:
p’ = p * pc + (1 – p) * (1 – pc) / n
p = probability of the player making the best move
pc = probability of the computer making the best move
p’ = computer’s estimate of player’s accuracy p
n = number of inferior moves in a position
Peter Ballard, Adelaide, Australia
Here's what I don't understand: the author only analysed world championship
matches. Capa squashed an aging Lasker in a 14 game match, but then was outplayed
by Alekhine in a 30+ game marathon. So that equates to a relatively poor record
in WC matches. I'd like to see the results for individual matches. In how many
matches did the loser score a better "quality of play" index than
the winner? If Capa scored better on "quality of play" than Alekhine
in the 1927 match, what does that say about their methodology?
Albert Silver, Rio de Janeiro, Brazil
When the authors write "The basic criterion for evaluating World Champions
was the average difference between moves played and best evaluated moves by
computer analysis", they are basically stating that the moves of the world
champions at tournament time controls are less likely to be correct than Crafty
at 15-30 seconds a move (the rough time taken to reach the depth chosen by
the authors). After all, instead of seeing whether the computer can find the
moves of the champions, as is often the case of test suites, here the champions
have the unenviable burden of having to play like Crafty. So, who do you trust
more on average? Kasparov (or Karpov, Kramnik, etc) at 3 minutes a move, or
Crafty at 15-30 seconds?
The histogram of the average error also implies that that is the edge
Crafty would have over said World Champion. So if Kasparov has an average error
rate over all his moves of 0.1292, this means that in a match where Crafty
is given 12 plies limit to play, Kasparov, with all his ability and positional
judgement, should lose 5-4 even with 6 times more thinking time. Capablanca
being much stronger would only lose 6-5... If the edge is in pawns and not
points, then it means that for every 8 moves on average, Crafty expects to
gain an advantage of one extra pawn over Kasparov. Have they any idea how utterly
absurd that sounds??
Julian Wan, Ann Arbor, USA
Thank you for the very interesting article. It makes several points:
- It shows that raw analysis of how often a player mirrors a computer program's
choice of move is not necessarily "proof" of computer assistance
– in simpler situations and positions, it may actually reflect that
player's great sense and judgement.
- It may open up new avenues of research – note that the games used
were only from the matches – if one were to subject games over a period
of time, it may show objectively a shift in style.
- It shows how style of play is a complex issue – not that Kasparov
who is known for his aggressive style is actually quite close to Karpov who
is often viewed as having a different more positional style.
Mohamed Nisthar, Riyadh Saudi Arabia
Capablanca has been nominated as the best of the champions. But if
I am not mistaken, he was thoroughly defeated in match with one player from
the Indian Subcontinent, Sultan Khan!!! Please check on this and it would be
valuable if an analysis is made of those games.
Editor's note: Sultan Khan was one of a few players
who had a plus record against Capablanca (as well as against Frank Marshall
and Savielly Tartakower). But we only know one game between the two: it was
Sultan Khan's white piece victory over Capablanca at the Hastings tournament
of 1930: 1.Nf3 Nf6 2.d4 b6 3.c4 Bb7 4.Nc3 e6 5.a3 d5 6.cxd5 exd5 7.Bg5 Be7
8.e3 O-O 9.Bd3 Ne4 10.Bf4 Nd7 11.Qc2 f5 12.Nb5 Bd6 13.Nxd6 cxd6 14.h4 Rc8
15.Qb3 Qe7 16.Nd2 Ndf6 17.Nxe4 fxe4 18.Be2 Rc6 19.g4 Rfc8 20.g5 Ne8 21.Bg4
Rc1+ 22.Kd2 R8c2+ 23.Qxc2 Rxc2+ 24.Kxc2 Qc7+ 25.Kd2 Qc4 26.Be2 Qb3 27.Rab1
Kf7 28.Rhc1 Ke7 29.Rc3 Qa4 30.b4 Qd7 31.Rbc1 a6 32.Rg1 Qa4 33.Rgc1 Qd7 34.h5
Kd8 35.R1c2 Qh3 36.Kc1 Qh4 37.Kb2 Qh3 38.Rc1 Qh4 39.R3c2 Qh3 40.a4 Qh4 41.Ka3
Qh3 42.Bg3 Qf5 43.Bh4 g6 44.h6 Qd7 45.b5 a5 46.Bg3 Qf5 47.Bf4 Qh3 48.Kb2
Qg2 49.Kb1 Qh3 50.Ka1 Qg2 51.Kb2 Qh3 52.Rg1 Bc8 53.Rc6 Qh4 54.Rgc1 Bg4 55.Bf1
Qh5 56.Re1 Qh1 57.Rec1 Qh5 58.Kc3 Qh4 59.Bg3 Qxg5 60.Kd2 Qh5 61.Rxb6 Ke7
62.Rb7+ Ke6 63.b6 Nf6 64.Bb5 Qh3 65.Rb8 1-0
Benoit Chamuleau, Istanbul, Turkey
First of all, thank you ChessBase for the very diversified news you offer on
the chess world – since three years or so, I check it out on almost daily
basis! Being fascinated by the studies that attempt to compare the strongest
chessplayers in the world, I feel that one dimension is continuously overlooked:
the fact that chess theory was not as advanced as it is today. If, for example,
the number of times that the best move is played is assessed, does this mean
the best moves as known today or the best moves as theoretically known at the
era of play? Whichever the case is, it means an 'unfair' comparison: players
that played strongly in the past, may be weak in today's competition due, for
example, to the much greater complexity (in terms of number of 'good' moves)
of today's games.
Indeed, people have a certain ceiling in their capacity, which was not as
quickly apparent in the past – due to the relatively lower level, or
simpler play – as it is today. I therefore think it may be interesting
that studies like the one now published on ChessBase, consider the theoretical
knowledge known at the era of play, and assess the players accordingly. Always
should be noted that the capacities of players from different times cannot
objectively be compared: maybe Steinitz did not bad at all against Kramnik,
had he known the theory of today. On the other hand, maybe even you or I could
be World Champ at Steinitz's time!
Tobias Nordquist, Sandviken, Sweden
This computer thing could even be new ground for a new rating system. ChessBase
should develop a algorithm or program that works like the "Analyse game"
in Fritz the difference is that the program should spit out a rating number.
For example Kramnik's error in game 2 should not be punished so much because
the oponent didn't see the error. But Topalovs error in game 9 should be punished.
Why? Because his opponent saw it. Why is this so important? Because the computer
cannot understand the non objective things in chess. Man call it subjective
things and as long as it doesn't get punished its not as wrong as the comuter
says it is. Hope you caught that!
Pavel Dimnik, Toronto, Canada
One of the reasons I love chess is its depth, its amazing ability
to regularly surprise and intrigue. On that note, I feel I must comment on
an important facet of chess that this study does not really touch upon. To
be fair, with regards to this type of mathemetical analysis, perhaps it is
impossible to do so. The facet that was not touched upon is the ability for
great players to create the type of positions they want. Kramnik stymied Kasparov,
Tal always found an explosion in the position, and Capablanca always found
himself in a logical, positional game (to name a few). I do not think that
a quantitative analysis can fully capture this ability, except to act as a
comparison between two players, but even then it would favor the player who
could most force the play into the form that would benefit his own style.
I applaud the effort taken, but for me this study serves to highlight and
remind me of the fact that quantitative analysis can never appreciate the beauty
of chess, or the true genius of its champions. It provides a useful tool of
course, and chess programs are practically stronger than ever, but without
the human 'deau ex machina' to oversee and appreciate what happened on the
board, there would be no chess.
Paul Muljadi, USA
Thank you for sharing the Bratko-Guid paper. While this is a scientific and
worthwhile attempt to identify the best chess player of all time, I think the
the study and paper need much improvement before we can get closer to the truth.
First of all, Capablanca being the best chess player of all time does not surprise
me at all. I've always concluded the same because I think he is the best endgame
player of all time. I think there is strong positive correlation between being
the best endgame player and the best chess player. The paper needs to address
different phases of the game and their best players. Secondly, we need to include
other great chess players who have not been recognized formally as the world
champions, such as Philidor, Morphy, Keres, etc. Chess titles are important,
but some of the great players never had a chance for the world chess championship
titles. And finally, the paper needs to address the psychological, physical,
and other external aspects of chess competition as well. Lasker and Botvinnik
made significant contributions in these areas.
Frank Dixon, Kingston, Canada
On the whole, this is an outstanding piece of work. I want to thank the two
scientists who have put this together, and also to thank ChessBase for making
it available. I want to add my additional opinion that GM Vladimir Kramnik,
who won the reunification match with GM Topalov a few days ago, may be the
first computer-trained World Champion in the history of chess. He was coming
up at a time when computers started to reach GM levels of playing strength,
and when the silicon beasts started to be used extensively for actual training
by top players. This should be taken into account when commenting upon Kramnik's
very low percentage of errors. The two writers have not elaborated on this
important point. Kramnik's early top-level play, starting in 1991 when he was
16 years of age and made his debut at high levels, was highly tactical while
still being strategically sound, for the most part. As he matured, the tactical
nature sharply diminished in favour of the strategical style of play, clear
evidence of the impact of computer training upon his talent. Players before
him, such as Fischer, did not have this opportunity to train with computers.
Fischer displayed outstanding tactical prowess in complex positions throughout
his career (up to and including 1972), tempering it with more strategical insight
as he matured and gained in strength.
Gary Furness MD, Santa Rosa, California, USA
Thanks for that very interesting article. I certainly never thought there would
be so many ways to measure the champions within their own peer group.
David Korn, Seattle, USA
First, hearty thanks to both Matej Guid and Ivan Bratko for their
excellent article attempting to objectively quantify the relative strengths
of the fourteen World Chess Champions. I found this fascinating and read every
line several times carefully, and totally delighted in their straightforward
application of simple but well conceived metrics to the performance of the
champions of chess.
I am wondering, since Mr. Sonas has already so well charted the historic ELO's
of these same players--and others as well, in his ChessMetrics.com site –
as well as in other articles comparing the greatest performances over time
and highlights both Karpov and Kasparov over many years for their cumulative
strength in major tournaments, what insights Mr. Guid and Bratko have as to
reconcile the general perception of often Garry Kasparov as perhaps the greatest
chess player of all time, and at times Karpov, who both won many games, in
super strong tournaments, of long periods of time?
Let me hasten to repeat, in no way do I wish to subtract anything but heartfelt
and gusty applause to their work, and this wonderful article. Many times we
have heard of the depth of Capablanca's play, and how he just seemed to 'know'
the right moves; similarly, this confirms what we all hear about Kramnik, that
he has the deepest understanding of chess. Some say deeper than Kasparov.
Now here is the rub: if as this article says that Capablanca and Kramnik in
both accuracy and blunder rate taken together in more complex positions combined
are leaders, while at the same time other analysis points the way solidly to
Kasparov and Karpov, then can we not say that these two may not 100% have had
the most absolute accurate play, but in will power, guile, determination, combativeness,
tenacity, will to win, etc., that they led the world for so long due to factors
external to, or not to accuracy alone, if you follow? Not to split hairs.
But Kramnik must not have played nearly as many games as the two other K's,
Karpov and Kasparov. Similarly, Capablanca did indeed loose few games, but
he also did not play that many games just as Fischer did not, in a relative
sense to other world chess champions?
So all agreed as Mr. Guid and Bratko suggest; but also Kasparov and Karpov
won for so long that their results indicate supremacy for reasons including
but not limited to accuracy and blunder reduction. Also, there is something
to be said for knowing when to play the absolute best moves, or when
to make the deepest calculations.
This seems to relate this discussion not just to raw calculation, but to physiological
endurance, and a sense of emotional economy, while at other times hints at
the will to win, which again is the emotional or physical resolve to try to
win every game, as distinct from accuracy alone, thus again bespeaking of Garry
Kasparov’s and Robert J. Fischer’s ell known command and desire
to win every game.
Matt, Goddard, Atlanta, USA
It's nice to see an engine-based evaluation of historic player strength,
but the study has made at least one dubious move, and at least one blunder.
On the dubious side, we have the mechanism for adjusting strengths based on
a subset of complicated positions, to account for playing style, when in reality
that playing style itself may be the greater factor in a player's strength.
It's understandable that we'd want to measure a player's ability to handle
complications, especially since these are thing we can most accurately assess
through an engine, but there is no correlating measure of a player's ability
to avoid (or create) complications.
The blunder is that the study doesn't take into account opening theory. Nowadays,
Kramnik plays many more moves that are prepared by theory and confirmed by
computer engines; the assessment of his games is going to show a higher percentage
of "perfect" moves in the opening than an assessment of Steinitz's
games, and as a result, players scores will be inflated over time. It may be
argued that a knowledge of opening theory is part and parcel of a player's
strength, but that would run counter to the intent of comparing players across
historic periods in the first place. Instead, it would seem that the study
would need to find some line of demarcation in the games, between opening theory
and over-the-board play. Simply looking at the games from, say, move 20 is
problematic, since the depth of (accurate) opening theory has increased over
time, and also because such hedging would provide a greater sample for positional
players, who tend to have longer games. An idea to base the demarcation on
the first dubious move would also have inherent problems: King's Gambit theory
would all be sampled, for instance, and also there may be an imbalance imposed
when playing strength is measured after one side is already at a disadvantage.
Using the computer's opening books as a demarcation also fails: theory to a
modern player may well be over-the-board play for the historic; also, the entirety
of a Kramnik's preparation is not contained in the files. So, it's not an easy
problem to overcome.