Historical Chess Ratings – dynamically presented

by Frederic Friedel
4/8/2016 – There is always a certain fascination with comparisons between strengths of chess players. Who was the greatest of them all? Was Rubinstein in his day the equal of Korchnoi in his? Rod Edwards, Professor of Mathematics, has developed a rating theory based on a network of interactions between chess players over time. His results are remarkable, and a presentation made by youthful California programmer Cary Huang is truly mesmerizing.

Rod Edwards historical chess ratings

The Edo Historical Chess Rating system is a novel approach to the retroactive rating of chess players over time. Ratings over the whole period are calculated simultaneously by an iterative method (Bradley-Terry). Similar iterative methods have been used before (by Elo, for example, to initialize the FIDE ratings in 1970), but in a static way, to estimate playing strengths at a particular time.

The Edo system uses the simple (but new when first developed, in 2004) idea of treating a single player in different years as separate players, and positing hypothetical drawn self-games between the same player in consecutive years. For example, Staunton-1845 is considered to have played a 30-game match with Staunton-1844 with a resulting score of 15-15. This keeps each player's rating from changing too dramatically in response to every year's performance, which is subject to random variations. Then, an adjustment is made to account for a known underlying distribution of playing strengths of all chess players in general - very high or very low ratings being inherently less likely than average ones. This adjustment decreases the effect of anomalous scores for players with very few game results.

More about the Edo historical rating system

Before we present the video that so fascinated us, here are some caveats that Rod Edwards expressed:

The video is misleading in presenting the early players (pre-1840 or 1850, say) by their ranking, as if I were claiming that this were based on complete information. I try to make it clear on my website that, because I have definite results for only a few early players, many strong players are simply missing in those years, and the ratings that are given are very uncertain (large variance). So I would certainly not claim that Alexandre, for example, was the best player in the world in 1818, only that he has the highest very tentative Edo rating of the few players I can rate in that year. Deschapelles was certainly stronger, and so probably was Petrov, despite his appearing lower on the list that year. So it is clear that the early rating lists especially are very tentative, and very incomplete.

Note also that my ratings are continually updated as I find new information. In my next update for example, Kolisch's rating curve will change and, even more dramatically, John Cochrane's, based on new information I've just found.

Here, without further ado, is the video presentation that fascinated us:

To get the proper effect switch to full screen and lean back to enjoy the six-minute presentation

And this is a five-minute presentation of the world's best Go players (including AlphaGo)

We have not (yet) been able to track down the authors of the above animations. The name given is "Abacaba" and the profiles in YouTube and Facebook are "We make videos about data and math."

We assume they are the Huang twins Michael and Cary, pictured on the right at the age of fourteen when, in 2012, they made this truly incredible Scale Of The Universe video, which shows the size of things from the entire uiniverse to quantum foam. Not just the video, also a "Scale of the Universe" animation in which you can actively scroll to get from one end of the range to the other. They spent a year and a half making it. They have also made tons of Flash videos.

Addendum: In the meantime Michael Huang contacted us. "Abacaba is a YouTube channel of my twin brother Cary Huang and me," he wrote, "although the ratings video was made by Cary alone. We are two teenagers from California making these videos in our free time. Cary wrote the code for the whole video (including the graphs and labels) in Processing, a programming language for making visualizations. Since the three sources (linked in the description of the video) each span a short period of time, he had to stitch them together, crossfading during the periods where they overlapped."

Rod Edwards is a Professor of Mathematics at the University of Victoria, Canada. While his academic research mainly deals with the theory of network dynamics in biological contexts, he has always had a fascination for chess and its history. Combining these two interests led him to develop a rating theory based on a network of interactions between chess players over time, and, as a spare-time project, he has been collecting historical data on chess players and results of their contests in the 19th and early 20th centuries. The result is the Edo Historical Chess Ratings project, continually being updated at http://www.edochess.ca.

Topics Ratings

Editor-in-Chief of the ChessBase News Page. Studied Philosophy and Linguistics at the University of Hamburg and Oxford, graduating with a thesis on speech act theory and moral language. He started a university career but switched to science journalism, producing documentaries for German TV. In 1986 he co-founded ChessBase.
Feedback and mail to our news service Please use this account if you want to contribute to or comment on our news page service



Discuss

Rules for reader comments

 
 

Not registered yet? Register

algorithmy algorithmy 4/8/2016 11:38
The latter part of the video shows that Carlsen is the true successor of the older generation that starts with Murphy up to Kasparov. Carlsen, also, had this distinguished leap over the others. He is truly a classical world champion!

but the most fascinating part is how it shows the great duels for the top through ages; Steinitz vs Zukertort, lasker vs Steinitz, Lasker vs Capablanca, Alekhine vs Capablanca, .......up to the great duel between Karpov and Kasparov. I hope we will see another duel between Carlsen and another challenger instead of these one sided battles!

also remarkable for this video is how it shows the rise and the downfall of the great players through ages. really, really fascinating video!
x_ileon@yahoo.co.uk x_ileon@yahoo.co.uk 4/8/2016 12:20
A lovely diversion! Watched it several times, with frequent pauses
Mr TambourineMan Mr TambourineMan 4/8/2016 12:39
They let Morphy play matches against himself until he died. Fischer however was not allowed to play games with himself even thou hi probably would have checkmated the other guy. Kasparov got eliminated. Maybe they thougt hi was gunned down in some sort of russian defence?
Rational Rational 4/8/2016 01:36
Like the animation, though put other music on in the background. Some fascinating moments by not so well known platters are for me: Blackurne nearly catching Steinitz, Maroczy nearly overtaking Lasker. Then the interesting dance between Lasker and Capablanca with Lasker coming back to top place after New York 1924.
I guess the earlier ratings are based on far fewer games than what the modern players get through. It's fascinating how there keeps on being one player who separates themselves from the more closely matched following pack
KevinC KevinC 4/8/2016 02:15
One thing I hate about these historical comparisons is that they never really take into account that we now know more about chess. With the tools today like ChessBase, engines, and Internet-based chess clubs (and access to top GMs at a click), players are A LOT stronger on average than they were even two decades ago.
yesenadam yesenadam 4/8/2016 02:32
The graph shows Fischer's rating declining sharply and smoothly for a few years after 72, when he in fact had stopped playing. I wonder how much else of the graph contains bits of fictional nonsense like that?
flachspieler flachspieler 4/8/2016 05:13
In total a nice piece of work.

Three proposals for improvement:
(i) There is a bit too much information in the diagram. I would prefer
to have first names deleted (so, better only family names and age; maybe
even also age deleted)
(ii) Morphy should be taken out from the moment on when he stopped playing
(iii) instead of the "plastic music" I would prefer in the background something
more smooth, for instance "Air for the G string".
flachspieler flachspieler 4/8/2016 06:07
Just seen: There exists a similar video from the Go world.
The visualisation in that video is much better (only the
music is not my taste):
https://www.youtube.com/watch?v=oRvlyEpOQ-8
Aighearach Aighearach 4/8/2016 06:47
The animation is great, the concept is great, the fake ratings are truly absurd and really take a lot away from it IMO. I don't mean the historical players who didn't have modern ratings; I mean the recent-history players who did have ratings, but who are graphed with fake ratings instead.

Play has improved over time, that doesn't mean that the players of history were as good as today; it means they were NOT as good as today. Seems... existentially unavoidable to me.

I'd rather see it with real ratings, including historical estimated ratings based on current views of actual play strength, not just relative ratings that presume that whoever was the top player in an era was 2800. That would improve the progression over time, but still maintain the primary entertainment value, which is watching the players of the same era rise and fall relative to each other.

@yesenadam it also shows him over 2800. Way over. Better than Carlsen, even. And yet, his max rating was 2785.
jajalamapratapri jajalamapratapri 4/8/2016 08:13
Cool. I look forward to the results from all the commentators here that know how to do this so much better, lol.
Chvsanchez Chvsanchez 4/8/2016 08:33
The issue with these back-rankings is that they are non historical. For example, according to them we now know that Najdorf was the second best player in strength. However, the people in that time (late 40's ) did not see it this way: at most he was the best player in South America.
Rational Rational 4/8/2016 09:45
The ratings are not meant to be Elo ratings it is Edo rating and uses a different methodology that's why e.g. Fischer is over 2800 at one point. Earlier Chessbase webpages covered how different rating methods might have stronger predictive powers than Elo. It is not clear to me anyway that modern chess players are stronger than the old chess players at the very top. See how Philidor showed how to win with Rook and Bishop vs Rook while Caruana and Svidler could not master his methods in the recent candidates tournament despite having already seen Philidor's method and have the benefit of about 250 years of examples and computer table bases.
The main point of the animation is to dynamically see the relative changes in rating at particular times rather than compare different eras.
jarreth22 jarreth22 4/8/2016 10:48
Thank you guys for posting it, very entertaining, quite an incredible sensation to get across all these years in few minutes! The raise and fall of great powers has always fascinated human beings!
James Satrapa James Satrapa 4/9/2016 05:08
Measured as a function of accuracy as quantified by engines, there is no doubt modern players are better players than historical players, but this means little in and of itself as modern players stand on the shoulders of giants. It's like saying Hawking was better than Newton - the comparison is ridiculous.

One thing the video shows is the relative dominance of the players. Truly, representations of Morphy and Fischer's relative ratings should have stopped when they stopped playing, but otherwise a player's greatness is to be measure as much by innovation and dominance as the accuracy of their play, which is a product of the evolution of theory (carbon and silicon), as much as if not more than ability alone.

For all its flaws, it's a delightful video.
A7fecd1676b88 A7fecd1676b88 4/9/2016 06:12
Garbage in, garbage out and that program is producing pure garbage, but that clearly does not prevent it from getting published.

Simply put, if you could accurately determine who is better from an algorithm, you would not need chess matches, now would you.

Further, as somebody alluded to with Hawking and Newton: The average beginning University physics student is a better physicist than Newton ever was, because that student knows calculus and much of classical physics and some modern physics. But none of these students could have formulated Newton's laws and derived their consequences by themselves without any examples to follow, and then write the Principia. A similar statement may be made but with Einstein and any beginning graduate student in physics.

Fischer was very clear on this idea. You can see this in YouTube videos of Fisher talking about chess talent. Sure modern masters might know more than say Morphy, but if you measure who was the best using talent as the measure, then Morphy would maybe rank the highest of anybody, except Fischer maybe Please note, the algorithm does not measure talent. What it actually measures? LOL, nobody knows.


The_Tenant The_Tenant 4/9/2016 08:05
Interesting data. Seems to correlate well with what the computers have said previously about Capablanca and Fischer as being the strongest players in history (see Guid and Bratko). Will be interesting to see what Ai's in the future think of historical player strength. Probably wont be that much different to what we see here.
Hawkman Hawkman 4/9/2016 09:38
Based on ratings inflation, Carlsen would have been #3 in January 1984.
vladivaclav vladivaclav 4/9/2016 10:19
Something's fishy. Alekhine defeated Capablanca in a long match while Capa's had +100 ELO points advantage...
James Satrapa James Satrapa 4/9/2016 11:25
"What it actually measures? LOL, nobody knows. "

Maybe relative dominance over contemporaries based on results as measured via a consistently applied if arbitrary rating system, in this case Edo rather than Elo, applied through time to maintain historical continuity?
lajosarpad lajosarpad 4/9/2016 01:44
@Rational, I am sure these guys could beat an in-form Phillidor anytime, blindfoldly and with rapid time control, while Phillidor could see the board and take his time. They know so much more about opening theory, they had studied a lot of endgame types unkown in Phillidor's time, they know a lot more about positional play than Phillidor and they had the opportunity to play a lot on the internet against various opposition, while Phillidor could only play when meeting personally with a guy. When it comes to talent, it is difficult to tell whether Caruana, for instance is more or less talented than Phillidor, but on the board he would crush Phillidor without any problems.

You compare apples with blackberries. They have not played well an endgame in a real game, under pressure, yet, you consider them to be weaker than Phillidor, because he has studied that endgame type and found the solution calmly, taking his time.
A7fecd1676b88 A7fecd1676b88 4/9/2016 04:41
@James Satrapa - "Maybe relative dominance over contemporaries based on results as measured via a consistently applied if arbitrary rating system, in this case Edo rather than Elo, applied through time to maintain historical continuity? "

Thank you for illustrating my point. You have no idea what it measures, you only know what it claims to measure. They are in fact wildly different things.

However, to make it concrete for you, it is sufficient to consider the case of Najdorf. Najdorf was never considered equal to the Soviet GMs, yet the algorithm would have us believe otherwise, it would appear. When you write an algorithm, you should at least test it against well known cases. If I want to critique an algorithm, I can either spend the time to read and understand it, or I can just look at its output.

You also suggest "measured via a consistently applied if arbitrary rating system". The rating system is not consistent. If you vary the pool of players, or have isolated pools, then the ratings will evolve differently. Not consistent at all.
for example, if you travel across the country to play in tournaments, then you notice that a player in an isolated region, say Washington or Oregon, with a rating of 2000 is not the same as a player in New York with a rating of 2000. That is just one example off inconsistency due to geography. Others have mentioned the well known problem of rating inflation. There are now over 1000 GMs. Dime a dozen. Originally, there were just 5.
chessbibliophile chessbibliophile 4/9/2016 05:54
On great masters past and present:
"If I have seen further than others, it is by standing upon the shoulders of giants." - Isaac Newton
jrdls jrdls 4/9/2016 07:58
If I got the list right, the top 5 players of all time by peak rating are:

1. Fischer 2893 (nov 1971)
2. Botvinnik 2882 (nov 1945)
2. Kasparov 2882 (mar 1990, apr 1993)
4. Carlsen 2878 (may 2014)
5. Capablanca 2870 (oct 1919)

Of all 5, the one that stands out to me is Botvinnik. I know he was a very good champion but he didn't come across as such a strong player.
James Satrapa James Satrapa 4/9/2016 11:01
"Najdorf was never considered equal to the Soviet GMs"

That sounds to me like a subjective judgment, not something against which you can satisfactorily test algorithms which need to be tested against something a bit more objective, like actual game results or measurements of accuracy of moves.

"You also suggest "measured via a consistently applied if arbitrary rating system". If you vary the pool of players, or have isolated pools, then the ratings will evolve differently. Not consistent at all. "

Geographically inconsistent pools of ratings is a new one to me, especially in the modern day of Elo ratings. It makes notional sense, but something I'd be skeptical about until there's more info. There is the well know case of the Burmese FMs that kept themselves isolated for the purposes of bumping their ratings to GM levels, but that was probably a rort.

My full quote was "measured via a consistently applied if arbitrary rating system...applied through time to maintain historical continuity?" Temporal separations are also tricky, but the key words here are"maintain historical continuity", indicating players dropping off the continuum due to retirement or death, while others join the fray as they come of age and become competitive as plainly indicated during the progress of the graphic.

Back in the day, there were very few really good players by modern standards. They tended to play each other much like the top group today frequently each other in invitationals, and as they aged, a player would drop out and another appear. This process is what I meant by maintaining historical continuity as the evolving pool of top players gradually morphs over time constantly maintaining links with the past, just as the Elo rating system does today.

How accurate these measurements are is another factor altogether, as the graph would have had to rely on results rather than any rigorous measurements of accuracy and that could distort the results to a significant extent. But it seems clear what the graph is actually trying to measure.

"Others have mentioned the well known problem of rating inflation. There are now over 1000 GMs. Dime a dozen. Originally, there were just 5. "

The five players that Czar Nicholas somewhat arbitarily designated as "grandmasters" back in 1914 is still matter of historical debate, one that is frequently referred to with some scorn as being unlikely. Of the modern group of players understood to be officially designated grandmasters, there were 27 named by FIDE in 1950 as the first official grandmasters. As you noted there are now over a thousand, and the reason this may have devalued to title is due much less to rating inflation than to FIDE consistently making the qualification criteria easier to achieve - this is a whole different and interesting, if occasionally tawdry, history about which there is insufficient space to go into, but it has little to do with rating inflation.
A7fecd1676b88 A7fecd1676b88 4/10/2016 05:55
"That sounds to me like a subjective judgment" -- And yet you will not be able to find one strong GM contemporary of Najdorf who thought he was equal to the Soviets. When the Soviets and Najdorf competed in a tournament, the thinking was never "I wonder if the Soviets can be beat Najdorf".

Looking at the life results of Najdorf in the book "Najdorf:Life and Games", we see the best he ever did against a Soviet GM in a match was a draw, against Tal in 1970. He lost to Bronstein in a match in 1954. Against US players, he lost matches to Reshevsky (twice!) and had a draw against Fine. Najdorf NEVER WON A MATCH AGAINST A TOP GM. Yet the algorithm has him at number 2 !? That would be funny if it wasn't so sad.

"algorithms which need to be tested against something a bit more objective, like actual game results or measurements of accuracy of moves. " -- Game results, yes. Move accuracy, NO.
This idea, no doubt put forward by the people who write these algorithms, is that move accuracy is important. It is not.
Garbage in, garbage out....and the idea of accurate moves is pure garbage.

The master does not look for the accurate move, he looks for the move that works. For example, if a master can win a piece against Kasparov, but has to walk a fine line of only moves for say 10 moves in a complicated position ( which a computer can easily do), or the master can win a pawn against Kasparov, and the resulting position is simple with no complications, then the master will take the pragmatic approach and win the pawn, not the piece, even though the computer would say winning the piece is more accurate.
Tal's play up to his winning of the World Championship was filled with non-accurate moves, yet he won. He won because he was not playing a computer, but humans who could not find their way in the complications that his non-accurate moves created. Accurate moves you say? LOL.

Nothing you have stated has supported your claim the rating system is consistent. It has always had problems which is why the USCF, for example, has continually had to modify it's rating formula.

"and as they aged, a player would drop out and another appear. This process is what I meant by maintaining historical continuity as the evolving pool of top players gradually morphs over time constantly maintaining links with the past, just as the Elo rating system does today." ------ Wow. You evidently don't understand how the process of old players leaving and young players entering has historically changed the rating pool, at least in the USCF. It has been a big problem in the USCF, so much that they had to change their formula to address the distortions in the pool it causes (remembering what we would like to have is a rough approximation to a normal distribution). When an old strong player drops out, he TAKES HIS HIGHER RATING WITH HIM. The younger players then don't have the chance to play against him and win his points. This process results in rating deflation. There is no historical continuity, except in your imagination of how you think ratings should work.

Again, the algorithm may be measuring something, but what it is nobody actually knows.
James Satrapa James Satrapa 4/10/2016 11:54
Yep. Subjective. Only a small minority of games are played in matches. Most are played in tournaments, and the results of all of a players' games are presumably counted, not just match results.

Move accuracy - yes. In any individual game, what you say may apply, but the best players overall are the ones that make the least and smallest mistakes. In other words, the most accurate players are the game winners.

"Tal's play up to his winning of the World Championship was filled with non-accurate moves, yet he won." True, but his style rattled Botvinnik who ended up making even more non-accurate moves.

"Nothing you have stated has supported your claim the rating system is consistent. It has always had problems which is why the USCF, for example, has continually had to modify it's rating formula. "

Not talking about USCF, which is different from Elo.

"he TAKES HIS HIGHER RATING WITH HIM. The younger players then don't have the chance to play against him and win his points. "

Of course not, they replace him in the pool by which time the new players rating is all ready at a high level to be able to enter that top pool, maybe even at the same level of the retiring player who has probably declined in strength.

A7fecd1676b88 A7fecd1676b88 4/10/2016 07:39
@James Satrapa -- There is no other way to say it, but you simply don't know what you are talking about, whether it is ratings, player strength, or algorithms. Good luck.
James Satrapa James Satrapa 4/10/2016 10:55
Ditto, friend.
yesenadam yesenadam 4/11/2016 07:22
A7fecd1676b88: Your condescension and repeatedly resorting to telling the other guy he doesn't know what he's talking about, reminds me strongly of a certain recent Chessbase author. That's not a good sign.
imdvb_8793 imdvb_8793 4/14/2016 07:20
"One thing I hate about these historical comparisons is that they never really take into account that we now know more about chess. With the tools today like ChessBase, engines, and Internet-based chess clubs (and access to top GMs at a click), players are A LOT stronger on average than they were even two decades ago."

Yes, but how well would those old masters have done with today's resources available to them? I shudder to think what kind of preparation Bobby Fischer, with his ambition and work ethic, would have come up with, had he had databases and engines at his disposal... And how seldom (if ever), with his calculating ability, he would have made any kind of mistakes, had he known all of the typical positions modern grandmasters know, and had all of the extra opening/endgame knowledge they do. He would have been even better than Kasparov, in my humble opinion. And Capablanca would have been a much, much stronger version of Carlsen. MUCH stronger...

"it also shows him over 2800. Way over. Better than Carlsen, even. And yet, his max rating was 2785."

Guess we don't believe in inflation, for some reason... (Despite the fact that it's mathematically incontestable. If there was no inflation, the average rating at the top, and not only, quite simply would have stayed the same - there's no way around it -, not gone up by 150-odd points, because the ELO system doesn't in any way measure the differences between generations or the progress in playing skill - or measure playing skill at all, in fact, other than by comparison with one's opponents, and the current rating pool -, but only how contemporary players compare to each other. Therefore, inflation is NOT a result of a better standard of play. Not in any mathematical way, that's for sure.)

"Based on ratings inflation, Carlsen would have been #3 in January 1984."

Exactly.

@A7fecd1676b88 - I suggest you switch to some place that doesn't discuss an article about math-based ratings, instead, if you think they're garbage and have no interest in counter-arguments! You're not telling us anything revolutionary or blowing our minds in any way, you know... So what's the value of your comments? Or the reason for your even being here in the first place - is it just in order to be a dick to people without facing repercussions, taking advantage of the safety of the online world? Looks like it to me.
Jan Boot Jan Boot 4/17/2016 09:02
Reeds
1