8/24/2003 – The SSDF list ranks chess playing programs on the basis of 90,000 games. But these are games the computers played against each other. How does that correlate to playing strength against human beings? Statistician Jeff Sonas uses a number of recent tournaments to evaluate the true strength of the programs.

How strong are the top chess programs?

In recent months I have been devoting considerable effort to gathering and analyzing data about computer chess programs, with the assistance of David Levy. Later on, I will be publishing my results on the ICGA and ChessBase websites, but in view of the recent performances by Shredder in Argentina and Brutus in Germany, I would like to give a quick evaluation while our memory of those tournaments is still fresh.

In July I would have said that there was one computer program at 2800 strength (Junior), three programs at 2700-2740 strength (Shredder, The King, and Fritz), and five more (Brutus, Rebel, Chess Tiger, Hiarcs, and Yace) in the 2630-2690 range. In the past couple of weeks, Shredder had identical 8.5/10 scores in the GM and IM divisions of a tournament in Argentina, while Brutus just finished off an impressive 9/11 score at the Lippstadt GM tournament. How do those results change my view of the top programs?

Well, Shredder's 8.5/10 score in the GM tournament was certainly very impressive. With an average opposition rating of 2445, that's about a 2740 performance rating. However, you have to balance that against the IM event at the same location. Against opposition averaging 120 points lower, Shredder still gave up 3 draws to reach the same 8.5/10 score. Any way you look at it, that's a performance rating 120 points lower (2620) in that event.

It seems weird to talk about an 85% score being at all disappointing, but that's just how it goes when you are playing against such "weaker" opposition. Based on its pre-match rating, Shredder should have scored at least +9 if not a perfect 10 in the weaker event. My revised estimates show Shredder, The King, and Fritz all together in the 2700-2720 range. Here is a summary of Shredder's tournament results over the past 20 months:

I should point out that the math gets a little unclear when you are a 300+ rating point favorite. Shredder would have to play in a much stronger human tournament in order for us to really make much of a claim as to whether its strength is 2700 or 2750 or 2800. Hopefully it will get that opportunity very soon!

Also, there is some very interesting evidence I've uncovered, suggesting two things:

  1. The strongest computers do progressively worse than expected against humans when the computer has a rating advantage of more than 100 points.
  2. The strongest computers do better than expected against 2600+ players, and this is increasingly true against progressively stronger humans.

I'm still in the research phase, but that effect certainly could be exaggerating the perceived gap between Junior and Shredder, since Junior's human opponents, on average, have been almost 300 points stronger than Shredder's (over the past year and a half).

Let's move on to Brutus. Its performance at Lippstadt was very impressive and also very informative. Before the tournament, I had zero games in my database between Brutus and FIDE-rated humans. This made my #5 ranking of Brutus somewhat suspect. My analysis has shown that games against humans are twice as significant as games against computers, when you are trying to figure out how strong a computer is. Thus we were really missing two-thirds of the story about Brutus because we didn't know how it does against humans.

Well, now we have a much better idea of the overall strength of Brutus. Despite a pretournament expectation of a +3 or +4 score, Brutus managed a +7 score (82%). That's a lower percentage than Shredder managed, but remember that Brutus faced a much stronger field than either of the Argentina events; the tournament field at Lippstadt was just short of a 2500 average. It works out to a performance rating of about 2765, clearly the best result of its career:

Of course, there is a big difference between a performance rating and an actual rating. For example, nobody was saying that Junior was suddenly 2950-strength, just because it scored 3.5/4 in a match last year against Mikhail Gurevich. Nevertheless, 11 games is a pretty good sample. My computer rankings place a lot of emphasis on very recent results, and a lot of emphasis on games against humans. These results suggest that Brutus now deserves to be ranked #2 in the world among computer chess players, trailing only Junior, with an approximate strength of 2730-2740.

Now that we finally have a good amount of data on how Shredder and Brutus have done against human players, here is a graphical comparison, summarizing the performance ratings of the top computers over the past 20 months, including only verified games against FIDE-rated humans:

As I mentioned earlier, I have a lot more to say about computers in chess, but for now you'll have to be satisfied with these graphs. However, please feel free to send me e-mail at if you have any questions, comments, or suggestions.

Jeff Sonas is a statistical chess analyst who has written dozens of articles since 1999 for Kasparov Chess website. He has invented a new rating system and used it to generate 150 years of historical chess ratings for thousands of players. You can explore these ratings on his Chessmetrics website. Jeff is also V.P. of Engineering for Ninaza, providing web-based medical software for the health care industry.

