Let's Check: the elite are better than you know

by Albert Silver
10/17/2022 – There are several ways to check a player's performance using an engine. One is to simply ask an engine to analyze every move and highlight every disagreement however small. Another is to use the tool in ChessBase and Fritz known as Let's Check. Here are the results from the recent Sinquefield Cup including a 100% match and the curiously high results by...

Fritz 18 Fritz 18

"Evolving Genius": learn to attack and play brilliancies. Fritz offers you everything you will need as a dedicated chess enthusiast.

More...

There are three main ways today to evaluate a player using an engine. This is not to be confused with annotating a game.

The most sophisticated one is used by analysts such as Dr. Ken Regan, who compile the average error rates of players per level. For example, I might lose an average 0.15 pawns per move compared to the engine's best, while a top GM might lose only 0.02. His system will go deeper than this, but that is still the foundation on which it lies and it will be better at catching 'smart cheaters' than a more basic system such as below.

The simplest is just to analyze a game with an engine and ask it to highlight every move it disagrees with, however small the difference. Obviously the risk is that in some positions, there might be three roughly equal moves that three engines play slightly differently.

Imagine you are analyzing with only Stockfish, and it says that five moves out of ten are not a match. This might overlook that two of the moves that don't match its choices, are chosen by another top engine such as Komodo Dragon 3. In other words, only five match Stockfish, but seven in all match top engine choices. That is the underlying point of Let's Check. When you analyze a game with it, it will not only tell you what a variety of engines thought of each move, it will give you a summary called Engine Correlation at the top, showing the percentage of times a player's moves matched the top choice of an engine.

However, unlike a plain engine comparison, it won't compare with just one top engine move, it will compare with several, and if the move matches any of those engines, then it is a match for Engine Correlation. 

Komodo Dragon 3

The new Komodo Dragon 3 engine has gained 100 Elo points in playing strength over its predecessor when using a processor core in blitz. That's a huge improvement for a program that already reached at an Elo level of over 3500!

Sinquefield Cup

Recently there were several claims about high Engine Correlation matches between Hans Niemann's games and the Let's Check choices, so out of curiosity I ran a complete Let's Check on all the games in the recent Sinquefield Cup and I must say the results were unexpected.

The first result to come out was that one player did actually obtain a 100% match. This was not the result of some ultra-short draw, since Let's Check will ignore theory moves, and games with too few moves played. I.e. a game that was 28 moves long but had 20 moves of theory will not be eligible for an Engine Correlation result. Who is this engine matching wonder? Wesley So.

In his game against Ian Nepomniachtchi, the American player achieved a 100% Engine Correlation score. However, he was not the star performer overall in terms of such measurements, since it was his only game over 80%. No, one player managed to score three times in excess of 90% engine correlation. Aha! I hear you cry out. We have him! So who is this chess engine-like god?

Levon Aronian had several of the highest quality games according to Let's Check

Meet Levon Aronian, late-bloomer extraordinaire, who had an engine correlation of 92% against Caruana (who himself has a 96% correlation in that same game) over 45 moves, 91% against Wesley So in 43 moves, and 91% against Magnus Carlsen in 36 moves. Plus two more games with over 80%.

He was not quite alone though, and none other than Ian Nepomniachtchi had two as well, plus several over 80%, showing the quality of play that led him to win the Candidates this year. Note that he had an average 78% engine correlation for the entire Candidates, 11% more than second-best Caruana.

The burning question on your mind, dear reader, is what about Hans? In terms of engine correlation, Hans was the worst. His best game, with an 88% match over 55 moves, was in round seven against Maxime Vachier-Lagrave. In his game against Carlsen it was a modest 68%, but of course Magnus was playing dreadful that day, and had only 37%. 

The mythical 100%

So how rare is 100% after all? It is rare but not as rare as you might think. I ran some random checks through games in 1999-2000 as I was curious about Kasparov and Kramnik. All in all I had some 150 eligible games, maybe less, yet it turned up a higher-than-expected number of perfect matches.

For example, the rapid games Amber tournament had several 100% perfect games, including Jeroen Piket in one, and Kramnik in another. And against Topalov no less... Memories of Toiletgate. There were also two(!) by Kasparov in Bosnia in 1999, another in Bosnia in 2000, one more by Kramnik in the World Knockout event against Korchnoi over 41 moves and later one by Michael Adams against Vlad in that same event.

However, there is a caveat that must be mentioned when using such tools. It is eminently possible to game the system to show a 100% match where it normally might not. You see, when doing a Let's Check analysis within Fritz, you have the option of providing your own engine, and then telling it to only use it for moves that did not match engine choices. In other words, you are trying to find an engine it will match. And if it does.... the engine correlation will improve.

 

Originally, this game was only a 90% match, with no engine choosing Garry Kasparov's 16.cxd5 for example. After trying several, I found an engine that chose it, and entered it as another Let's Check choice. Now the tally reads:

So yes, the results can absolutely be manipulated by the unscrupulous. A telltale sign might be in the engines listed. If a new game shows Stockfish 14+, Komodo 12+ and so on, it should be fine, but if you see some very old engines or odd names for that same new game, be on your guard, as they may have been used only to get an extra match. 

Mega Database 2022

The ChessBase Mega Database 2022 is the premiere chess database with over 9.2 million games from 1560 to 2021 in high quality.

Regardless, here is the signature win by Kasparov with notes from Mega Database:

 

Conclusion

Does this in any way invalidate the use of a tool such as Let's Check? Of course not, but as all such tools, they must be used with good sense and judgement. The fact that modern elite players can rattle off multiple games with such extraordinarily high engine matches is a testament to the increasing overall quality of the chess players, since the engines they are matching today, are also hundreds of Elo stronger than engines of a decade ago. These players are also studying and learning from the engines, and that increase in pure ability is a consequence of it.

 


Born in the US, he grew up in Paris, France, where he completed his Baccalaureat, and after college moved to Rio de Janeiro, Brazil. He had a peak rating of 2240 FIDE, and was a key designer of Chess Assistant 6. In 2010 he joined the ChessBase family as an editor and writer at ChessBase News. He is also a passionate photographer with work appearing in numerous publications, and the content creator of the YouTube channel, Chess & Tech.

Discuss

Rules for reader comments

 
 

Not registered yet? Register

e-mars e-mars 10/17/2022 06:32
Someone seems to forget that Carlsen NEVER said he thought he lost his game to Niemann because of Neimann's cheating at the Saint Louis event. Carlsen was planning to withdraw from the tournament before the tournament even started, but then opted to play. He was upset to lose against a cheater REGARDLESS Niemann cheated in the actual game.
Science22 Science22 10/17/2022 05:19
The evaluation of the Carlsen - Niemann game is absurd. I wish everybody would go check the analysis of Stockfish https://www.youtube.com/watch?v=BbEiW-60hf0.

Stockfish concludes ind the end of the video that Niemann did not made a single move which was not within the first 3 choice of Stockfish, and 93,4 % was the first choice of Stockfish.

Niemann makes a lot of only win moves in the complicated endgame, and Carlsen never had a clear draw after the opening. Notice the word clear. Some of the moves of Niemann in the endgame are fantastic like an old world champion with many years of experience.

I am very curious. How can adding more engines lead to a drop from 93,4 % to 57 % ? You have just told us that if just one machine correlate it is a match. But Stockfish agree with Niemanns moves all the way !

Why did the worlds best play Carlsen think like hell to find solutions in the endgame ? I tell you why, he played a supercomputer. Wait and see.
Science22 Science22 10/17/2022 04:44
To my opinion Albert Silver deliver a conclusion on a statistically inconclusive basis.

The claim is that all super grandmasters play like computers along the lines of Niemann. It is not correct when one look at the crucial stages middle and endgame.

To underline his claim Silver highlights the game Kasparov - Sokolov from the year 2000. The game is on 26 moves, and Sokolov has played it all before up to move 14. Which Kasparov was naturally prepared for. After this, some natural development moves follow, and already in move 19 (h3) Kasparov starts a combination that ends the game quickly.

Seven moves later it's all over after a forced combination that is made more simple through Sokolov's misplay. No problem for one of the best tactical players ever on this planet.

The game never left the opening phase. In stark contrast to this is the Carlsen - Niemann 2022 Sinquefield Cup of 57 moves with both middle - and endgame. Where the very precise moves of Niemann (only win) came in the middle and endgame.

What has been decisive for my own assessment has been the huge difference in performance when Niemann plays in tournaments without live transmission, where he has lost ELO every time the last 3 years, and those with live transmission where he has gain rating every time. No statistical analysis can explain this as a natural deviation.
joachimus joachimus 10/17/2022 04:43
I think the entire issue is with the concept of cheating , that all moves have to be correlated to top engine choices.
This is the approach which I find not useful because for GMs it is enough to receive signal in key moment when evaluation changes and then follow the route to victory. That's why I find Niemann game case such controversial.
If he would have thought/analyse over the board then he wouldn't have problem with sharing his thoughts about the games and post-analysis and certainly not creating stories like "by miracle I checked this game this morning" which he repeated few times. Not mentioning his victory over Yoo in very complicated position which he just comment as wonderful game which speaks for itself. Come on really?
Albert Silver Albert Silver 10/17/2022 04:28
@MeisterZeiger - In the game of Backgammon, there was a revolution as the new neural net AIs were able to outplay the best players. The common approach is not to try to make such distinctions of human vs machines, but rather to accept the machine is right, and try to explain it in a way that a human can use to guide future decisions.
MeisterZinger MeisterZinger 10/17/2022 04:03
"The fact that modern elite players can rattle off multiple games with such extraordinarily high engine matches is a testament to the increasing overall quality of the chess players," you say. Is it? Or is it a demonstration that today's players have learned to play more like machines? There's a difference.
KnightOnTheRim KnightOnTheRim 10/17/2022 03:40
I think you shouldn’t compare draws ‘ % accuracy and wins .
WildKid WildKid 10/17/2022 02:43
That certainly seems to back up the view that Niemann's win against Carlsen was due to the latter's bad play,rather than Niemann cheating, or even playing especially well.
arzi arzi 10/17/2022 01:45
Oh, what a disappointment, Niemann's result with Let's check -tool in 9 games was the worst of all players, 57%. Carlsen had 37% on that important day he lost against Niemann (who´s score was in that game, 68%). No mythical 100%. Should we ask MacGyver for help?