The Hans Niemann case: Numbers – what they reveal and what they do not reveal

by Andrea Carta
10/24/2022 – There is love at first sight, as everyone knows, and there are, somewhat surprisingly, "statistics at first sight". What is that? In the infamous matter labelled by now as "the Hans Niemann cheating case" (a title that S.S. Van Dine would have greatly appreciated, as well as the mystery surrounding the case) such "statistics at first sight" have attracted the undisputed attention of every party involved – and above all the attention of the ones not involved. | Picture: The rating development of Hans Niemann

ChessBase 17 - Mega package - Edition 2024 ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.

More...

Such a thing has happened because (until now) nobody has ever caught Hans Niemann cheating, at least not over the board (aka OTB, a new acronym that got immensely popular in a matter of days). Not because of lack of fantasy on the part of the audience: after the "anal beads" mentioned by no less than Elon Musk, all sort of devices have been suggested, up the hilarious "transmission-of-signals-directly-into-the-ear" (described at https://www.ll.mit.edu/news/laser-can-deliver-messages-directly-your-ear-across-room), a technique that would require a complex laser equipment placed close to the player, not to mention the enormous cost of such a device.

That’s why statistics have been widely used to determine if Hans Niemann has really cheated over the board in the past (as Carlsen played so badly against him at the Sinquefield Cup that it looks unlikely cheating occurred in that circumstance). At first, Professor Ken Regan, known to be the world’s greatest expert on cheating detection, studied the matter, and found no reason to suspect Hans Niemann of cheating. His findings are discussed in an interview he gave to Albert Silver on the 20th of September: https://en.chessbase.com/post/is-hans-niemann-cheating-world-renowned-expert-ken-regan-analyzes, but that did not put an end to the matter.

The general idea, at least for people convinced that Hans Niemann is indeed a compulsive cheater, is that "Ken Regan’s tool" is obsolete, as it only "relies on centipawn losses" (differences between a player’s moves and the engines’ best ones), and because it’s well known worldwide, thus allowing "careful cheaters" to avoid detection. "Ken Regan’s tool" is much more than simply evaluating the average centipawn losses (aka ACPL – another brand-new acronym – but that does not bother the new experts, as new statistics and new tools have surfaced. The most known is probably the one depicted by "Gambit-man", a Chess.com user, self-defined expert on the matter. This user made use of the "Let’s Check" tool provided by ChessBase in order to evaluate the games played by Hans Niemann during the last 3 years, a time frame during which he played - almost frantically - more than 400 games. The "Let’s Check" tool, as Albert Silver explained in a subsequent article (https://en.chessbase.com/post/let-s-check-the-elite-are-better-than-you-know#discuss) "will give you a summary called Engine Correlation at the top, showing the percentage of times a player's moves matched the top choice of an engine". As FM Nate Solon also explained in an article published in his own blog on the 4th of October (https://zwischenzug.substack.com/p/did-hans-niemann-cheat?r=av0j7&utm_campaign=post&utm_medium=web), the more a game is analysed by means of an engine, the higher the correlation will be, and also will possibly increase every time a new engine is employed: furthermore, no comparison could ever be made between different games (not to mention different players) for the same reason, the different engines involved in the analyses. That’s why ChessBase says the tool shouldn't be used for cheat detection (or to be accurate: "the correlation isn’t a sign of computer cheating, because strong players can reach high values in tactically simple games. Only low values say anything, because these are sufficient to disprove the illegal use of computers in a game").

Despite that, streamer and FM Yosha Iglesias published a video called "The most incriminating evidence against Hans Niemann" (https://youtu.be/jfPzUgzrOcQ), promoting Gambit-Man’s research and highlighting 10 games with a perfect 100% correlation (not to mention other 23 at 90% or more). According to Iglesias, no other player in the world, not even Carlsen, boasts so many 100% games; also, Hans Niemann’s average correlation throughout all the tournaments (65%) compares with "super GMs" (players rated at least 2700 ELO points), despite Niemann never achieving super GM-status. Hikaru Nakamura also relaunched the video and Gambit-Man findings.

But this is "statistics at first sight". Even discounting Nate Solon’s strong objections, it is not possible to ignore Albert Silver’s findings when analysing the Sinquefield Cup’s games with the "Let’s Check" tool, findings presented in the article mentioned above. Not only is Hans Niemann’s correlation in the infamous game against the World Champion just "a modest 68%", but the player with the best correlation at the Sinquefield Cup (3 games over 90% and 2 more over 80%) is… Levon Aronian. He’s one of the three players who sub-performed at the Sinquefield Cup, and he currently seems to go through a crisis and lost a lot of rating in the last tournaments he played (Olympiad, Sinquefield Cup and now US Championship) and has by now reverted to his 2005 rating.

The fact that Aronian performed so well – according to the Let’s Check tool - in five games (out of 8), yet his real performance was mediocre at best, should ring a bell. Another player – Wesley So – had a perfect game with a 100% correlation, but this happened because only 8 moves (out of 28) were considered worth of analysis (the others, being pure theory, were discarded by the tool). With this in mind, let’s check (for real) the 10 "100% games" played by Hans Niemann, on the original Gambit-Man table (found at https://docs.google.com/spreadsheets/d/127lwTsR-2Daz0JqN1TbZ8FgITX9Df9TyXT1gtjlZ5nk/edit#gid=0)

  • -World Youth Open U16, 10/2019, 8th round (out of 11): against FM Miguel Angel Soto (2283), won in 27 moves.
  • -Marshall GM Norm, 2/2020, 7th round (out of 9): against IM Christopher Woojin Yoo (2430), won in 22 moves.
  • -CCCSA Fall Invitational, 10/2020, 6th round (out of 9): against IM Aleksandr Ostrovskiy (2427), won in 28 moves.
  • -7th Sunway Sitges, 12/2020, 6th round (out of 10): against GM Matthieu Cornette (2558), won in 36 moves.
  • -1st GM Mix Bassano, 3/2021, 5th round (out of 9): against IM Jesus Martin Duque (2454), won in 28 moves.
  • -14th Philadelphia International, 6/2021, 1st round (out of 9): against Eddy Tian (2204), won in 31 moves.
  • -US Junior Closed, 7/2021, 6th round (out of 9): against IM Ben Li (2376), won in 34 moves.
  • -2nd Tras-Os-Montes, 8/2021, 7th round (out of 9): against FM Isak Storme )2398), won in 38 moves.
  • -4th Sharjah Masters, 9/2021, 2nd round (out of 9): against GM Cristhian Camilo Rios (2466), won in 45 moves.
  • -Kvika Reykjavik Open, 4/2022, 5th round (out of 9): against GM Steinn Gretarsson Hjorvar (2542), won in 37 moves.

What is there behind all these "perfect" games? One possible explanation is, of course, cheating. But there also alternative explanations. The most obvious is the length of these games: half of them lasted less than 32 moves, and we already know – from So’s game at the Sinquefield Cup – that in such a case only few not theoretical moves remain, making the occurrence of a 100% correlation much more likely. Furthermore, only one game lasted more than 40 moves – the 45 moves’ victory against Rios – and the analyses show this game to be all but perfect: for example, both Stockfish and the well-known Chess.com utility (in the picture below) point out a lot of moves that in no way may be deemed "the engines’ best", even including some inaccuracies: which engines suggested to the Let’s Check tool that this game was perfect remains a mystery, and if cheating cannot yet be discounted, possible foul play also cannot (something that has first been highlighted by Nate Solon). But above all such analyses strongly hint that the ways the Let’s Check tool works are difficult to fully understand, thus the tool itself cannot be regarded as reliable, at least not for cheating detection. Why not simply trust ChessBase itself, that clearly states just that?

But even if these "perfect" games were evidence of cheating, what logic could possibly be behind it? As the "perfect" games are sporadic, there should be some criteria that prompted Hans Niemann to select them for cheating. Maybe one could speculate these games were the tournaments’ last ones, in order for him to achieve the best possible placement without raising suspicion, but this is not the case: none of these games was played in the last, or second to last, round. Or maybe one could imagine Hans Niemann only cheated against the strongest opponents, trusting his own skills in any other case: but this is also not true, as his best "perfect" victory occurred in 2020 against GM Matthieu Cornette (2558), while in the whole year 2022 he played at least 70 stronger players, without achieving a "perfect" game against any of them.

While the mystery of the "perfect" games is unlikely to be solved soon, other "statistics at first sight" are still to be discussed. On the 6th of October Chess.com published a report intended to explain why Hans Niemann was banned from their site and their online tournaments. This long-awaited report, while explaining – even emphasizing – that Niemann cheated online a lot, also admitted, just as Ken Regan did before, that there is no evidence of him cheating "on the board". Not satisfied with its own conclusion, Chess.com added some small statistics that nonetheless rise again strong suspicions: at page 12 it is shown that no other player improved as much as Niemann between the age of 11 to 19 (a comparison has been made with many other famous young players), as shown in the picture below. But not only this finding rests on the mysterious "Strength Score" (a parameter only Chess.com uses, whose working principles nobody knows, and whose main purpose is to detect online, not on the board cheating), it also is a good example of "cherrypicking", as the same comparison could be done making use of the ELO ratings over any other lifespan, with completely different results: for example, Hans Niemann gained no ELO points in the year 2019, at 16, while Firouzja, same age, same year, gained 105. Keymer, same age, following year, gained 64 points. And so on.

At page 15 another impressive statistic is shown: Hans Niemann gained his GM title at the age of 17, while all other so-called "youngsters" (Firouzja, Keymer and many others) did it before, some even at the age of 12 (Gukesh). Again, this is cherry-picking: if the comparison is made, not with the "youngsters", but with other players of similar strength (of today) instead, the outcome is completely different. For example, both Tomashevsky (rated 2696 ELO points) and Wojtaszek (2693) won the title at the age of 18 – later than Niemann – and even a younger player such as Alekseenko (2691) gained the title at the age of 18. Maybe Niemann won’t become as strong as Firouzja or Keymer, but right now there’s nothing strange in him becoming a GM title at the age of 17.

The last "statistics at first sight" appeared on YouTube on the 2th of October, when the Brazilian streamer Rafael Leite published a video with the impressive title "TOP URGENT! Strong EVIDENCE of CHEATING has been found on NIEMANN's Controversy". The following day another video followed, called "HUGE FINDING: Hans Niemann has 2500 Strength". The day after, eventually, an article of him appeared at https://medium.com/@rafaelvleite82/how-i-found-perfect-correlation-between-chess-player-rating-and-acpl-and-stdcpl-bea9485055de, with further explanations: the conclusion was that "I found out that Chess Player Hans Niemann has a 2500–2550 Strength, even being rated near 2700", immediately followed by the "big question" "What can possibly explain a 2500 strength player achieve 2700 rating?", a question having an answer clearly hinted at: cheating! Even ChessBase gave Rafael Leite the greatest evidence, reprising his findings in an article at https://en.chessbase.com/post/statistical-analysis-of-the-games-of-hans-niemann.

What findings? At first, Rafael Leite found the obvious: the stronger a player is, the less mistakes he makes. Then he analysed a lot of games (many thousands) hoping to correlate the average ACPL with the players’ rating and eventually computed the following table, one that even accounts for STCPL (Standard Deviation CentiPawn Loss). This table looks so impressive you should expect to find a big "42" somewhere inside (unluckily the closer we get to the Answer to the Ultimate Question of Life, the Universe, and Everything is only a "41" on the right).

All the "usual suspects" have been checked: the youngsters (Gukesh, Keymer, Praggnanandha, Erigaisi), plus Carlsen and Caruana: for every one of them the ACPL inferred from their games matches their rating. Niemann is different: his ACPL is "only" 25, meaning his "true" strength is just 2550 ELO points (conveniently rounded to 2500), despite him having just reached the 2700 barrier (2699 just before the US Championship). This is the "strong evidence of cheating" Leite found on October 2. But is it really that? The difference between Niemann’s "true" strength and his ELO rating may be caused by a lot of things, but cheating cannot be one of them, as it implies making "engine moves", something that would obviously decrease the ACPL.

The answer to the "big question" "What can possibly explain a 2500 strength player achieve 2700 rating?" is not difficult to find. We must keep in mind how the ELO system works: the rating is not a fixed, immutable value, but may vary a lot depending on the player’s condition, even after full maturity is reached. Caruana’s ELO, for example, in the last 10 years fluctuated between 2763 and 2844, staying most of the time at 2810-2820 (likely his "true" strength). So, first of all there’s nothing inherently strange in a difference (even a big one) between ELO rating and "true" strength. Of course, a difference of 150 points may look excessive, but there is a well-known problem, in the ELO system, that may be the reason. Let’s say, for example, that a player’s ELO is 2000 points of today, and that the player retires, studies a lot and eventually comes back, having improved to a "true" strength of 2200. If he plays in a tournament against opponents rated, on average, 2000 points, his expected score will be 4.5 (assuming he would play the usual 9 games); but as he will likely perform at 2200, his score will be 7, with a gain of 50 Elo points. After the tournament his ELO rating will become 2050, still far away from his "true" strength, and he will need many other tournaments until eventually achieving 2200. That’s why the notorious K-factor is worth 40 for young players (until 2300) and 20 for everyone else until 2400 (thus it’s worth 10 only for strong players, that presumably are close to their "true" strength): a high K factor boosts the rise of young players, reducing the effect of the problem mentioned above. However, everyone knows that’s not enough, and usually strong young players are underrated (albeit by not much).

Let’s now go back to Gambit-Man’s table, trying to compute the average ELO rating of Niemann’s opponents. To simplify matters, the first eight tournaments may be left out (Niemann played not so well, losing many ELO points, and some links at the games are missing), as well as the last ones (Sinquefield Cup and US Championships), when the cheating allegations may have disrupted his (and his opponents’) concentration. 42 tournaments remain, accounting for 369 games. Let’s split these 42 tournaments in 3 equal groups and see what was expected from Niemann:

  • -First group, since 103rd Marshall Chess Club Championship (11/2019) to 1st Spring Weekend Bassano (3/2021). 118 games, average opponents’ ELO was 2391. Niemann’s ELO was usually at 2460-2480, so he was expected to score about 61%, but scored 68,22% instead (+67 =27 -24).
  • -Second group, since First GM Mix Bassano (3/2021) to Serbian Premier League (9/2021). 128 games, average opponents’ ELO was 2423. Niemann’s ELO varied from 2520 to 2630, so he was expected to score about 70%, but scored 74,22% instead (+77 =36 -15).
  • -Third group, since 4th Sharjah Masters (9/2021) to Turkish Super League (8/2022). 123 games, average opponents’ ELO was 2560. Niemann’s ELO varied from 2630 to 2690, so he was expected to score about 64%, but scored 66,26% instead (+59 =45 -19).

In other words, Niemann constantly overperformed during the last 3 years, gaining a lot of points until reaching the 2700 barrier. His progress since 11/2019 to 8/2022 may be found on the Fide website:

 

Let’s now go back to the "big question": "What can possibly explain a 2500 strength player achieve 2700 rating?". The answer, as we already know, cannot be "cheating". Instead, it’s now obvious that in the last 3 years Niemann was constantly underrated, because he was – and still is – a young player quickly improving, and his K factor, being always worth 10 (his ELO rating was already more than 2400 in November 2019), wasn’t high enough to avoid the problem that has been explained before. During the last 3 years Niemann’s "true" strength has likely risen from 2500 to 2700, slowly reducing the gap from his ELO rating, as hinted by the difference between his expected performance and his real one becoming smaller with time. By now the gap has likely become zero, and Niemann could really be worth 2700 Elo points; but his average "true" strength over these three years has likely been 2600, not that far from the 2550 estimated by Rafael Leite according to his ACPL. Furthermore, Nate Solon (as we can read in his article mentioned above) has found that "the thing that struck me when looking over Niemann’s games is his aggression. Most of the top grandmasters like to avoid risks when possible. Niemann seems more willing to take the game into murky territory, and especially to sacrifice material". Other chess experts have even compared Niemann’s style of play to Tal’s. In any case, such a risky style is of course prone to inaccuracies and mistakes and, even when successful – as it clearly is in Niemann’s case – would by all means increase the player’s ACPL. So we have now the answer to the "big question": Niemann gained many ELO points because in the last three years he became a better player and his "true" strength, at the moment, is 2700 for real, the same as his ELO rating. The difference between his past average "true strength", 2600, and the 2550 estimated by Rafael Leite is due to his risky style of play.

In the end we have found that "statistics at first sight", all of them, look like strong evidence of Hans Niemann cheating, and cheating a lot. But at second sight, all the statistics show instead a picture typical of a young player rising fast, with no evidence of cheating whatsoever. Ken Regan was right.

Does this mean that Hans Niemann never cheated on the board? It’s still difficult to say. Opinions of strong players cannot be discounted, nor cannot be the ones of expert commentators like Alejandro Ramirez (his opinion can be read at https://en.chessbase.com/post/alejandro-ramirez-it-does-seem-very-likely-that-hans-cheated-over-the-board, with a link to a podcast in which the matter is fully discussed). But it’s extremely unlikely that statistics alone will ever provide evidence on the matter, and unless some clever Philo Vance will ever be able to deduce his method and trap him "on the spot", the mystery will never be solved. Chess, already diminished because of the overwhelming engines’ dominance, is on the verge of completely losing its charisma. Hysteria is spreading fast: already people are not permitted to watch important tournaments in person, and live broadcast is quickly disappearing. Will the "old times" ever come back?

 


Born in Italy, IT engineer, he's written some GO software, published several papers about reconstructing GO games from videos by means of AI tecniques and has joined two scientific conferences (Liberec 2015 and Pisa 2018) during the corresponding European Go Congresses. Like Ingo Althoefer - who arranged such conferences - he's above all a chess fan since the Spassky-Fischer match and has even attended many World Championships since then. He considers himself a good amateur, depite not even reaching the 2000 barrier (that will forever remain his forbidden dream).

Discuss

Rules for reader comments

 
 

Not registered yet? Register