Sonas: Assessment of the EU performance calculation

by ChessBase
4/16/2011 – The 2011 European Individual Championship left 29 players with a tied score vying for eight places in the next World Cup. To break the tie the ECU used performance ratings, but calculated them in a way that led to some bizarre results – and to a formal protest by at least one player. Jeff Sonas introduces us to other, more logical systems. As usual his report is presented with exceptional clarity.

ChessBase 18 - Mega package ChessBase 18 - Mega package

Winning starts with what you know
The new version 18 offers completely new possibilities for chess training and analysis: playing style analysis, search for strategic themes, access to 6 billion Lichess games, player preparation by matching Lichess games, download Chess.com games with built-in API, built-in cloud engine and much more.

More...

Assessment of the EU performance calculation

By Jeff Sonas

With almost 400 players competing at the recently-concluded 2011 European Individual Championship in Aix-les-Bains, France, including 85 players rated 2600+, it is no surprise that the final leaderboard was extremely crowded at the top. Four players tied for the overall lead scoring +6 (8.5/11), eleven players finished half a point back with +5 (8/11), and there was a big group of 29 players on +4 (7.5/11). The exact final ranks were not determined by rapid playoffs, but by a series of tiebreaker calculations, the first of which was performance rating. The performance rating calculation can be fraught with controversy, and at least one player has already lodged an official protest against the final standings.

Because there was so much at stake in the performance rating calculation here (including determining the individual European Champion as well as a number of qualification spots into the World Cup) I thought it might be interesting to take a closer look at the calculations and the resultant numbers.

Part 1 – Reading the rules

Of course we ought to start with the official rules for the tournament itself. Section 8 clarifies that (per ECU Tournament Rules D3.2) there are no tie-break matches, and that the tiebreaker rules will follow section B6.2 of the ECU Tournament Rules, namely:

6.2 Tie-breaking in individual competitions.
The order of players that finish with the same number of points shall be determined by application of the following tie-breaking procedures in sequence, proceeding from (a) to (b) to (c) to (d) the extent required:
     (a) Performance Rating
     (b) Median-Buchholz 1, the highest number wins;
     (c) Buchholz, the highest number wins;
     (d) Number of wins, the highest number wins.
In case of (a) the highest and the lowest rated opponent will be deleted and the maximum rating difference of two players shall be 400 points.
In the case of unplayed games for the calculation of (a), (b) and (c) the current FIDE Tournament Rules shall be applied.

Note that if you had an unplayed game (such as a forfeit or a bye), it mentions "the current FIDE Tournament Rules". I think it is referring to this part of the FIDE Rating Handbook, where it simply says that in the handling of unplayed games for tie-break purposes, the result shall be counted as a draw against the player himself.

Okay, fair enough. So if players finish on the same score, we will break the tie by calculating a "Performance Rating", where the highest and the lowest rated opponent will be deleted and the maximum rating difference of two players shall be 400 points. Unfortunately there is no further detail provided in either the Aix-les-Bains or EU regulations regarding the exact method for calculating performance rating.

Part 2 – Deciding how to calculate performance rating

So with all of those ties at the top, and 23 qualification spots available for the World Cup, the specific details of this performance rating calculation are quite significant. Unfortunately there are a few different ways to calculate performance rating; let's look at some of these.

(2a) Simple Performance Rating

The simplest approach, and I think the most generally accepted one, is to treat all of a player's games identically, where we average the opponent ratings and then add a modifier, taken from the standard Elo table (see section 8.1(a) here) for converting from fractional score to rating difference. That follows the recommendation provided by FIDE (see section 5. Tiebreak Rules using Ratings). As an example let's look at the four players who finished on +6:

Player Name

V.Potkin

R.Wojtaszek

J.Polgar

A.Moiseenko

Player Rating

2653

2711

2686

2673

Round 1

1-0 vs. 2347

1/2 vs. 2423

1/2 vs. 2415

1-0 vs. 2388

Round 2

1-0 vs. 2532

1-0 vs. 2422

1-0 vs. 2409

1-0 vs. 2543

Round 3

1-0 vs. 2595

1-0 vs. 2590

1-0 vs. 2547

1-0 vs. 2592

Round 4

1-0 vs. 2575

1/2 vs. 2615

1-0 vs. 2600

1/2 vs. 2598

Round 5

1-0 vs. 2616

1-0 vs. 2611

1/2 vs. 2601

1/2 vs. 2584

Round 6

1/2 vs. 2673

1/2 vs. 2626

0-1 vs. 2614

1/2 vs. 2584

Round 7

1/2 vs. 2677

1-0 vs. 2631

1-0 vs. 2595

1-0 vs. 2626

Round 8

1/2 vs. 2711

1/2 vs. 2653

1-0 vs. 2584

0-1 vs. 2626

Round 9

1/2 vs. 2598

1-0 vs. 2673

1-0 vs. 2626

1-0 vs. 2616

Round 10

1-0 vs. 2707

1-0 vs. 2634

1-0 vs. 2626

1-0 vs. 2534

Round 11

1/2 vs. 2686

1/2 vs. 2730

1/2 vs. 2653

1-0 vs. 2683

Overall Score

8.5/11

8.5/11

8.5/11

8.5/11

Pct Score

77%

77%

77%

77%

Elo Modifier

+211

+211

+211

+211

Avg Opp Rating

2611

2601

2570

2579

Performance

2822

2812

2781

2790

So by this simple measure, the final standings would have been #1 Potkin, #2 Wojtaszek, #3 Moiseenko, and #4 Polgar. However, the actual standings had Polgar finishing in third place, and Moiseenko finishing in fourth place. Why was this? It is because this particular tournament used a different performance rating calculation.

There is a concern that extreme rating differences can skew the performance rating too much. It is easy to construct examples wherein a player's performance rating can go down despite a victory, because their latest opponent's rating was so low, or conversely that it can go up despite a loss, because their latest opponent's rating was so high. For instance if you start a ten-round event by scoring 50% in 9 games against 2200-rated average opposition, that is (of course) a 2200 performance rating through 9 rounds.  If you then defeat a 1200-rated player in the final round, it yields a 55% score against average opponent strength of 2100, for a performance rating of 2136 (less than 2200 despite your victory). Conversely, if you had instead lost to a 2700-rated player in the final round, that gives a 45% score against average opponent strength of 2250, for a performance rating of 2214 (more than 2200 despite your loss). These examples, though they are somewhat artificial, lead to the desire to "blunt" the impact of extreme rating differences, such as the approach taken in the Aix-les-Bains tournament.

(2b) EU 2011 Performance Rating

The key difference with Aix-les-Bains comes from that special clause "the highest and the lowest rated opponent will be deleted". Instead of an eleven-game percentage score and an average of eleven different opponent ratings, the calculation used for this tournament identifies each player's highest-rated opponent, and lowest-rated opponent, and ignores both games. Thus it effectively is looking at a nine-game percentage score, and an average of nine different opponent ratings, based on the selected subset of nine games. So instead we get this:

Player Name

V.Potkin

R.Wojtaszek

J.Polgar

A.Moiseenko

Player Rating

2653

2711

2686

2673

Round 1

1-0 vs. 2347

1/2 vs. 2423

1/2 vs. 2415

1-0 vs. 2388

Round 2

1-0 vs. 2532

1-0 vs. 2422

1-0 vs. 2409

1-0 vs. 2543

Round 3

1-0 vs. 2595

1-0 vs. 2590

1-0 vs. 2547

1-0 vs. 2592

Round 4

1-0 vs. 2575

1/2 vs. 2615

1-0 vs. 2600

1/2 vs. 2598

Round 5

1-0 vs. 2616

1-0 vs. 2611

1/2 vs. 2601

1/2 vs. 2584

Round 6

1/2 vs. 2673

1/2 vs. 2626

0-1 vs. 2614

1/2 vs. 2584

Round 7

1/2 vs. 2677

1-0 vs. 2631

1-0 vs. 2595

1-0 vs. 2626

Round 8

1/2 vs. 2711

1/2 vs. 2653

1-0 vs. 2584

0-1 vs. 2626

Round 9

1/2 vs. 2598

1-0 vs. 2673

1-0 vs. 2626

1-0 vs. 2616

Round 10

1-0 vs. 2707

1-0 vs. 2634

1-0 vs. 2626

1-0 vs. 2534

Round 11

1/2 vs. 2686

1/2 vs. 2730

1/2 vs. 2653

1-0 vs. 2683

Overall Score

7/9

7/9

7/9

6.5/9

Pct Score

78%

78%

78%

72%

Elo Modifier

+220

+220

+220

+166

Avg Opp Rating

2629

2606

2579

2589

Performance

2849

2826

2799

2755

You can see that for the first three players, the impact of removing the games against highest-rated and lowest-rated opponents had the effect of removing 1.5/2, leaving a 7/9 (78%) score whereas for Moiseenko, it actually removed 2/2, lowering his score to 6.5/9 (72%). This has such a large impact that it hardly matters what you calculate for the average opponent ratings; Moiseenko already has a gap of more than 50 points to make up (since the others will get +220 for their 78% scores and he will only get +166 for his 72% score). And ultimately his performance rating under this scenario (2755) is more than 40 points behind the other three.

You can go through all four players and point to small variations that would have made a big difference to their performance rating calculations. Potkin's opponents in rounds 8 and 10 were within five Elo points of each other; had their ratings been switched then it would have been Potkin who had 2/2 subtracted from his score, and he would have ended up in 4th place instead. Wojtaszek's first two opponents were rated 2423 and 2422; had those been switched then he would have only had 1.0/2 subtracted, leading to a fabulous 83% score from the remaining nine games and an easy first place. The same thing for Polgar; her first two opponents were rated 2415 and 2409; had it been switched then she would have only had 1.0/2 subtracted and she would have won first place. And for Moiseenko, until the final round his two strongest opponents were both rated 2626 (one of which he beat and the other he lost to) so it would have been a further tiebreaker that would have left him either having 2/2 subtracted from his score (and fourth place in the tiebreak) or having 1.0/2 subtracted from his score (and likely first place).

I don't like this approach. I think it is too discontinuous, too dependent on tiny rating point differences as we have just seen. And I find it quite unfortunate that in an attempt to compensate for perceived problems in averaging opponents' ratings together, the result was a calculation that was dominated by the outcomes (rather than the opponent ratings) in those two removed games. It is basically saying that the secondary goal for these players, in addition to scoring as many overall points as possible, is to do particularly well against the middle of the pack. I really think it would be fairest to say that the 11 outcomes all have equal relevance.

(2c) Performance Rating Removing Top/Bottom ratings

A methodology that perhaps would have been more in the spirit of the "middle of the pack" approach, would be to say that we retain all eleven game outcomes, but discard the highest-rated opponent and lowest-rated opponent, for purposes of calculating the average opponent rating. One might even argue that this is a perfectly valid way to read the phrase "the highest and the lowest rated opponent will be deleted". It certainly seems to me to be a fairer and smoother way. So under this approach, these four players would still retain their 77% score, but we would only use the middle nine opponent ratings when calculating the average opponent rating:

Player Name

V.Potkin

R.Wojtaszek

J.Polgar

A.Moiseenko

Player Rating

2653

2711

2686

2673

Round 1

1-0 vs. 2347

1/2 vs. 2423

1/2 vs. 2415

1-0 vs. 2388

Round 2

1-0 vs. 2532

1-0 vs. 2422

1-0 vs. 2409

1-0 vs. 2543

Round 3

1-0 vs. 2595

1-0 vs. 2590

1-0 vs. 2547

1-0 vs. 2592

Round 4

1-0 vs. 2575

1/2 vs. 2615

1-0 vs. 2600

1/2 vs. 2598

Round 5

1-0 vs. 2616

1-0 vs. 2611

1/2 vs. 2601

1/2 vs. 2584

Round 6

1/2 vs. 2673

1/2 vs. 2626

0-1 vs. 2614

1/2 vs. 2584

Round 7

1/2 vs. 2677

1-0 vs. 2631

1-0 vs. 2595

1-0 vs. 2626

Round 8

1/2 vs. 2711

1/2 vs. 2653

1-0 vs. 2584

0-1 vs. 2626

Round 9

1/2 vs. 2598

1-0 vs. 2673

1-0 vs. 2626

1-0 vs. 2616

Round 10

1-0 vs. 2707

1-0 vs. 2634

1-0 vs. 2626

1-0 vs. 2534

Round 11

1/2 vs. 2686

1/2 vs. 2730

1/2 vs. 2653

1-0 vs. 2683

Overall Score

8.5/11

8.5/11

8.5/11

8.5/11

Pct Score

77%

77%

77%

77%

Elo Modifier

+211

+211

+211

+211

Avg Opp Rating

2629

2606

2579

2589

Performance

2840

2817

2790

2800

And so we can see that under this approach, it would have Moiseenko rather than Polgar who would have finished in third place. I will also show you at the end how the World Cup qualifiers would have differed, depending on which performance rating calculation was used.

(2d) Thompson Performance Rating

Ken Thompson has previously suggested an eminently reasonable calculation for performance rating, though it is much harder to calculate than the above approaches. The idea is actually quite simple: if somebody scores 8.5/11 against a particular set of ratings, then the "performance rating" will be whatever FIDE rating would have yielded an expected score of exactly 8.5/11 against those exact opponents. More generally, performance rating ought to represent the rating that would have led to an expected score exactly the same as the score that was actually achieved, given the individual opponents' ratings.

Another great thing about this calculation is that we don't have to care at all what the player's current rating is, yet we can still apply the "400-point rule" that limits the expected score in the case of very large rating differences. It does seem a bit strange to penalize/reward a player in their performance rating calculation, based on what their current rating actually is, but any approach that wants to implement that "maximum rating difference of two players shall be 400 points" constraint would have to either use the Thompson-like approach, or use the player's own rating, which seems a bit counter-intuitive. The whole point of performance rating is that it provides a way to describe the player's performance skill based only on results, without having to use the player's current rating.

This calculation becomes trickier than we would like, because the Elo Expectancy Table is not continuous; it only provides the expected score out to two decimal points, and sometimes this leads to zero solutions or multiple solutions. So you would ideally use Elo's exact formula instead of the Expectancy Table, although I like to split the difference and use linear interpolation on the official Expectancy Table. In addition the only way to really perform the Thompson calculation is to do a "binary search", where you keep trying different proposed ratings, then see whether it leads to an expected score that is too low or too high, and then adjust the proposed rating up or down accordingly until you get closest to the desired expected score. Finding the best fit out to 0.1 Elo points gives us these performance ratings for the four top finishers:

Player

 

Thompson Performance

V.Potkin

 

2841.0

R.Wojtaszek

 

2825.8

J.Polgar

 

2792.0

A.Moiseenko

 

2800.5

Part 3 – Alternative final standings

In the detailed examples provided above, we can see that for three reasonable alternative ways to calculate performance rating, the third place finisher would have been Moiseenko rather than Polgar. And as you might expect, there would be similar variations in the qualification for the World Cup. There were 23 World Cup slots available in this tournament, with three of the top finishers (V.Potkin, P.Svidler, and R.Mamedov) already having qualified, as had two other players who also finished on +4 (B.Jobava and A.Pashikian). Since the top four scored +6, and the next eleven scored +5, and three of those fifteen players were already in, that meant twelve of the 23 World Cup spots were claimed by players on +6 or +5, leaving 11 slots for the players having the best tiebreak scores among the whole group on +4. So the lucky ones were those finishing in #16 through #26, and the unlucky ones were those finishing in #27 through #44:

EU 2011
Performance

Traditional
Performance

Removing Top/
Bottom Ratings

Thompson
Performance

(2849) 1 Potkin

(2822) 1 Potkin

(2840) 1 Potkin

(2841) 1 Potkin

(2826) 2 Wojtaszek

(2812) 2 Wojtaszek

(2817) 2 Wojtaszek

(2826) 2 Wojtaszek

(2799) 3 Polgar Judit

(2790) 4 Moiseenko

(2800) 4 Moiseenko

(2801) 4 Moiseenko

(2755) 4 Moiseenko

(2781) 3 Polgar Judit

(2790) 3 Polgar Judit

(2792) 3 Polgar Judit

(2819) 5 Vallejo

(2768) 6 Ragger

(2792) 6 Ragger

(2790) 6 Ragger

(2783) 6 Ragger

(2764) 5 Vallejo

(2775) 7 Feller

(2775) 7 Feller

(2766) 7 Feller

(2763) 7 Feller

(2774) 5 Vallejo

(2769) 5 Vallejo

(2751) 8 Svidler

(2757) 8 Svidler

(2760) 8 Svidler

(2764) 8 Svidler

(2751) 9 Mamedov

(2754) 9 Mamedov

(2760) 9 Mamedov

(2760) 9 Mamedov

(2741) 10 Vitiugov

(2744) 10 Vitiugov

(2750) 10 Vitiugov

(2750) 10 Vitiugov

(2732) 11 Zhigalko

(2740) 13 Korobov

(2747) 13 Korobov

(2750) 13 Korobov

(2719) 12 Jakovenko

(2735) 14 Inarkiev

(2745) 14 Inarkiev

(2737) 14 Inarkiev

(2697) 13 Korobov

(2731) 11 Zhigalko

(2741) 11 Zhigalko

(2737) 11 Zhigalko

(2695) 14 Inarkiev

(2704) 12 Jakovenko

(2728) 12 Jakovenko

(2728) 12 Jakovenko

(2633) 15 Postny

(2676) 15 Postny

(2683) 15 Postny

(2704) 15 Postny

 

(2776) 16 Azarovi

(2735) 29 Parligras

(2762) 29 Parligras

(2762) 29 Parligras

(2771) 17 Khairullin

(2723) 16 Azarovi

(2747) 20 Zherebukh

(2759) 20 Zherebukh

(2754) 18 Kobalia

(2720) 17 Khairullin

(2743) 16 Azarovi

(2746) 16 Azarovi

(2739) 19 Guliyev

(2716) 18 Kobalia

(2738) 17 Khairullin

(2737) 17 Khairullin

(2739) 20 Zherebukh

(2716) 22 Iordachescu

(2733) 22 Iordachescu

(2734) 22 Iordachescu

(2728) 21 Riazantsev

(2712) 20 Zherebukh

(2724) 26 Motylev

(2728) 31 Esen Baris

(2725) 22 Iordachescu

(2710) 26 Motylev

(2721) 18 Kobalia

(2728) 27 Ivanisevic

(2722) 23 Lupulescu

(2707) 32 Nielsen

(2720) 27 Ivanisevic

(2723) 18 Kobalia

(2718) 24 Mcshane

(2704) 27 Ivanisevic

(2715) 31 Esen Baris

(2718) 19 Guliyev

(2717) 25 Fridman

(2693) 33 Cheparinov

(2711) 32 Nielsen

(2715) 26 Motylev

(2716) 26 Motylev

(2692) 37 Saric

(2706) 33 Cheparinov

(2714) 32 Nielsen

 

(2712) 27 Ivanisevic

(2687) 21 Riazantsev

(2706) 19 Guliyev

(2708) 37 Saric

(2711) 28 Jobava

(2687) 34 Gustafsson

(2704) 37 Saric

(2698) 33 Cheparinov

(2709) 29 Parligras

(2684) 25 Fridman

(2695) 34 Gustafsson

(2695) 34 Gustafsson

(2709) 30 Romanov

(2684) 24 Mcshane

(2695) 21 Riazantsev

(2694) 21 Riazantsev

(2707) 31 Esen Baris

(2677) 23 Lupulescu

(2689) 23 Lupulescu

(2692) 23 Lupulescu

(2703) 32 Nielsen

(2675) 36 Smirin

(2685) 24 Mcshane

(2691) 25 Fridman

(2698) 33 Cheparinov

(2673) 40 Bologan

(2684) 25 Fridman

(2689) 24 Mcshane

(2687) 34 Gustafsson

(2672) 42 Rublevsky

(2682) 40 Bologan

(2685) 36 Smirin

(2669) 35 Kulaots

(2669) 31 Esen Baris

(2680) 41 Beliavsky

(2685) 41 Beliavsky

(2668) 36 Smirin

(2668) 41 Beliavsky

(2680) 42 Rublevsky

(2682) 30 Romanov

(2651) 37 Saric

(2668) 30 Romanov

(2678) 28 Jobava

(2680) 28 Jobava

(2649) 38 Pashikian

(2656) 28 Jobava

(2676) 30 Romanov

(2679) 42 Rublevsky

(2634) 39 Edouard

(2652) 19 Guliyev

(2676) 36 Smirin

(2678) 40 Bologan

(2629) 40 Bologan

(2640) 38 Pashikian

(2657) 38 Pashikian

(2648) 35 Kulaots

(2627) 41 Beliavsky

(2633) 35 Kulaots

(2636) 35 Kulaots

(2647) 38 Pashikian

(2627) 42 Rublevsky

(2630) 43 Volkov

(2633) 43 Volkov

(2643) 43 Volkov

(2625) 43 Volkov

(2602) 39 Edouard

(2602) 44 Sjugirov

(2620) 39 Edouard

(2594) 44 Sjugirov

(2584) 44 Sjugirov

(2601) 39 Edouard

(2600) 44 Sjugirov

 Clearly the most unlucky player was Mircea-Emilian Parligras, who had a top-ten performance rating by most measures, the second-best Median Buchholz and Buchholz scores in the entire tournament (those were the tiebreakers after performance rating), and in fact had the best performance rating out of the entire +4 group in all three of the alternate calculations. Yet he finished in 14th out of the 29 players on +4, for an overall placement of #29 in the tournament, mostly because the two results that were deleted were both wins, and so he failed to qualify for the World Cup. Ivan Ivanisevic and Peter Heine Nielsen also would have qualified for the World Cup under any of these other three measures, yet failed to qualify under the calculations actually used (finishing #27 and #32, respectively). And of course you can see from the above listings that several players would have failed to qualify for a World Cup spot under any of the three alternative calculations, yet did qualify, typically because they had 1.0/2 removed from their score during the tiebreak calculation. 8 of the 12 who had 1.0/2 removed from their score did qualify, whereas none of the five who had 2.0/2 removed from their score (including Parligras) managed to qualify.

I don't really have a strong recommendation for what should be done here, because I don't know what was communicated to the players beforehand, or even during the tournament. It wouldn't seem right to use either the traditional or Thompson performance measures, given the wording of the rules about discarding the highest and lowest rated opponents, but on the other hand I really dislike the idea of discarding the actual outcomes and letting the remaining percentage score play such a large relative role. So if you had asked me in the middle of the tournament, I would have suggested interpreting the rules to use the "Removing Top/Bottom Ratings" calculation. If you had asked me after the tournament… is it too late?

So certainly one major lesson here is that tournament regulations need to be unambiguous with regard to how performance rating is calculated. This applies wherever performance rating is used: tiebreak rankings, board prizes, etc. However, it still doesn't answer a very interesting question: what method for calculating performance rating works best? I am currently working on analyzing several of these formulas for predictive accuracy, and the preliminary evidence reveals that the most accurate performance rating measure (out of the four listed above) would either be the standard performance rating formula or the "Removing Top/Bottom Ratings" formula.

Interestingly, for 9-round events in particular, the "Removing Top/Bottom Ratings" formula appears to be slightly more accurate than the standard calculation. GM David Navara has also made a very interesting suggestion wherein you calculate the standard performance rating, but exclude any wins that have decreased your overall performance rating, or any losses that have increased your overall performance. This is a neat idea that needs to be carefully assessed, as it does lead to the possibility of comparing performance ratings between two players across differing numbers of games, which is not ideal. In a few days I will release the results of my analysis of predictive accuracy.

Copyright Jeff Sonas / ChessBase


Previous articles by Jeff Sonas on ChessBase.com

The Elo rating system – correcting the expectancy tables
30.03.2011 – In recent years statistician Jeff Sonas has participated in FIDE meetings of "ratings experts" and received access to the historical material from the federation archives. After thorough analysis he has come to some remarkable new conclusions which he will share with our readers in a series of articles. The first gives us an excellent overview of how the rating system works. Very instructive.

The Deloitte/FIDE Chess Rating Challenge
20.02.2011 – Statician Jeff Sonas and Kaggle, a site specializing in data modeling with regular prediction competitions, have launched a new online contest to develop a more accurate chess rating system. Professional services firm Deloitte provides a $10,000 prize to the winner, and FIDE will also bring a top finisher to Athens, Greece to present their rating system. Report and California pictorial.

Can you out-predict Elo? – Competition update
21.09.2010 – Can we devise a more accurate method for measuring chess strength and predicting results than Elo, which has done good service for half a century? Jeff Sonas has given statisticians 65,000 games which they must use to predict the results 7,800 other games. The idea is to find out who can out-perform Elo. In the lead is 28-year-old Portugese biochemist Filipe Maia. Current status.

Impressions from FIDE rating conference 2010
10.06.2010 – The FIDE ratings conference, held last week in Athens, Greece, spent quite a bit of time discussing the problem of rating inflation. Two different opinions met head on: one of chess statistician Jeff Sonas, USA, and one represented by Polish GM Bartlomiej Macieja. The subject matter is not easy to understand, but our colleague Michalis Kaloumenos made a serious effort to do so. Food for thought.

Rating inflation – its causes and possible cures
27.07.2009 – Thirty years ago there was one player in the world rated over 2700. Fifteen years ago there were six. Today there are thirty-three. What is the cause of this "rating inflation". A general improvement of chess skills? A larger number of players in the rating pool? The way the initial ratings are conducted? In this clearly written article statistician Jeff Sonas addresses these questions. Must read!

Rating debate: is 24 the ideal K-factor?
03.05.2009 – FIDE decided to speed up the change in their ratings calculations, then turned more cautious about it. Polish GM Bartlomiej Macieja criticised them for balking, and Jeff Sonas provided compelling statistical reasons for changing the K-factor to 24. Finally John Nunn warned of the disadvantages of changed a well-functioning system. Here are some more interesting expert arguments.

FIDE: We support the increase of the K-factor
29.04.2009 – Yesterday we published a letter by GM Bartlomiej Macieja asking the World Chess Federation not to delay the decision to increase the K-factor in their ratings calculation. Today we received a reply to Maceija's passionate appeal from FIDE, outlining the reasons for the actions. In addition interesting letters from our readers, including one from statistician Jeff Sonas. Opinions and explanations.

Making sense of the FIDE cycle
10.12.2005 – Why, many of our readers have asked, are eight players who got knocked out in round four of the World Cup in Khanty-Mansiysk still playing in the tournament? What, they want to know, is the point? Has it something to do with the 2007 FIDE world championship cycle? It most certainly does. Jeff Sonas explains in meticulous detail. Please concentrate.

The Greatest Chess Player of All Time – Part IV
25.05.2005 – So tell us already, who was the greatest chess performance of all time? After analysing and dissecting many different aspects of this question, Jeff Sonas wraps it up in the final installment of this series, awarding his all-time chess "Oscar" nomination to the overall greatest of all time.

The Greatest Chess Player of All Time – Part III
06.05.2005 – What was the greatest chess performance of all time? Jeff Sonas has analysed the duration different players have stayed at the top of the ratings list, at the highest individual rating spikes and best tournament performances. Today he looks at the most impressive over-all tournament performances in history, and comes up with some very impressive statistics.

The Greatest Chess Player of All Time – Part I
24.04.2005 – Last month Garry Kasparov retired from professional chess. Was he the greatest, most dominant chess player of all time? That is a question that can be interpreted in many different ways, and most answers will be extremely subjective. Jeff Sonas has conducted extensive historical research and applied ruthlesss statistics to seek a solution to an age-old debate.

How (not) to play chess against computers
11.11.2003 – Is there any way that human chess players can withstand the onslaught of increasingly powerful computer opponents? Only by modifying their own playing style, suggests statistician Jeff Sonas, who illustrates a fascinating link between chess aggression and failure against computers. There may still be a chance for humanity. More...

Physical Strength and Chess Expertise
07.11.2003 – How can humans hope to hold their ground in their uphill struggle against chess computers? Play shorter matches, stop sacrificing material, and don't fear the Sicilian Defense, says statistician Jeff Sonas, who also questions the high computer ratings on the Swedish SSDF list. Here is his evidence.

Are chess computers improving faster than grandmasters?
17.10.2003 – The battle between humans and machines over the chessbaord appears to be dead-even – in spite of giant leaps in computer technology. "Don't forget that human players are improving too," says statistician Jeff Sonas, who doesn't think it is inevitable that computers will surpass humans. Here is his statistical evidence.

Man vs Machine – who is winning?
08.10.2003 – Every year computers are becoming stronger at chess, holding their own against the very strongest players. So very soon they will overtake their human counterparts. Right? Not necessarily, says statistician Jeff Sonas, who doesn't believe that computers will inevitably surpass the top humans. In a series of articles Jeff presents empirical evidence to support his claim.

Does Kasparov play 2800 Elo against a computer?
26.08.2003 – On August 24 the well-known statistician Jeff Sonas presented an article entitled "How strong are the top chess programs?" In it he looked at the performance of top programs against humans, and attempted to estimate an Elo rating on the basis of these games. One of the programs, Brutus, is the work of another statistician, Dr Chrilly Donninger, who replies to Jeff Sonas.

Computers vs computers and humans
24.08.2003 – The SSDF list ranks chess playing programs on the basis of 90,000 games. But these are games the computers played against each other. How does that correlate to playing strength against human beings? Statistician Jeff Sonas uses a number of recent tournaments to evaluate the true strength of the programs.

The Sonas Rating Formula – Better than Elo?
22.10.2002 – Every three months, FIDE publishes a list of chess ratings calculated by a formula that Professor Arpad Elo developed decades ago. This formula has served the chess world quite well for a long time. However, statistician Jeff Sonas believes that the time has come to make some significant changes to that formula. He presents his proposal in this milestone article.

The best of all possible world championships
14.04.2002 – FIDE have recently concluded a world championship cycle, the Einstein Group is running their own world championship, and Yasser Seirawan has proposed a "fresh start". Now statistician Jeff Sonas has analysed the relative merits of these three (and 13,000 other possible) systems to find out which are the most practical, effective, inclusive and unbiased. There are some suprises in store (the FIDE system is no. 12,671 on the list, Seirawan's proposal is no. 345). More.

Reports about chess: tournaments, championships, portraits, interviews, World Championships, product launches and more.

Discuss

Rules for reader comments

 
 

Not registered yet? Register