Assessment of the EU performance calculation
By
Jeff Sonas
With almost 400 players competing at the recently-concluded 2011 European Individual
Championship in Aix-les-Bains, France, including 85 players rated 2600+, it
is no surprise that the final leaderboard was extremely crowded at the top.
Four players tied for the overall lead scoring +6 (8.5/11), eleven players finished
half a point back with +5 (8/11), and there was a big group of 29 players on
+4 (7.5/11). The exact final ranks were not determined by rapid playoffs, but
by a series of tiebreaker calculations, the first of which was performance rating.
The performance rating calculation can be fraught with controversy, and at least
one player has already lodged
an official protest against the final standings.
Because there was so much at stake in the performance rating calculation here
(including determining the individual European Champion as well as a number
of qualification spots into the World Cup) I thought it might be interesting
to take a closer look at the calculations and the resultant numbers.
Part 1 – Reading the rules
Of course we ought to start with the
official rules for the tournament itself. Section 8 clarifies that (per
ECU Tournament Rules D3.2) there are no tie-break matches, and that the tiebreaker
rules will follow section B6.2 of the ECU
Tournament Rules, namely:
6.2 Tie-breaking in individual competitions.
The order of players that finish with the same number of points shall be determined
by application of the following tie-breaking procedures in sequence, proceeding
from (a) to (b) to (c) to (d) the extent required:
(a) Performance Rating
(b) Median-Buchholz 1, the highest number wins;
(c) Buchholz, the highest number wins;
(d) Number of wins, the highest number wins.
In case of (a) the highest and the lowest rated opponent will be deleted and
the maximum rating difference of two players shall be 400 points.
In the case of unplayed games for the calculation of (a), (b) and (c) the
current FIDE Tournament Rules shall be applied.
Note that if you had an unplayed game (such as a forfeit or a bye), it mentions
"the current FIDE Tournament Rules". I think it is referring to this
part of the FIDE Rating Handbook, where it simply says that in the handling
of unplayed games for tie-break purposes, the result shall be counted as a draw
against the player himself.
Okay, fair enough. So if players finish on the same score, we will break the
tie by calculating a "Performance Rating", where the highest and the
lowest rated opponent will be deleted and the maximum rating difference of two
players shall be 400 points. Unfortunately there is no further detail provided
in either the Aix-les-Bains or EU regulations regarding the exact method for
calculating performance rating.
Part 2 – Deciding how to calculate performance rating
So with all of those ties at the top, and 23 qualification spots available
for the World Cup, the specific details of this performance rating calculation
are quite significant. Unfortunately there are a few different ways to calculate
performance rating; let's look at some of these.
(2a) Simple Performance Rating
The simplest approach, and I think the most generally accepted one,
is to treat all of a player's games identically, where we average the opponent
ratings and then add a modifier, taken from the standard Elo table (see section
8.1(a) here) for converting from fractional score to rating difference.
That follows the recommendation
provided by FIDE (see section 5. Tiebreak Rules using Ratings). As an example
let's look at the four players who finished on +6:
Player Name |
V.Potkin |
R.Wojtaszek |
J.Polgar |
A.Moiseenko |
Player Rating |
2653 |
2711 |
2686 |
2673 |
Round 1 |
1-0 vs. 2347 |
1/2 vs. 2423 |
1/2 vs. 2415 |
1-0 vs. 2388 |
Round 2 |
1-0 vs. 2532 |
1-0 vs. 2422 |
1-0 vs. 2409 |
1-0 vs. 2543 |
Round 3 |
1-0 vs. 2595 |
1-0 vs. 2590 |
1-0 vs. 2547 |
1-0 vs. 2592 |
Round 4 |
1-0 vs. 2575 |
1/2 vs. 2615 |
1-0 vs. 2600 |
1/2 vs. 2598 |
Round 5 |
1-0 vs. 2616 |
1-0 vs. 2611 |
1/2 vs. 2601 |
1/2 vs. 2584 |
Round 6 |
1/2 vs. 2673 |
1/2 vs. 2626 |
0-1 vs. 2614 |
1/2 vs. 2584 |
Round 7 |
1/2 vs. 2677 |
1-0 vs. 2631 |
1-0 vs. 2595 |
1-0 vs. 2626 |
Round 8 |
1/2 vs. 2711 |
1/2 vs. 2653 |
1-0 vs. 2584 |
0-1 vs. 2626 |
Round 9 |
1/2 vs. 2598 |
1-0 vs. 2673 |
1-0 vs. 2626 |
1-0 vs. 2616 |
Round 10 |
1-0 vs. 2707 |
1-0 vs. 2634 |
1-0 vs. 2626 |
1-0 vs. 2534 |
Round 11 |
1/2 vs. 2686 |
1/2 vs. 2730 |
1/2 vs. 2653 |
1-0 vs. 2683 |
Overall Score |
8.5/11 |
8.5/11 |
8.5/11 |
8.5/11 |
Pct Score |
77% |
77% |
77% |
77% |
Elo Modifier |
+211 |
+211 |
+211 |
+211 |
Avg Opp Rating |
2611 |
2601 |
2570 |
2579 |
Performance |
2822 |
2812 |
2781 |
2790 |
So by this simple measure, the final standings would have been #1 Potkin, #2
Wojtaszek, #3 Moiseenko, and #4 Polgar. However, the actual standings had Polgar
finishing in third place, and Moiseenko finishing in fourth place. Why was this?
It is because this particular tournament used a different performance rating
calculation.
There is a concern that extreme rating differences can skew the performance
rating too much. It is easy to construct examples wherein a player's performance
rating can go down despite a victory, because their latest opponent's rating
was so low, or conversely that it can go up despite a loss, because their latest
opponent's rating was so high. For instance if you start a ten-round event by
scoring 50% in 9 games against 2200-rated average opposition, that is (of course)
a 2200 performance rating through 9 rounds. If you then defeat a 1200-rated
player in the final round, it yields a 55% score against average opponent strength
of 2100, for a performance rating of 2136 (less than 2200 despite your victory).
Conversely, if you had instead lost to a 2700-rated player in the final round,
that gives a 45% score against average opponent strength of 2250, for a performance
rating of 2214 (more than 2200 despite your loss). These examples, though they
are somewhat artificial, lead to the desire to "blunt" the impact
of extreme rating differences, such as the approach taken in the Aix-les-Bains
tournament.
(2b) EU 2011 Performance Rating
The key difference with Aix-les-Bains comes from that special clause
"the highest and the lowest rated opponent will be deleted". Instead
of an eleven-game percentage score and an average of eleven different opponent
ratings, the calculation used for this tournament identifies each player's highest-rated
opponent, and lowest-rated opponent, and ignores both games. Thus it effectively
is looking at a nine-game percentage score, and an average of nine different
opponent ratings, based on the selected subset of nine games. So instead we
get this:
Player Name |
V.Potkin |
R.Wojtaszek |
J.Polgar |
A.Moiseenko |
Player Rating |
2653 |
2711 |
2686 |
2673 |
Round 1 |
1-0 vs. 2347
|
1/2 vs. 2423 |
1/2 vs. 2415 |
1-0 vs. 2388
|
Round 2 |
1-0 vs. 2532 |
1-0 vs. 2422
|
1-0 vs. 2409
|
1-0 vs. 2543 |
Round 3 |
1-0 vs. 2595 |
1-0 vs. 2590 |
1-0 vs. 2547 |
1-0 vs. 2592 |
Round 4 |
1-0 vs. 2575 |
1/2 vs. 2615 |
1-0 vs. 2600 |
1/2 vs. 2598 |
Round 5 |
1-0 vs. 2616 |
1-0 vs. 2611 |
1/2 vs. 2601 |
1/2 vs. 2584 |
Round 6 |
1/2 vs. 2673 |
1/2 vs. 2626 |
0-1 vs. 2614 |
1/2 vs. 2584 |
Round 7 |
1/2 vs. 2677 |
1-0 vs. 2631 |
1-0 vs. 2595 |
1-0 vs. 2626 |
Round 8 |
1/2 vs. 2711
|
1/2 vs. 2653 |
1-0 vs. 2584 |
0-1 vs. 2626 |
Round 9 |
1/2 vs. 2598 |
1-0 vs. 2673 |
1-0 vs. 2626 |
1-0 vs. 2616 |
Round 10 |
1-0 vs. 2707 |
1-0 vs. 2634 |
1-0 vs. 2626 |
1-0 vs. 2534 |
Round 11 |
1/2 vs. 2686 |
1/2 vs. 2730
|
1/2 vs. 2653
|
1-0 vs. 2683
|
Overall Score |
7/9 |
7/9 |
7/9 |
6.5/9 |
Pct Score |
78% |
78% |
78% |
72% |
Elo Modifier |
+220 |
+220 |
+220 |
+166 |
Avg Opp Rating |
2629 |
2606 |
2579 |
2589 |
Performance |
2849 |
2826 |
2799 |
2755 |
You can see that for the first three players, the impact of removing the games
against highest-rated and lowest-rated opponents had the effect of removing
1.5/2, leaving a 7/9 (78%) score whereas for Moiseenko, it actually removed
2/2, lowering his score to 6.5/9 (72%). This has such a large impact that it
hardly matters what you calculate for the average opponent ratings; Moiseenko
already has a gap of more than 50 points to make up (since the others will get
+220 for their 78% scores and he will only get +166 for his 72% score). And
ultimately his performance rating under this scenario (2755) is more than 40
points behind the other three.
You can go through all four players and point to small variations that would
have made a big difference to their performance rating calculations. Potkin's
opponents in rounds 8 and 10 were within five Elo points of each other; had
their ratings been switched then it would have been Potkin who had 2/2 subtracted
from his score, and he would have ended up in 4th place instead. Wojtaszek's
first two opponents were rated 2423 and 2422; had those been switched then he
would have only had 1.0/2 subtracted, leading to a fabulous 83% score from the
remaining nine games and an easy first place. The same thing for Polgar; her
first two opponents were rated 2415 and 2409; had it been switched then she
would have only had 1.0/2 subtracted and she would have won first place. And
for Moiseenko, until the final round his two strongest opponents were both rated
2626 (one of which he beat and the other he lost to) so it would have been a
further tiebreaker that would have left him either having 2/2 subtracted from
his score (and fourth place in the tiebreak) or having 1.0/2 subtracted from
his score (and likely first place).
I don't like this approach. I think it is too discontinuous, too dependent
on tiny rating point differences as we have just seen. And I find it quite unfortunate
that in an attempt to compensate for perceived problems in averaging opponents'
ratings together, the result was a calculation that was dominated by the outcomes
(rather than the opponent ratings) in those two removed games. It is basically
saying that the secondary goal for these players, in addition to scoring as
many overall points as possible, is to do particularly well against the middle
of the pack. I really think it would be fairest to say that the 11 outcomes
all have equal relevance.
(2c) Performance Rating Removing Top/Bottom ratings
A methodology that perhaps would have been more in the spirit of the "middle
of the pack" approach, would be to say that we retain all eleven game outcomes,
but discard the highest-rated opponent and lowest-rated opponent, for purposes
of calculating the average opponent rating. One might even argue that this is
a perfectly valid way to read the phrase "the highest and the lowest rated
opponent will be deleted". It certainly seems to me to be a fairer and
smoother way. So under this approach, these four players would still retain
their 77% score, but we would only use the middle nine opponent ratings when
calculating the average opponent rating:
Player Name |
V.Potkin |
R.Wojtaszek |
J.Polgar |
A.Moiseenko |
Player Rating |
2653 |
2711 |
2686 |
2673 |
Round 1 |
1-0 vs. 2347 |
1/2 vs. 2423 |
1/2 vs. 2415 |
1-0 vs. 2388 |
Round 2 |
1-0 vs. 2532 |
1-0 vs. 2422 |
1-0 vs. 2409 |
1-0 vs. 2543 |
Round 3 |
1-0 vs. 2595 |
1-0 vs. 2590 |
1-0 vs. 2547 |
1-0 vs. 2592 |
Round 4 |
1-0 vs. 2575 |
1/2 vs. 2615 |
1-0 vs. 2600 |
1/2 vs. 2598 |
Round 5 |
1-0 vs. 2616 |
1-0 vs. 2611 |
1/2 vs. 2601 |
1/2 vs. 2584 |
Round 6 |
1/2 vs. 2673 |
1/2 vs. 2626 |
0-1 vs. 2614 |
1/2 vs. 2584 |
Round 7 |
1/2 vs. 2677 |
1-0 vs. 2631 |
1-0 vs. 2595 |
1-0 vs. 2626 |
Round 8 |
1/2 vs. 2711 |
1/2 vs. 2653 |
1-0 vs. 2584 |
0-1 vs. 2626 |
Round 9 |
1/2 vs. 2598 |
1-0 vs. 2673 |
1-0 vs. 2626 |
1-0 vs. 2616 |
Round 10 |
1-0 vs. 2707 |
1-0 vs. 2634 |
1-0 vs. 2626 |
1-0 vs. 2534 |
Round 11 |
1/2 vs. 2686 |
1/2 vs. 2730 |
1/2 vs. 2653 |
1-0 vs. 2683 |
Overall Score |
8.5/11 |
8.5/11 |
8.5/11 |
8.5/11 |
Pct Score |
77% |
77% |
77% |
77% |
Elo Modifier |
+211 |
+211 |
+211 |
+211 |
Avg Opp Rating |
2629 |
2606 |
2579 |
2589 |
Performance |
2840 |
2817 |
2790 |
2800 |
And so we can see that under this approach, it would have Moiseenko rather
than Polgar who would have finished in third place. I will also show you at
the end how the World Cup qualifiers would have differed, depending on which
performance rating calculation was used.
(2d) Thompson Performance Rating
Ken Thompson has previously suggested an eminently reasonable calculation for
performance rating, though it is much harder to calculate than the above approaches.
The idea is actually quite simple: if somebody scores 8.5/11 against a particular
set of ratings, then the "performance rating" will be whatever FIDE
rating would have yielded an expected score of exactly 8.5/11 against those
exact opponents. More generally, performance rating ought to represent the rating
that would have led to an expected score exactly the same as the score that
was actually achieved, given the individual opponents' ratings.
Another great thing about this calculation is that we don't have to care at
all what the player's current rating is, yet we can still apply the "400-point
rule" that limits the expected score in the case of very large rating differences.
It does seem a bit strange to penalize/reward a player in their performance
rating calculation, based on what their current rating actually is, but any
approach that wants to implement that "maximum rating difference of two
players shall be 400 points" constraint would have to either use the Thompson-like
approach, or use the player's own rating, which seems a bit counter-intuitive.
The whole point of performance rating is that it provides a way to describe
the player's performance skill based only on results, without having to use
the player's current rating.
This calculation becomes trickier than we would like, because the Elo Expectancy
Table is not continuous; it only provides the expected score out to two decimal
points, and sometimes this leads to zero solutions or multiple solutions. So
you would ideally use Elo's exact formula instead of the Expectancy Table, although
I like to split the difference and use linear interpolation on the official
Expectancy Table. In addition the only way to really perform the Thompson calculation
is to do a "binary search", where you keep trying different proposed
ratings, then see whether it leads to an expected score that is too low or too
high, and then adjust the proposed rating up or down accordingly until you get
closest to the desired expected score. Finding the best fit out to 0.1 Elo points
gives us these performance ratings for the four top finishers:
Player |
|
Thompson Performance |
V.Potkin |
|
2841.0 |
R.Wojtaszek |
|
2825.8 |
J.Polgar |
|
2792.0 |
A.Moiseenko |
|
2800.5 |
Part 3 – Alternative final standings
In the detailed examples provided above, we can see that for three reasonable
alternative ways to calculate performance rating, the third place finisher would
have been Moiseenko rather than Polgar. And as you might expect, there would
be similar variations in the qualification for the World Cup. There were 23
World Cup slots available in this tournament, with three of the top finishers
(V.Potkin, P.Svidler, and R.Mamedov) already having qualified, as had two other
players who also finished on +4 (B.Jobava and A.Pashikian). Since the top four
scored +6, and the next eleven scored +5, and three of those fifteen players
were already in, that meant twelve of the 23 World Cup spots were claimed by
players on +6 or +5, leaving 11 slots for the players having the best tiebreak
scores among the whole group on +4. So the lucky ones were those finishing in
#16 through #26, and the unlucky ones were those finishing in #27 through #44:
EU 2011
Performance |
Traditional
Performance |
Removing Top/
Bottom Ratings |
Thompson
Performance |
(2849) 1 Potkin |
(2822) 1 Potkin |
(2840) 1 Potkin |
(2841) 1 Potkin |
(2826) 2 Wojtaszek |
(2812) 2 Wojtaszek |
(2817) 2 Wojtaszek |
(2826) 2 Wojtaszek |
(2799) 3 Polgar Judit |
(2790) 4 Moiseenko |
(2800) 4 Moiseenko |
(2801) 4 Moiseenko |
(2755) 4 Moiseenko |
(2781) 3 Polgar Judit |
(2790) 3 Polgar Judit |
(2792) 3 Polgar Judit |
(2819) 5 Vallejo |
(2768) 6 Ragger |
(2792) 6 Ragger |
(2790) 6 Ragger |
(2783) 6 Ragger |
(2764) 5 Vallejo |
(2775) 7 Feller |
(2775) 7 Feller |
(2766) 7 Feller |
(2763) 7 Feller |
(2774) 5 Vallejo |
(2769) 5 Vallejo |
(2751) 8 Svidler |
(2757) 8 Svidler |
(2760) 8 Svidler |
(2764) 8 Svidler |
(2751) 9 Mamedov |
(2754) 9 Mamedov |
(2760) 9 Mamedov |
(2760) 9 Mamedov |
(2741) 10 Vitiugov |
(2744) 10 Vitiugov |
(2750) 10 Vitiugov |
(2750) 10 Vitiugov |
(2732) 11 Zhigalko |
(2740) 13 Korobov |
(2747) 13 Korobov |
(2750) 13 Korobov |
(2719) 12 Jakovenko |
(2735) 14 Inarkiev |
(2745) 14 Inarkiev |
(2737) 14 Inarkiev |
(2697) 13 Korobov |
(2731) 11 Zhigalko |
(2741) 11 Zhigalko |
(2737) 11 Zhigalko |
(2695) 14 Inarkiev |
(2704) 12 Jakovenko |
(2728) 12 Jakovenko |
(2728) 12 Jakovenko |
(2633) 15 Postny |
(2676) 15 Postny |
(2683) 15 Postny |
(2704) 15 Postny |
|
(2776) 16 Azarovi |
(2735) 29 Parligras |
(2762) 29 Parligras |
(2762) 29 Parligras |
(2771) 17 Khairullin |
(2723) 16 Azarovi |
(2747) 20 Zherebukh |
(2759) 20 Zherebukh |
(2754) 18 Kobalia |
(2720) 17 Khairullin |
(2743) 16 Azarovi |
(2746) 16 Azarovi |
(2739) 19 Guliyev |
(2716) 18 Kobalia |
(2738) 17 Khairullin |
(2737) 17 Khairullin |
(2739) 20 Zherebukh |
(2716) 22 Iordachescu |
(2733) 22 Iordachescu |
(2734) 22 Iordachescu |
(2728) 21 Riazantsev |
(2712) 20 Zherebukh |
(2724) 26 Motylev |
(2728) 31 Esen Baris |
(2725) 22 Iordachescu |
(2710) 26 Motylev |
(2721) 18 Kobalia |
(2728) 27 Ivanisevic |
(2722) 23 Lupulescu |
(2707) 32 Nielsen |
(2720) 27 Ivanisevic |
(2723) 18 Kobalia |
(2718) 24 Mcshane |
(2704) 27 Ivanisevic |
(2715) 31 Esen Baris |
(2718) 19 Guliyev |
(2717) 25 Fridman |
(2693) 33 Cheparinov |
(2711) 32 Nielsen |
(2715) 26 Motylev |
(2716) 26 Motylev |
(2692) 37 Saric |
(2706) 33 Cheparinov |
(2714) 32 Nielsen |
|
(2712) 27 Ivanisevic |
(2687) 21 Riazantsev |
(2706) 19 Guliyev |
(2708) 37 Saric |
(2711) 28 Jobava |
(2687) 34 Gustafsson |
(2704) 37 Saric |
(2698) 33 Cheparinov |
(2709) 29 Parligras |
(2684) 25 Fridman |
(2695) 34 Gustafsson |
(2695) 34 Gustafsson |
(2709) 30 Romanov |
(2684) 24 Mcshane |
(2695) 21 Riazantsev |
(2694) 21 Riazantsev |
(2707) 31 Esen Baris |
(2677) 23 Lupulescu |
(2689) 23 Lupulescu |
(2692) 23 Lupulescu |
(2703) 32 Nielsen |
(2675) 36 Smirin |
(2685) 24 Mcshane |
(2691) 25 Fridman |
(2698) 33 Cheparinov |
(2673) 40 Bologan |
(2684) 25 Fridman |
(2689) 24 Mcshane |
(2687) 34 Gustafsson |
(2672) 42 Rublevsky |
(2682) 40 Bologan |
(2685) 36 Smirin |
(2669) 35 Kulaots |
(2669) 31 Esen Baris |
(2680) 41 Beliavsky |
(2685) 41 Beliavsky |
(2668) 36 Smirin |
(2668) 41 Beliavsky |
(2680) 42 Rublevsky |
(2682) 30 Romanov |
(2651) 37 Saric |
(2668) 30 Romanov |
(2678) 28 Jobava |
(2680) 28 Jobava |
(2649) 38 Pashikian |
(2656) 28 Jobava |
(2676) 30 Romanov |
(2679) 42 Rublevsky |
(2634) 39 Edouard |
(2652) 19 Guliyev |
(2676) 36 Smirin |
(2678) 40 Bologan |
(2629) 40 Bologan |
(2640) 38 Pashikian |
(2657) 38 Pashikian |
(2648) 35 Kulaots |
(2627) 41 Beliavsky |
(2633) 35 Kulaots |
(2636) 35 Kulaots |
(2647) 38 Pashikian |
(2627) 42 Rublevsky |
(2630) 43 Volkov |
(2633) 43 Volkov |
(2643) 43 Volkov |
(2625) 43 Volkov |
(2602) 39 Edouard |
(2602) 44 Sjugirov |
(2620) 39 Edouard |
(2594) 44 Sjugirov |
(2584) 44 Sjugirov |
(2601) 39 Edouard |
(2600) 44 Sjugirov |
Clearly the most unlucky player was Mircea-Emilian Parligras,
who had a top-ten performance rating by most measures, the second-best Median
Buchholz and Buchholz scores in the entire tournament (those were the tiebreakers
after performance rating), and in fact had the best performance rating out of
the entire +4 group in all three of the alternate calculations. Yet he finished
in 14th out of the 29 players on +4, for an overall placement of #29 in the
tournament, mostly because the two results that were deleted were both wins,
and so he failed to qualify for the World Cup. Ivan Ivanisevic and Peter Heine
Nielsen also would have qualified for the World Cup under any of these other
three measures, yet failed to qualify under the calculations actually used (finishing
#27 and #32, respectively). And of course you can see from the above listings
that several players would have failed to qualify for a World Cup spot under
any of the three alternative calculations, yet did qualify, typically because
they had 1.0/2 removed from their score during the tiebreak calculation. 8 of
the 12 who had 1.0/2 removed from their score did qualify, whereas none of the
five who had 2.0/2 removed from their score (including Parligras) managed to
qualify.
I don't really have a strong recommendation for what should be done here, because
I don't know what was communicated to the players beforehand, or even during
the tournament. It wouldn't seem right to use either the traditional or Thompson
performance measures, given the wording of the rules about discarding the highest
and lowest rated opponents, but on the other hand I really dislike the idea
of discarding the actual outcomes and letting the remaining percentage score
play such a large relative role. So if you had asked me in the middle of the
tournament, I would have suggested interpreting the rules to use the "Removing
Top/Bottom Ratings" calculation. If you had asked me after the tournament…
is it too late?
So certainly one major lesson here is that tournament regulations need to be
unambiguous with regard to how performance rating is calculated. This applies
wherever performance rating is used: tiebreak rankings, board prizes, etc. However,
it still doesn't answer a very interesting question: what method for calculating
performance rating works best? I am currently working on analyzing several of
these formulas for predictive accuracy, and the preliminary evidence reveals
that the most accurate performance rating measure (out of the four listed above)
would either be the standard performance rating formula or the "Removing
Top/Bottom Ratings" formula.
Interestingly, for 9-round events in particular, the "Removing Top/Bottom
Ratings" formula appears to be slightly more accurate than the standard
calculation. GM David Navara has also made a very interesting suggestion wherein
you calculate the standard performance rating, but exclude any wins that have
decreased your overall performance rating, or any losses that have increased
your overall performance. This is a neat idea that needs to be carefully assessed,
as it does lead to the possibility of comparing performance ratings between
two players across differing numbers of games, which is not ideal. In a few
days I will release the results of my analysis of predictive accuracy.
Copyright
Jeff Sonas / ChessBase
Previous articles by Jeff Sonas on ChessBase.com

|
The Elo rating system – correcting the expectancy tables
30.03.2011 – In recent years statistician Jeff
Sonas has participated in FIDE meetings of "ratings experts" and received
access to the historical material from the federation archives. After
thorough analysis he has come to some remarkable new conclusions which
he will share with our readers in a series of articles. The first gives
us an excellent overview of how the rating system works. Very
instructive. |

|
The Deloitte/FIDE Chess Rating Challenge
20.02.2011 – Statician Jeff Sonas and Kaggle,
a site specializing in data modeling with regular prediction competitions,
have launched a new online contest to develop a more accurate chess rating
system. Professional services firm Deloitte provides a $10,000 prize to
the winner, and FIDE will also bring a top finisher to Athens, Greece
to present their rating system. Report
and California pictorial. |

|
Can you out-predict Elo? – Competition update
21.09.2010 – Can we devise a more accurate
method for measuring chess strength and predicting results than Elo, which
has done good service for half a century? Jeff Sonas has given statisticians
65,000 games which they must use to predict the results 7,800 other games.
The idea is to find out who can out-perform Elo. In the lead is 28-year-old
Portugese biochemist Filipe Maia. Current
status. |

|
Impressions from FIDE rating conference 2010
10.06.2010 – The FIDE ratings conference, held
last week in Athens, Greece, spent quite a bit of time discussing the
problem of rating inflation. Two different opinions met head on: one of
chess statistician Jeff Sonas, USA, and one represented by Polish GM Bartlomiej
Macieja. The subject matter is not easy to understand, but our colleague
Michalis Kaloumenos made a serious effort to do so. Food
for thought. |

|
Rating inflation – its causes and possible cures
27.07.2009 – Thirty years ago there was one
player in the world rated over 2700. Fifteen years ago there were six.
Today there are thirty-three. What is the cause of this "rating inflation".
A general improvement of chess skills? A larger number of players in the
rating pool? The way the initial ratings are conducted? In this clearly
written article statistician Jeff Sonas addresses these questions. Must
read! |

|
Rating debate: is 24 the ideal K-factor?
03.05.2009 – FIDE decided to speed up the change
in their ratings calculations, then turned more cautious about it. Polish
GM Bartlomiej Macieja criticised them for balking, and Jeff Sonas provided
compelling statistical reasons for changing the K-factor to 24. Finally
John Nunn warned of the disadvantages of changed a well-functioning system.
Here are some more interesting
expert arguments. |

|
FIDE: We support the increase of the K-factor
29.04.2009 – Yesterday we published a letter
by GM Bartlomiej Macieja asking the World Chess Federation not to delay
the decision to increase the K-factor in their ratings calculation. Today
we received a reply to Maceija's passionate appeal from FIDE, outlining
the reasons for the actions. In addition interesting letters from our
readers, including one from statistician Jeff Sonas. Opinions
and explanations. |

|
Making sense of the FIDE cycle
10.12.2005 – Why, many of our readers have
asked, are eight players who got knocked out in round four of the World
Cup in Khanty-Mansiysk still playing in the tournament? What, they want
to know, is the point? Has it something to do with the 2007 FIDE world
championship cycle? It most certainly does. Jeff Sonas explains in meticulous
detail. Please
concentrate. |

|
The Greatest Chess Player of All Time – Part IV
25.05.2005 – So tell us already, who was the
greatest chess performance of all time? After analysing and dissecting
many different aspects of this question, Jeff Sonas wraps it up in the
final installment of this series, awarding his all-time chess "Oscar"
nomination to the overall greatest
of all time. |

|
The Greatest Chess Player of All Time – Part III
06.05.2005 – What was the greatest chess performance
of all time? Jeff Sonas has analysed the duration different players have
stayed at the top of the ratings list, at the highest individual rating
spikes and best tournament performances. Today he looks at the most impressive
over-all tournament performances in history, and comes up with some very
impressive statistics. |

|
The Greatest Chess Player of All Time – Part I
24.04.2005 – Last month Garry Kasparov retired
from professional chess. Was he the greatest, most dominant chess player
of all time? That is a question that can be interpreted in many different
ways, and most answers will be extremely subjective. Jeff Sonas has conducted
extensive historical research and applied ruthlesss statistics to seek
a solution
to an age-old debate. |

|
How (not) to play chess against computers
11.11.2003 – Is there any way that human chess
players can withstand the onslaught of increasingly powerful computer
opponents? Only by modifying their own playing style, suggests statistician
Jeff Sonas, who illustrates a fascinating link between chess aggression
and failure against computers. There may still be a chance for humanity.
More...
|

|
Physical Strength and Chess Expertise
07.11.2003 – How can humans hope to hold their
ground in their uphill struggle against chess computers? Play shorter
matches, stop sacrificing material, and don't fear the Sicilian Defense,
says statistician Jeff Sonas, who also questions the high computer ratings
on the Swedish SSDF list. Here
is his evidence. |

|
Are chess computers improving faster than grandmasters?
17.10.2003 – The battle between humans and
machines over the chessbaord appears to be dead-even – in spite of giant
leaps in computer technology. "Don't forget that human players are improving
too," says statistician Jeff Sonas, who doesn't think it is inevitable
that computers will surpass humans. Here is his statistical
evidence. |

|
Man vs Machine – who is winning?
08.10.2003 – Every year computers are becoming
stronger at chess, holding their own against the very strongest players.
So very soon they will overtake their human counterparts. Right? Not necessarily,
says statistician Jeff Sonas, who doesn't believe that computers will
inevitably surpass the top humans. In a series of articles Jeff presents
empirical
evidence to support his claim. |

|
Does Kasparov play 2800 Elo against a computer?
26.08.2003 – On August 24 the well-known statistician
Jeff Sonas presented an article entitled "How
strong are the top chess programs?" In it he looked at the performance
of top programs against humans, and attempted to estimate an Elo rating
on the basis of these games. One of the programs, Brutus, is the work
of another statistician, Dr Chrilly Donninger, who replies
to Jeff Sonas. |

|
Computers vs computers and humans
24.08.2003 – The SSDF
list ranks chess playing programs on the basis of 90,000 games. But
these are games the computers played against each other. How does that
correlate to playing strength against human beings? Statistician Jeff
Sonas uses a number of recent tournaments to evaluate the true
strength of the programs. |

|
The Sonas Rating Formula – Better than Elo?
22.10.2002 – Every three months, FIDE publishes
a list of chess ratings calculated by a formula that Professor Arpad Elo
developed decades ago. This formula has served the chess world quite well
for a long time. However, statistician Jeff Sonas believes that the time
has come to make some significant changes to that formula. He presents
his proposal in this milestone
article. |

|
The best of all possible world championships
14.04.2002 – FIDE have recently concluded a
world championship
cycle, the Einstein Group is running their own world
championship, and Yasser Seirawan has proposed a "fresh
start". Now statistician Jeff Sonas has analysed the relative merits
of these three (and 13,000 other possible) systems to find out which are
the most practical, effective, inclusive and unbiased. There are some
suprises in store (the FIDE system is no. 12,671 on the list, Seirawan's
proposal is no. 345). More. |