Evaluating Tournament Strength
By Felix Pîrvan, Bucharest
The strength of chess tournaments have long been measured and compared.
In the following, I will briefly discuss two of the already established evaluation
methods and then introduce a new method for strength assessment. Then, I will
show how this new method applies not only to super-tournaments, but to tournaments
in general, regardless of the competition format (be it round robin, Swiss,
knock-out, or any other), and even to matches between two players. Finally,
I'll try to identify other factors that make a tournament important or memorable,
and how the strength itself contributes to the importance of a chess event.
Known measures for a (super-) tournament strength
The categories introduced by FIDE, and still in informal use today, base the
classification on the average Elo rating of the players involved. Known disadvantages
of this method are:
- It does not do justice to older tournaments, due to the inflation of the
Elo ratings.
- It does not evaluate how much each player is involved in the tournament.
- It does not give a good measure for the open tournaments, where the ratings
are rather widely spread.
Another kind of evaluation, described on the Jeff Sonas's Chessmetrics
site, assigns a number of points to each top ten player that competes in
the tournament. As such, 4 points, 4, 3, 3, 2, 2, 1, 1, 1, and 1 point are assigned
to the players from #1 down to #10. Summing up the points gives the tournament
class. Compared to the FIDE categories, this method:
- Offers better representation to older tournaments, as it uses a time-independent
criterion
- Still does not evaluate how much each top ten player is involved in the
tournament
- Only applies to super-elite tournaments, as players below #10 are not taken
into account
Some Examples
Let's first discuss some examples, involving the following tournaments:
- Tournament A: single round robin (RR for short) between the top eight players
- Tournament B: double round robin (RR2) between the top eight players
- Tournament C: knock-out event with two games per round (KO2) between the
top eight players
- Tournament D: RR between the top 15 players
Question: Which is stronger?
Argument 1: A is stronger than C. In tournament A, 28 games
will be played, in contrast to only 14 games in the tournament B (not counting
tie-breaks). This is because in the tournament C, some of the players leave
the competition early, so they don't get to play too many games (some will play
only 2 games, some only 4 games). Even the finalists from the KO2 event only
would have played 6 games in the entire tournament (in contrast to 7 games per
player in the RR event), so it can be said that even they leave the competition
a bit early!
Argument 2: D is stronger than A. Let's take the player #8
as an example. In the tournament A, she will be playing against #1 to #7, and
then go home. In the tournament D, she would be playing against #1 to #7, but
also against #9 to #15, so she would have a harder time in the tournament D.
The same judgment also applies to any other player involved in both tournaments:
the tournament D would be harder for each of them.
Argument 3: B is stronger than D. Again, let's take the player
#8 as an example. In the tournament D, she will be playing a total of 14 games,
against #1 to #7 players, and also against #9 to #15 players. In the tournament
B, she will also be playing 14 games, but first against #1 to #7 players, and
then again against #1 to #7 players. Obviously she would have a harder time
in the tournament B, because the competition there would be stronger. The same
judgment also applies to any other player involved in both tournaments: the
tournament B would be harder for each of them.
So the conclusion is: B is stronger than D, which is stronger than A, which
is stronger than C.
I will now try to draw some ideas from the examples above, which would lead
to a formula for the strength evaluation.
The Strength Evaluation
To evaluate the strength, I have taken into account the following criteria:
- The number of games played by each player should count as an indicator of
the tournament strength
- The strength should be an additive measure, i.e. if a tournament can be
split in smaller tournaments, its strength should be the sum of the strengths
of these smaller tournaments.
- The stronger the games played in a tournament are, the stronger the tournament
should be
Now one conclusion comes to mind: let the strength of a tournament be the sum
of the strengths of all the games played. That leaves us with assessing the
strength of a game. I consider that this can be assessed from two factors:
- The sum of the strengths of the players involved: the stronger the players,
the stronger the game
- The difference between the strengths of the players involved: the smaller
the strength difference between the players, the stronger the game
With the symbols: G for game strength, S for the strength of the stronger player
and W for the strength of the weaker player, we could evaluate G, according
to the criteria above, as: G = (S + W) - (S - W) = 2*W. Dropping the constant
factor as redundant, we get that a measure for the strength of a game between
two players can be taken as the strength of the weaker player.
Note: Jeff Sonas at Chessmetrics adopts the same measure regarding matches
between two players, although I am not aware of his reasons.
So the strength of a tournament would be the sum of the strengths of all games
played in that tournament, while the strength of a game would be the strength
of the weaker player in that game.
Now we are left with assigning strengths to various players. As the measure
for the game strength takes into account only the weaker player, the #1 player
in the world doesn't need to be assigned any strength, as she would never be
the weaker player in any game. I propose the following ranking classes and strength
scale:
A |
#2 |
5 |
B |
#3-5 |
2 |
C |
#6-10 |
1 |
D |
#11-20 |
0.5 |
E |
#21-50 |
0.2 |
F |
#51-100 |
0.1 |
G |
#101-200 |
0.05 |
H |
#201-500 |
0.02 |
I |
#501-1000 |
0.01 |
...and so on...
For a tournament where all the games to be played are known in advance, the
tournament strength will also be known in advance. For a round robin (RR) event,
the tournament strength would be:
R = sum(Ni*Si)
Where R - the strength of the round robin tournament
Ni - the number of players better ranked than the player #i;
Si - the strength of the player #i.
For an RR2 event, R = 2*sum(Ni*Wi) and so on.
For a tournament where not all the games to be played are known in advance,
the tournament strength can be estimated. For Swiss tournaments, estimation
could be made as follows: supposing that all the N players are involved in a
round robin, but instead of playing all the N-1 rounds, they only play M rounds
(usually 9, 11, or 13), so the strength of the tournament would be:
S = R * M / (N-1) where
S - the strength of the Swiss tournament
R - the strength of a round robin tournament played by all the players involved
in the Swiss
M - the number of rounds in the Swiss
N - the number of players in the Swiss
Note: I did a quick check on this evaluation on the European Individual Championship
from 2003. The actual strength of all the games played differed by only 2.8%
from the estimated strength (31.32 instead of 32.22; see below the table of
Swiss tournaments).
For a knock-out tournament, a simple estimation could assume that in each round,
the better ranked player qualifies. But since in most cases, this would obviously
not always happen, the actual strength of all games played would be almost always
lower than this estimation. The bigger and more frequent the surprises, the
lower the actual strength will be.
The Strongest Tournaments
Now let's look at some real tournaments and compute their strength. For recent
tournaments (from July 2000 onwards), I used the most recent FIDE rankings that
applied the date the tournament started, in order to identify the ranking classes
of the top 100 players involved in the tournament. As I didn't have any information
for the players ranked #101 or lower, I estimated the number of such players
in each of the ranking classes #101-200, #201-500, and #501-1000, based on the
January 2011 FIDE ratings, for which I had the threshold ratings of the players
#200, #500, and #1000, and deducted the inflation observed for the player #100.
I neglected the players ranked lower than #1000, not being able to estimate
their ranking class accurately enough. Anyway, those players only marginally
count for something when the strongest tournaments are measured.
For older tournaments, I used the Chessmetrics rankings from the month the
tournament started (except otherwise noted). I am aware that using two different
ranking systems implies there is no common base for the evaluations, but I had
no better choice at hand, and besides, these evaluations are given here only
as an example. The formula stays, while the number of players in each ranking
class could change according to different ranking systems. In all the tables
below, except the matches’ table, the Strength column represents the estimated
strength before the tournament starts. So here they are, the strongest tournaments
ever:
Tourn. |
Year |
#1 |
A |
B |
C |
D |
E |
F |
Other |
Type |
Strength |
London |
1883 |
1 |
1 |
3 |
2 |
4 |
1 |
|
2 |
RR2-RR6* |
153.7 |
Zürich |
1953 |
1 |
|
3 |
5 |
5 |
1 |
|
|
RR2 |
144.6 |
Vienna |
1898 |
|
1 |
3 |
3 |
6 |
4 |
2 |
1 |
RR2** |
141.2 |
Vienna |
1882 |
1 |
1 |
3 |
4 |
3 |
1 |
|
5 |
RR2 |
132.8 |
Carlsbad |
1929 |
|
1 |
3 |
5 |
5 |
6 |
1 |
2 |
RR |
91.3 |
St Petersburg |
1914 |
1 |
1 |
3 |
1 |
3 |
|
|
2 |
RR+RR2*** |
84.5 |
London |
1899 |
1 |
|
3 |
3 |
2 |
3 |
1 |
2 |
RR2 |
83.4 |
Bled**** |
1931 |
1 |
|
3 |
3 |
2 |
3 |
1 |
1 |
RR2 |
83.4 |
Baden-Baden |
1870 |
1 |
1 |
2 |
4 |
1 |
|
|
1 |
RR2 |
82 |
AVRO |
1938 |
1 |
1 |
3 |
3 |
|
|
|
|
RR2 |
82 |
* Each draw was repeated until the third time, when it finally counted as
a draw. This made each minimatch have from two to six games. The estimation
is based on a 1/1/1 ratio between each possible result of a game, i.e. 1-0
/ 0.5-0.5 / 0-1. In the end, there were a total of 73 draws, but I don’t have
information about how many draws each minimatch had. Given that all the draws
count for about 10.4 more rounds (of 7 games per round), I estimate the actual
strength (after the tournament ended) to 149.1.
** The player ranked #65 played only 8 games, and then redrew. There was
a tiebreak of 4 games between #2 and #5, who finished equal first. These facts
brought the actual strength to 146.4.
*** This was an RR between 11 players, than an RR2 final between the best
five. The estimated strength is calculated supposing the strongest five would
qualify for the final. But in fact, #2, #3, #4, #6, and #12 qualified, and
that brought the actual strength down to 60.5.
**** Chessmetrics uses the January 1931 rankings for this tournament, although
it started in August. I did so also, to maintain consistency.
Below is the list of strongest tournaments after Zurich 1953:
Tournament |
Year |
#1 |
A |
B |
C |
D |
E |
F |
G |
H |
I |
Other |
Type |
Strength |
Linares |
1993 |
1 |
1 |
3 |
4 |
4 |
|
1 |
|
|
|
|
RR |
71.3 |
Linares |
1999 |
1 |
1 |
2 |
3 |
1 |
|
|
|
|
|
|
RR2 |
67 |
Moscow |
2001 |
|
|
3 |
5 |
10 |
19 |
25 |
26 |
22 |
10 |
8 |
KO2/KO4/KO8* |
62.28 |
Linares |
1992 |
1 |
1 |
3 |
3 |
3 |
2 |
1 |
|
|
|
|
RR |
60.4 |
Linares |
1998 |
1 |
1 |
2 |
3 |
|
|
|
|
|
|
|
RR2 |
60 |
Wijk aan Zee |
2001 |
1 |
1 |
3 |
4 |
1 |
|
3 |
1 |
|
|
|
RR |
57.45 |
Linares |
1994 |
1 |
1 |
3 |
3 |
2 |
2 |
2 |
|
|
|
|
RR |
56.2 |
Las Palmas |
1996 |
1 |
1 |
3 |
1 |
|
|
|
|
|
|
|
RR2 |
56 |
Montreal |
1979 |
1 |
|
2 |
3 |
2 |
2 |
|
|
|
|
|
RR2 |
55.8 |
New Delhi/Tehran |
2000 |
|
|
3 |
5 |
9 |
17 |
20 |
21 |
14 |
5 |
6 |
KO2/KO4/KO6** |
54.56 |
Moscow |
1967 |
|
1 |
2 |
3 |
4 |
6 |
2 |
|
|
|
|
RR |
51.3 |
Wijk aan Zee |
2008 |
1 |
1 |
1 |
4 |
4 |
3 |
|
|
|
|
|
RR |
51.2 |
Linares-Morelia |
2007 |
1 |
1 |
1 |
3 |
1 |
1 |
|
|
|
|
|
RR2 |
50.8 |
* First 5 rounds were best of 2 games (KO2), the semifinals were best of
4 games (KO4), and the final was best of 8 games (KO8). Tie-breaks are not
counted in the strength estimation. The estimated strength is calculated supposing
that in each round, the better ranked player qualifies. The players were ranked
and paired according to FIDE ranking list from July 2001, although a more
recent list was available. I used this list also.
** First 5 rounds were best of 2 games (KO2), the semifinals were best of
4 games (KO4), and the final was best of 6 games (KO6). Tie-breaks are not
counted in the strength estimation. The estimated strength is calculated supposing
that in each round, the better ranked player qualifies. The first round had
only 36 minimatches, instead of 64, because only 100 players took part, not
128. The players were ranked and paired according to FIDE ranking list from
July 2000, although a more recent list was available. I used this list also.
Below is the list of some of the strongest Swiss tournaments for which I could
find information:
Tourn. |
Year |
#1 |
A |
B |
C |
D |
E |
F |
G |
H |
I |
Others |
Type |
Strength |
Comments |
Istanbul |
2003 |
|
|
|
|
2 |
18 |
29 |
55 |
50 |
22 |
31 |
S13* |
32.22 |
European Championship |
Ohrid |
2001 |
|
|
|
|
3 |
19 |
20 |
43 |
61 |
23 |
34 |
S13 |
27.16 |
European Championship |
Warsaw |
2005 |
|
|
|
1 |
2 |
15 |
24 |
33 |
55 |
30 |
69 |
S13 |
20.16 |
European Championship |
Moscow |
2006 |
|
|
|
|
2 |
13 |
15 |
30 |
30 |
3 |
|
S9* |
16.48 |
Aeroflot |
Plovdiv |
2008 |
|
|
|
|
|
6 |
35 |
42 |
79 |
40 |
135 |
S11* |
15.65 |
European Championship |
* 13-round Swiss
** 9-round Swiss
*** 11-round Swiss
Below there is a list of some other recent tournaments, usually believed to
be among the strongest:
Tournament |
Year |
#1 |
A |
B |
C |
D |
E |
F |
Type |
Strength |
Comments |
Ciudad de Mexico |
2007 |
1 |
|
2 |
2 |
3 |
|
|
RR2 |
44 |
WCC |
Moscow |
2009 |
|
1 |
3 |
4 |
2 |
|
|
RR |
42.5 |
Tal Memorial |
Dortmund |
2001 |
|
1 |
3 |
2 |
|
|
|
RR2 |
42 |
strongest Dortmund |
San Luis |
2005 |
|
1 |
2 |
2 |
2 |
1 |
|
RR2 |
39.8 |
WCC |
Astrakhan |
2010 |
|
|
|
3 |
7 |
3 |
1 |
RR |
31.9 |
strongest Grand Prix |
Sofia |
2005 |
|
1 |
2 |
2 |
1 |
|
|
RR2 |
31 |
strongest Sofia |
Bilbao |
2008 |
1 |
|
2 |
2 |
1 |
|
|
RR2 |
31 |
strongest Bilbao |
Nanjing |
2010 |
1 |
1 |
1 |
|
1 |
2 |
|
RR2 |
24.6 |
strongest Nanjing |
Elista |
2007 |
|
|
1 |
2 |
6 |
5 |
2 |
KO6* |
16.2 |
Candidates |
* Two rounds of 6-game minimatches. The actual strength was a bit lower,
as not always the best player qualified for the second round, and not always
all 6 games were played.
This evaluation method can be applied to any kind of chess event, including
team competitions and matches. The strongest team events were certainly the
Chess Olympiads. Let's take the last Olympiad (Khanty-Mansiysk 2010) as an example.
It was contested over 4 boards and 11 rounds. In the evaluation, I didn't take
into account the reserve player, assuming only the first 4 players play all
the games. This makes the event equivalent to 4 independent Swiss tournaments,
so the strength of the entire Olympiad would be the sum of the strengths of
these four Swiss tournaments. Here they are:
Tournament |
#1 |
A |
B |
C |
D |
E |
F |
G |
H |
I |
Others |
Type |
Strength |
Olympiad 2010, board 1 |
1 |
1 |
2 |
3 |
5 |
11 |
4 |
17 |
18 |
8 |
79 |
S11 |
11.4 |
Olympiad 2010, board 2 |
|
|
|
1 |
2 |
10 |
6 |
7 |
18 |
9 |
96 |
S11 |
3.73 |
Olympiad 2010, board 3 |
|
|
|
1 |
1 |
3 |
9 |
6 |
9 |
15 |
105 |
S11 |
1.86 |
Olympiad 2010, board 4 |
|
|
|
|
1 |
1 |
4 |
10 |
12 |
10 |
111 |
S11 |
1.13 |
Olympiad 2010, Total |
1 |
1 |
2 |
5 |
9 |
25 |
23 |
40 |
57 |
42 |
391 |
S11*4 |
18.2 |
Many of the World Championship matches did not have an a priori fixed length,
so I have taken into account the actual number of games played. These are the
strongest matches ever played (all involved the players #1 and #2):
World Championship Match |
Year |
Games |
Strength |
Comments |
Karpov – Kasparov |
1984 |
48 |
240 |
strongest event of any kind |
Capablanca – Alekhine |
1927 |
34 |
170 |
|
Karpov – Korchnoi |
1978 |
32 |
160 |
|
The Importance of a Tournament
Finally, I will introduce a measure to assess the importance of a tournament.
An important event is a rare event. Rare means there is enough time (or space)
around it. The time span dominated by a tournament A is composed of:
- The time span T1 extending from the last at-least-so-strong previous tournament
until the tournament A
- The time span T2 extending from the tournament A until the next at-least-so-strong
tournament
Of the two time spans, however, the one carrying more meaning is T1. If T1
is large, the tournament A will be remembered as the first tournament of its
strength after many years, or, as they say, it will make history. Also, T1 can
be computed at the time the tournament takes place, depending only on the past.
On the other hand, the size of T2 only means the tournament is followed by a
long period of weaker events. Rankings can be done based on each time span,
or on both. I will only list here the most important tournaments according to
the length of T1, which I consider more meaningful. They are ordered chronologically
for clarity.
Tournament |
Year |
#1 |
A |
B |
C |
D |
E |
F |
Others |
Type |
Strengh |
T1 [years] |
Baden-Baden |
1870 |
1 |
1 |
2 |
4 |
1 |
|
|
1 |
RR2 |
82 |
strongest tournament so far |
Vienna |
1882 |
1 |
1 |
3 |
4 |
3 |
1 |
|
5 |
RR2 |
133 |
strongest tournament so far |
London |
1883 |
1 |
1 |
3 |
2 |
4 |
1 |
|
2 |
RR2-RR6 |
154 |
strongest tournament so far |
Carlsbad |
1929 |
|
1 |
3 |
5 |
5 |
6 |
1 |
1 |
RR |
91.3 |
31 (strongest since Vienna 1898) |
Zürich |
1953 |
1 |
|
3 |
5 |
5 |
1 |
|
|
RR2 |
145 |
70 (strongest since London 1883) |
Montreal |
1979 |
1 |
|
2 |
3 |
2 |
2 |
|
|
RR2 |
55.8 |
26 (strongest since Zürich 1953) |
Linares |
1992 |
1 |
1 |
3 |
3 |
3 |
2 |
1 |
|
RR |
60.4 |
39 (strongest since Zürich 1953) |
Linares |
1993 |
1 |
1 |
3 |
4 |
4 |
|
1 |
|
RR |
71.3 |
40 (strongest since Zürich 1953) |
Conclusions
Although the strength of a tournament may be praised, tournaments are often
remembered (and sometimes forgotten) for other reasons also, not measured here.
Because of that, the results, which strictly address the strength matter, have
probably already raised some eyebrows, not matching too well the common perception.
Factors besides strength that may induce a long-lasting impression are:
-
The outstanding domination of a certain player. Among overwhelming performances
those of Alekhine in Bled 1931 (5.5 points lead from 26 games) and Karpov
at Linares 1994 (2.5 points lead from 13 games) come to mind.
-
The mood of the players. Thrilling, spectacular, or highly complex games
enhance the tournament reputation over the years. On the other hand, if
the players are not in a fighting mood, the fame of the tournament will
suffer.
-
Various factors non-related to chess: organization, prizes, conflicts between
players, political issues etc.
To conclude, the importance of a tournament can be judged by its strength,
but also by putting the tournament in a historic perspective. On the other hand,
many factors combine to give a memorable tournament, strength being only one
of them.
Data sources
 |
About the author
Felix Pîrvan, 34 years old, was born in Pitesti, Romania. He did
intensive swimming training in the four early school years. At the Politehnica
University Bucharest he graduated in the field of Artificial Intelligence
and worked in Bucharest as a programmer for over ten years. Worked for
one year in the field of Natural Language Processing, at the Romanian
Institute for Artificial Intelligence.
As of 2008, Felix is working at MB Telecom, as a programmer in the field
of Image Processing. He takes a keen interest in Computer Vision, Machine
Learning and Data Clustering. He also has a passion for Statistics. In
his free time he plays online correspondence chess, some OTB chess tournaments,
and also enjoys distance runnning and mountain biking. |
Copyright
Felix Pirvan/ChessBase