On the Importance of Embracing Risk in Tournaments
By Darse Billings
In two-player games like chess, if we want to know who the best player is,
we usually play tournaments. However, the player who wins the most tournaments
is not necessarily the best player!
Read that last sentence again. The objectively best player may not be the
favourite to win a tournament. That is a fact.
Here's the key point to explain this apparent riddle: some players may have
a propensity to draw more games, and that hurts their chances of winning a
tournament. Their quiet, positional style reduces the risk of a loss; whereas
a sharp tactical style tends to win or lose games, rather than draw. The more
conservative player might be the strongest player in the world – the favourite
to win a match against any other player – but still win less than his fair
share of tournaments. (And by "his", we of course mean "his,
her, or its").
In terms of game theory, a tournament with n players is really an n-player
meta-game, even though each game within the tournament is played between two
players.
Over many tournaments, the strongest player will earn the highest average
score, and the best win-loss record, but might nevertheless be far behind the
leaders in terms of most tournament wins (first place, or equal first).
Conversely, the sharp win-or-lose player will win more than his fair share
of tournaments. The sharp player will also have more than his fair share of
last-place finishes, but that is relatively unimportant. (In terms of game
theory, we say that the utility function is non-uniform). The higher variance
in outcomes ensures a less central probability distribution over possible scores,
resulting in a higher probability of finishing first (or last). We might say
"variance is our friend".
To demonstrate this fact empirically, a simple computer simulation was written
that reflects players of equal strength but different styles. Each player is
defined with a number that reflects their personal proclivity toward a drawn
game.
Let's look at the two extremes. Player A (aggressive) abhors draws, and will
do anything within reason to avoid them, always looking for a decisive result,
win or lose. Player Q (quiet) never takes any unnecessary risks, strives to
accrue small advantages, hoping to convert them into a win in the endgame,
but is always content with a draw if that doesn't work out. If the game is
not a draw, then both players have a 50-50 chance of winning or losing, so
they are of objectively equal strength.
We will assign each player a value for "drawishness", from 0% to
100%. Player A is our 0% model, Player Q is 100%. Other players are in between
these extremes. When two players play each other, we will average their drawishness
values to determine the probability that the game ends in a draw. In the case
of Player A versus Player Q, 50% of the games will end in a draw, 25% in a
win for A, and 25% in a win for Q. If players with drawishness of 40% and 80%
play each other, then 60% of games end in draws, 20% wins, 20% losses.
Now we will define a roster of players, and play many complete round-robin
tournaments. The same conclusions will hold for Swiss system tournaments –
even more so, because short-term variance is even more important for that format.
In these experiments we will include all players using 10% increments in drawishness,
giving us 11 players in total.
We ran 1.1 million round-robin tournaments. For each tournament, all the players
sharing first place split the tournament win evenly (for example, a three-way
tie for first would give each player one-third of a tournament win). Since
all 11 players are of equal strength, each would be expected to win 100,000
tournaments, all else being equal.
However, style is important for winning tournaments, and we find that the
sharper styles win more tournaments than drawish styles. In fact, Player A
wins almost 50% more than his fair share of tournaments, while Player Q wins
less than half! The complete results are shown in Table 1 (with the number
of tournament wins rounded off to the nearest thousand).

Table 1. Tournament wins among 11 players of equal strength, but different
playing styles.
We previously asserted that an objectively stronger player might win fewer
tournaments than his weaker peers, based on differences in style. To demonstrate
this, we now add a strength factor to the definition of each player. This is
a relative factor that determines the probability of winning a game that ends
in a win or a loss.
For example, a strong player with a strength factor of 70% playing against
a weaker player with a strength factor of 50% would end up winning 60% of the
decisive games between them. If the two players have an average drawishness
factor of 50%, then the total outcome from a long series of games would be
50% draws, 30% wins for the stronger player, and 20% wins for the weaker player.
In this example, the difference in strength between these players would be
roughly 40 rating points.
We will now run tournaments where the strongest players are also the most
drawish players. Although they earn the highest scores on average, this does
not automatically translate into tournament victories.
In Table 2, we allow the 11 players with increasingly drawish styles to be
progressively stronger by a 1% increase, or roughly two rating points per step.
We find that the weakest players still win the most tournaments, while the
strongest player wins by far the fewest.

Table 2. Tournament wins among 11 players of increasing strength but increasingly
drawish playing styles.
In Table 3, we repeat the experiment, with double the increase in objective
playing strengths. Now the weakest players are winning fewer tournaments, but
so are the strongest players. Although the distribution is flatter, the players
winning the most tournaments are those of moderate strength, with moderate
acceptance of risk. We can see that style has a slightly greater impact than
strength under these conditions.
Table 3. Tournament wins among 11 players of increasing strength but increasingly
drawish playing styles.
In Table 4, we double the rate of increasing strength again, giving an even
greater objective advantage for the more conservative players. Now the balance
is tipped in favour of the stronger players, but the strongest player in the
group is still below par in total wins.
Table 4. Tournament wins among 11 players of increasing strength but increasingly
drawish playing styles.
In fact, even if we skew the rating differences to a 200-point spread overall,
the strongest player still doesn't win the most tournaments! This is shown
in Table 5.
Table 5. Tournament wins among 11 players of rapidly increasing strength
but increasingly drawish playing styles.
Note that the best player never loses a single game to the worst player in
this scenario, but does draw half of the time, for a 75% overall win rate.
Despite the large disparity in playing strengths, the strongest player still
does not dominate the tournament circuit, ranking behind the other strong players
with slightly less conservative playing styles.
Under these extreme conditions the weaker players seldom win tournaments,
but they nevertheless score those upsets much more frequently than their strength
alone would predict.
To put these results in perspective, Table 6 looks at the same range of playing
strengths, but with all players having the same propensity toward drawn games
(50% draws in all cases). We can see that under these conditions, the strongest
player wins by far the most events – twice as many as Player Q in the above
setting, and almost one third of all the tournaments played. Player 6 has not
changed, but wins half as many tournaments now that the top players are unshackled.
Clearly, style counts for a lot when it comes to winning tournaments!

Table 6. Tournament wins among 11 players of rapidly increasing strength
and equally drawish playing styles (50% draws).
A few caveats are needed to prevent possible misinterpretation of these results.
We've shown that all else being equal, a player who tends to score a win and
a loss is better off than a player who tends to score two draws. However, if
trying to avoid a draw increases the chance of a loss more than the chance
of a win, then all bets are off. You cannot casually sacrifice expected value
just to increase variance.
Second, the utility of a draw changes as a tournament progresses. As an extreme
example, suppose Player A leads by a half point over player Q going into the
last round of a major tournament, and both are well clear of the rest of the
field. Then a draw is as good as a win for A, but is worthless to Q. To maximize
their chances, they'd both better be willing to do a role reversal!
Finally, it is not a simple dichotomy between positional and tactical styles.
Although they are correlated with high and low draw frequencies, they are not
the same thing. Familiar tactical positions may lead to more draws than novel
positional situations. Moreover, some positions demand tactics, others demand
an emphasis on strategic considerations. Trying to force a square peg into
a round hole is not the best approach. As Sun Tzu said, the best style is no
style – you must be able to fluidly adapt to the circumstances.
For those who understand the dynamics of tournament strategy, none of these
conclusions should come as a surprise. Extreme results and increased risk go
hand in hand – on both ends of the spectrum. A player who is willing to risk
the embarrassment of finishing last also enjoys a decided advantage toward
achieving the highest glory.
In the past, tournament results may have counted more toward the impression
of a player's strength than match results. Perhaps the drawish masters were
better than their reputations. Perhaps players like Petrosian deserve more
credit than they usually receive. Perhaps the chess world should begin distinguishing
between the top tournament players (such as Anand and Topolov), and the top
match players (such as Kramnik). Food for thought.
Regardless, the lesson is clear: if you want to win tournaments, play to win!
 |
Biography
Dr. Darse Billings holds a Ph.D. in computer science, and has a keen
interest in games of all descriptions. He is the architect of the strongest
poker-playing programs in the world, having founded that area of artificial
intelligence research in 1992. In 2000, his Lines of Action program,
Mona, won the de facto world championship, defeating all of the top human
players, and winning every game it ever played.
Darse retired from competitive chess in 1989, with a rating of 2065,
having suffered from a drawish positional style. |