Popularity of chess openings over time

6/24/2014 – Recently we published an article that asked: 1.e4 or 1.d4 – which is the better move? Now Randy Olson, a third year PhD engineering student, has investigated which were the most popular openings in 650,000 tournament games from 1850-2014. He found 4,000 unique openings, with 30 comprising 70% of all chess games. Randy illustrates his results with some wonderful graphics.

Popularity of openings over time

A data-driven exploration of the evolution of chess by Randy Olson

For this article, which explores a data set of over 650,000 chess tournament games ranging back to the 15th century, I wanted to look at how chess openings have grown and waned in popularity over time. I only have reliable data on chess games back to 1850, so that year will be my starting point.

The first few moves of a chess game, known as the chess opening, are one of the most-studied aspects of the game, largely because of how important they can be. If you don’t start off with a good opening you could doom yourself to defeat before the game really even begins. It’s therefore no surprise that one of the key steps to becoming a skilled chess player is studying and memorizing the many varieties of openings. Hundreds of openings have been developed since 1850, so it should make for an interesting exercise to see how these openings have evolved since then.

White’s first move

It’s a well-known fact that White has a small advantage at the beginning of the game. To maintain this advantage, White should ues it to take over the middle of the board as quickly as possible. The most popular first White moves from 1850-2014 are shown below. Note that all of these are fairly aggressive openings that build toward control of the middle of the board.

In 1850, White openings were fairly homogeneous: most chess experts played King’s Pawn. Chess players didn’t begin to explore variants of the King’s Pawn in earnest until the 1890s, when Queen’s Pawn (moving a pawn to d4) started to replace King’s Pawn in some player’s repertoires. The 1920s saw another burst of innovation with the rising popularity of the Zukertort Opening (moving the knight to f3) and the English Opening (moving a pawn to c4), which completed the set of staple first-turn openings that are really ever used nowadays.

Black’s first move

Many of Black’s opening moves are more defensive in nature and attempt to undermine White’s initial advantage. In 1850 it was standard fare for Black to match the ever-popular King’s Pawn by moving a pawn to e5 (the Open Game). Although I typically group unpopular openings into the “Other” category, I wanted to point out the short-lived spike in popularity of the Pirc Defence in the 1850s. Though the Pirc is typically thought of as a relatively new opening, Moheschunder Bannerjee used this opening almost exclusively in his 50+ games against John Cochrane, winning 40% of them (far above his overall 24% win rate as Black).

Moreover, the rise of the Queen’s Pawn in the 1890s resulted in the rise of the Closed Game in the 1890s. Black openings similarly saw a burst of innovation in the 1920s, with the development of the Indian Defence in response to the Queen’s Pawn, and the introduction of the ever-popular Sicilian Defence in response to the standard King’s Pawn. By 2014, the Open Game is well past its glory days, and seems to be on its way out.

The French Defence seems to have been a staple Black opening for the past 164 years, consistently comprising 5%-10% of all chess games. Amusingly, the French Defence has a reputation for solidity and resilience, which is also reflected in its historical usage.

White’s second move

Here’s where things get complicated. I noted in the first section that the most popular first moves for White have historically been King’s and Queen’s Pawn, so that’s why the more popular second moves for White exclusively start with them. The Zukertort and English Openings simply haven’t become popular enough yet for their followup moves to show up here.

With the waning popularity of the Open Game over time, it’s no surprise that the responses to it have similarly declined. By 2014, the typical response to the Open Game is to play the King’s Knight, with the once-popular King’s Gambit and Vienna Game becoming all but extinct. The Sicilian Defence’s explosive rise to popularity is again reflected here, with the Open Sicilian (knight to f3) becoming White’s standard response. Again, White’s response to Black’s French Defence (moving a pawn to d4) has remained consistently popular over time, rarely dropping below 5% of the games played each year.

To avoid being overly wordy here, I’ll allow the visualization to speak for itself and leave the reader to explore the remaining trends as they please.

Black’s second move

If you’re familiar with chess, you know how quickly the set of possible moves grows with each move a player makes. After White and Black’s first move the board will be in one of 400 unique positions. After their second moves there are 197,742 possible positions. And after only three moves 121 million possible positions can arise. This means that if you play enough chess it’s highly likely that you will play a game that no one has ever played in the history of our universe. You can only imagine how difficult it would be to visualize all possible chess moves even up to the third move.

Despite the infinite possibility in chess, there appears to be a strong bias toward a small subset of openings. In my data set there were roughly 4,000 unique openings, and the 30 most popular ones comprise 70% of all chess games. Below is a visualization of the distribution of those 30 most popular openings from 1850-2014.

Have any thoughts on a better way to visualize this data? Please leave them in the comments! I’ve already reached the limit of what area charts can effectively visualize by Black’s second move.

Interestingly, chess appears to be becoming more diverse over time. Whereas there were less than 100 unique openings by the end of both player’s second move in 1850, there were over 1,000 unique openings by 2014. This may be an artifact of the data set, however, because there are far more games recorded in the 21st century in it:

About the author

Randy Olson is a Computer Science graduate research assistant at Michigan State University, specializing in artificial intelligence, artificial life, and evolutionary computation. He runs a research blog where he writes about data visualization, scientific computing, evolution, and AI. Randy is an ardent advocate of open science and regularly travels the U.S. to teach researchers scientific computing skills at Software Carpentry workshops.


Feedback and mail to our news service Please use this account if you want to contribute to or comment on our news page service



Discuss

Rules for reader comments

 
 

Not registered yet? Register

ajedrezenalbacete@gmail.com ajedrezenalbacete@gmail.com 6/24/2014 04:01
About the question of how to visualize the data, more specificly in the video of the 30 most popular openings. You may try one or all of this ideas:
- to use the mean of the last 3 years (or 2 or 4 years) to make the curve smoother.
- to change the color of the bar turning to red on local max and to blue on local minimum.
- to densify the gap between years using some function, for example linear, e.g.:
> 1930: 5%, 1931: 9% --> 1930: 5%, 1930.25: 6%, 1930.50: 7%, 1930.75: 8%, 1931: 9%

About the the static graphs of first moves evolution you may also try the mean thing.

Good luck
jones_michaelt@hotmail.com jones_michaelt@hotmail.com 7/5/2014 11:42
I would say that, rather than showing the total percentage of games which used a particular opening, it would be more meaningful to show the percentage of those games which contained all the previous moves. So rather than showing the overall percentage of games in which the French was played, you could show the percentage of games in which 1. e4 was played, to which the response was 1... e6. If 1. e4 is played in about 40% of recent games and 1. e4 e6 in about 8% (trying to judge by eye from the charts), then of those games which start 1. e4, 20% are Frenches (with about 40% Sicilians, 20% open and 20% other replies). If the total percentage of Frenches played decreases over time, it could be either because the French is decreasing in popularity as a reply to 1. e4 (with a corresponding increase in Sicilians, symmetrical games or other replies), or that 1. e4 is itself declining in popularity, while the French is maintaining its popularity relative to other replies.
1