Inside the (deep) mind of AlphaZero

by Albert Silver
12/7/2018 – It was a long time coming, but the wait is over. After nearly a full year, being ping-ponged from one peer reviewer to the next, the final paper on AlphaZero is out, shedding light on a number of hitherto unknown or misunderstood elements in its construction, not to mention some clarifications and corrections. These include sample code to help implement their work and all the games of the match against Stockfish, of which 20 were specially chosen by GM Matthew Sadler. | Graphic: Deep Mind

Strategy University Vol. 4: The technique of realising the win Strategy University Vol. 4: The technique of realising the win

Great players of the past used to say – the most difficult thing in chess is to win won positions! Every player has such problems – those at the top of the tree and (especially) juniors. The correct technique consists of proper exchange methods and of the continuation of a correctly chosen plan; it is important not to change strategy after a small material gain. The DVD shows and explains instructive mistakes made when trying to make extra material or a positional advantage count and in addition it demonstrates the correct techniques as employed in classic games.

More...

Full AlphaZero paper is published

When AlphaZero was first announced late last year, it is not an understatement to say it caused feelings of shock and awe. After all, a new paradigm had been ushered into the somewhat stodgy world of computer chess, challenging decades of accepted truths and promising wondrous things for players all around the world.

Here was a program that eschewed conventional wisdom on how one should be built, challenging even that most basic premise: faster is better. Not only did it not run remotely as fast as Stockfish, the standard it was tested against, but it was a good 900 times slower, yet still stronger by some margin.

Accompanying this eye-opening news was a tantalising pre-paper that shared many of its intimate details to those who could understand it, and were willing to work to implement it. Still, there were many who cried foul, screaming that not only had the test match been grossly unfair as AlphaZero ran on a ‘supercomputer’ while Stockfish did not, but that Stockfish had been nothing short of crippled.

AlphaZero: Shedding new light on the grand games of chess, shogi and Go 

Match conditions

The final paper, published in Science magazine, a serious journal that will demand the utmost scrutiny and peer reviews before accepting a paper, has brought in a number of rectifications regarding the match conditions as well as clarifications on the hardware. In the pre-paper, the hardware ascribed to Stockfish had been 64 threads generating 70 million positions per second, and 32MB (megabytes) for hash tables. That last detail caused no shortage of cries of outrage, since such a minuscule amount could barely benefit it. Then there was the matter of the 100-game match at one minute per move, and finally, last but not least, there were the mysterious four TPUs that AlphaZero was running on. While many today might appreciate what a strong GPU brings to the table, a TPU is hard to quantify.

The final paper brings a number of changes, which make it unclear whether this was as stated, or whether it was misreported. Whatever the case, the games shared at the Deep Mind website are different from those in the pre-paper, and while there is no shortage of brilliancies (that is unchanged), they are different brilliancies. 

In this final paper, the match was not only rerun, with roughly the same result (+104 Elo performance), but had much better conditions for Stockfish to put the complaints to rest of it being crippled to rest. This time Stockfish was running on 44 threads on 44 cores (two 2.2GHz Intel Xeon Broadwell CPUs with 22 cores), a hash size of 32GB, Syzygy endgame tablebases, at 3-hour time controls with 15 additional seconds per move. Furthermore, Stockfish 8 was not the only version tested, Stockfish 9 was given its chance as well. The relative difference in nodes per second was maintained, for roughly 900-1, so that much was not changed. The authors also measured the overall average nodes per second for each player, instead of just the start position, which had been the case in the pre-paper. All in all, they report on the total results of 1000 games, though only 210 are actually published at the website.

As to AlphaZero and its first generation TPUs, the authors help narrow down its strength by explaining that while not the same, the inference performance is equivalent to a Titan V. The Titan V is without question a superb professional grade GPU, but its performance is nearly identical to that of the newly released Nvidia RTX 2080 Ti, a $1200 GPU. Powerful? Without question, but hardly a supercomputer unless comparing to machines from years back.


Furthermore, the authors tested a variety of conditions, and not just without books. They tried allowing Stockfish to use a book while AlphaZero did not, and even a TCEC-style match using the exact same openings TCEC used in a superfinal a couple of years back, as well as time handicap matches with AlphaZero getting one third the time Stockfish got or even one-tenth. Have you wanted to know how AlphaZero would have fared in the TCEC superfinal against Stockfish? Here is the result.

More importantly, all the games for these matches have been released — over 200 games, including a fine selection by Sadler who took the liberty of choosing those he felt were not to be missed.

The article brought much more detailed explanations as well as graphs to help understand

Shogi fans were not overlooked either. Not only were the 100 games between the Shogi version of AlphaZero published, but ten were chosen by Yoshiharu Habu, who is the 'Kasparov' of Shogi.

One knowledgeable aficionado who went over them was flabbergasted. As he explained, “I've been looking at some of the shogi games...and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!”

In the Science publication where the AlphaZero paper appears, additional commentary was provided by luminaries such as Murray Campbell, a leader in AI research and one of the key names behind Deep Blue, as well as an editorial by Garry Kasparov, who gave his own perspective on it, noting:

(...) I admit that I was pleased to see that AlphaZero had a dynamic, open style like my own. The conventional wisdom was that machines would approach perfection with endless dry maneuvering, usually leading to drawn games. But in my observation, AlphaZero prioritizes piece activity over material, preferring positions that to my eye looked risky and aggressive. Programs usually reflect priorities and prejudices of programmers, but because AlphaZero programs itself, I would say that its style reflects the truth. This superior understanding allowed it to outclass the world's top traditional program despite calculating far fewer positions per second. It's the embodiment of the cliché, 'work smarter, not harder'.

AlphaZero shows us that machines can be the experts, not merely expert tools. Explainability is still an issue — it's not going to put chess coaches out of business just yet. But the knowledge it generates is information we can all learn from.

Be sure to read the entire editorial.

Openings

In the pre-paper, numerous fascinating graphs had been published on the opening preferences of AlphaZero as it evolved, as well as its results in test matches against Stockfish. This time the statistics are shared more in a visual manner with colour bars to help see when it won more or lost.

There is also a fascinating breakdown of its favourite 6-ply sequence in self-play as it evolved. In other words, what would it play as the best opening for both sides for six plies. AlphaZero was trained for a total of 700 thousand steps (think of these as lessons in its evolution), and here we can see what it thought was ideal after just 50 thousand steps, then 143 thousand steps, and so forth until its pinnacle of opening play… get ready to grimace: the Berlin.

The Berlin as the logical evolution of theory?

Some might see the Berlin as the final word by AlphaZero on openings as a sign of regression. After all, after 608 thousand steps, it thought the classic Ruy Lopez was ideal.

What we learned

For developers and programmers, this was a godsend as it finally put a large number of questions to rest regarding parameters used in training and playing, as well as some truly eye-opening revelations. For those wondering about the exact implementations, Deep Mind has provided sample pseudocode as they call it, enough to show how some of the algorithms might be coded. Among the more exciting items on a technical level was a formula that had the base of the search change according to the number of nodes per move it reached. The deeper it looked, the wider the search became.

So does this wrap up AlphaZero for good now? Hardly. As Demis Hassabis was so ready to point out recently, a new AlphaZero has been developed that is stronger than the one referenced in the paper. Be ready for new announcements!


GM King analysis

Grandmaster Daniel King analyses several of the new games from AlphaZero for his PowerPlay Show.


Replay all AlphaZero's games

 
New ...
Open...
Share...
Layout...
Flip Board
Settings
MoveNResultEloPlayers
Replay and check the LiveBook here
1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6 4.d3 Bc5 5.Bxc6 dxc6 6.0-0 Nd7 7.Nbd2 0-0 8.Qe1 f6 9.Nc4 Rf7 10.a4 Bf8 11.Kh1 Nc5 12.a5 Ne6 13.Ncxe5 fxe5 14.Nxe5 Rf6 15.Ng4 Rf7 16.Ne5 Re7 17.a6 c5 18.f4 Qe8 19.axb7 Bxb7 20.Qa5 Nd4 21.Qc3 Re6 22.Be3 Rb6 23.Nc4 Rb4 24.b3 a5 25.Rxa5 Rxa5 26.Nxa5 Ba6 27.Bxd4 Rxd4 28.Nc4 Rd8 29.g3 h6 30.Qa5 Bc8 31.Qxc7 Bh3 32.Rg1 Rd7 33.Qe5 Qxe5 34.Nxe5 Ra7 35.Nc4 g5 36.Rc1 Bg7 37.Ne5 Ra8 38.Nf3 Bb2 39.Rb1 Bc3 40.Ng1 Bd7 41.Ne2 Bd2 42.Rd1 Be3 43.Kg2 Bg4 44.Re1 Bd2 45.Rf1 Ra2 46.h3 Bxe2 47.Rf2 Bxf4 48.Rxe2 Be5 49.Rf2 Kg7 50.g4 Bd4 51.Re2 Kf6 52.e5+ Bxe5 53.Kf3 Ra1 54.Rf2 Re1 55.Kg2+ Bf4 56.c3 Rc1 57.d4 Rxc3 58.dxc5 Rxc5 59.b4 Rc3 60.h4 Ke5 61.hxg5 hxg5 62.Re2+ Kf6 63.Kf2 Be5 64.Ra2 Rc4 65.Ra6+ Ke7 66.Ra5 Ke6 67.Ra6+ Bd6 0–1
  • Start an analysis engine:
  • Try maximizing the board:
  • Use the four cursor keys to replay the game. Make moves to analyse yourself.
  • Press Ctrl-B to rotate the board.
  • Drag the split bars between window panes.
  • Download&Clip PGN/GIF/FEN/QR Codes. Share the game.
  • Games viewed here will automatically be stored in your cloud clipboard (if you are logged in). Use the cloud clipboard also in ChessBase.
  • Create an account to access the games cloud.
WhiteEloWBlackEloBResYearECOEventRnd
Stockfish 8-AlphaZero-0–12017AlphaZero vs. Stockfish1.1
Stockfish 8-AlphaZero-0–12017AlphaZero vs. Stockfish1.2
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.3
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.4
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.5
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.6
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.7
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.8
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.9
AlphaZero-Stockfish 8-1–02017AlphaZero vs. Stockfish1.10
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.1
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.2
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.3
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.4
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.5
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.6
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.7
Stockfish-AlphaZero-0–12018AlphaZero vs. Stockfish2.8
Stockfish-AlphaZero-0–12018AlphaZero vs. Stockfish2.9
Stockfish-AlphaZero-1–02018AlphaZero vs. Stockfish2.10
Stockfish-AlphaZero-0–12018AlphaZero vs. Stockfish2.11
Stockfish-AlphaZero-0–12018AlphaZero vs. Stockfish2.12
Stockfish-AlphaZero-0–12018AlphaZero vs. Stockfish2.13
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.14
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.15
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.16
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.17
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.18
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.19
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.20
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.21
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.22
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.23
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.24
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.25
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.26
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.27
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.28
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.29
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.30
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.31
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.32
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.33
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.34
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.35
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish2.36
Stockfish-AlphaZero-1–02018AlphaZero vs. Stockfish2.37
AlphaZero-Stockfish-0–12018AlphaZero vs. Stockfish2.38
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.39
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.40
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.41
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.42
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.43
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.44
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.45
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.46
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.47
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.48
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.49
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.50
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.51
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.52
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.53
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.54
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.55
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.56
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.57
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.58
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.59
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.60
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish2.61
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.62
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.63
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.64
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.65
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.66
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.67
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.68
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.69
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.70
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.71
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.72
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.73
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.74
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.75
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.76
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.77
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.78
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.79
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.80
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.81
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.82
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.83
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.84
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.85
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.86
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.87
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.88
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.89
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.90
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.91
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.92
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.93
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.94
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.95
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.96
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.97
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.98
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.99
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.100
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.101
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.102
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.103
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.104
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.105
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.106
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.107
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.108
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.109
AlphaZero-Stockfish-½–½2018AlphaZero vs. Stockfish2.110
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish3.1
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish3.2
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish3.3
AlphaZero-Stockfish-1–02018AlphaZero vs. Stockfish3.4
Stockfish-AlphaZero-0–12018AlphaZero vs. Stockfish3.5
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish3.6
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish3.7
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish3.8
Stockfish-AlphaZero-½–½2018AlphaZero vs. Stockfish3.9
Stockfish-AlphaZero-1–02018AlphaZero vs. Stockfish3.10

Endgame Turbo 5 USB flash drive

Perfect endgame analysis and a huge increase in engine performance: Get it with the new Endgame Turbo 5! This brings the full 6-piece Syzygy endgame tablebases on a pendrive. Just plug it in a USB socket and you are set!


Links


Born in the US, he grew up in Paris, France, where he completed his Baccalaureat, and after college moved to Rio de Janeiro, Brazil. He had a peak rating of 2240 FIDE, and was a key designer of Chess Assistant 6. In 2010 he joined the ChessBase family as an editor and writer at ChessBase News. He is also a passionate photographer with work appearing in numerous publications, and the content creator of the YouTube channel, Chess & Tech.

Discuss

Rules for reader comments

 
 

Not registered yet? Register

We use cookies and comparable technologies to provide certain functions, to improve the user experience and to offer interest-oriented content. Depending on their intended use, analysis cookies and marketing cookies may be used in addition to technically required cookies. Here you can make detailed settings or revoke your consent (if necessary partially) with effect for the future. Further information can be found in our data protection declaration.