Inside the (deep) mind of AlphaZero

by Albert Silver
12/7/2018 – It was a long time coming, but the wait is over. After nearly a full year, being ping-ponged from one peer reviewer to the next, the final paper on AlphaZero is out, shedding light on a number of hitherto unknown or misunderstood elements in its construction, not to mention some clarifications and corrections. These include sample code to help implement their work and all the games of the match against Stockfish, of which 20 were specially chosen by GM Matthew Sadler. | Graphic: Deep Mind

Strategy University Vol. 4: The technique of realising the win Strategy University Vol. 4: The technique of realising the win

Great players of the past used to say – the most difficult thing in chess is to win won positions! Every player has such problems – those at the top of the tree and (especially) juniors. The correct technique consists of proper exchange methods and of the continuation of a correctly chosen plan; it is important not to change strategy after a small material gain. The DVD shows and explains instructive mistakes made when trying to make extra material or a positional advantage count and in addition it demonstrates the correct techniques as employed in classic games.

More...

Full AlphaZero paper is published

When AlphaZero was first announced late last year, it is not an understatement to say it caused feelings of shock and awe. After all, a new paradigm had been ushered into the somewhat stodgy world of computer chess, challenging decades of accepted truths and promising wondrous things for players all around the world.

Here was a program that eschewed conventional wisdom on how one should be built, challenging even that most basic premise: faster is better. Not only did it not run remotely as fast as Stockfish, the standard it was tested against, but it was a good 900 times slower, yet still stronger by some margin.

Accompanying this eye-opening news was a tantalising pre-paper that shared many of its intimate details to those who could understand it, and were willing to work to implement it. Still, there were many who cried foul, screaming that not only had the test match been grossly unfair as AlphaZero ran on a ‘supercomputer’ while Stockfish did not, but that Stockfish had been nothing short of crippled.

AlphaZero: Shedding new light on the grand games of chess, shogi and Go 

Match conditions

The final paper, published in Science magazine, a serious journal that will demand the utmost scrutiny and peer reviews before accepting a paper, has brought in a number of rectifications regarding the match conditions as well as clarifications on the hardware. In the pre-paper, the hardware ascribed to Stockfish had been 64 threads generating 70 million positions per second, and 32MB (megabytes) for hash tables. That last detail caused no shortage of cries of outrage, since such a minuscule amount could barely benefit it. Then there was the matter of the 100-game match at one minute per move, and finally, last but not least, there were the mysterious four TPUs that AlphaZero was running on. While many today might appreciate what a strong GPU brings to the table, a TPU is hard to quantify.

The final paper brings a number of changes, which make it unclear whether this was as stated, or whether it was misreported. Whatever the case, the games shared at the Deep Mind website are different from those in the pre-paper, and while there is no shortage of brilliancies (that is unchanged), they are different brilliancies. 

In this final paper, the match was not only rerun, with roughly the same result (+104 Elo performance), but had much better conditions for Stockfish to put the complaints to rest of it being crippled to rest. This time Stockfish was running on 44 threads on 44 cores (two 2.2GHz Intel Xeon Broadwell CPUs with 22 cores), a hash size of 32GB, Syzygy endgame tablebases, at 3-hour time controls with 15 additional seconds per move. Furthermore, Stockfish 8 was not the only version tested, Stockfish 9 was given its chance as well. The relative difference in nodes per second was maintained, for roughly 900-1, so that much was not changed. The authors also measured the overall average nodes per second for each player, instead of just the start position, which had been the case in the pre-paper. All in all, they report on the total results of 1000 games, though only 210 are actually published at the website.

As to AlphaZero and its first generation TPUs, the authors help narrow down its strength by explaining that while not the same, the inference performance is equivalent to a Titan V. The Titan V is without question a superb professional grade GPU, but its performance is nearly identical to that of the newly released Nvidia RTX 2080 Ti, a $1200 GPU. Powerful? Without question, but hardly a supercomputer unless comparing to machines from years back.


Furthermore, the authors tested a variety of conditions, and not just without books. They tried allowing Stockfish to use a book while AlphaZero did not, and even a TCEC-style match using the exact same openings TCEC used in a superfinal a couple of years back, as well as time handicap matches with AlphaZero getting one third the time Stockfish got or even one-tenth. Have you wanted to know how AlphaZero would have fared in the TCEC superfinal against Stockfish? Here is the result.

More importantly, all the games for these matches have been released — over 200 games, including a fine selection by Sadler who took the liberty of choosing those he felt were not to be missed.

The article brought much more detailed explanations as well as graphs to help understand

Shogi fans were not overlooked either. Not only were the 100 games between the Shogi version of AlphaZero published, but ten were chosen by Yoshiharu Habu, who is the 'Kasparov' of Shogi.

One knowledgeable aficionado who went over them was flabbergasted. As he explained, “I've been looking at some of the shogi games...and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!”

In the Science publication where the AlphaZero paper appears, additional commentary was provided by luminaries such as Murray Campbell, a leader in AI research and one of the key names behind Deep Blue, as well as an editorial by Garry Kasparov, who gave his own perspective on it, noting:

(...) I admit that I was pleased to see that AlphaZero had a dynamic, open style like my own. The conventional wisdom was that machines would approach perfection with endless dry maneuvering, usually leading to drawn games. But in my observation, AlphaZero prioritizes piece activity over material, preferring positions that to my eye looked risky and aggressive. Programs usually reflect priorities and prejudices of programmers, but because AlphaZero programs itself, I would say that its style reflects the truth. This superior understanding allowed it to outclass the world's top traditional program despite calculating far fewer positions per second. It's the embodiment of the cliché, 'work smarter, not harder'.

AlphaZero shows us that machines can be the experts, not merely expert tools. Explainability is still an issue — it's not going to put chess coaches out of business just yet. But the knowledge it generates is information we can all learn from.

Be sure to read the entire editorial.

Openings

In the pre-paper, numerous fascinating graphs had been published on the opening preferences of AlphaZero as it evolved, as well as its results in test matches against Stockfish. This time the statistics are shared more in a visual manner with colour bars to help see when it won more or lost.

There is also a fascinating breakdown of its favourite 6-ply sequence in self-play as it evolved. In other words, what would it play as the best opening for both sides for six plies. AlphaZero was trained for a total of 700 thousand steps (think of these as lessons in its evolution), and here we can see what it thought was ideal after just 50 thousand steps, then 143 thousand steps, and so forth until its pinnacle of opening play… get ready to grimace: the Berlin.

The Berlin as the logical evolution of theory?

Some might see the Berlin as the final word by AlphaZero on openings as a sign of regression. After all, after 608 thousand steps, it thought the classic Ruy Lopez was ideal.

What we learned

For developers and programmers, this was a godsend as it finally put a large number of questions to rest regarding parameters used in training and playing, as well as some truly eye-opening revelations. For those wondering about the exact implementations, Deep Mind has provided sample pseudocode as they call it, enough to show how some of the algorithms might be coded. Among the more exciting items on a technical level was a formula that had the base of the search change according to the number of nodes per move it reached. The deeper it looked, the wider the search became.

So does this wrap up AlphaZero for good now? Hardly. As Demis Hassabis was so ready to point out recently, a new AlphaZero has been developed that is stronger than the one referenced in the paper. Be ready for new announcements!


GM King analysis

Grandmaster Daniel King analyses several of the new games from AlphaZero for his PowerPlay Show.


Replay all AlphaZero's games

 

Endgame Turbo 5 USB flash drive

Perfect endgame analysis and a huge increase in engine performance: Get it with the new Endgame Turbo 5! This brings the full 6-piece Syzygy endgame tablebases on a pendrive. Just plug it in a USB socket and you are set!

More...


Links




Born in the US, he grew up in Paris, France, where he completed his Baccalaureat, and after college moved to Rio de Janeiro, Brazil. He had a peak rating of 2240 FIDE, and was a key designer of Chess Assistant 6. In 2010 he joined the ChessBase family as an editor and writer at ChessBase News. He is also a passionate photographer with work appearing in numerous publications.
Discussion and Feedback Join the public discussion or submit your feedback to the editors


Discuss

Rules for reader comments

 
 

Not registered yet? Register

Lyricist Lyricist 12/7/2018 01:07
Promising but still unconvincing. The Deep Mind company is suspiciously dancing around it and the author of this article is dancing around it too.
celeje celeje 12/7/2018 11:40
@ rokko: They had to run new games because the reviewers insisted on it, so it's the same paper but with revisions the reviewers forced them to do to patch over obvious flaws & get it accepted.
celeje celeje 12/7/2018 11:35
This article is full of factual errors, just like the last one.
morphic6 morphic6 12/7/2018 11:21
The playing field is changing, accept it people!
IvankoH IvankoH 12/7/2018 11:18
they make big business of everything
let the engine go 1/1 and then see it. .Stockfish 10-AlphaZero
Tom Box Tom Box 12/7/2018 11:03
It is a great shame that Stockfish was not allowed to play at its full strength. I presume this was because of public relations.
rokko rokko 12/7/2018 09:51
Are you sure that this paper provides details about the first match?

From another website I got the impression that this was a new match held in January 2018.