Alpha Zero: Comparing "Orangutans and Apples"

by André Schulz
12/13/2017 – In time for the start of the London Chess Classic DeepMind, a subsidiary of Google, published a remarkable report about the success of their "Machine Learning" project Alpha Zero. Alpha Zero is a chess program and won a 100 game match against Stockfish by a large margin. But some questions remain. Reactions from chess professionals and fans.

Fritz 16 - He just wants to play! Fritz 16 - He just wants to play!

Fritz 16 is looking forward to playing with you, and you're certain to have a great deal of fun with him too. Tense games and even well-fought victories await you with "Easy play" and "Assisted analysis" modes.


From Zero to Chess

The company DeepMind Technologies was founded in London 2010 by Demis Hassabis, Shane Legg and Mustafa Suleyman. In January 2014, Google bought the start-up company for an undisclosed amount, estimated to be about USD $500 million. The company became Google DeepMind, and has the vision to "understand artificial intelligence". Here, it wants to adapt the capacity of the human brain to the approaches of "Machine Learning".

Machine learning

In October 2015, DeepMind had a first big success with the game of Go. Go is a very complex game and requires strategic skills in particular. For a long time it had been impossible to translate the requirements of Go into mathematical formulas that would allow Go programs to compete with the best human Go players. But with special self-learning heuristics the DeepMind program AlphaGo got better and better and was finally strong enough to beat Go professionals. In October 2015, AlphaGo defeated several-time European Champion Fan Hui, in March 2016 the program won 4 : 1 against the South Korean Go professional Lee Sedol, a 9-Dan player — both matches were played under tournament conditions.

The architecture of the AlphaGo program is based on an interaction of two neural networks, a "policy network" to define candidate moves, and a "value network" to evaluate positions. A Monte Carlo approach connects the two networks to a search tree. With the help of a database with 30 million moves the program learnt to predict the moves of humans.

By nicoguaro (Own work) [CC BY 3.0 (], via Wikimedia Commons

In the match against Fan Hui, AlphaGo ran on a computer cluster of 1202 CPUs and 178 GPUs and used 40 "search threads". In the following match against Lee Sedol it had 1920 CPUs and 280 GPUs. For the learning phase before the matches the Google Cloud platform with its Tensor Processing Units (TPUs, ASICs for the software collection TensorFlow) was used.

In May 2017 AlphaGo took part in the "Wuzhen Future of Go Summit 2017" in Wuzhen, China, and won three games against the world's number one, Ke Jie. The program also won against five leading Go players who could consult with each other during the game.

The next development step was the program AlphaGo Zero, and in October 2017, DeepMind published a report about the development of this program. AlphaGo Zero started at zero, with reduced hardware structure. That is, the program knew the rules of Go but had no previous knowledge whatsoever about the game. However, it got better by playing against itself. Four Tensor Processing Units were used as hardware. With the help of TensorFlow it took AlphaGo Zero only three days to play better than the previous AlphaGo version which had beaten the best human Go player — but now AlphaGo Zero defeated its predecessor with 100-0.

Since Hassabis had been a good chess player as a junior it did not come as a surprise when DeepMind turned to chess after its success with Go. From the beginning of computer development chess has been considered the touchstone of artificial intelligence (AI).

(Above) Monte Carlo method applied to approximating the value of π. After placing 30,000 random points, the estimate for π is within 0.07% of the actual value. | Source: By nicoguaro CC BY 3.0, via Wikimedia Commons

DeepMind's Video about AlphaGo Zero

The last big leap forward in the development of computer chess happened a bit more than ten years ago when Fabien Letouzey published a new approach of the search tree with his program "Fruit". Vasik Rajlich, the developer of Rybka, significantly improved this approach. His program Rybka was later decompiled and several programmers used the Rybka code as a point of departure to write even further developed and improved chess programs of their own.

The basis of all these programs is an optimised Alpha-Beta search in which certain evaluation parameter (material, possibilities to develop, king safety, control of squares, etc.) establish the best moves for both sides. The more lines you can eliminate as irrelevant in the search tree, the more efficient is the search, and the program can go much deeper into the crucial main line. The program with the deeper search wins against the other programs. However, the drawing rate in top-level computer chess is very high.

Houdini 6 Standard

Houdini 6 continues where its predecessor left off, and adds solid 60 Elo points to this formidable engine, once again making Houdini the strongest chess program currently available on the market.


Alpha Zero's Monte Carlo search tree is a completely different approach. At every point the program plays a number of games against itself, that always start with the current position. In the end it counts the results for an evaluation. In their paper "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" the authors described this approach in more detail.

In a learning phase (training) Alpha Zero used 5000 "first-generation" TPUs from the Google hardware park to play games against itself. 64 "second-generation" TPUs were used for the training of the neuronal network. And after only four hours of training Alpha Zero played better than Stockfish.

During the training phase Alpha Zero also played matches against Stockfish, always a hundred games, 50 with White and 50 with Black, and starting with ten popular openings. Alpha Zero won the majority of these matches but not all of them: in the Queens Gambit the program lost 1-2  with Black (47 games were drawn). In the Grünfeld (which DeepMind erroneously calls "Kings Indian") Alpha Zero lost 0-2 with Black while 48 games ended in a draw. In the Kan-Variation of the Sicilian it lost 3-7 with 40 draws. With colours reversed Alpha Zero always won clearly.


The "well-trained" Alpha Zero program then played a 100 game match against Stockfish, in which it used a computer with four TPUs while Stockfish was running on hardware with "64 threads". In 25 of the 28 games Alpha Zero won in this match Alpha Zero was playing with White but with Black it won only three games. That is a very unusual result. Usually, there's 55% statistical difference between White and Black in chess. In Go and Shogi matches the difference between playing with White and playing with Black was much less marked.

Source: DeepMind

Incidentally, this result equals a 65% success rate or an Elo-difference of about 130 points — which is the difference between Magnus Carlsen and a 2700 player.

GM Daniel King shares AlphaZero vs. Stockfish highlights:

This on-demand PowerPlay show requires a ChessBase Account

ChessBase Account Premium annual subscription

At the airport, in the hotel or at home on your couch: with the new ChessBase you always have access to the whole ChessBase world: the new ChessBase video library, tactics server, opening training App, the live database with eight million games, Let’s Check and web access to


Reaction and reception

The reaction of the international press was enthusiastic, comparable to the reaction when Deep Blue won the match against Garry Kasparov 20 years ago. Back then the value of IBM shares rose considerably. Google DeepMind certainly would not be unhappy if that happened to its parent company. But the reactions were also markedly uncritical. Breathless reporting along the lines of: A great super computer just taught itself chess in a couple of hours and now is better than the best chess program. Mankind took a great step forward (to where?). After all, this is the very impression the publication wanted to create. 

On Tore Romstad from the Stockfish team had the following to say about the match:

The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly).

He goes on to note that the version of Stockfish was not the most current one and the specifics of its hardware set up was unusual and untested. By contrast, the "4 hours of learning" is actually misleading considering the hardware resources underlying that work.

But in any case, Stockfish vs AlphaZero is very much a comparison of apples to orangutans. One is a conventional chess program running on ordinary computers, the other uses fundamentally different techniques and is running on custom designed hardware that is not available for purchase (and would be way out of the budget of ordinary users if it were).

Romstad admits that the comparison between two entirely different approaches has its charms and might give better impulses for future developments than the previous races in computer chess where one program with the same calculation methods is only slightly better than another.

Viswanathan Anand

Anand after Round 6 | Source: Saint Louis Chess Club on YouTube

Several players weighed in on the London Chess Classic live webcast, some of the most interesting remarks came from Viswanathan Anand:

"Oviously this four hour thing is not too relevant — though it's a nice punchline — but it's obviously very powerful hardware, so it's equal to my laptop sitting for a couple of decades. I think the more relevant thing is that it figured everything out from scratch and that is scary and promising if you look at it...I would like to think that it should be a little bit harder. It feels annoying that you can work things out with just the rules of chess that quickly."

Indeed, for chess players who work with computer programs, the breakthrough of Alpha Zero has, for now, no use at all. In the short run, no adequate hardware for Alpha Zero will be available. And for chess programmers the results of the research project were rather disillusioning. And even if an Alpha Zero program would at some point in the future run on common hardware, the required powerful development environment would still be unaffordable. But if the project eventually spawns an open source cousin, one that could provide the necessary computer performance, it would spell the end of the individual and varied chess programs as we know them today. Until then, the like of Houdini and Komodo are still top dogs in the chess engine market.

GM Larry Kaufman from the Komodo team lauded the news with a caveat on Facebook:

Yes, it's big news. I'm not sure yet how it will affect what we do. It depends on whether Google releases the program or keeps it proprietary. It wasn't a fair match in all respects, but nevertheless impressive.

Komodo Chess 11

The multiple computer chess world champion comes in a new and yet more powerful version. Thanks to co-author US Grandmaster Larry Kaufman, Komodo is the strategist among the top chess programs!


Other grandmasters took to Twitter:

IM Sagar Shah of ChessBase India recorded an instructive lecture at the Indian Institute of Technology Madras on the role of technology in chess, and discussed AlphaZero at length, including analysis of some of its games:

To sum up

The DeepMind team achieved a remarkable success with the Alpha Zero project. It showed that it is possible to use a Monte Carlo method to reach an enormous playing strength after only a short training period — if you use the Google Cloud with 5000 TPUs for training, of course!

Unfortunately, the comparison with Stockfish is misleading. The Stockfish program ran on a parallel hardware which is — if one understands Tore Romstad correctly — only of limited use to the program. It is not clear precisely how the hardware employed ought to be compared. The match was played without opening book and without endgame tablebases, which both are integral components of a program like Stockfish. The chosen time control is totally unusual, even nonsense, in chess — particularly in computer chess.

Of the 100 games of the match, DeepMind only published ten wins by Alpha Zero, unfortunately without information about search depths and evaluations.

But the entire chess world is eagerly awaiting more experiments and information on future development plans.



Translation from German: Johannes Fischer
Additional reporting: Macauley Peterson

André Schulz started working for ChessBase in 1991 and is an editor of ChessBase News.
Discussion and Feedback Join the public discussion or submit your feedback to the editors


Rules for reader comments


Not registered yet? Register

celeje celeje 12/14/2017 10:38
@DrCliche: Re. "they completely circumvented time management and used not 1 min/move TC but movetime uci command. This means that search was interrupted in the middle of the iteration, does not matter what happened at that point - not resolved fail low or best move change."
How did the writer of those words know this had happened (since the AZ team is basically hiding and not giving any details)?
What does "resolve a fail low" mean?
DrCliche DrCliche 12/14/2017 09:02
@davidrimshnick It doesn't get "stuck" at depth 39, it merely takes more time to deepen the search the deeper you go. Let it keep running and you will get to depth 41, though it might take a while if your computer is slow.

Regardless, on TCEC10's beefy hardware, all the top engines were routinely reaching middlegame depths in the 40s, with Stockfish consistently going the deepest. Deep Mind claimed their copy of Stockfish was calculating 70-80 million nodes per second, which is ostensibly faster than the TCEC machine! However 1GB hash for 64 threads is ABSURDLY low, resulting in massive playing strength reduction. Ideally, you'd have a minimum of 1GB hash per thread, though 2GB per thread would be better. I can't conceive of a reason the Deep Mind team couldn't have easily made that happen. Google literally has the largest computing infrastructure in the world. They probably have a million times the CPU and memory resources used for Deep Mind's copy of Stockfish just sitting idle at any given moment.

Also, from the Stockfish forums, "they completely circumvented time management and used not 1 min/move TC but movetime uci command. This means that search was interrupted in the middle of the iteration, does not matter what happened at that point - not resolved fail low or best move change. To me it looks like substantial elo loss in comparison even to 1min/move time control."

So not only did Deep Mind needlessly gimp Stockfish by forcing it to use precisely 1 min/move rather than letting it manage its own time (resulting in massive playing strength reduction), the Deep Mind team didn't even set *that* up correctly, resulting in situations where moves were selected essentially at random because Stockfish wasn't allowed to resolve a fail low. (And, indeed, if you go through the published games with a correctly configured Stockfish on even modest home hardware, you'll find many instances, often multiple *per game* where your copy of Stockfish will almost instantly reject whatever move Deep Mind's Stockfish ended up playing. And if you let your evaluation keep running to high depths, the move that Deep Mind's Stockfish played will never enter the top PVs.)

A correctly configured Stockfish Master with books, ETBs, and reasonable hash size would have had similarly dominating results against Deep Mind's Stockfish 8-, though the wins wouldn't look "beautiful" to a human because Stockfish's playing style is considerably less "human" than AlphaZero's at the moment. It's quite possible, even probable, that AlphaZero as tested wasn't even close to the strongest chess playing entity in the world (much less the strongest, as Deep Mind's paper coyly leads you to believe), despite the fact it was essentially running on a supercomputer.

All that being said, AlphaZero is clearly capturing and successfully applying chess knowledge that current engines don't understand as readily. There's little doubt that neural network based chess evaluation can and will be used to improve top engines over the coming years. There's also little doubt that AlphaZero's purposefully very general approach would have benefited from intelligent application of domain-specific knowledge and more advanced search strategies. For example, using tablebases to 100% correctly score endgames during training would probably have made AlphaZero stronger. It's also probably true that while playing, a search algorithm that mixes alpha-beta with MCTS would be an improvement over MCTS alone. (You might need to stick to 100% MCTS during training, though, since alpha-beta propagates errors rather than averaging them out.)
drgenial drgenial 12/14/2017 08:34
Considering that DeepMind decided to not allocate more time to AlphaZero to self train (i am implying it would get bettter and better as it went on training), i want to believe that, AlphaZero, with all the Google resources and their seemingly breakthrough algorithms, could not be much better than this. Even less, since they crippled Stockfish to show their point. Yes, probably there is a breakthrough, but why then not make it glasklar (more clear)?
drgenial drgenial 12/14/2017 08:29
Who determined the 4 hours learning process? Was it Alphazero itself which said "ok, i am better than those engines now?"

Had Deep Mind had more patience to make Alpha train for one week (less?more?), it would come beating stockfish 100-0. That yes would be undisputed glory . But, they did not. So i guess this 4 hours self learning was the best they could achieve.
davidrimshnick davidrimshnick 12/14/2017 06:22
@drcliche, if you say so, it seems stuck at depth 39 recommending b4 when I run it. Anyway, I'm sure they will enter one of the computer chess tourneys eventually and win to make the point eventually.

My issue is all this skepticism is masking how important this is for chess.
felonge felonge 12/14/2017 04:05
It's Tord Romstad not Tore Romstad. :)
PCMorphy72 PCMorphy72 12/14/2017 12:40
Fortunately, we can read some serious observations also from the real scientific approach:
Werewolf Werewolf 12/14/2017 11:23
The last point in your post is a good one. It seems virtually certain that Stockfish could be improved by 100 elo by making the changes I suggested in my post as well as upgrading to the latest version. So the 130 elo gap suddenly shrinks to 30 elo. Also, as impressive as Alpha Zero is, it is running on MUCH faster hardware.
tjallen tjallen 12/14/2017 09:34
We see from the games that Stockfish's aggressively pruned search tree omits moves whose intermediate values seem unpromising. This reintroduces the old horizon effect. Yes, some variations are examined to 40 ply, but others get pruned away before a depth that would reveal their true value.
We can characterize these positions that SF prunes away by reviewing the games - positions where multiple pieces are left en prise, positions with irrelevant material imbalances (the extra material is blocked and immobile), positions where pawns nearing the eighth rank become extremely valuable. An average player can see these grand themes; there are probably more.
We also see more proof of the modern adage that a concrete winning variation is better than any intuitions as to what might work. SF uses human rules to prune away variations that seem unpromising, but SF is clearly not finding these concrete variations that wind through unpromising territory.
celeje celeje 12/14/2017 07:09
@ fgkdjlkag, @fons2: No, I think that Andre Schulz, this article's author, just misunderstood. It is highly likely that it just trained against itself. But of course that does mean that the 1200 games with set openings are against the fully trained AZ, so those results definitely do count, meaning AZ lost plenty of games.
sunya1989 sunya1989 12/14/2017 06:11
first A0 was trained by only playing against itself, second, the "powerful" hardware was used only in the training phase, third, the opening book was used (otherwise the research will be disqualified, and the deepmind will not be allowed to publish this paper) overall the match between A0 and stockfish must be fair in every aspect
DrCliche DrCliche 12/14/2017 05:53
@davidrimshnick Actually,'s Stockfish finds 21. Bg5 at depths routinely reached in computer tournaments. The move enters the top 5 PVs in the mid 30s. By depth 41, Stockfish thinks Bg5 is the best or second best move. [It varies because Stockfish isn't deterministic.] Part of the reason there's controversy surrounding Deep Mind's results is because it's quite easy for casual users to verify that an updated, bookless, ETBless copy of Stockfish on modest hardware is a significantly better chess player than whatever AlphaZero was playing against. It's so trivial for even complete neophytes to download the latest stable version of Stockfish and to configure it properly that it's inescapable to conclude that the world leading, professional computer scientists at Deep Mind used a gimped, gutted, misconfigured Stockfish intentionally. Anyway, try the position yourself:
peteypabpro peteypabpro 12/14/2017 05:23
The author is incorrect, the games against stockfish were not part of training.
fons2 fons2 12/14/2017 05:22
So Alpha Zero plays 1200 training games against Stockfish to see what openings work best, then plays the "official" 100 game match with Stockfish's opening book disabled? Hmm...

Also doesn't this invalidate the "learn from zero" thing? I'm confused.

(Still a nice accomplishment for AI though, I guess we can be clear on that.)
fgkdjlkag fgkdjlkag 12/14/2017 04:55
It gets even worse. Alphazero got to train itself AGAINST stockfish before the actual match? What would happen if Alphazero played against Houdini, or didn't get to train itself against stockfish before the match? As @e-mars pointed out, this is not self-learning.

If the difference is 130 elo points, then what would be the difference if stockfish was not handicapped? 50 points?
celeje celeje 12/14/2017 03:18
@Andre Schulz: Also, did you find independent confirmation that there was no opening book used for Stockfish? That was reported by one article, but I'm not sure where that chess journalist got that info from.
celeje celeje 12/14/2017 03:17
@Andre Schulz: Are you sure those opening-themed games against Stockfish were in the training phase?? I haven't seen that written anywhere. Did you just assume that or have you been told that by someone who knows? I think that was all _after_ the training phase.
peteypabpro peteypabpro 12/14/2017 02:23
Where do you get that no opening books were used? DeepMind said that AlphaZero didn't use one (obviously), but never said Stockfish didn't.
fons fons 12/14/2017 01:29
So Alpha Zero plays 1200 training games against Stockfish to see what openings work best, then plays the "official" 100 game match with Stockfish's opening book disabled?

What a joke.

Also: doesn't this invalidate the "learn from zero" headline?
I'm confused.

(Still a nice accomplishment for AI though, I guess we can be clear on that.)
mdamien mdamien 12/13/2017 09:49
Apart from the "match conditions" the AlphaZero team used to test their AI against Stockfish (that is, degree to which Stockfish was crippled for their testing purposes), and apart from the question of whether or not AlphaZero learned from training against Stockfish (thereby "learning" from its opening book) ... apart from these interesting points, the select games released from the match are simply astounding. If you did not know that the participants were computers, you might think AlphaZero was a reincarnated master from the Golden Age, prepped in modern theory: playing confidently with a material deficit when opposing pieces are out of play, assured by tactics that would be afforded his active pieces. Then, to realize that the defending side is not a player from the 1800's, but a machine heretofore known to simply make no tactical mistakes ... it's breathtaking. It is though Humanity has found its champion against the computers, and it's another computer -- albeit one that plays like Morphy.
Hhorse Hhorse 12/13/2017 08:50
Anyone with even a rudimentary understanding and experience of deep learning knows that "learning" can not take place without "training". Be it a simple vision problem e.g. recognize fruits or a more complex one as recognize speed of upcoming car. So to say that AZ just figured out space is important is like saying an object recognition algorithm just figured out speed of upcoming car is important without ever crashing! These days companies are eager to get marketing story way ahead of the engineering story and then try to play catch up. Most crash and burn. Some with deep pockets like the Big-5 have enough cash to burn and still remain standing. So until more information on the technology is available, even without the hardware specifics, this is just
a stunt well performed.
davidrimshnick davidrimshnick 12/13/2017 08:32
Where are you getting that Stockfish didn't have an opening book? Also how do you account for the fact that you could give Houdini or Stockfish days and they still wouldn't find Bg5? You are deluding yourself if you think that Stockfish could have won under different circumstances. Anyway, the AlphaZero of this week is probably already significantly better than last week. The days of handcrafting chess engines is done, deal with it.
Werewolf Werewolf 12/13/2017 06:40
It's impressive stuff from Alpha Zero, but it wasn't a fair fight.

I'd like to see Stockfish back under these conditions:
- A faster Dual Xeon Workstation with Hyper Threading OFF
- Much more Hash, say 32 GB rather than the silly 1GB used
- A tournament quality opening book
- Proper tablebases
- A longer time control where the engine can decide where to allocate time

I believe the 130 elo difference would mostly disappear if all these steps were followed.
HollyHampstead HollyHampstead 12/13/2017 05:56
I believe that the Deep Mind team have indeed solved intelligence. The traditional way for Chess (and other strategy games) programs to be developed is (in a simplified nushell) to identify the key features of the problem domain, such as material, mobility, centre control, etc etc (for Chess), and then to create an evaluation function in which each feature is given a numerical weighting to refelct its relative importance.

What I believe Deep Mind have done is to develop a system which can examine a problem domain and work out for itself what the key features are. In other words (for Chess) it is like a human who looks at the initial position, knowing the rules but nothing else, and then deciding what are the most important features and what are their relative weights. So just by examining the rules of the game Deep Mind's system can understand the problem and how to solve it. That is true intelligence, not artificial intelligence.

When, a few years ago, the London newspaper "Evening Standard" declared Demis Hassabis to be the second most important person in London, after Boris Johnson, they got it the wrong way round. What Demis and his team have done will contribute far more to mankind that Bojo could ever dream of.

I predict that Demis Hassabis will eventually be awarded a Nobel Prize for this achievement.
e-mars e-mars 12/13/2017 05:38
It is finally clear now that AZ played "training" matches against Stockfish before the "real" match. If I am not mistaken, that was one of the "obscure" points behind the impressive achievement. Hence, it didn't learn all by itself.