MuZero figures out chess, rules and all

by Frederic Friedel
12/12/2019 – Just imagine you had a chess computer — the auto-sensor kind. Would someone who had no knowledge of the game be able to work it out, just by moving pieces. Or imagine you are a very powerful computer. By looking at millions of images of chess games would you be able to figure out the rules and learn to play the game proficiently? The answer is yes — because that has just been done by Google's Deep Mind team. For chess and 76 other games. It is interesting, and slightly disturbing. | Graphic: DeepMind

ChessBase 15 - Mega package ChessBase 15 - Mega package

Find the right combination! ChessBase 15 program + new Mega Database 2020 with 8 million games and more than 80,000 master analyses. Plus ChessBase Magazine (DVD + magazine) and CB Premium membership for 1 year!



In 1980 the first chess computer with an auto response board, the Chafitz ARB Sargon 2.5, was released. It was programmed by Dan and Kathe Spracklen and had a sensory board and magnet pieces. The magnets embedded in the pieces were all the same kind, so that the board could only detect whether there was a piece on the square or not. It would signal its moves with LEDs located on the corner of each square.

Chafitz ARB Sargon 2.5 | Photo: My Chess Computers

Some years after the release of this computer I visited the Spracklens in their home in San Diego, and one evening had an interesting discussion, especially with Kathy. What would happen, we wondered, if we set up a Sargon 2.5 in a jungle village where nobody knew chess. If we left the people alone with the permanently switched-on board and pieces, would they be able to figure out the game? If they lifted a piece, the LED on that square would light up; if they put it on another square that LED would light up briefly. If the move was legal, there would be a reassuring beep; the square of a piece of the opposite colour would light up, and if they picked up that piece another LED would light up. If the original move wasn’t legal, the board would make an unpleasant sound.

Our question was: could they figure out, by trial and error, how chess was played? Kathy and I discussed it at length, over the Sargon board, and in the end came to the conclusion that it was impossible — they could never figure out the game without human instructions. Chess is far too complex.

Now, three decades later, I have to modify our conclusion somewhat: maybe humans indeed cannot learn chess by pure trial and error, but computers can...

DeepMind’s MuZero teaches itself

You remember how AlphaGo and AlphaZero were created, by Google's DeepMind division. The programs Leela and Fat Fritz were generated using the same principle: tell an AI program the rules of the game, how the pieces move, and then let it play millions of games against itself. The program draws its own conclusions about the game and starts to play master-level chess. In fact, it can be argued that these programs are the strongest entities to have ever played chess — human or computer.

Now DeepMind has come up with a fairly atrocious (but scientifically fascinating) idea: instead of telling the AI software the rules of the game, just let it play, using trial and error. Let it teach itself the rules of the game, and in the process learn to play it professionally. DeepMind combined a tree-based search (where a tree is a data structure used for locating information from within a set) with a learning model. They called the project MuZero. The program must predict the quantities most relevant to game planning — not just for chess, but for 57 different Atari games. The result: MuZero, we are told, matches the performance of AlphaZero in Go, chess, and shogi.

And this is how MuZero works (description from VenturBeat):

Fundamentally MuZero receives observations — images of a Go board or Atari screen — and transforms them into a hidden state. This hidden state is updated iteratively by a process that receives the previous state and a hypothetical next action, and at every step the model predicts the policy (e.g., the move to play), value function (e.g., the predicted winner), and immediate reward (e.g., the points scored by playing a move)."

Evaluation of MuZero throughout training in chess, shogi, Go, and Atari — the y-axis shows Elo rating | Image: DeepMind

As the DeepMind researchers explain, one form of reinforcement learning — the technique in which rewards drive an AI agent toward goals — involves models. This form models a given environment as an intermediate step, using a state transition model that predicts the next step and a reward model that anticipates the reward. If you are interested in this subject you can read the article on VenturBeat, or visit the Deep Mind site. There you can read this paper on the general reinforcement learning algorithm that masters chess, shogi and Go through self-play. Here's an abstract:

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

That refers to the original AlphaGo development, which has now been extended to MuZero. Turns out it is possible not just to become highly proficient at a game by playing it a million times against yourself, but in fact it is possible to work out the rules of the game by trial and error.

I have just now learned about this development and need to think about the consequences — discuss it with experts. My first somewhat flippant reaction to a member of the Deep Mind team: "What next? Show it a single chess piece and it figures out the whole game?"

Editor-in-Chief emeritus of the ChessBase News page. Studied Philosophy and Linguistics at the University of Hamburg and Oxford, graduating with a thesis on speech act theory and moral language. He started a university career but switched to science journalism, producing documentaries for German TV. In 1986 he co-founded ChessBase.
Discussion and Feedback Join the public discussion or submit your feedback to the editors


Rules for reader comments


Not registered yet? Register

Yasser Seirawan Yasser Seirawan 12/14/2019 05:51

It seems to me that the next step is for MuZero to devise its own board game. One that it would consider sufficiently complex. That is after many "millions of training steps" it would still be challenged-learning-improving at its own creation. It would create its own board (battleground) as well as two armies.

Once MuZero finished its task, it would then be up to us to figure out the rules of play.

besominov besominov 12/13/2019 12:35
Yes, but who is telling the program what moves are legal or not?

Like in the hypothetical example with the Sargon computer and the jungle villagers: in that case the board is telling the user what moves are legal or not.

So somebody who knows the rules is telling the program what the rules are, just not in a direct way.

So it does not really seem like such a big step from the previous "zero" approach, just a bit more generalized (and cumbersome).
XecutionStyle XecutionStyle 12/13/2019 10:34
The final hyperbole emphasizes what just happened.
Injecting knowledge was an important objective to learning, especially reinforcement learning, as it dramatically reduced the trials necessary for optimal policy. AlphaZero was given such knowledge through the rules of the game; thereon, its predictions were purely statistical: "based on my experience, this move given the current configuration yields a favorable outcome with a probability of...". It knew nothing of the game. It's like throwing a ball, and predicting motion through kinematics - knowing "why" i.e. Gravity, air-friction etc. was not necessary.
MuZero, in the ball example, models those entities involved such as gravity, and uses them to understand how the system behaves. Therefore its predictions are based on fundamentally answering "why", rather than "how" like AlphaZero.
The benefits reach beyond the obvious. Not only will it be safer (as simulating on a model of the environment is harmless compared to needing actual experience), require less experience (understanding gravity exists is far more useful than implicitly accounting for it using statistics), but with a model, we'll be able to explain most system-dynamics such as how a butterfly could/couldn't trigger a hurricane without needing data of every permutation involving butterflies and hurricanes.
ketchuplover ketchuplover 12/13/2019 04:09
regarding the jungle village example I say you can learn how the pieces move but not castling,en passant and pawn promotion.
Keshava Keshava 12/12/2019 08:23
"Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess..." Well, Lc0 is how Alpha Zero would currently do in fair competition (ran by third parties) against other top engines. On the CCRL 40/4 Rating List Lc0 is one of the top 4 engines and a close 2nd to Stockfish on the CCRL 40/40 Rating List - If you run your own tests you can usually find a way to get the results you want.