AlphaGo vs Lee Sedol: history in the making

History in the making

In order to appreciate just how extraordinary an achievement AlphaGo is, a comparison with chess programs, which have been punishing the elite for decades now, is in order. What is so special or different about Go that somehow resisted programming efforts and genius for so long?

What is Go

Woman Playing Go (Tang Dynasty c. 744)

Go, which translates to “encircling game” is a game whose roots and history easily rival those of chess, with written records going back to the 4th century BC. It is played on a 19 x 19 grid, with each player placing stones on the board. Black moves first, and then white, with the pieces never moving from their squares, though they can be removed if captured. The goal, as the translation of its name implies, is to have surrounded a larger total area of the board with one's stones than the opponent by the end of the game.

Korean couple, in traditional dress, play in a photograph dated between 1910 and 1920

This ultra-simplified introduction to Go is necessary to understand the complexities and challenges involved in programming it, compared to a game such as chess. Chess programming is dominated by the search and the evaluation function. The evaluation starts with the most fundamental aspect: the differing values of the pieces, while the search is about pruning down the number of moves to calculate and then looking ahead as many moves as possible to reach a quality decision. In Go, both of these are instantly problematic.

Comparing Go and chess programming

The search function in chess engines boils down to selecting a number of moves and steadily looking deeper and deeper. At the beginning of a chess game, White has twenty possible moves. After that, Black also has twenty possible moves. Once both sides have played, there are 400 possible board positions. Go, by contrast, begins with an empty board, where Black has 361 possible opening moves, one at every intersection of the 19 by 19 grid. White can follow with 360 moves. That makes for 129,960 possible board positions after just the first round of moves. It is easy to see that even with the most severe pruning techniques, a program would only be able to see ahead a few moves at best. However, the situation is even worse than that, since while the average chess game is roughly 60-80 moves (white moves and black moves), the average game of Go lasts 200-240 moves.

An overview of the third game from AlphaGo vs Lee Sedol. White won it via resignation,
hence the W+Res in the result space after 176 moves

Next comes the evaluation function, or determining what constitutes a bad position or a good position, to choose between moves. In chess this starts with the value of the pieces, where the king is priceless (capture it and win), the queen is worth nine pawns, a rook is five, and so on. This is then tempered by various well-defined aspects such as isolated pawns, centralized knights, and so forth. Go begins with no difference in value of any of its pieces, and the board situations are so large and complex that simple rules such as a doubled-pawn make no sense. Now that you understand why chess programming strategies have failed so abysmally in Go, what is the solution?

For a long time, there was none really, and as a result, until 2005-2006, the best programs in the world were weak amateurs at best, equivalent to a 1400-1600 player in chess. The comparison in levels in chess and Go is difficult due to the range of levels in Go, but that will be the topic of the final article on the match. Don’t think for an instant this was due to a lack of resources invested, since the promised payback was huge: tens of millions of Go players in East Asia who would line up to buy a strong program.

The Monte Carlo Tree Search revolution

The change came with the French programmer Rémi Coulom. Coulom was a programming prodigy who at the age of ten, less than a year after receiving his first computer, had programmed Mastermind. In four years, he had created an AI that could play Connect Four. Othello followed shortly thereafter, and by 18, Coulom had written his first chess program. The Frenchman eventually earned a PhD for work on how neural networks and reinforcement learning can be used to train simulated robots to swim.

Coulom had exchanged ideas with a fellow academic named Bruno Bouzy, who believed that the secret to computer Go might lie in a search algorithm known as Monte Carlo. Rather than having to search every branch of the game tree, Monte Carlo would play out a series of random games from each possible move, and then deduce the value of the move from an analysis of the results.

While Bouzy was unable to make the idea work, Coulom hit upon a novel way of combining the virtues of tree search with the efficiency of Monte Carlo. He christened the new algorithm Monte Carlo Tree Search, or MCTS, and in January of 2006, his program Crazy Stone won its first tournament. He published his landmark concept in a paper that changed Go programs, setting a dividing point for programs before MCTS and those after.

Remi Coulom (left) and his computer program, Crazy Stone, take on grandmaster Norimoto Yoda

This led to a revolution in Go programs that experienced a massive burst in strength, and now the very best version of Crazy Stone, running on a 64-core PC was able to hold its own against a pro, albeit 65 years old, with only a four-stone handicap in an exhibition game. Again, this is hard to quantify in chess terms, but it would probably be equivalent to a 2200 Elo performance or so. A long way from becoming world champion, but a massive leap forward from the recent heyday of weak amateur software. Until today, Crazy Stone has remained the absolute best Go program, together with rival Zenith Go, based on Coulom’s concept.

A screenshot of Crazy Stone, distributed by Unbalance

Where does AlphaGo come in the story? The problem with the Monte Carlo technique is that there was no obvious way forward. Doubling the CPU power does not lead to a significant increase in playing ability. In chess it equates to one ply, or 50 Elo, but not in Go. After all, gaining a move would require far more than just doubling the computing power, and even then it would be one move ahead in a game that will last over 200 moves. In other words, a drop in the proverbial bucket. Unless some other way to progress was found, a new wall had been reached in Go software progression.

Given this understanding, then even if Google and DeepMind were to somehow get the world’s most powerful supercomputer behind a refined version of Crazy Stone, it might be a bit smarter, and a couple of moves deeper, but nothing a world-class player need concern himself with. It is therefore not strange that Lee Sedol’s prediction of a 4-1 win at worst was even seen as generous…. for the program! Four days later, three wins by the program, and the sheer shock by the programming and Go community is understandable. Somehow DeepMind had conjured up some magic that brought Go software from weak master at best, to world beater! The question is simply: how?

AlphaGo and DeepMind

DeepMind Technologies, the developer of AlphaGo, was founded by British Artificial Intelligence researcher Demis Hassabis in 2010, and specialized in building general-purpose learning algorithms. Hassabis, a former chess prodigy in his own right, rated no.2 in the world in the under-14 category just 35 Elo behind Judit Polgar who was no.1, has always had a particular fascination for games, and not just chess. He won the world games championship a record five times, being also an expert in Shogi, poker and Diplomacy to name a few.

Demis Hassabis, Lee Sedol, and Sergey Brin, co-founder of Google (photo: Google DeepMind)

DeepMind Technologies's stated goal is to "solve intelligence", which they are trying to achieve by combining "the best techniques from machine learning and systems neuroscience to build powerful general-purpose learning algorithms". As opposed to other AIs, such as IBM's Deep Blue or Watson, which were developed for a pre-defined purpose and only function within their scope, DeepMind claims that their system is not pre-programmed: it learns from experience, using only raw pixels as data input. Technically it uses deep learning on a convolutional neural network, with a novel form of Q-learning, a form of model-free reinforcement learning. Their system has been tested on video games, notably early arcade games, such as Space Invaders or Breakout. Without altering the code, the AI begins to understand how to play the game, and after some time plays, for a few games (most notably Breakout), a more efficient game than any human ever could.

Excellent overview of Go and AlphaGo with explanations by project leader David Silver, and DeepMind CEO Demis Hassabis

David Silver, lead project manager explains how AlphaGo came into existence: "AlphaGo is actually around two years old, if we have to give it an age. It was a research project with myself and Aja Huang, and Chris Maddison, an intern from Google Brain. We wanted to ask this question, whether a neural network using deep learning can actually learn to understand the game of Go well enough to play reasonably. And so this was a pilot research project. We tried some experimental things, we tried a whole bunch of ideas, and around a year ago we published a first paper on this result and we discovered that actually the neural network by itself could perform remarkably well. It could actually reach the level of an amateur dan-level player without any lookahead at all. Without even adding any search tree in. When I saw this result, I was really taken aback. I'm an amateur player myself. I'm not a very strong player, but I'm aware as a player of the importance of reading out situations, and I kind of found it mind-blowing that a neural network without any explicit reading of the positions would be able to understand a position well enough to reach amateur dan level. And at that time I felt that this was something with a lot of potential and I sat down and talked with Demis, the CEO of DeepMind, and I said 'I really think someone is going to take these deep learning techniques and actually achieve the highest levels of play. I think it's really going to happen. This is something that's in the cards now', and he said, 'let's make sure it's us.' And he really powered up the project."

The end-result was published on January 27, 2016, in a paper in the journal Nature, revealing not only the existence of AlphaGo, but its incredible results by then. As can be seen above, AlphaGo was evaluated as being roughly 1000 Elo stronger than Crazy Stone. Is it any wonder no one could believe how strong it is?

The ground-breaking paper started with the statement:

"We introduce a new approach to computer Go that uses value networks to evaluate board positions and policy networks to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte-Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte-Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs.”

As of March, a match with the legendary player Lee Sedol was organized with a prizefund of $1 million. By now, DeepMind has effectively won the match with an incredible start of 3-0, though the fourth and fifth games will be played out regardless. It is not a complete whitewash for the machine though as Lee Sedol did strike back in game four to prove the machine was not invincible – yet.

The games are all streamed live on the internet with superb commentary by Chris Garlock and professional 9-dan player Michael Redmond, the only Westerner to ever achieve this rank. DeepMind has said it would donate the winnings to charity such as UNICEF.

After a shock loss, Lee Sedol exits the press conference (photo: Google DeepMind)

Broadcasting to the world

The reception and audience of the match has been incredible, with over 60 million Chinese watching it alone,
100 million people around the world, and no fewer than 3,300 articles in South Korea alone after game one.

Aside from the excellent live commentary provided on DeepMind's official YouTube channel in English, making it possible for even non-players to feel as if they understand what is going on, there are guest appearances by the AlphaGo team before the games start, and 15-minute video summaries of the games being posted after.

15-minute summary of the key Game 3, analyzed by Michael Redmond 9-dan professional and Chris Garlock
At DeepMind's official YouTube channel, there are summaries of the other games, as well as the complete archived
videos of the games.

However, the event's broadcast is hardly limited to this. It is also being streamed and broadcast freely in several languages everywhere in video and on Go servers. There are numerous live video streams, several in English, Chinese, Korean, Russian, you name it. Not only do the organizers not forbid them or restrict coverage, they encourage and applaud them, setting a wonderful example.

Also broadcasting live with Cho Hyeyeon (left), a 9-dan professional, is the official American Go Association,
providing superb commentary. On some days there is Kim Myungwan also a 9-dan pro. Great stuff.

The Japanese also have multiple channels with live commentary by pros

Even the Russian Go Federation brings commentary by strong amateur Natalia Kovaleva,
a 5-dan amateur, and Ilya Shikshin a 1-dan pro.

Needless to say, the major servers such as IGS (Internet Go Server), KGS, and others were
broadcasting to the many fans

In our final report on this historic match, we will bring info on the players, the game, how to play, and more. Stay tuned!

SHOP

SHOP

AlphaGo vs Lee Sedol: history in the making

ONLINE SHOP

Modern Classics

History in the making

Comparing Go and chess programming

The Monte Carlo Tree Search revolution

AlphaGo and DeepMind

Broadcasting to the world

Discuss

Fritz 20

The Centre Attack of the Ruy Lopez in 60 Minutes

ChessBase Magazine Extra 225

Reinventing the Ragozin

Ruy Lopez for the tournament player - A Complete White Repertoire against the Marshall, Berlin & Co

ChessBase Magazine 225

Master Class Vol 1 to 18

Master Class Vol. 18: Max Euwe

Pop-up for detailed settings