Standing on the shoulders of giants

The start of the revolution

When you think of AI or machine learning you may draw up images of AlphaZero or even some science fiction reference such as HAL-9000 from 2001: A Space Odyssey. However, the true forefather, who set the stage for all of this, was the great Arthur Samuel.

Samuel was a computer scientist, visionary, and pioneer, who wrote the first checkers program for the IBM 701 in the early 1950s. His program, "Samuel’s Checkers Program", was first shown to the general public on TV on February 24^th, 1956, and the impact was so powerful that IBM stock went up 15 points overnight (a huge jump at that time). This program also helped set the stage for all the modern chess programs we have come to know so well, with features like look-ahead, an evaluation function, and a mini-max search that he would later develop into alpha-beta pruning. So while he may have been one of the forefathers of chess engines such as Stockfish, how does that give him credit to AI?

Arthur Samuel plays checkers with an IBM 704 computer in Poughkeepsie, New York | Photo: history-computer.com

Arthur Samuel was not content to simply develop the world’s first checkers program, and he began to develop the first techniques of actual machine learning — a term he himself coined — including experiments in which his program played itself thousands of times to try to improve. He laid the groundwork for future developments in the field of reinforcement learning, and published his seminal paper “Some Studies in Machine Learning Using the Game of Checkers” in July, 1959. His work began with rote learning at first, but soon began working on techniques that would become the precursor to Temporal Difference Learning, which would lead to a pivotal point in reinforcement learning 30 years later.

While Arthur Samuel’s work still developed hand-crafted values the program would fine-tune, it wasn’t until 1992 that the first truly ‘Zero’ neural network was developed for a game: Gerald Tesauro’s groundbreaking TD-Gammon. The image on the left, from IBM's Think Magazine, December 1992, shows Gerald operating his Backgammon program.

TD-Gammon was the first time true model-free reinforcement learning (this is what Deep Mind’s ‘Zero’ means) was applied to a board game, and from the very start it produced extraordinary results. The idea and theory of model-free reinforcement learning is that the neural network, the AI’s ‘brain’, starts with bare bones knowledge of the game, and must learn how to play it entirely on its own.

The concept was not invented by the IBM researcher, and even he was using the work of Richard Sutton [pictured right from his home page], a distinguished research scientist at DeepMind and a professor of computing science at the University of Alberta.

Nevertheless, Gerald Tesauro was the first person to apply model-free reinforcement learning to a board game. TD-Gammon used a combination of techniques to create a neural network model that learned from pure self-play games from which it developed its own strategies. Prior to it, backgammon programs had been pitifully weak, trounced by experts with ease. TD-Gammon developed a very fine positional sense and its strategies were to become the foundation of modern backgammon theory. After 1.5 million self-play games, it had peaked at a standard that could challenge the best players of the day. Two-time backgammon world champion Bill Robertie wrote a book about his time studying and playing with it called “Learning from the Machine”. This concept was to be the same one that informed Matthew Sadler’s book on AlphaZero Game Changer over 25 years later. Also, much like DeepMind was to do decades later, Dr. Tesauro published his work and all the details in a scientific paper, “Temporal Difference Learning”, in ACM in 1995. Today you can find a version ported for Tensorflow.

This changed everything, and it led to the birth of ever greater backgammon neural networks that could provide world-class competition as well as world-class analysis. The first great program to follow and raise the standard was Jellyfish, after which came Snowie, and even a magnificent open-source project: GNU Backgammon, which to this day is the second strongest backgammon software available. It too can be found at its source site. For documentation, refer to my online manual, “All About GNU”.

DeepMind, Go and Leela

Curiously, in spite of the enormous success in backgammon, there was no follow-up for other board games, regardless of whether they were successful or not. In chess, the oldest truly developed neural network is Stoofvlees, a 2007 program by Gian-Carlo Pascutto, trained from grandmaster games. While considerably ahead of its time in concept, it lacked the hardware needed to run at speeds that might make it truly competitive. Years later, another neural network effort would be made: Giraffe by Matthew Lai, who soon joined the DeepMind team that developed AlphaZero.

AlphaGo logo

It wasn’t until 2016 that we saw a repeat of history of sorts as DeepMind used a new and powerful technique known as Deep Reinforcement Learning and produced the breakthrough program AlphaGo. Although it was already able to demonstrate clear superiority over one of the top Go players in the world — the Korean genius Lee Sedol — it still used a combination of human games and self-play games. The project, headed by David Silver, published a paper, and it was an immediate bolt of lightning to the Go world as top commercial programs such as Crazy Stone all came out with Deep Learning versions based on the new techniques.

Online, a new free program came out with a nice interface that also implemented these new techniques and its author was none other than Gian-Carlo Pascutto (right). He had already written a top engine called Deep Sjeng using the standard engine techniques that were employed by Fritz, Rybka, Houdini and others, but had since moved into computer Go. Here was a field that had been wide open prior to AlphaGo, with barely a program that could play at master level, much less world champion caliber. The name of his pet project was Leela.

Then in 2017, DeepMind took a bold step forward by using an advanced neural network structure invented by Microsoft called Residual Networks, or ResNets for short. In their paper, DeepMind attributes no less than 600 Elo in improvement to this structural change. It also marked a shift toward the model-free reinforcement learning used by Tesauro, which they coined ‘Zero’. This meant that the new Go program was now free of any knowledge or biases, and was taught nothing beyond the rules of the game. Everything it learned would be the product of its massive self-play and the conclusions it reached on its own. Tens of millions of games later, AlphaGo Zero, as this new version was called, was head and shoulders above AlphaGo, which had already been the unchallenged number one. Analysts that included some of the best players in the world were blown away. They saw clear areas where the groundwork for new theory was being laid out, and thanks to the fresh scientific article detailing it all, there was now a roadmap to this Holy Grail of Go.

Again, Gian-Carlo Pascutto took up the gauntlet, but faced a new challenge: resources. While DeepMind might boast supercomputers to generate the tens of millions of games needed to create their god of Go, Pascutto had no such option, and the problem was daunting. Consider that even with the most powerful computer on the market, with the most powerful GPU (graphics processing unit) needed to accelerate the work, only a few hundred games per hour would be created. There might be a roadmap to the Holy Grail, but it would require a decade to get there. He therefore created a new open-source project called Leela Zero and added a brilliant idea: a client program anyone could download, which would generate self-play games and automatically send them to a centralized server. The idea was to leverage the computer power of fans and dreamers alike and let them help build this community-driven AlphaGo Zero neural network available to all. Needless to say, the actual search and training algorithms were those published by DeepMind in their papers. Pundits from chess were fascinated, envious, and in complete agreement: these newfangled neural networks might solve Go, which is more pattern recognition than brute calculation, but they would not work for chess. Right?

AlphaZero and Leela Chess Zero

To call it a clap of thunder in a clear blue sky would be a gross understatement. When DeepMind announced AlphaZero in December of 2017, a new improved version of their AlphaGo Zero template, but this time applied to three games: Go, shogi, and… chess! The result was slack-jawed chess players around the world. DeepMind claimed to not only have produced a neural network that played chess at the highest level, but that did so in extraordinary conditions. First of all, it used almost 1000 times fewer nodes per second than the reigning number one Stockfish, which would mean at least 500 Elo lost if it were Stockfish itself trying this stunt. Second, it played positional chess to make a top GM weep with joy, with speculative plays and attacks that no conventional engine would try (unless it were configured with suicidal settings).

Had this been their first announcement they might have been met with derision and disbelief, but with the track record of AlphaGo behind them, it just set chess players dreaming. Sergey Karjakin, the World Championship challenger of 2016, said that he would pay a million dollars to have AlphaZero. It wasn’t important whether this was a genuine monetary offer, the spirit of the comment was widely shared. Once again, DeepMind published the recipe to their success.

Demis Hassabis

An outsider might wonder why DeepMind would even bother given the massive success and publicity they had garnered from AlphaGo, but the key lay in the founder of DeepMind: Demis Hassabis. Hassabis had been a chess prodigy in his own right, reaching master level (Elo 2300+) a the age of 13. While he may have strayed from the chess path as a professional player, his first love was not to be forgotten. With the success of AlphaGo Zero, and the development pipeline having been so well fine-tuned by now, it was time to see if something special could be done in chess as well. The rest, as they say, is history.

Almost immediately after the news and publication of the first draft of the paper, someone tried to convert Gian-Carlo Pascutto’s open-source Leela Zero code to chess, but it was a non-functional mess. This time Gary Linscott (in image to the right) came to the rescue, a programmer who was a main developer and contributor to the chess engine Stockfish. He cleaned up the code, and with the help of some other skilled programmers equally desirous to see it work, they went about effectuating the transition from Go to chess. As one of them later commented though, without the Leela Zero codebase to work with, they were unlikely to have even tried to take on this mammoth project. I wrote about this development in my article Leela Chess Zero: AlphaZero for the PC in April 2018.

Within about a month the code had been tested to work, and giving credit where credit was due, it was called Leela Chess Zero. Just as in the case of Leela Zero (the Go program), a key component was to leverage the community resources via contributions using a client program that anyone could run from their computer. Again, the issue of producing sufficient games to train the project posed a major challenge. DeepMind claimed to have needed roughly 44 million games to train the AlphaZero chess neural network, and while they managed this feat in four hours with the help of 5000 super processors especially designed for AI development, the average Joe could come nowhere near this. LCZero had been born.

Since there was no desire to reinvent the wheel, a number of basic Stockfish pieces of code were used to enable the switch to chess, such as the move generator. Another more significant problem was the pure speed of the search and binary. Using their custom-made AI processors known as TPUs, DeepMind had not really needed to worry about speed. With just four they reached an average of over 60,000 nodes per second, while a user with a 1080ti, the most powerful GPU at the time, could barely hope for 2000 nodes per second with a similar network running in the LCZero binary. Two key contributors were to team up and solve these problems, Alexander Lyashuk and Ankan Banerjee.

Alexander Lyashuk [left, image from Chessprogramming.org], also known as Crem on the official Discord channel, is a lead programmer in the Leela Chess project, overseeing the contributions made by others and vetting the quality of the code to avoid having it bogged down in bugs and issues. He was also the original author of the new lc0 (as in Leela Chess Zero) binary that was to replace the old Stockfish code with clean code, which would no longer raise eyebrows regarding its origins.

Ankan Banerjee [image right from Chessprogramming.org] is a programmer who has written the drivers for Nvidia video cards, and with his singular knowledge and skills, hand-wrote code that leverages their speed, leading to a near five to ten times increase in speed. Suddenly that same 1080ti was producing upwards of 9000 nodes per second instead of the previous 1000-2000, and when the new generation came out supporting FP16 code, he wrote further code so that the new RTX 2080 was now flying at 30,000 nodes per second.

While many continued to fine-tune and debug the main binary, it still used the core search advocated and designed by DeepMind. Naturally, modern additions were included such as tablebase support, but it is still AlphaZero at its heart.

Progress was steady and not without hiccups. Some of the training parameters at first had to be guessed at and tested, since the pre-paper that DeepMind posted gave the essentials, but failed to share some seemingly minor, yet important details. Eventually, with time, the biggest issues were overcome, multiple iterations of the main network were trained, and it did realize its dream of defeating Stockfish under the rigorous conditions in TCEC, a highly respected online computer tournament.

Deus X and Fat Fritz

I entered the fray around March of 2018 as merely one more enthusiast, fascinated and confused. When I read about DeepMind’s disavowal of using outside content such as human or engine games, I was mystified. Granted computer Go before AlphaZero was almost rudimentary by comparison, so using engine games from other sources made little sense, but chess was different. It had enormous databases of incredibly high quality games, whether human or engine, which surely surpassed the quality of a self-play game played in seconds. I approached numerous experts and tried to convince at least one to undertake the experiment, but no one would bite. Not only was there an almost instant refusal based on the word of DeepMind, but there was a secondary school I came to dub ‘zeroists’ who refused to even consider anything that was not ‘zero’. Meaning it had to adhere to the philosophy of zero outside input of any form.

As I explored the codebase it dawned on me that I might be able to try this on my own. That is the beauty of open-source, and in this case the tools were mostly in place. The training code used to train the Leela Chess network, itself based on the AlphaZero paper, would gladly accept games of any source, so long as they fit the format it expected. In theory, the binary could convert this, but since it was deemed a dead end, the code was broken and no one had any interest in repairing it. One contributor familiar with it came to my rescue, Dietrich Kappe, as he himself was playing with this idea and shared his fixes that allowed the PGNs to be converted to trainable content.

Another substantial challenge was that I had nowhere near the 44 million games AlphaZero used. This time scrutinizing the literature on neural network training I came across methods that differed from those used by AlphaZero, and by extension Leela, promising to accelerate development as well as the quality of the result. Over the course of months and constant experimentation, this and many other changes led to surprising results. This first neural network was called Deus X (i.e. Deus Ex Machina) and was trained entirely on human games, achieving a success beyond what I had hoped. Running in the lc0 binary as a new neural network module, it played in a strong distinctive style that was the result of its study of human games.

Encouraged by the feedback of observers and the Leela developers, I went on to pursue more ambitious projects. One big help was Daniel Uranga, an Argentine programmer, who made numerous changes to both the binary and the training code to aid my efforts. After all, the purpose of my efforts was not to reproduce AlphaZero or Leela, but try something different. Eventually this larger and more ambitious project began to shine and ChessBase threw in its support as well, which is what led to Fat Fritz.

"What on Earth is that?"

Still, much like every project in this article, while Fat Fritz stands unquestionably on the shoulders of giants, it too brings something new to the table, and will enrich your chess analysis and enjoyment.

SHOP

SHOP

Standing on the shoulders of giants

ONLINE SHOP

How to play the Anti-Sicilians

The start of the revolution

DeepMind, Go and Leela

AlphaZero and Leela Chess Zero

Deus X and Fat Fritz

Links

Discuss

Fritz 20

Silence the Sicilian - Win with the Alapin Variation (2.c3)

Queen's Gambit Accepted Powerbook 2025

Queen's Gambit Accepted Powerbase 2025

Fritz 20 & Opening Encyclopaedia 2025

Master Modern Opening Strategy: Flank Attacks against Classical Openings

Rossolimo-Moscow Powerbase 2025

Rossolimo-Moscow Powerbook 2025

Pop-up for detailed settings