The Learning Curve for Chess Skill

by Robert Howard
4/15/2018 – ChessBase recently reported on the new artificial intelligence program “AlphaZero” and its astoundingly steep learning curve for chess skill. Given just the rules of chess, in 24 hours of playing games only against itself, it improved to a superhuman level. Along the way, AlphaZero discovered and ultimately abandoned standard openings such as the French and Caro-Kann Defences. | Pictured: The learning curve projection for World Champion Magnus Carlsen overlayed on the actual rating trend.

ChessBase 14 Download ChessBase 14 Download

Everyone uses ChessBase, from the World Champion to the amateur next door. Start your personal success story with ChessBase 14 and enjoy your chess even more!


Along with the ChessBase 14 program you can access the Live Database of 8 million games, and receive three months of free ChesssBase Account Premium membership and all of our online apps! Have a look today!

More...

The Power Law applied to chess

Compared to Deep Mind's AlphaZero, the human learning curve for chess skill is much shallower and develops over many years. But what does it actually look like? Does skill typically improve at a constant rate? Are there lengthy plateaus with no improvement, perhaps eventually overcome by study, coaching and playing in stronger tournaments? Is a player’s skill trajectory and maximum performance level readily predictable from early on? Is the learning curve shape the same for very talented and less talented players? Are “J-curves” common, whereby performance declines soon after rating list entry but then improves greatly?

book coverRatings profiles over time are interesting but are insufficient to answer these questions well. In his book “The rating of chess players past and present”, Arpad Elo presented rating trajectories over age of some famous players. He mostly found curves that initially increased, peaked around age 35 and then declined, as the mental slow-down and decreased ability to cope with novelty that comes with ageing setting in. However, chess learning curves best are examined over number of rated games instead of time and with players taken to their maximum performance limit, and only up until the typical peak age of 35. Playing rated games seems to be the major determinant of skill level and curves over time and games can differ greatly. A player might play few or no games over a few years but many in another span. A lengthy plateau over time may mean few games being played instead of no improvement, as it would over number of games.

Researchers in psychology and economics have extensively studied learning curves for many different skills. In general, skill development tends to follow a similar pattern, regardless of the skill’s complexity and the timescale. Improvement, usually measured by time to perform the task and/or by performance accuracy, typically is rapid at first and then its rate progressively declines until a maximum performance level is approached. Most improvement thus occurs early on. Lengthy plateaus followed by rises can occur, perhaps when the learner discovers a new way to do the given task, but are not that common. For very complex skills such as solving differential equations, carrying out scientific research or piloting a modern jet fighter, learning curves may be flat for many persons because the task is just too difficult.

GaltonIn the 19th Century, polymath Francis Galton [left] proposed that natural talent for a given skill determines the rate of gain and the asymptotic performance level. He likened skill development to gains after beginning a weight-lifting program. Initially, muscle strength can rise rapidly with exercise but gains begin to level off and eventually cease. The more genetically talented improve faster and reach higher maximum strength levels.

When a skill asymptote is reached after extensive practice, performance can become automatic, performed without conscious awareness. For instance, a skilled touch typist can operate on automatic pilot. Ask where the letter p is on the keyboard and he or she may go through the motions of typing the p key, and watching where the fingers go. 

Chess playing at a high level is a “hyper-complex skill”, without a real automation phase, like such skills as running a large corporation in a changing economy or commanding an army in battle. Continual monitoring of changing conditions, adaptation to change, creativity, calculation of future possibilities, and other factors are needed to avoid “Kodak moments” where automatic responses fail.

Certainly, some component skills of chess playing can become automatic, such as recognizing a smothered mate pattern or playing a simple king and pawn ending with an outside passed pawn. Indeed, some players conduct blitz games in automatic mode, using only “chess intuition”; not calculating, planning or speculating on what the opponent is up to. But in slow games against strong opponents, much conscious thought is needed, even for highly practised players.

Interestingly, research shows that the typical learning curve for many skills often can be described reasonably well by an equation called a power function, expanded on below. This function even has a name; the “power law of practice”. With a simple skill such as making cigars from paper and tobacco, performance time typically drops rapidly at first and then begins to level off according to the power law, slowly approaching the physical limit of how quickly the task can be performed. The power law can well describe the development of complex skills such as using a new surgical technique. Surgery performance time progressively decreases with number of initial operations according to the power law, and perhaps alarmingly, patients suffer progressively fewer complications in the surgeon’s later operations.

The power law is used in industry for making predictions about how much further training is needed until an acceptable performance level is reached. How many operations does a surgeon need before complications are unlikely? Which of several fighter pilot candidates is likely to reach the higher performance level and after how much practice? A factory manager may use the power law for costing. Say a new production process is used to produce X units of a new product. Some workers will learn their new task faster and the actions of all need to be coordinated. Overall factory time to produce successive units can be predicted to some degree with the power law.      

Projecting Elo ratings

A few years ago, I did a study of chess skill development with 387 FIDE-rated players. Most were grandmasters. They entered the FIDE rating list at age 14 or less in July 1985 or later, when FIDE first reports game numbers and had played at least 750 FIDE-rated games. I determined ratings in categories of approximately 50 games and fitted various equations to their pattern of development. On average, they tended to reach near asymptote after about 750 rated games. Some players improved very little from list entry. Some improved a great deal and their development pattern was well described by the power law. Less talented players tended to peak earlier than 750 games and then started to show a performance decline, perhaps because various life events intervened or they realized the limits of their natural talent.

But the study had some drawbacks. One problem is that players usually learn the moves years before getting an official rating, and it is difficult to count all unrated or nationally rated games. Perhaps some players improved little because they had been playing for many years already in nationally rated tournaments and had already reached their talent limit by list entry.

To get around these problems, recently I did another study. I used FIDE rating data of all individuals who entered the list from July 1985 at age nine or less (instead of age 14) and who had played at least 1000 FIDE-rated games by July 2016 and who gained at least 300 rating points since list entry. The 300 point requirement gives much performance upside to see curve shapes, eliminating players who improved little. There were 23 such players. They would have been playing a few years before FIDE list entry and some entered the rating list at a high level. Nevertheless, it is interesting to see the learning curves from near their career starts.

Figure 1 (below) shows the mean curve of all 23 players to 1049 games, with ratings in approximate 50 game categories, and with 21 data points in all. The large dots are the actual average ratings and the line shows the fit of the power law. The curve starts at 49 games rather than zero. The fit is almost perfect, as most points are very close to the theoretical curve generated from the power law equation. Most improvement happens early on and then performance starts to level off toward a near asymptote after around 1000 games. No individual showed a J-curve.

Figure 1

Figure 1

The power law can predict where the curve will go from relatively early in practice. Figure 2 shows the Figure 1 power law curve (derived from data up to 1049 games, using 21 data points) and a second curve based on just four data points; from ratings only at 49, 99, 149 and 199 games. In other words, taking the average ratings at only up to 199 games, I fitted a power function to these four data points and used the equation to project the curve out to 1049 games. The two curves are very close, with only a slight under-prediction by the projection from 199 games. The predicted rating at 1049 games from the curve based on four data points to 199 games is about 2536, quite close to the actual value of about 2549. So, the power law gives a reasonable idea of where the ratings will go from just 199 games.  

Figure 2

Figure 2

Individual learning curves can be more ragged than this average curve but the power function still can fit well. Figures 3 and 4 give data of the two highest rated players of the 23. They learned the moves at age 4 and 5 and played a few years before entering the FIDE rating list. Both learning curves are good fits to the simple power function and it gives a good idea of where the curves will go from 199 games.

Figure 3

Figure 3

Figure 4

Figure 4

Figures 5 and 6 show that the curves for each from Figures 3 and 4 and the predicted curve from 199 games. In both cases the curve from 199 games slightly under-predicts ratings at 1049 games, but not by much.

Figure 5

Figure 5

Figure 6

Figure 6

Figure 7 gives another example; the FIDE rating learning curve for Magnus Carlsen, up to July 2016. He learned the moves at age five years, took the game up seriously at age eight and entered the FIDE rating list in 2001 at age 10.

For the first 199 games, the learning curve is very steep indeed, verging on the vertical. But the rate levels off and peaks around 1000 games. His very impressive top rating of 2882 still is the record. The simple power function fits all the data well. However, the prediction of the equation based on data up to 199 games is not so good, so here the power law does not work so well. It predicts a rating of 3023 at 749 games versus the actual value of 2826 and of 3174 at 1049 games instead of the actual 2853.

Figure 7

Figure 7

Using mathematics to describe and predict complex and variable human behaviour can be dicey but is useful with skill learning, within limits. The power law works best with averages over many individual curves but can help describe and predict chess learning curves for some individuals when data are available from early in their careers.  

Equations applied to learning curves

Researchers have applied many different equations to learning curves. The power law has several complex versions with added parameters for the amount of previous experience and maximum performance level, but these do not work well with FIDE data. The simple version used here is

Y = a* Xb

where Y is rating, X is number of games, and a and b are fitted parameters. A is the curve starting point and b is the rate of change. If b is negative, the curve goes downwards, and if it is positive, the curve goes upwards. For example, the equation for the mean data of 23 players up to 1049 games is as follows; rating = 1580 times number of games raised to the power of 0.069.

Microsoft Excel will fit the power function to data. Input the cumulative games count in one column and the ratings at each count in another, click “insert a line diagram” and then click  “fit trend line” and “fit power function” (depending on the Excel version). The equation with fitted parameters is displayed on the diagram and can predict future performance by plugging in different game values.

I have only tested the power law with data up to at least 199 games and with FIDE ratings from players who enter the domain quite young. How well it works with rated games from a national federation is uncertain but is worth a try. As noted, the power law works best with grouped data of many individuals so predictions from a few games for any one person should not be taken unduly seriously.


Bibliography

Elo, A. E. (1978). The rating of chess players, past and present. New York: Arco. (2008 edition, New York: Ishi Press.)

Gaschler, R., Progscha, J., Smallbone, K., Ram, N., & Bilalic, M. (2014). Playing off the curve- testing quantitative predictions of skill acquisition theories in development of chess performance.  Frontiers in Psychology, 5, 923.

Howard, R. W. (2014). Learning curves in highly skilled chess players: A test of the generality of the power law of practice. Acta Psychologica, 151, 16-23.

Howard, R. W. (in press). Development of chess skill from domain entry to near asymptote. American Journal of Psychology.

Papachristofiet, O., Jenkins, D.  & Sharples, L. D. (2016). Assessment of learning curves in complex surgical interventions; a consecutive case-series study. Trials, 17, 266.

Links



Robert Howard holds a PhD in psychology from the University of Queensland in Australia and has research interests in human intelligence, learning and memory, and in the development of expertise. He has carried out many research studies examining expertise in general, using chess data. Until recently, he taught at the University of New South Wales in Sydney. He has authored five books, the latest being Islands in the Orient Sea: Travels in the Edgy 21st-Century Philippines, published in 2012.
Discussion and Feedback Join the public discussion or submit your feedback to the editors


Discuss

Rules for reader comments

 
 

Not registered yet? Register

celeje celeje 4/24/2018 03:19
I hear there's a CB article coming soon on Leela.
Hope it's ACCURATE this time.
celeje celeje 4/21/2018 04:47
@ fons3:
Dunno how you got that idea. I never said anything about failing. Why do you read my comments that way???
We know the numbers re. AZ training time. Must I pretend it's not a concern or else be "on a mission to prove lc0 will fail"???
I said what I hoped for.
The only failure in all this is DeepMind's. That's why lc0 etc is interesting. It's all we've got.
fons3 fons3 4/21/2018 01:40
@ celeje: I dunno, you seem to be on a mission to prove lc0 will fail, whatever that means.
celeje celeje 4/20/2018 03:13
@ fons3:

I don't expect anything. It doesn't matter what people think except if it changes what happens. It's all about computer time. So they're right to worry about that.
People always want to do stuff they don't realize means dropping the 0 in lc0. But the zero is the most important part. Maybe the only important part.
Of course some zero project will remain. But if there are forks, then there's even less computer time for each.

What would you like out of lc0? Have you got involved?
Leela's not the only one. But I don't know if any other has had community support = computer time.
fons3 fons3 4/20/2018 02:26
@ celeje:

This is an open source project, so yes people will ask questions, some of them skeptical. That's how this works. Everybody contributes and things are discussed.

What do you expect, endless comments of people slapping each other on the back, telling each other how great they are or how great lc0 is? From what I've seen overly pessimistic or critical comments are usually corrected by other users and very often they are the result of not properly understanding how this stuff works.

Ultimately this is an experiment and nobody can know for sure how strong lc0 will eventually become. But it's the million dollar question so it's discussed a lot.
celeje celeje 4/19/2018 02:39
@fons3:

>>> The real elo of lczero is in its name
>>That was obviously a joke, which you can see if you open the thread.

A joke of course. But real concerns behind the joke. I read the whole thread, not just the title.

>>> They started too optimistic so they're now pessimistic.
>>Could you link to an example of this because that is not the impression >>I'm getting.

Will try but hard to find again maybe because of multiple forums etc. & do google group posts have search? One thread somewhere questioned whether they should continue at all. People in it asked if they'd been fooled.

I mean, questions:
How strong is AZ?
How strong should Leela be now?
How strong should Leela be far but not too far from now?

I guess some overestimated the first 2 big-time. Else there's no reason for them to worry about what Leela does now.

What I'd like: Same-size NN as AZ. Follow AZ as closely as they can. At least 10 million training games. Then they're past the point where AZ mostly flatlined. Then just see what Leela's like.
Dunno how upgrading NNs from smaller sizes complicates things. Agree bug-fixing on the fly is a problem.
fons3 fons3 4/19/2018 12:26
@ celeje:

>> Yeah, maybe Gary L. can clarify what he did.

Why not just ask?

>> The real elo of lczero is in its name

That was obviously a joke, which you can see if you open the thread.

>> They worry Leela will never learn tactics or openings.

I think that people are mainly worried about bugs potentially messing things up or holding things back. It's still early days relatively speaking. Also this is complicated stuff so people are misinformed or just asking questions.

Just play trough some of the matches, for example: https://lichess.org/study/kLXrkT4M and it seems to me that lc0 knows how to handle openings or tactics well enough.

>> They started too optimistic so they're now pessimistic.

Could you link to an example of this because that is not the impression I'm getting.

What I do see is a lot of people not properly understanding this project, which is understandable because this is extremely complicated stuff. It's tempting to suggest all sorts of stuff without properly understanding all the ramifications.
celeje celeje 4/19/2018 06:36
@ fons3: Yeah, maybe Gary L. can clarify what he did. Someone asked on one of those chats & no one replied. Didn't think you can make the old NN a subset of the new NN, so dunno.

I wasn't pessimistic. Before thinking about NN size etc. I saw they had 5 million games already & said I don't see why they cannot equal AZ now, so everyone can just forget about AZ & find out everything from Leela. Perhaps still true if they upgrade to same-size NN.

talkchess has threads like: "The real elo of lczero is in its name". e.g. They worry Leela will never learn tactics or openings. Then they start arguing about whether AZ is legit.


Maybe depends what you mean by pessimistic. They started too optimistic so they're now pessimistic. I know the thing I'm typing this comment on would need to have been running since the Roman Empire times to match the computer training time AZ had. (No joke.) So I'm optimistic about what Leela's done so far.
fons3 fons3 4/18/2018 04:35
@ celeje: I don't think that the new network is untrained, that would show up in the progress chart.

Pessimistic? I dunno, seems like you're the one who's pessimistic.
celeje celeje 4/18/2018 11:57
@ fons3:

Yes, I saw they upgraded once. It raises questions about how they do this upgrade. It's replacing the old 'brain' with a 'bigger brain' but the new brain is then untrained. So did they retrain the new one with all the old games?

It's smaller than AZ's. So this may be a problem. I should take back my confidence about Leela's final performance. But this makes sense. I wondered before how they would get enough CPU time. Clearly they needed to use a smaller NN because of that.

Some chat on those forums seems pretty pessimistic. Maybe they just expected too much. Computing power really matters.
fons3 fons3 4/18/2018 11:20
@ celeje:

I was referring to this:

Gary Linscott:
We are on 10x128 now!
There will be a slight drop in game speed on GPUs, so you should run a full tune with the new network. CPUs take more of a hit unfortunately. But the new network should have a pretty high ceiling hopefully :).
10x128 means 10 blocks, 128 filters. It roughly quadruples the power of the 6x64 net.

Jesse Jordache:
what does 10x128 (or 6x64) refer to?

jkiliani: The size (and representation power) of the neural network. The larger the net, the better it can evaluate positions, so you increase playing strength by upgrading the architecture, even if it loses speed.

From: https://groups.google.com/forum/#!topic/lczero/c9pkjRVY144


LC0 is already notching up ~2700 performances:
http://talkchess.com/forum/viewtopic.php?t=67147&highlight=lc0
brabo_hf brabo_hf 4/17/2018 04:59
On my Dutch blog I showed how the learning curve looks for amateurs using Belgian rating data which works the same way as fide rating data: http://schaken-brabo.blogspot.be/2017/02/ervaring.html
I used statistics of all +4000 players so much more than what was used in above report.

However I came to similar conclusion as the author Robert Howard. "On average, they tended to reach near asymptote after about 750 rated games."
I wrote in my article that between 400 and 800 rated games there is only a marginal gain of 60 points. In other words above 800 for amateurs we see no real progress anymore.

Much more can be found on my blog about it.
celeje celeje 4/17/2018 11:44
@ fons3.

We can't judge learning, only performance. And for performance we can only squint at tiny graphs in their paper.

In general you shouldn't take anything DeepMind says too seriously. But the Go performance curve in the paper was different. It's still going up very slowly. So that claim is okay for Go. The chess performance curve is a flat line.
The shogi performance curve is sort of flat but jittering up & down.


The network does not get bigger. You have to choose its size at the start. You can choose a huge one to begin with if you want. But then it takes more computer time to train. Need to see if Leela chose the same size as AZ.
fons3 fons3 4/17/2018 11:30
@ celeje: >> But AZ flatlined long before the end.

I dunno. Was it performance or learning that flatlined?
At any given strength there will be a performance limit.

https://futurism.com/deepmind-never-found-the-limit-of-alphago-zeros-intelligence/
DeepMind CEO and co-founder Demis Hassabis, speaking at Google’s Go North conference, said, about AGZ, “We never actually found the limit of how good this version of AlphaGo could get. We needed the computers for something else.”

Of course I suppose at some point there will be a practical hardware limit. (As the network gets smarter it gets bigger and the computational cost increases.)

But I am no expert. ;)
celeje celeje 4/17/2018 10:39
@fons3: Thanks.

Re. rating curve, I think it just needs 2 things to be remembered. Its 0 rating is random play. It's a curve from self-testing results, not testing against other programs.

Heard about it at the start but thought they'd struggle to get enough CPU time. That's why I wonder about TC. The curve shows 5 million training games already. AZ had 44 million in total. But AZ flatlined long before the end. Seems no reason now why Leela won't equal AZ's performance & everyone can then ignore AZ and look at Leela.

Re. coverage on chess websites: it's still early days for Leela.
& Leela has no propaganda machine to manipulate media.
fons3 fons3 4/17/2018 10:14
@ celeje: Here you will find more info on the project:
http://chessprogramming.wikispaces.com/LCZero
https://github.com/glinscott/leela-chess/blob/master/README.md

I think it's best to go to the talkchess forums for questions.

I'm surprised this project hasn't gotten any coverage yet on the major chess websites.

As to the rating curve for LCZero from what I have gathered it can not be interpreted in the usual way and needs an explanation from an expert.
Martin Fierz Martin Fierz 4/17/2018 10:06
Your power law fit makes little sense to me. A fit should always be correct asymptotically, and the power law fit is clearly not, as all players level off and their elo goes towards a horizontal plateau before the decline kicks in. Your power law predicts an always increasing elo. I would try to fit with a function that has a horizontal asymptote!
celeje celeje 4/17/2018 09:03
@fons3:
Thanks for link.
Do you know much about what they're doing with Leela?
e.g. are self-training games same TC as AZ used?
Is total CPU time used shown somewhere?
A bit surprised Leela can get so much CPU time.
fons3 fons3 4/16/2018 05:52
How does LCZero's progress fit in? ;)

http://162.217.248.187
jmafoko jmafoko 4/15/2018 09:50
amusing. esp that there is order in the midst of chaotic data.
wb_munchausen wb_munchausen 4/15/2018 09:28
I would be interested in the graphs representing amateur players' improvement over time or # of games.
adbennet adbennet 4/15/2018 08:30
Magnus Carlsen could perhaps have peaked at 3100 if he had had a bunch of 2999 players to play against. The alternative, as the author suggests, is not to take the prediction too seriously for an individual player.
1