What makes problems and studies beautiful?
A computer program takes a look.
By Azlan Iqbal (with Harold van der Heijden, Matej Guid and Ali Makhmali)
At a Glance
Computers today can easily play chess at the grandmaster level but they
cannot tell a beautiful combination or move sequence from a bland one.
We humans tend to appreciate beauty or aesthetics in the game almost as
much as winning itself. A curious combination or spectacular sacrifice
will often gain our attention and praise, even centuries hence. In this
research, which has been on-going for seven years, we show that a computer
can indeed be programmed to recognize and evaluate beauty or aesthetics
in (at least) three-move mate problems and more recently endgame studies.
The computational aesthetic evaluations for these domains are experimentally-validated
and correlate positively and well with domain-competent human assessment.
This technology therefore presents us with the ability to data-mine beautiful
sequences from databases containing millions of sequences (too large to
be explored by human eyes), and also assist human judges in composition
Though the computational approach may seem different or even inadequate
compared to how human experts say they evaluate beauty, the results are
comparable. Just as computers play chess in a way quite unlike how humans
do, they appear to evaluate beauty differently as well. We provide a freely
downloadable public version of the award-winning aesthetics evaluation
program, Chesthetica Endgame (see below). While this version does not
feature the precise aesthetic scores that we used in our experiments,
it will rank aesthetically the sequences in a (PGN) database of three-movers
or studies in a way not unlike a human expert. In this article, we also
briefly consider deeper aspects of beauty perception in the game, some
directions for future work and certain wider implications computational
aesthetics research has beyond the domain of chess and artificial intelligence
Dr. Mohammed Azlan Bin Mohamed Iqbal (or Azlan Iqbal, for
short) – pictured above receiving his Ph.D. degree at the University of
Malaya's convocation ceremony in 2009 – is 34 years old and also has a
bachelor's and a master's degree in computer science from Universiti Putra Malaysia.
For the last ten years he has worked as a lecturer and senior lecturer in the
College of Information Technology, Universiti Tenaga Nasional (Selangor, Malaysia).
Four of those were spent as a part-time student, doing his Ph.D. in computer
science – specifically in the area of computational aesthetics, a sub-field
of AI. Presently, he also does research in the area of computational creativity.
Azlan doesn't have an official chess rating, but he puts himself, conservatively,
on the level of an infrequent club player. He also enjoys reading books on the
many different sciences and is a casual piano player.
Chess players are likely to have at least heard of the world of chess composition.
This includes, for example, mate-in-two (#2) and mate-in-three (#3) problems,
and also endgame studies which are typically a bit longer and do not necessarily
end in mate. Chess compositions – to use a collective term – can be considered
‘works of art’ because they are intentionally designed to feature unexpected
moves, themes or ideas that human players find appealing or beautiful in ways
that are not always easy to put into words. The following is an example of what
we mean by ‘beauty’ in the game. The solution can be found at the end of the
A. S. Gurvich, Bakinski Rabochi, 1927
White to play and win
Figure 1: A classic example of beauty in chess
It was this feeling that inspired me in late 2005 to undertake the topic of
chess aesthetics for my doctoral dissertation . I wanted to develop a computational
aesthetics model for mate-in-three problems; perhaps the most common type of
composition. White always won (checkmate) in three moves, against any defence.
Since even back then computers could already beat the best human players, why
couldn’t they also ‘appreciate’ or at least ‘recognize’ beauty in move sequences
like we do? Here was a research gap that should be filled, I thought. I demonstrated
experimentally (for the first time) that a computer could indeed identify beauty
in three-movers and do so in a way that correlated positively and well
enough with competent human player aesthetic assessment.
Not long after, I was contacted by endgame study expert Harold van der Heijden
(who had read one of my research papers) and he suggested that I look into adapting
or extending my three-mover aesthetics model to endgame studies. I thought it
was an interesting challenge that could further validate the model, so I applied
for a research grant for it. Matej Guid, an AI researcher and FIDE Master player,
also joined the team as we realized our research interests overlapped. We were
successful in obtaining a modest grant , and together with our research assistant,
worked on the project for about 18 months. Our results are published in full
In this article, we will attempt to summarize for our readers what we did and
roughly how. We have intentionally left out much of the technical details and
specifics to accommodate a broader audience; however, interested readers may
choose to obtain the final, published version of the article above – available
through most university libraries worldwide – and refer also to the associated
references within it for more complete information. At the end of this article,
and in the spirit that new technology should be made available to the public
as soon as possible, we have included a download link to the Chesthetica Endgame
(CEG) computer program installer package. It can be used to rank the aesthetics
of endgame studies (White to play and win), three-movers – and as preliminary,
presently unpublished research would suggest – two-movers as well. Single moves
and whole games remain of lesser interest and uninvestigated for now.
Advancing from the Three-Mover Model to the Studies Model
The characteristics of a beautiful composition are not explicitly defined –
perhaps it is impossible for humans to do so – but they are described in sufficient
clarity in several publications that date as far back as the early 20th century.
These characteristics or features include, for instance, the use of chess themes
(e.g. pin, fork), pieces that travel longer distances across the board, the
fruitful sacrifice of material, and the ‘economical’ use of pieces to achieve
a particular objective, to name just a few. Some chess themes are not easily
defined and can be quite challenging to detect computationally; in one case
this lead to the discovery of the ‘invisible fork’  by Chesthetica, the very
first experimentally-validated chess aesthetics recognition computer program
(it analyzed only #3 problems).
In the original three-mover aesthetics model (the essence of my doctoral dissertation),
seventeen features like the ones just mentioned were identified based on the
literature and painstakingly formalized, i.e. represented using mathematical
formulae dependent on the logic of the game. These formulae, in summation, were
used to derive a crisp numerical score for a problem that represented its perceived
aesthetics or beauty. Surprisingly to some in the field of artificial intelligence
(AI) and outside, no ‘machine learning’ was required. Experimentally, I tested
the model’s ability to differentiate between domains (forced three-movers taken
from real games and #3 compositions) and its inability to differentiate
within each domain. This means it was predicted that, aesthetically,
on average, the compositions would score significantly higher than the real
game sequences, but the real game sequences would score similarly to each other
and so would the compositions. These predictions proved true.
The computer’s scores were also tested against average human player aesthetic
ratings (using an online survey of hundreds of respondents from an online chess
community) and they correlated positively and well with them. It is perhaps
worthwhile to note that experienced human composers and regular (competent)
players usually have different ideas about what constitutes ‘beauty’ in a move
sequence. Composers tend to focus on what we call ‘depth appeal’ (e.g. a deep
idea that emerges after careful thought and analysis) whereas the majority of
players are more easily impressed with ‘visual appeal’ (e.g. a manoeuvre
that looks good and can be understood quickly). The computational aesthetics
model is more suited to the latter category; though a case could be made for
either type as to which constitutes ‘true’ beauty, without ignoring the spectrum
in between. Two contrasting examples are provided in Appendix A of our IEEE
paper cited above. Two new contrasting examples are provided here; the study
and its accompanying explication is courtesy of Harold.
Sergyi Didukh (Ukraine) & Siegfried Hornecker (Germany)
1st Honourable Mention Olympia Dunyasi 2010
White to play and draw
Figure 2: An example of ‘depth appeal’
At first sight the only plausible white move here is 1.bxa3? but this loses
as we will see later. The right key move is 1. g6! forcing 1…hxg6
and now 2…bxa3. Both sides rush their pawns to queen: 2…g5 3.a4 g4
4.a5 g3 5.a6 g2 6.a7 g1Q 7.a8Q Qg8+ 8.Kb7 Qxa8 9.Kxa8 Kc6 10.Ka7! Kb5 11.a4+!
Kxa4 (11…Kxb4 12.Kb6 Kxa4 13.Kc5) 12.Kb6! Kxb4 13.Kc6 Kc4 14.Kd6 Kd4
15.Ke6 Ke4 16.Kf6 Kf4. Here Black has a black pawn at h6. When White had
played 1. bxa3 immediately, the solution runs parallel, after which the black
pawn would have been at h7 instead! That is a major difference, because in the
solution White can now play 17.Kg6 drawing, while in the try 16.Kg7 fails
Pollock - Consultants, 1893
White to play and win
Figure 3: An example of ‘visual appeal’
In Fig. 3, we have a clear example of visual appeal where White wins through
a forced series of moves, albeit not necessarily immediately obvious ones. The
winning sequence is as follow: 1.Qd7+ (a queen sacrifice) Bxd7
2. Nd6+ (a discovered/double check so the king must move) Kd8
3.Nf7+ (a clean fork of king and rook even though that is hardly the
main objective here) Kc8 4.Re8+ (a clearance sacrifice) Bxe8 5.Rd8#.
A sequence like this does not require much effort or specialized knowledge beyond
the basic rules of the game to understand or appreciate.
Despite the experimental validations of the original model, there seemed to
be room for improvement beyond three-movers. In adapting the three-mover model
to endgame studies, we had to modify a few of the formalizations in such a way
that worked for endgame studies, but was still ‘backward compatible’ with three-movers.
We also learned that not all seventeen aesthetic features need to be summed,
but rather only a selection of the highest-scoring five or six. Perhaps most
curious was the seemingly necessary inclusion of a stochastic element (i.e.
some randomness) in choosing those five or six. Without some randomness, the
experimental results were still reliable, but not the best we could get. Furthermore,
the probability of choosing five or six needed to be split 20-80 or a 20% chance
of selecting the top five features and an 80% chance of selecting the top six
features. Anything else just did not seem to work as well. The statistical approach
we used to determine this relates to the expected ‘normality’ of the distribution
of aesthetic scores and what is considered an acceptable positive correlation
strength with human assessment; it is explained in Appendix B of our IEEE paper.
We performed similar experiments as was done with the original three-mover
model but were faced with the difficulty of finding analogous sequences in real
games that were similar enough to composed studies to be compared with them.
Our study expert, Harold van der Heijden, was indispensable in ensuring the
real game sequences we selected were as close as could be to studies, as forced
three-movers from real games were to three-move problems. The following is an
example of a ‘study-like’ sequence taken from a real game.
Meduna, Eduard (2450) vs. Schoeneberg, Manfred (2390),
Leipzig BKL, 1981
White to play and win
Figure 4: A ‘study-like’ sequence taken from a real game
This is the position after 62…Rg8-c8+? (other moves, like 62…Rg1, would have
drawn). 63.Nc4 (threatens 64.Ra5 mate) 63…Rc5! 64.Re6!! (threatens
65.Ra6+ and mate. But the similar 65.Re8 would have failed to 65…Kb5) 64…Rc7
(now 64…Kb5 65.Rb6+ Ka4 66.Rb4 mate. 64…Rh5 65.Ra6+ Kb5 66.Ra5+ wins the rook)
65.Rb6 (threatens 69.Rb4 mate; an artistic mate picture) 65…Rb7! 66.Nb2+
(avoiding 66.Rxb7 stalemate) and Black resigned.
The adapted model could now differentiate between (but not within) domains
for studies (three-movers were also tested again with success). However, since
the appreciation of studies is somewhat esoteric, we could not perform an online
survey with chess players and simply lacked the funds to elicit the cooperation
of many experienced composers. So instead we had our experts and team members
Harold and Matej represent the viewpoints of expert composer and player, respectively.
They were asked to rate aesthetically 30 randomly-selected composed studies
and 30 real-game sequences that resembled studies to be compared against the
computer’s evaluations. Correlation with each of them was positive and good;
better than the original three-mover model achieved. Correlation with their
average ratings (both Harold and Matej combined, divided by two) was even better.
Combined with the experimental validations of the original three-mover model,
the new model – which could also evaluate three-movers – looked even more convincing.
It suggested that a computer could indeed be used, at the very least, to draw
attention to potentially ‘interesting’ or beautiful studies and problems. Considering
the hundreds of thousands – perhaps millions – out there, one benefit is obvious.
Computationally, it would seem to break new ground in computer chess, if nothing
else. An interesting aspect of the new aesthetics model is that the score it
produces for a problem or study now is no longer ‘crisp’ or fixed as before
(due to the necessary stochastic element). This means that, given a ‘second
look’, the computer program could rate the problem or study slightly differently.
Individually, this represents a negligible difference in scores but in a large
collection of compositions or a small one in which some are of similar aesthetic
value, overall rankings can be changed. The precise scores themselves are therefore
not as important.
We would probably be faced with the same problem if a human judge were asked
to ‘re-rank’ a collection of studies, or if several judges were used for this
purpose. Following that, we cannot always expect to agree with their decision(s),
even if they bothered to explain themselves. In our IEEE paper, we put the famous
Saavedra position ‘under the microscope’ and based on the computer’s evaluation,
it would seem that it is not quite as beautiful on its own merits as most of
us would like to think. At first, this came as a surprise even to us because
the study is very well known. The computer could not explain itself – it certainly
‘knows’ nothing about the study to begin with – but knowing how the aesthetics
model works, we could understand its decision. For one thing, the main line
of the Saavedra is not quite ‘forced’ even though White still wins.
Personally, I am quite pleased with the results of our research into computational
chess aesthetics over the years. While such technology can be used to enhance
existing chess playing and database packages, and possibly to assist human judges
and composers, the model serves as a building block toward even better and greater
things in artificial intelligence. For instance, there is the possibility of
automatically generating compositions of higher quality than before  and
publishing the very first books of chess compositions to have been ‘written’
or contributed to entirely by a machine.
There are also contributions to be made specifically in the AI sub-field of
computational creativity (making machines not just intelligent, but actually
‘creative’). One of the main issues in this area is the evaluation of creative
artefacts or objects produced by a computer. Who decides and how? If a chess
composition such as a problem or study may be considered such an artefact, the
new aesthetics model can be used as a more ‘objective’ or reliable means of
determination; not to mention a cost effective one. This gives researchers more
freedom to work on the computational approaches of creating creative objects
than worrying about how to evaluate them as well. This is, in fact, relevant
to a new research project Matej and I are involved in in which we hope to develop
a generic method (i.e. a widely applicable one like artificial neural networks)
that is capable of accepting information from any domain and producing objects
of creative value . Our main domain of investigation is once again, chess;
but we hope to demonstrate applicability of the generic method in at least one
other domain as well.
It is difficult to say if our research sheds any light on the psychological
aspect of chess composition evaluation. Just because the new aesthetics models
evaluates a selection of common features and focuses on 5 or 6 of the most prominent
ones, with a touch of randomness or ‘personal taste’, if you will, it does not
necessarily mean that human judges do something similar. However, humans are
biochemical ‘machines’ in as much as computers are silicon ones. Whatever the
processes used in each, the results are comparable and correlate. Does it matter
that the computer does not play chess in the way that we do? If not,
then why should it matter if it evaluates beauty as we (think we) do? If a computer
could look into our skulls and deep into our neurons as we look at its algorithms
and code, what would it say? And would it be impressed?
Solution to Figure 1
The g-pawn looks as if it will queen. However, White wins with 1.Ne4!
So if 1…g1Q+? 2.Nf2+ Qxf2+ 3.Qxf2 and White wins. Black defends with 1…Nd3.
So if 2.Qxd3 g1Q+ leaving White with insufficient material advantage to win.
2.Qf2+!! Nxf2 (if 2…Nf1 3.Qh4+ wins. 2…g1Q 3.Ng3+). 3.Ng3+! Kg1 4.Ng5,
a zugzwang with mate on the next move.
Replay all the examples above
- Mohammed Azlan Bin Mohamed Iqbal (2008). A Discrete Computational Aesthetics
Model for a Zero-Sum Perfect Information Game, Ph.D. Thesis, University
of Malaya, Kuala Lumpur, Malaysia.
- Mohammed Azlan Bin Mohamed Iqbal (2010). A Computational Approach to
Modelling Multi-Dimensional Human Aesthetic Perception (01-02-03-SF0188;
March 2010-October 2011). Other members: Dr. Harold van der Heijden (International
Judge for Chess Composition, Netherlands), Dr. Matej Guid (FIDE Master, AI
Laboratory, University of Ljubljana, Slovenia), Ali Makhmali (Research Assistant).
Ministry of Science, Technology and Innovation (MOSTI): eScienceFund.
- Azlan Iqbal (2012). Knowledge Discovery in Chess Using an Aesthetics
Approach, Journal of Aesthetic Education, Vol. 46, No. 1, pp. 73-90.
- Azlan Iqbal (2011). Increasing Efficiency and Quality in the Automatic
Composition of Three-Move Mate Problems, in Entertainment Computing -
ICEC 2011, Lecture Notes in Computer Science, Vol. 6972, pp. 186-197. Anacleto,
J.; Fels, S.; Graham, N.; Kapralos, B.; Saif El-Nasr, M.; Stanley, K. (Eds.).
1st Edition., 2011, XVI. Springer. ISBN 978-3-642-24499-5.
- Mohammed Azlan Bin Mohamed Iqbal (2012). A Computational Model of Human
Creativity in a High Complexity Domain (01-02-03-SF0240; May 2012-October
2013). Other members: Dr. Matej Guid (FIDE Master, AI Laboratory, University
of Ljubljana, Slovenia), Dr. Cameron Browne (Queensland University of Technology,
Australia; Computational Creativity Group, Imperial College, UK), Dr. Simon
Colton (Department of Computing, Imperial College, UK), Jana Krivec (Woman
Grandmaster, Department of Intelligent Systems, Jožef Stefan Institute, Slovenia),
Boshra Talebi Haghighi and Shazril bin Azman (Research Assistants). Ministry
of Science, Technology and Innovation (MOSTI): eScienceFund.
Chesthetica Endgame (CEG) was developed as a research tool based on the
original Chesthetica program. It was the sole recipient of The Foreign
Special Award by the Association of Polish Inventors and Rationalizers:
Crystal Statuette as the Prize of Association at the Malaysia Technology
Expo 2012. This public version supports PGN files containing three-move
mate problems (as Chesthetica does) and also composed endgame studies
where White wins (draws were not tested to maintain experimental integrity).
In principle, all other mate-in-n (n >= 2) problems are also supported
but only the aforementioned has been experimentally verified (the rest
remain untested). In this version, the precise aesthetic scores are not
displayed because they distract from the main utility of this program,
which is to rank problems and studies based on their aesthetics. As such,
there should be at least two compositions in a PGN. Due to the stochastic
technology employed, CEG’s rankings may also change slightly each
time the database is evaluated. This is an unintended side-effect of the
approach that produced the best results experimentally. Note that individual
human judges may factor into their rankings other things such as originality
and adherence to certain composition conventions, which are not necessarily
associated with ‘visual beauty’ (as perceived by human players)
CEG runs on Windows XP SP3 and Windows 7. On Windows 7 systems, the size
of text, if changed to higher than 100%, might cause the interface of
CEG to become distorted due to DPI ‘awareness’ issues. Users
facing this problem are recommended to leave the size at 100% but to use
instead the Personalize > Windows Color > Advanced Appearance Settings
to change the font sizes of each item manually. For Windows 7 systems
that still have problems (e.g. file access issues), it may have to do
with UAC security and the workaround is to right click the program icon
and ‘Run as Administrator’. Please view the ‘Read Me’
PDF file included in the installation for information about how to use
the program. Finally, even though you may find yourself in disagreement
with the program’s evaluations, it is perhaps helpful to remember
that this is considered normal between humans, even experts.
(Download Installer: EXE,
Main Interface of Chesthetica Endgame
|| Azlan Iqbal received the B.Sc. and M.Sc. degrees in computer science
from Universiti Putra Malaysia (2000 and 2001, respectively) and the Ph.D.
degree in computer science (artificial intelligence) from the University
of Malaya in 2009. He has been with the College of Information Technology,
Universiti Tenaga Nasional since 2002, where he is senior lecturer. He is
a member of the IEEE and AAAI, and chief editor of the electronic Journal
of Computer Science and Information Technology (eJCSIT). His research interests
include computational aesthetics and computational creativity in games.
|| Harold van der Heijden (1960) finished his HBO-B (university of
applied sciences) study in Biochemistry in 1981. He has been a research
technician in a veterinary institute (GD, Animal Health Service) in the
Netherlands since 1982. In 2009 he obtained a Ph.D. degree from the veterinary
faculty of the Utrecht University. Since 2010 he is heading the research
and development laboratory of GD. In the domain of endgame studies, he has
been active as a collector (largest endgame study collection of the world),
writer (three books), main editor of EBUR (1993-2006) the magazine of the
Dutch/Flemish endgame circle ARVES, and main editor of the international
magazine EG (since 2007). He obtained the title of international judge of
endgame studies from FIDE in 2001, and organized and judged dozens of endgame
study tourneys. As of 2012, he is also FIDE master of chess composition.
|| Matej Guid received his B.Sc. (2005) and Ph.D. (2010) degrees
in computer science from the Faculty of Computer and Information Science
at the University of Ljubljana, Slovenia. He is a researcher at the Artificial
Intelligence Laboratory, University of Ljubljana. His research interests
include heuristic search, computer game-playing, automated explanation and
tutoring systems, and argument-based machine learning. Chess has been one
of his favorite hobbies since childhood. He was also a junior champion of
Slovenia a couple of times, and holds the title of FIDE master. Website.
|| Ali Makhmali completed his B.Sc. degree in computer science in
Universiti Tenaga Nasional, Malaysia (2009). He is now completing his M.
Sc. (Artificial Intelligence) in the same university and for 18 months served
as research assistant to Azlan Iqbal under the eScienceFund research grant
(01-02-03-SF0188). His main tasks under the project were to program and
test CHESTHETICA EG. His research interests include computational aesthetics
in computer games, Web development, and computer programming. E-mail.
Articles by/about the authors
||Can computers be made to appreciate beauty?|
02.09.2009 – Or at least to identify and retrieve
positions that human beings consider beautiful? While computers may be
able to play at top GM level, they are not able to tell a beautiful combination
from a bland one. This has left a research gap which Dr Mohammed Azlan
Mohamed Iqbal, working at Universiti Tenaga Nasional, Malaysia, has tried
to close. Here's his delightfully
interesting PhD thesis.
||76,132 studies – It's the thought that counts
02.03.2012 – If you looked at a study for five
minutes, eight hours a day, it would take you over three years to go through
all the endgame studies that Harold van der Heijden has collected in his
database: 85% of all studies ever composed. He worked for fifteen years
on the project, but also managed to have a family, do a doctorate, and
work as a research scientist. Steve Giddins tells us his
||Using chess engines to estimate human skill
11.11.2011 – In sports and games, rating systems
are a widely accepted method for estimating skill levels of the players.
They are based on the outcome of direct competitions only. Matej Guid
and Ivan Bratko of the University of Ljubljana propose a different approach:
to assess skill level at chess by applying chess engines to analyse positions
and moves played. Interesting
||Computers choose: who was the strongest player?
30.10.2006 – Who is the best chess player of
all time? Entire books have been devoted to the subject, but all have
one major flaw: they are mainly subjective. Necessarily so, since there
is no direct way of comparing Morphy to Fischer, Lasker to Kasparov. Or
is there? Two scientists from Slovenia try it with computers and statistics.
results might surprise you.