A computer program to identify beauty in problems and studies

by Azlan Iqbal
12/15/2012 – Computers today can play chess at the grandmaster level, but cannot tell a beautiful combination from a bland one. In this research, which has been on-going for seven years, the authors of this remarkable article show that a computer can indeed be programmed to recognize and evaluate beauty or aesthetics, at least in three-move mate problems and more recently endgame studies. Fascinating.

What makes problems and studies beautiful?
A computer program takes a look.

By Azlan Iqbal (with Harold van der Heijden, Matej Guid and Ali Makhmali)

At a Glance

Computers today can easily play chess at the grandmaster level but they cannot tell a beautiful combination or move sequence from a bland one. We humans tend to appreciate beauty or aesthetics in the game almost as much as winning itself. A curious combination or spectacular sacrifice will often gain our attention and praise, even centuries hence. In this research, which has been on-going for seven years, we show that a computer can indeed be programmed to recognize and evaluate beauty or aesthetics in (at least) three-move mate problems and more recently endgame studies. The computational aesthetic evaluations for these domains are experimentally-validated and correlate positively and well with domain-competent human assessment. This technology therefore presents us with the ability to data-mine beautiful sequences from databases containing millions of sequences (too large to be explored by human eyes), and also assist human judges in composition tourneys.

Though the computational approach may seem different or even inadequate compared to how human experts say they evaluate beauty, the results are comparable. Just as computers play chess in a way quite unlike how humans do, they appear to evaluate beauty differently as well. We provide a freely downloadable public version of the award-winning aesthetics evaluation program, Chesthetica Endgame (see below). While this version does not feature the precise aesthetic scores that we used in our experiments, it will rank aesthetically the sequences in a (PGN) database of three-movers or studies in a way not unlike a human expert. In this article, we also briefly consider deeper aspects of beauty perception in the game, some directions for future work and certain wider implications computational aesthetics research has beyond the domain of chess and artificial intelligence (AI).

Dr. Mohammed Azlan Bin Mohamed Iqbal (or Azlan Iqbal, for short) – pictured above receiving his Ph.D. degree at the University of Malaya's convocation ceremony in 2009 – is 34 years old and also has a bachelor's and a master's degree in computer science from Universiti Putra Malaysia. For the last ten years he has worked as a lecturer and senior lecturer in the College of Information Technology, Universiti Tenaga Nasional (Selangor, Malaysia). Four of those were spent as a part-time student, doing his Ph.D. in computer science – specifically in the area of computational aesthetics, a sub-field of AI. Presently, he also does research in the area of computational creativity. Azlan doesn't have an official chess rating, but he puts himself, conservatively, on the level of an infrequent club player. He also enjoys reading books on the many different sciences and is a casual piano player.

Introduction

Chess players are likely to have at least heard of the world of chess composition. This includes, for example, mate-in-two (#2) and mate-in-three (#3) problems, and also endgame studies which are typically a bit longer and do not necessarily end in mate. Chess compositions – to use a collective term – can be considered ‘works of art’ because they are intentionally designed to feature unexpected moves, themes or ideas that human players find appealing or beautiful in ways that are not always easy to put into words. The following is an example of what we mean by ‘beauty’ in the game. The solution can be found at the end of the article.

A. S. Gurvich, Bakinski Rabochi, 1927

White to play and win

Figure 1: A classic example of beauty in chess

It was this feeling that inspired me in late 2005 to undertake the topic of chess aesthetics for my doctoral dissertation [1]. I wanted to develop a computational aesthetics model for mate-in-three problems; perhaps the most common type of composition. White always won (checkmate) in three moves, against any defence. Since even back then computers could already beat the best human players, why couldn’t they also ‘appreciate’ or at least ‘recognize’ beauty in move sequences like we do? Here was a research gap that should be filled, I thought. I demonstrated experimentally (for the first time) that a computer could indeed identify beauty in three-movers and do so in a way that correlated positively and well enough with competent human player aesthetic assessment.

Not long after, I was contacted by endgame study expert Harold van der Heijden (who had read one of my research papers) and he suggested that I look into adapting or extending my three-mover aesthetics model to endgame studies. I thought it was an interesting challenge that could further validate the model, so I applied for a research grant for it. Matej Guid, an AI researcher and FIDE Master player, also joined the team as we realized our research interests overlapped. We were successful in obtaining a modest grant [2], and together with our research assistant, worked on the project for about 18 months. Our results are published in full in:

Azlan Iqbal, Harold van der Heijden, Matej Guid and Ali Makhmali (2012). Evaluating the Aesthetics of Endgame Studies: A Computational Model of Human Aesthetic Perception, IEEE Transactions on Computational Intelligence and AI in Games: Special Issue on Computational Aesthetics in Games, Vol. 4, No. 3, pp. 178-191. ISSN 1943-068X. e-ISSN 1943-0698. Link

In this article, we will attempt to summarize for our readers what we did and roughly how. We have intentionally left out much of the technical details and specifics to accommodate a broader audience; however, interested readers may choose to obtain the final, published version of the article above – available through most university libraries worldwide – and refer also to the associated references within it for more complete information. At the end of this article, and in the spirit that new technology should be made available to the public as soon as possible, we have included a download link to the Chesthetica Endgame (CEG) computer program installer package. It can be used to rank the aesthetics of endgame studies (White to play and win), three-movers – and as preliminary, presently unpublished research would suggest – two-movers as well. Single moves and whole games remain of lesser interest and uninvestigated for now.

Advancing from the Three-Mover Model to the Studies Model

The characteristics of a beautiful composition are not explicitly defined – perhaps it is impossible for humans to do so – but they are described in sufficient clarity in several publications that date as far back as the early 20th century. These characteristics or features include, for instance, the use of chess themes (e.g. pin, fork), pieces that travel longer distances across the board, the fruitful sacrifice of material, and the ‘economical’ use of pieces to achieve a particular objective, to name just a few. Some chess themes are not easily defined and can be quite challenging to detect computationally; in one case this lead to the discovery of the ‘invisible fork’ [3] by Chesthetica, the very first experimentally-validated chess aesthetics recognition computer program (it analyzed only #3 problems).

In the original three-mover aesthetics model (the essence of my doctoral dissertation), seventeen features like the ones just mentioned were identified based on the literature and painstakingly formalized, i.e. represented using mathematical formulae dependent on the logic of the game. These formulae, in summation, were used to derive a crisp numerical score for a problem that represented its perceived aesthetics or beauty. Surprisingly to some in the field of artificial intelligence (AI) and outside, no ‘machine learning’ was required. Experimentally, I tested the model’s ability to differentiate between domains (forced three-movers taken from real games and #3 compositions) and its inability to differentiate within each domain. This means it was predicted that, aesthetically, on average, the compositions would score significantly higher than the real game sequences, but the real game sequences would score similarly to each other and so would the compositions. These predictions proved true.

The computer’s scores were also tested against average human player aesthetic ratings (using an online survey of hundreds of respondents from an online chess community) and they correlated positively and well with them. It is perhaps worthwhile to note that experienced human composers and regular (competent) players usually have different ideas about what constitutes ‘beauty’ in a move sequence. Composers tend to focus on what we call ‘depth appeal’ (e.g. a deep idea that emerges after careful thought and analysis) whereas the majority of players are more easily impressed with ‘visual appeal’ (e.g. a manoeuvre that looks good and can be understood quickly). The computational aesthetics model is more suited to the latter category; though a case could be made for either type as to which constitutes ‘true’ beauty, without ignoring the spectrum in between. Two contrasting examples are provided in Appendix A of our IEEE paper cited above. Two new contrasting examples are provided here; the study and its accompanying explication is courtesy of Harold.

Sergyi Didukh (Ukraine) & Siegfried Hornecker (Germany)
1st Honourable Mention Olympia Dunyasi 2010


White to play and draw

Figure 2: An example of ‘depth appeal’

At first sight the only plausible white move here is 1.bxa3? but this loses as we will see later. The right key move is 1. g6! forcing 1…hxg6 and now 2…bxa3. Both sides rush their pawns to queen: 2…g5 3.a4 g4 4.a5 g3 5.a6 g2 6.a7 g1Q 7.a8Q Qg8+ 8.Kb7 Qxa8 9.Kxa8 Kc6 10.Ka7! Kb5 11.a4+! Kxa4 (11…Kxb4 12.Kb6 Kxa4 13.Kc5) 12.Kb6! Kxb4 13.Kc6 Kc4 14.Kd6 Kd4 15.Ke6 Ke4 16.Kf6 Kf4. Here Black has a black pawn at h6. When White had played 1. bxa3 immediately, the solution runs parallel, after which the black pawn would have been at h7 instead! That is a major difference, because in the solution White can now play 17.Kg6 drawing, while in the try 16.Kg7 fails to 16…h5.

Pollock - Consultants, 1893

White to play and win

Figure 3: An example of ‘visual appeal’

In Fig. 3, we have a clear example of visual appeal where White wins through a forced series of moves, albeit not necessarily immediately obvious ones. The winning sequence is as follow: 1.Qd7+ (a queen sacrifice) Bxd7 2. Nd6+ (a discovered/double check so the king must move) Kd8 3.Nf7+ (a clean fork of king and rook even though that is hardly the main objective here) Kc8 4.Re8+ (a clearance sacrifice) Bxe8 5.Rd8#. A sequence like this does not require much effort or specialized knowledge beyond the basic rules of the game to understand or appreciate.  

Despite the experimental validations of the original model, there seemed to be room for improvement beyond three-movers. In adapting the three-mover model to endgame studies, we had to modify a few of the formalizations in such a way that worked for endgame studies, but was still ‘backward compatible’ with three-movers. We also learned that not all seventeen aesthetic features need to be summed, but rather only a selection of the highest-scoring five or six. Perhaps most curious was the seemingly necessary inclusion of a stochastic element (i.e. some randomness) in choosing those five or six. Without some randomness, the experimental results were still reliable, but not the best we could get. Furthermore, the probability of choosing five or six needed to be split 20-80 or a 20% chance of selecting the top five features and an 80% chance of selecting the top six features. Anything else just did not seem to work as well. The statistical approach we used to determine this relates to the expected ‘normality’ of the distribution of aesthetic scores and what is considered an acceptable positive correlation strength with human assessment; it is explained in Appendix B of our IEEE paper.

We performed similar experiments as was done with the original three-mover model but were faced with the difficulty of finding analogous sequences in real games that were similar enough to composed studies to be compared with them. Our study expert, Harold van der Heijden, was indispensable in ensuring the real game sequences we selected were as close as could be to studies, as forced three-movers from real games were to three-move problems. The following is an example of a ‘study-like’ sequence taken from a real game.

Meduna, Eduard (2450) vs. Schoeneberg, Manfred (2390),
Leipzig BKL, 1981


White to play and win

Figure 4: A ‘study-like’ sequence taken from a real game

This is the position after 62…Rg8-c8+? (other moves, like 62…Rg1, would have drawn). 63.Nc4 (threatens 64.Ra5 mate) 63…Rc5! 64.Re6!! (threatens 65.Ra6+ and mate. But the similar 65.Re8 would have failed to 65…Kb5) 64…Rc7 (now 64…Kb5 65.Rb6+ Ka4 66.Rb4 mate. 64…Rh5 65.Ra6+ Kb5 66.Ra5+ wins the rook) 65.Rb6 (threatens 69.Rb4 mate; an artistic mate picture) 65…Rb7! 66.Nb2+ (avoiding 66.Rxb7 stalemate) and Black resigned.

The adapted model could now differentiate between (but not within) domains for studies (three-movers were also tested again with success). However, since the appreciation of studies is somewhat esoteric, we could not perform an online survey with chess players and simply lacked the funds to elicit the cooperation of many experienced composers. So instead we had our experts and team members Harold and Matej represent the viewpoints of expert composer and player, respectively. They were asked to rate aesthetically 30 randomly-selected composed studies and 30 real-game sequences that resembled studies to be compared against the computer’s evaluations. Correlation with each of them was positive and good; better than the original three-mover model achieved. Correlation with their average ratings (both Harold and Matej combined, divided by two) was even better. Combined with the experimental validations of the original three-mover model, the new model – which could also evaluate three-movers – looked even more convincing.

It suggested that a computer could indeed be used, at the very least, to draw attention to potentially ‘interesting’ or beautiful studies and problems. Considering the hundreds of thousands – perhaps millions – out there, one benefit is obvious. Computationally, it would seem to break new ground in computer chess, if nothing else. An interesting aspect of the new aesthetics model is that the score it produces for a problem or study now is no longer ‘crisp’ or fixed as before (due to the necessary stochastic element). This means that, given a ‘second look’, the computer program could rate the problem or study slightly differently. Individually, this represents a negligible difference in scores but in a large collection of compositions or a small one in which some are of similar aesthetic value, overall rankings can be changed. The precise scores themselves are therefore not as important.

We would probably be faced with the same problem if a human judge were asked to ‘re-rank’ a collection of studies, or if several judges were used for this purpose. Following that, we cannot always expect to agree with their decision(s), even if they bothered to explain themselves. In our IEEE paper, we put the famous Saavedra position ‘under the microscope’ and based on the computer’s evaluation, it would seem that it is not quite as beautiful on its own merits as most of us would like to think. At first, this came as a surprise even to us because the study is very well known. The computer could not explain itself – it certainly ‘knows’ nothing about the study to begin with – but knowing how the aesthetics model works, we could understand its decision. For one thing, the main line of the Saavedra is not quite ‘forced’ even though White still wins.

Conclusions

Personally, I am quite pleased with the results of our research into computational chess aesthetics over the years. While such technology can be used to enhance existing chess playing and database packages, and possibly to assist human judges and composers, the model serves as a building block toward even better and greater things in artificial intelligence. For instance, there is the possibility of automatically generating compositions of higher quality than before [4] and publishing the very first books of chess compositions to have been ‘written’ or contributed to entirely by a machine.

There are also contributions to be made specifically in the AI sub-field of computational creativity (making machines not just intelligent, but actually ‘creative’). One of the main issues in this area is the evaluation of creative artefacts or objects produced by a computer. Who decides and how? If a chess composition such as a problem or study may be considered such an artefact, the new aesthetics model can be used as a more ‘objective’ or reliable means of determination; not to mention a cost effective one. This gives researchers more freedom to work on the computational approaches of creating creative objects than worrying about how to evaluate them as well. This is, in fact, relevant to a new research project Matej and I are involved in in which we hope to develop a generic method (i.e. a widely applicable one like artificial neural networks) that is capable of accepting information from any domain and producing objects of creative value [5]. Our main domain of investigation is once again, chess; but we hope to demonstrate applicability of the generic method in at least one other domain as well.

It is difficult to say if our research sheds any light on the psychological aspect of chess composition evaluation. Just because the new aesthetics models evaluates a selection of common features and focuses on 5 or 6 of the most prominent ones, with a touch of randomness or ‘personal taste’, if you will, it does not necessarily mean that human judges do something similar. However, humans are biochemical ‘machines’ in as much as computers are silicon ones. Whatever the processes used in each, the results are comparable and correlate. Does it matter that the computer does not play chess in the way that we do? If not, then why should it matter if it evaluates beauty as we (think we) do? If a computer could look into our skulls and deep into our neurons as we look at its algorithms and code, what would it say? And would it be impressed?

Solution to Figure 1

The g-pawn looks as if it will queen. However, White wins with 1.Ne4! So if 1…g1Q+? 2.Nf2+ Qxf2+ 3.Qxf2 and White wins. Black defends with 1…Nd3. So if 2.Qxd3 g1Q+ leaving White with insufficient material advantage to win. 2.Qf2+!! Nxf2 (if 2…Nf1 3.Qh4+ wins. 2…g1Q 3.Ng3+). 3.Ng3+! Kg1 4.Ng5, a zugzwang with mate on the next move.

Replay all the examples above

References

  1. Mohammed Azlan Bin Mohamed Iqbal (2008). A Discrete Computational Aesthetics Model for a Zero-Sum Perfect Information Game, Ph.D. Thesis, University of Malaya, Kuala Lumpur, Malaysia.
  2. Mohammed Azlan Bin Mohamed Iqbal (2010). A Computational Approach to Modelling Multi-Dimensional Human Aesthetic Perception (01-02-03-SF0188; March 2010-October 2011). Other members: Dr. Harold van der Heijden (International Judge for Chess Composition, Netherlands), Dr. Matej Guid (FIDE Master, AI Laboratory, University of Ljubljana, Slovenia), Ali Makhmali (Research Assistant). Ministry of Science, Technology and Innovation (MOSTI): eScienceFund.
  3. Azlan Iqbal (2012). Knowledge Discovery in Chess Using an Aesthetics Approach, Journal of Aesthetic Education, Vol. 46, No. 1, pp. 73-90.
  4. Azlan Iqbal (2011). Increasing Efficiency and Quality in the Automatic Composition of Three-Move Mate Problems, in Entertainment Computing - ICEC 2011, Lecture Notes in Computer Science, Vol. 6972, pp. 186-197. Anacleto, J.; Fels, S.; Graham, N.; Kapralos, B.; Saif El-Nasr, M.; Stanley, K. (Eds.). 1st Edition., 2011, XVI. Springer. ISBN 978-3-642-24499-5.
  5. Mohammed Azlan Bin Mohamed Iqbal (2012). A Computational Model of Human Creativity in a High Complexity Domain (01-02-03-SF0240; May 2012-October 2013). Other members: Dr. Matej Guid (FIDE Master, AI Laboratory, University of Ljubljana, Slovenia), Dr. Cameron Browne (Queensland University of Technology, Australia; Computational Creativity Group, Imperial College, UK), Dr. Simon Colton (Department of Computing, Imperial College, UK), Jana Krivec (Woman Grandmaster, Department of Intelligent Systems, Jožef Stefan Institute, Slovenia), Boshra Talebi Haghighi and Shazril bin Azman (Research Assistants). Ministry of Science, Technology and Innovation (MOSTI): eScienceFund.

Chesthetica Endgame

Chesthetica Endgame (CEG) was developed as a research tool based on the original Chesthetica program. It was the sole recipient of The Foreign Special Award by the Association of Polish Inventors and Rationalizers: Crystal Statuette as the Prize of Association at the Malaysia Technology Expo 2012. This public version supports PGN files containing three-move mate problems (as Chesthetica does) and also composed endgame studies where White wins (draws were not tested to maintain experimental integrity). In principle, all other mate-in-n (n >= 2) problems are also supported but only the aforementioned has been experimentally verified (the rest remain untested). In this version, the precise aesthetic scores are not displayed because they distract from the main utility of this program, which is to rank problems and studies based on their aesthetics. As such, there should be at least two compositions in a PGN. Due to the stochastic technology employed, CEG’s rankings may also change slightly each time the database is evaluated. This is an unintended side-effect of the approach that produced the best results experimentally. Note that individual human judges may factor into their rankings other things such as originality and adherence to certain composition conventions, which are not necessarily associated with ‘visual beauty’ (as perceived by human players) per se.

CEG runs on Windows XP SP3 and Windows 7. On Windows 7 systems, the size of text, if changed to higher than 100%, might cause the interface of CEG to become distorted due to DPI ‘awareness’ issues. Users facing this problem are recommended to leave the size at 100% but to use instead the Personalize > Windows Color > Advanced Appearance Settings to change the font sizes of each item manually. For Windows 7 systems that still have problems (e.g. file access issues), it may have to do with UAC security and the workaround is to right click the program icon and ‘Run as Administrator’. Please view the ‘Read Me’ PDF file included in the installation for information about how to use the program. Finally, even though you may find yourself in disagreement with the program’s evaluations, it is perhaps helpful to remember that this is considered normal between humans, even experts.

(Download Installer: EXE, ZIP, 4.1 MB)


Main Interface of Chesthetica Endgame

The authors

Azlan Iqbal received the B.Sc. and M.Sc. degrees in computer science from Universiti Putra Malaysia (2000 and 2001, respectively) and the Ph.D. degree in computer science (artificial intelligence) from the University of Malaya in 2009. He has been with the College of Information Technology, Universiti Tenaga Nasional since 2002, where he is senior lecturer. He is a member of the IEEE and AAAI, and chief editor of the electronic Journal of Computer Science and Information Technology (eJCSIT). His research interests include computational aesthetics and computational creativity in games. Website.
Harold van der Heijden (1960) finished his HBO-B (university of applied sciences) study in Biochemistry in 1981. He has been a research technician in a veterinary institute (GD, Animal Health Service) in the Netherlands since 1982. In 2009 he obtained a Ph.D. degree from the veterinary faculty of the Utrecht University. Since 2010 he is heading the research and development laboratory of GD. In the domain of endgame studies, he has been active as a collector (largest endgame study collection of the world), writer (three books), main editor of EBUR (1993-2006) the magazine of the Dutch/Flemish endgame circle ARVES, and main editor of the international magazine EG (since 2007). He obtained the title of international judge of endgame studies from FIDE in 2001, and organized and judged dozens of endgame study tourneys. As of 2012, he is also FIDE master of chess composition. Website.
Matej Guid received his B.Sc. (2005) and Ph.D. (2010) degrees in computer science from the Faculty of Computer and Information Science at the University of Ljubljana, Slovenia. He is a researcher at the Artificial Intelligence Laboratory, University of Ljubljana. His research interests include heuristic search, computer game-playing, automated explanation and tutoring systems, and argument-based machine learning. Chess has been one of his favorite hobbies since childhood. He was also a junior champion of Slovenia a couple of times, and holds the title of FIDE master. Website.
Ali Makhmali completed his B.Sc. degree in computer science in Universiti Tenaga Nasional, Malaysia (2009). He is now completing his M. Sc. (Artificial Intelligence) in the same university and for 18 months served as research assistant to Azlan Iqbal under the eScienceFund research grant (01-02-03-SF0188). His main tasks under the project were to program and test CHESTHETICA EG. His research interests include computational aesthetics in computer games, Web development, and computer programming. E-mail.

Articles by/about the authors

Can computers be made to appreciate beauty?
02.09.2009 – Or at least to identify and retrieve positions that human beings consider beautiful? While computers may be able to play at top GM level, they are not able to tell a beautiful combination from a bland one. This has left a research gap which Dr Mohammed Azlan Mohamed Iqbal, working at Universiti Tenaga Nasional, Malaysia, has tried to close. Here's his delightfully interesting PhD thesis.
76,132 studies – It's the thought that counts
02.03.2012 – If you looked at a study for five minutes, eight hours a day, it would take you over three years to go through all the endgame studies that Harold van der Heijden has collected in his database: 85% of all studies ever composed. He worked for fifteen years on the project, but also managed to have a family, do a doctorate, and work as a research scientist. Steve Giddins tells us his remarkable story.
Using chess engines to estimate human skill
11.11.2011 – In sports and games, rating systems are a widely accepted method for estimating skill levels of the players. They are based on the outcome of direct competitions only. Matej Guid and Ivan Bratko of the University of Ljubljana propose a different approach: to assess skill level at chess by applying chess engines to analyse positions and moves played. Interesting academic study.
Computers choose: who was the strongest player?
30.10.2006 – Who is the best chess player of all time? Entire books have been devoted to the subject, but all have one major flaw: they are mainly subjective. Necessarily so, since there is no direct way of comparing Morphy to Fischer, Lasker to Kasparov. Or is there? Two scientists from Slovenia try it with computers and statistics. The results might surprise you.

Copyright ChessBase



Dr. Azlan Iqbal has a Ph.D. in artificial intelligence from the University of Malaya and is a senior lecturer at Universiti Tenaga Nasional, Malaysia, where he has worked since 2002. His research interests include computational aesthetics and computational creativity in games. He is a regular contributor at ChessBase News.
Feedback and mail to our news service Please use this account if you want to contribute to or comment on our news page service



Discuss

Rules for reader comments

 
 

Not registered yet? Register