Revisiting whether women play more beautiful chess

by Azlan Iqbal
9/17/2018 – Dealing with subjects such as aesthetics and gender is generally prone to criticism. After all, the concepts that are used as bases for analysis tend to compel some degree of subjectivity. Nevertheless, AZLAN IQBAL has been exploring this issue for years. After publishing a controversial article a couple of years ago, he informs us about the improvements made to his original research. The question remains the same: do women play more beautiful chess than men? | Photos: Pascal Simon / Simon Bohnenblust

ChessBase 15 - Mega package ChessBase 15 - Mega package

Find the right combination! ChessBase 15 program + new Mega Database 2019 with 7.6 million games and more than 70,000 master analyses. Plus ChessBase Magazine (DVD + magazine) and CB Premium membership for 1 year!

More...

No significant difference

About three years ago, I presented a research paper comparing the aesthetics of play in chess between men and women. I even wrote an article about it here on ChessBase. It was not received very well for a variety of reasons but that is what the science nevertheless suggested and I defended the work. Even so, I did take into account some constructive criticisms pertaining to the depth and scope of the original research. For instance, player strength (e.g. Elo rating) was not taken into account and the scope was limited to just forced three-movers. As we know, science is often only as good as the time and funding that can be afforded to it. The furore did, however, spark some interest in the question and I was able to obtain a dedicated research grant from my university in order to look somewhat deeper into the issue. I even had money now to hire a research assistant.

To sum up the findings now, the answer as to whether women play more beautiful chess than men is still no. Rather, this time around, the study regarding aesthetics of play between the genders, on average, produces the result that there is no difference. More details can be found in the full paper.

We used two books as sources. The first was Play Like a Girl: Tactics by 9 Queens by Jennifer Shahade — the only “expert-selected games by women” book we could find. The second was The Ultimate Chess Puzzle Book by John Emms, a comparable book that contained games that (incidentally) were almost exclusively played by men.

In general, the games that were included were those ending in mate (of various lengths) and a selection of ‘study-like’ positions, where the winner simply had a decisive advantage by the end of the sequence. So not just forced three-movers, as in the previous research. These sequences were processed by Chesthetica based on its aesthetics model in order to generate an aesthetics score or value for each. Readers interested in how the model works may want to read the linked paper first, but suffice to say here that the computer evaluations have been shown to correlate positively and well with domain-competent human assessment. The Elo ratings of the players at the times those games were played, if they were available, were also taken into account. Again, this was something that was not done in the previous research and a fair point of contention with regard to any grand conclusions that might have been drawn from that work.

The results

In total, 39 games between women, 40 games between women and men, and 115 games between men were able to be derived from the books as usable for our purposes. For each group, it was first determined whether the Elo ratings of the players were comparable. This meant that, on average, the Elo ratings for the White players and Black players were not different, statistically. Furthermore, across all groups, the women averaged an Elo rating of 2,394 and the men averaged an Elo rating of 2,427. This meant that, statistically, on average, players from both genders were equally strong. Playing strength was therefore ruled out as a factor. The table shows the aesthetic score comparisons.

There was no statistically significant difference between groups 1 and 3. Across all three groups there was no statistically significant difference either. Even between groups 1 and 2 there was no statistically significant difference detected. So, in conclusion, on average, the sequences taken from games between women and men were no different in aesthetic quality. This research therefore contradicts the previous research which showed that games between men were more aesthetically pleasing than those between women (within the scope of forced three-move sequences and not taking Elo ratings into account).  

In 2017, Hou Yifan deliberately lost a game at the Gibraltar Masters in protest for the high number of women she was paired against | Photo: Alina l'Ami

The new findings suggest that, in chess, both men and women have aesthetic playing preferences that are not unlike each other. Regardless, those who were critical of the previous research by claiming that the aesthetics model could not be trusted or was unreliable might want to be critical of the present research as well, since the same model was used. As for everyone else, the present work should be taken as more thorough and reliable (i.e. an update) compared to the previous findings. As always, future research in the area could, again, be different.

Links




Dr. Azlan Iqbal has a Ph.D. in artificial intelligence from the University of Malaya and is a senior lecturer at Universiti Tenaga Nasional, Malaysia, where he has worked since 2002. His research interests include computational aesthetics and computational creativity in games. He is a regular contributor at ChessBase News.
Discussion and Feedback Join the public discussion or submit your feedback to the editors


Discuss

Rules for reader comments

 
 

Not registered yet? Register

lajosarpad lajosarpad 9/22/2018 10:00
@Azlan

I have nothing against trying to find out more about such philosophical concepts as beauty and I think mathematics and applied mathematics is the best thought school to move forward, but due to the lack of a standard definition for beauty, which comes from common sense I think factual knowledge about the essence of the question is next to impossible to be gathered without preliminary studies of what beauty is. Whatever the expert says helps you have an idea, but when there is a contradiction between your taste and what the expert says, then either the expert is not an expert, or the concept is too broad to cover all tastes. However, I find your approach to be very constructive and important if we define the results accurately: based on the opinion of experts you have constructed a product, which, based on empirical knowledge of a subset of the set of subjects constructed a hypothesis which seems to be true. I think this is the accurate way to describe the state of affairs.

@dumkof

I remember that game. It was fantastic. I think we need several objective criteria to cover several possible tastes. If I want to see beautiful chess games and I like brutal attacks and elegant defenses, then it would be awesome to have a system which automatically finds games which are beautiful in those terms. And if you consider preciseness to be more beautiful, then it would be awesome to find precise games for you. However, here we have a problem: engines are not perfect either and they may consider a move precise and an alternative imprecise, when the invert is the truth.
dumkof dumkof 9/21/2018 10:31
@Lajosarpad

Here is the Chessbase article of that game:

https://en.chessbase.com/post/game-8-leko-wins-to-take-the-lead
dumkof dumkof 9/21/2018 10:17
@Lajosarpad

It was game 8.
Leko played black and his historical move was 26 ... Bxf3!!
It's surprising that even today's top engines need some time to find this move. The evaluation score changes dramatically, once it finds the correct move. A scientifically proven beauty! :)

"Beauty" is a matter of taste, it's very subjective. Tastes are not debatable. But Chess is pure objective and mathematical. That's why we need some objective criteria, like engine correlation, in order to call a move/game "beautiful".
azlan azlan 9/21/2018 01:37
@lajosarpad:

“I experienced this in chess and chess problems as well and I am not sure relying on the experts will represent the whole subculture of chess. In fact I am skeptic, but, of course, with facts I am convincable.”

The issue is, in academic work, expert validation is essentially a requirement or at least takes precedence. But I wholly agree with you that even weaker players and non-master composers (myself certainly included) can and do appreciate beauty/aesthetics in the game. I wouldn’t have done any of this kind of research otherwise (including my PhD on the topic) because enough interest simply wouldn’t have been there.

“I still think that the subjectivity of the concept of beauty and the difficulty to measure it makes this task very difficult. You need to use every information you can get about the game to improve precision, as precision is very very difficult, when it is not clear what we need to correlate to.”

I think you will find that the experiments conducted and explained in my thesis and the other paper provides some empirical grounding about the concept of beauty in chess. Generally, in science, when a computational model can be used to make correct predictions in the domain, it is considered validated (though not necessarily perfectly so, as nothing in science is or can be).

“Even body beauty as a concept changes over time. In ancient times fat ladies were considered to be most beautiful, now the word "fat" is an insult. You are probably right when you state that some patterns were considered to be beautiful over time, but can we be sure that those patterns will be considered to be beautiful in the future (if there is a future for humanity) as well, or is it possible that the given patterns were not devalued due to mere chance? I think none of us will answer that question.”

Our DNA and brains, as a result, probably come preloaded with “wetware” regarding many things. Even newborn babies have been found to prefer more "beautiful" faces. Yes, there is room for change and refinement through evolution over the ages but if that pattern of change can be modeled as well, I don’t see why it cannot be used to adapt any kind of aesthetics model accordingly too. I think no one will really bother to answer that question, however, as there are far more pressing issues in science competing for funding these days (and probably in the future too). Then again, it could be just the kind of thing that leads to a major breakthrough in some other field, which tends to happen in science quite often.

“Anyway, it is a good idea to try to convert beauty into algorithms and I am glad my skepticism was taken well. If I read your thesis and if I have some suggestions or observation, then I will surely send them to you.”

Sure, I appreciate it.
azlan azlan 9/21/2018 01:37
@lajosarpad:

“You stated that the ratings are not relevant aside rfom ensuring that the women were not weaker than the men. This is a self-defeating statement, since if a rating level difference between the genders would have distorted the results, then players of different strength play differently in terms of aesthetics in general and then the question of how beautifully men and women play in the different rating ranges.”

The ratings were kept equal this time to rule out the possibility that playing strength could affect the aesthetic quality of a person’s play. Previously, when men were found to have a higher aesthetic quality of play, on average, than women, it was pointed out that perhaps the women were simply weaker and therefore “obviously” couldn’t play as well or as beautifully. It’s a fair point of contention. And yes, it would be yet another research question (and project) to find enough valid game sequences to analyze in order to determine how, exactly, being a stronger player correlates with a higher aesthetic quality of play. My suspicion is that there is a cut-off point.

“Will the answer of how beautifully play the different genders be having the same answer when we talk about players around 1800 as in the case you have analyzed, when the players were around 2350?”

I suspect there is a range where aesthetics plays a role and the result would be the same, i.e. no difference between the genders (my guess is starting around 1,600 to about 2,400). Below that the players aren’t “competent” enough and above that aesthetics tends to take a back seat to practicality (i.e. precise play and winning at all costs). Chess engines, for example, playing at the 3,000+ level may not play aesthetically at all by our human standards. All this is really just guesswork at this point. A new study or several new studies will be required just to answer this. It does not affect the findings of the present work, however. One may limit one’s conclusions (if any) to players at around the 2,400 range if they like since that was the range of players to which the expert-selections (in the books) just so happened to be around and available for testing.

“Unless the aesthetics displayed by 1800 players are analyzed as well, this question is unanswerable and any statement which renders the rating unimportant beside the fact that there is no significant difference between genders in terms of strength is a premature assumption.”

Again, one may limit any conclusions to around the 2,400 range which happened to be what was tested. I should point out once more that the kind of data required (i.e. expert-selected game sequences *with* player ratings *and* of the #3, #4, #5, study-like variety) is not as easily available as one might assume. I was lucky the first book (Jennifer’s) existed or even this study could not have been done at all. (continued...)
lajosarpad lajosarpad 9/21/2018 11:04
@azlan I am sure you will agree that there is a difference between what the engine can and what the engine does. I am discussing the way you and your engine analyzed the aesthetics of a game. Of course the implementation is probably capable of being used differently either out of the box or with small changes. The ratings of the players might have been relevant. You stated that the ratings are not relevant aside rfom ensuring that the women were not weaker than the men. This is a self-defeating statement, since if a rating level difference between the genders would have distorted the results, then players of different strength play differently in terms of aesthetics in general and then the question of how beautifully men and women play in the different rating ranges. Will the answer of how beautifully play the different genders be having the same answer when we talk about players around 1800 as in the case you have analyzed, when the players were around 2350? Unless the aesthetics displayed by 1800 players are analyzed as well, this question is unanswerable and any statement which renders the rating unimportant beside the fact that there is no significant difference between genders in terms of strength is a premature assumption. It is certainly an improvement to rely on the subjectivity of several other people than to your own subjectivity, the improvement is that more people are involved.

I were present at some wine tastings and there were expert tasters, who knew their stuff. It was in fact amusant, since they really liked some prize winning wines, while all the other noobs, such as myself found it difficult to drink the best wine. I experienced this in chess and chess problems as well and I am not sure relying on the experts will represent the whole subculture of chess. In fact I am skeptic, but, of course, with facts I am convincable. I still think that the subjectivity of the concept of beauty and the difficulty to measure it makes this task very difficult. You need to use every information you can get about the game to improve precision, as precision is very very difficult, when it is not clear what we need to correlate to.

Even body beauty as a concept changes over time. In ancient times fat ladies were considered to be most beautiful, now the word "fat" is an insult. You are probably right when you state that some patterns were considered to be beautiful over time, but can we be sure that those patterns will be considered to be beautiful in the future (if there is a future for humanity) as well, or is it possible that the given patterns were not devalued due to mere chance? I think none of us will answer that question.

Anyway, it is a good idea to try to convert beauty into algorithms and I am glad my skepticism was taken well. If I read your thesis and if I have some suggestions or observation, then I will surely send them to you.

@Dumkof I think you deem more aethetic precise chess than romantic chess. I am sure there are others sharing your taste as well, but I know factually that there are other tastes as well. I think every popular taste of chess beauty is worthy to be approached scientifically. I wonder which win do you speak of in the Kramnik - Lékó match? The fifth or the eighth game?

@Pieces in Motion Do you have any proof to your statement that men are smarter than women?
Pieces in Motion Pieces in Motion 9/21/2018 05:29
Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz, much ado about nothing. It's biological, women are not as smart as men and hence don't play Chess as well. Look at reality to see the truth and not deny it by wasting time on pointless research.
azlan azlan 9/20/2018 01:30
@dumkof: Points taken. There is also likely to be a point that “perfect solutions” by computer (perhaps 20 or 30 moves deep and that meet all your criteria) are even beyond human conceptions of beauty. I suppose we have to draw the line somewhere, as humans. I believe I did mention in my thesis that even being exposed to minor flaws in a “beautiful” sequence from a game between humans (i.e. flaws that “cook” it, even somewhat) can take away from our aesthetic appreciation of it but this is something we have to live with if we are ever to consider sequences from real games between humans as beautiful (given that we are inherently fallible creatures, unlike computers). One approach I used in my experimentation (to validate the model) was to ensure that when comparing real-game sequences against human compositions, the real-game sequences used were sound computationally as well (i.e. they didn’t have alternate or shorter solutions).
azlan azlan 9/20/2018 01:30
@lajosarpad: As I mentioned, the aesthetics model can indeed evaluate any sequence of moves, be it in the opening, middle or endgame. The sequences used in the present research were what they were, as selected by the authors of the books used; both of whom are grandmasters (those are whom I meant; not the players, whom were indeed experts too but not necessarily grandmasters). The player ratings were actually not relevant aside from ensuring that the women were not weaker, on average, than the men. Again, the selection by the authors of the books (i.e. the grandmasters) was to rule out bias by me, assuming I was the one who selected the games from their books based on my personal taste (which I didn’t do). Presumably the authors who wrote the puzzle books above exercised some objectivity when making their selections for public consumption.

I merely worked with what they chose, given their clear expertise in the domain. There were many mate in 1 and mate 2 sequences which we couldn’t use, for example, because Chesthetica only evaluates mates in 3, 4 and 5 moves, and study-like constructs/sequences. The point is that this is a distinct improvement over the earlier work that only looked at mate-in-3s and is a better (even though not perfect) representation of aesthetics in human play. In addition, the present research ensured the women were not weaker than the men (i.e. the significant of their Elo ratings); again, a distinct improvement over the earlier work.

I see nothing wrong, in principle, with the aesthetics model being applied to *every* move in a game for some kind of “total” or “fuller” aesthetics score but as I mentioned, there would be no precedent by humans and the results, for that reason, would be deemed even more unreliable than just evaluating expert-selected sequences in games.

"The concept of beauty changes over time. Complicated, risky play was viewed to differently before the computer age. Now, chess fans with their engines immediately scream "blunder" when their engine changes the evaluation of the position. The same is true with some positional elements. Now we can appreciate a lot of positional elements people were not really aware of before the pioneers of positional chess, like Steinitz. And so on."

Perhaps, but there are likely basic aesthetic principles that remain the same. Not unlike particular body shapes/measurements and facial features that humans, in general, have been scientifically shown to find attractive across time and across cultures.

"Are you sure that this surprise element is needed for beauty to be found? It is possible that you already have a win, but you execute it very very nicely. I think your definition above is very restrictive."

No, it’s not “necessary”. I just gave that as an example of beauty not being entirely in the eye of the beholder (philosophically appealing as that may be). As I mentioned, you should look at all the aesthetic principles and themes (and their mathematical formalizations) in my thesis and the paper linked to earlier. Individually, they may not even make perfect sense to any individual reader or player but collectively, and on average, somehow, they withstood experimental validation quite well (several different types of experiments, in fact).

"I will try to allocate time to read the thesis."

Please do, if you can, and I appreciate your interest in the subject. I spent quite a few years on it myself. You should also know that my focus is now computational creativity and I have kind of moved on from computational aesthetics (except perhaps maybe to simply “apply” the model in research such as this). Your suggestions, nevertheless, are fair for the development of a perhaps improved model in the future (by someone else building upon my work or maybe even taking an entirely different approach). They would still need to validate it experimentally, however, as I did.
dumkof dumkof 9/20/2018 12:07
Some games from the romantic era are considered to be "beautiful", only because of all the bloody action and sacrifices on the board, but when you go through these games with an engine, you would notice dramatic fluctuations in evaluation. It's mostly the huge blunders of the weaker player, that allow such "beautiful" games.

A game should be considered as beautiful only then, when a game both involves nice (and deep) ideas and is being played nearly inaccurate by both players (according to the engine). So, it takes both sides to produce a beautiful game.

Some moves being played by humans are so deep, that even engines manage to realize them only a few moves later, but not initially. Games involving such deep moves are beautiful. For example the win of Leko over Kramnik in thir world championship match involved such a deep move. The immortal Kasparov - Topalov game also involved a series of such deep moves, which could not be correcty evaluated by the engine initially.
lajosarpad lajosarpad 9/20/2018 10:52
@azlan

By end of the game I mean end of the game, it might be a study-like position in the middlegame where we know the result, or even a mate in the opening. By the term "end of the game" I meant the latter part of the game, where your system knows the evaluation. In the article we can read the average ÉLŐ of the players, so I think calling the players collectively grandmasters is an exageration, but they are experts nevertheless.

Earlier moves contribute to the aesthetics. If I play something super solid, then my approach is probably safety first. If I play the Albin Countergambit, or the Fried Liver Attack, then there is a higher chance that I will reach a more aesthetic position for those who like complications and paradoxes. I do think that risky play with the intention of reaching a very interesting position should be rewarded by an aesthetic heuristic, instead of assuming backwards the intention from the result.

"The focus in my aesthetics research has therefore always been on the actual sequences deemed aesthetic by the experts. "

Ok, now I understand how you implemented it.

I think the lack of analysis for all the moves by human experts has a very specific reason: the limitations of a human to digest many positions, so they filter the moves. If I am right in my assumption, then giving some kind of heuristic point to the moves would be an improvement.

The concept of beauty changes over time. Complicated, risky play was viewed to differently before the computer age. Now, chess fans with their engines immediately scream "blunder" when their engine changes the evaluation of the position. The same is true with some positional elements. Now we can appreciate a lot of positional elements people were not really aware of before the pioneers of positional chess, like Steinitz. And so on.

"Beauty is, by its nature, to be found only in a win or a draw when all seemed lost."

Are you sure that this surprise element is needed for beauty to be found? It is possible that you already have a win, but you execute it very very nicely. I think your definition above is very restrictive.

I will try to allocate time to read the thesis.

I am happy to learn that you actually collected preference data from players and fans. I still think a site where users could choose their preferences would be superior, but I am perfectly aware that it would mean additional work to be done. Nevertheless, this is a constructive idea.

This is indeed not the place for the math part, we would exclude other participants from this discussion.

Also, it might be a good idea to categorize preference settings, like 'aggressive preference', 'positional preference', 'precision preference' and so on. You would need to only change the weights of the given criterias and the result would be more personalized, that is, virtually every player would be able to find his or her preference.
azlan azlan 9/19/2018 01:28
@lajosarpad: Many of the sequences from the books used in the present research were not really from the endgame (if that’s what you meant by “end of the game”). They were sequences chosen by the authors (who are grandmasters) and some were also from the middle-game. One of the assumptions of the research was that since these sequences were chosen by experts, they must therefore possess some aesthetic merit. I agree that even earlier parts of the game or perhaps every move in the game could contain some amount of aesthetic merit but I have never seen or even heard of an “entire game” winning a brilliancy prize or considered by anyone to be beautiful. Usually, it’s a particular sequence in the game which brands it, perhaps incorrectly, “a beautiful game”. The focus in my aesthetics research has therefore always been on the actual sequences deemed aesthetic by the experts.

Having said that, the aesthetics model is, in principle, capable of being used to evaluate the aesthetics of every move in a game for a collective score as a whole but for the aforementioned reason I have never programmed Chesthetica to do so. As I mentioned, even human experts don’t do it (they focus on a particular sequence in a game deemed to be “beautiful”) so there would be nothing for me to base it on. The concept of beauty, in general, may be fluid, as you suggest but unless the rules of chess change, I have good reason to believe that the “principles of beauty” in the game should be fairly consistent throughout the ages as long as the players are sufficiently competent. For instance, a player could not seriously claim that *losing* was beautiful. Beauty is, by its nature, to be found only in a win or a draw when all seemed lost. Sacrifices are beautiful too; especially unexpected ones. If you would like to learn more about the concept of beauty that the aesthetics model is based on, you should read at least section 3.2 of my PhD thesis:

https://www.researchgate.net/publication/230855649_A_Discrete_Computational_Aesthetics_Model_for_A_Zero-Sum_Perfect_Information_Game

In short, it is sort of a compromise between beauty as defined by chess composers and what players find beautiful in games. The model was validated experimentally against the aesthetic assessments of humans (expert composers and players included) which is also explained in my thesis and in the aesthetics paper linked to the article above. Here is the link again:

https://www.researchgate.net/publication/255907604_Evaluating_the_Aesthetics_of_Endgame_Studies_A_Computational_Model_of_Human_Aesthetic_Perception

I’m sorry it’s not something I can explain properly here. It’s fairly complicated and uses formalized mathematical representations of many aesthetic principles and themes grounded in chess literature (including the Levitt book ‘sp0623’ was talking about). Please also refer to the acknowledgements section of my thesis for further credits with regard to the development of the aesthetics model. The stochastic element in the aesthetics model is actually very minor and an unintended side-effect of what the statistical analysis work at the time revealed to be an improvement in aesthetic assessment. Please refer to Appendix B of the second link I provided above. I agree somewhat with you on the “noise” aspect, however. Still, I would argue that the model cancels out most of it because when compared against the aesthetic assessment of domain-competent humans (i.e. the experts), the model correlated positively and quite well. The advantage of such a computational aesthetics model is, of course, consistency and the ability to evaluate hundreds or thousands of sequences which would be impossible for humans to do with any level of consistency. Again, the results are based on averages so any one person looking at any one assessment by the computer is likely to find some if not total disagreement.
lajosarpad lajosarpad 9/19/2018 11:22
@azlan

Did you use problemists' definition for what beauty is?

Are you sure that the concept you have implemented to analyze beauty is matching with the majority of chess players?
lajosarpad lajosarpad 9/19/2018 11:14
@azlan

Thank you for correcting me and explaining that you have been using study-like positions for which the result of the game is known.

This changes my criticism, but not its essence. Your analysis was not limited to finding out which gender delivers more beautiful mates, it was broader, it was searching for the answer to the following question:

"Which gender achieves more beautiful wins"

If we replace in my initial criticism every place where I have mistakenly spoken about delivering more beautiful mates with achieving more beautiful wins, then my criticism no longer contains the mistake, coming from my oversight, which were correctly fixed by you.

However, as far as I understood the analysis is still focusing at the end of the game. The choice of the opening is at least as important as the end of the game. Also, there are motifs well before the end of the game (study-like positions or mate combinations) which could be added to the score. If you would introduce the concept of move heuristics, which would be higher if the move is a sacrifice, a positional manouvering concidered to be beautiful, a double-edged move, a move reaching to complications (all these can be determined by exploring the game tree of an engine's analysis, for instance, a double-edged move's subtree's variations are evaluated as more extreme than the other variations of the alternatives). Extra credit should be given to taking risks, that is, the position has a correct and safe move and a double-edged move is making it more complicated. You would add the move heuristics of the players, that would lead to a score and if a study-like position or mate is reached, evaluate it with your current evaluation and somehow have a result containing both. Also, I think the study-like positions and mates can be interpreted as particular cases of the move heuristic concept. But all in all, we still have a problem with subjectivity, which is not limited to individual taste, but to fashion as well. It is possible that whatever is considered to be beautiful now will be disregarded in the future as not so beautiful.

The concept of beauty is therefore unreliable. To be able to determine what beauty is nowadays, you will need to include the masses to the decision. Maybe a website where you present positions and combinations and people are giving marks would explain you what beauty is nowadays. And if that site would function for many years, you will be able to see how the concept of chess beauty changes. This is still not a correct method, but at least it is responsive to the concept of beauty and silences a lot of critics by giving them a say in the question.

You are working with heuristics and fuzzy elements, you are working with stochastics, so I think it is not necessary to explain why reaching a given beautiful or less beautiful is random. You are stating that the players' style is a factor in reaching that position and you are correct, but it is still random, because:

1. The game was undeterministic before start (triviality)
2. The players are trying to outsmart each-other which affects the game
3. The situation affects the game
4. The opening struggle highly affects what positions will reach (if you play the Berlin as Black, there is a high chance you will play a queenless position, even if your White opponent likes to plays with queens on the board)
5. The perceived playing style of the opponent is an important factor

So, their playing style is just one of the many factors which contributes to reaching a given position and there is so much white noise that it is unreliable to draw any conclusions about the aesthetics of their play from the position they reached. Yes, if they have reached a beautiful position, you can say with a certain probability that they might be playing more beautiful chess, but I would advise you to do not bet on it.
sp0623 sp0623 9/19/2018 10:15
The first time I was acquainted with the notion of 'chess beauty' was through the 1995 book 'Secrets of Spectacular Chess' by Jonathan Levitt and David Friedgood, published by Batsford. In it, the authors describe paradox, depth, geometry, and flow to be the four elements of chess beauty. In part III of that book, the authors cover samples from practical play, studies, and problems. I highly recommend that book to those interested in the field of chess aesthetics.
geraldsky geraldsky 9/18/2018 06:00
Carlsen plays most of the time --simple and dry positions that lead usually to endgame.
azlan azlan 9/18/2018 03:24
@scriabin369: The Elo strength was not arbitrarily chosen. They just so happened to be like that based on the games that could be successfully derived and processed by Chesthetica from the books. Fortunately, the average Elo of men and women were not different to a statistically significant degree or we would have had to arbitrarily remove some games to equalize for playing strength. That would have compromised the research findings somewhat. The issue I see with analyzing games or sequences by weak players, as you suggest, is that there is some correlation between aesthetic quality and playing strength. The argument against any findings there will naturally be that either the females or males simply didn't know enough about chess for the aesthetics of their play to even matter.

Still, your point is taken that very strong players (whose brains are already “honed” to chess) may not be “normal” enough to draw any conclusions regarding the general population. Another issue I should point out is simply the lack of reliable data (i.e. games) such that many different types of comparisons can be done in the first place. On your final point, I would say that “position analysis and calculation” capabilities are beyond the scope of the present research and there may indeed be differences between the genders in that regard.

@lajosarpad: The games we analyzed were not limited to just mates. They also included ‘study-like’ positions (mentioned in the article above). I agree that “beauty” is difficult to define or measure, especially from an individual’s perspective. We each have our own “taste”. However, this is why we typically use the average or mean value in such cases, wherever possible. I am not sure that the sequences (both the mates and the study-like endings) are simply “random results of a struggle”. Players tend to have playing styles that build up to an ending. Even in actual mates, there can be several choices of *how* to mate the opponent that the player chose consciously or unconsciously. Some variations are more aesthetically pleasing than others (and score higher). Your final point was also alluded to at the end of the article. Future research findings in this area could very well be different (for any number of reasons).
KevinC KevinC 9/18/2018 03:00
It is easier to beat weaker players more beautifully, and perhaps the most important thing would be the rating disparity between players, although random matchups of style play a major factor.
lajosarpad lajosarpad 9/18/2018 10:00
What is beauty? A philosophical question with a subjective answer, which most people are unable to put into words. Yet, in order to do such a study you need to define it in very exact terms, since at the end of the day the concept of beauty will be complied by the compiler or interpreted by the interpreter, depending on the programming language you use. Alternatively you can create a learning system, which gets "beautiful" and "ugly" positions as an input and will have to recognize the patterns which make a position beautiful or ugly. This, by itself is a very difficult task and I have my doubts about the system's ability to determine what beauty is.

Also, there is a human bias. There are some terms chess problemists define which make a mating pattern beautiful or not. For example, the mating combination does not start with check or capture, at the end position all points around the king, including the king's own position are controlled by a single piece (there can be more mating pieces, but a given square around the king is controlled by a single piece) and so on. These rules can be programmed into a system, but is the sub-culture of problemists able to determine what beauty is? Their problems are very beautiful for them, but not necessarily beautiful for everyone. I tend to like more brutal attacks, king chases than paradoxical first moves.

Also, since you are analyzing only games which ended by mate, the real question you are analyzing is not whether men or women are playing more beautiful chess, but whether men or women are delivering nicer mates.

The mates we see in games were not planned from the start of the game, they are random results of a struggle. The question of beauty and beauty comparison, if we restrict the analysis to mates only can only arise if there are multiple possible mates from a given position and their beauty differs. Since the players are not starting from the same position on the board when a mating position arises and are not in the same situation (a friendly game is very different from being forced to win at the last round), which you cannot take into account.

Finally, statistics are drawn from your system's empyric findings. Due to the finite nature of experiments and findings, even if your system and method was correct and the number of experiments was large-enough, this research could only yield a hypothesis, since the fact that women played more beautiful chess than men in the past (or not) by no means is a factual proof that whatever the finding was, it will be correct in the future as well.
scriabin369 scriabin369 9/18/2018 01:26
Dear Azlan, you present a very interesting article however I feel that one factor is still holding you back from discovering a truer answer to your question regarding the different aesthetics of chess playing between men and women. That is the question of ELO strength which you arbitrarily chose around 2400 to be a control for strength. I would suggest that you use very average beginners of chess who know the rules and furthermore have not self chosen to be interested in chess and instead are random people from society. The reason is I strongly believe that there is a self selection bias of people who choose to play chess(brains that are good at chess) especially those who are capable of reaching 2400 ie not your average man or woman brain. I'm a physician and marvel at the human body regularly enough to appreciate the subtle difference between man and woman. Furthermore I was a neuroscience Major in undergrad and remember very clearly participating in studies with the exact idea of showing how brilliantly both men and women can solve the same problem very differently. Furthermore there were very early studies at that time to suggest that male and female problem solving patterns can be linked to both anatomic and genetic markers. Suffice to say is that using the average male or female brain is going to be a more controlled assessment of difference between them. Of course it's just a gut feeling and I have not looked at the literature for years. Wish you the best and I hope what you find is that both men and women have their own special way of discovering chess. Lastly perhaps the question of aesthetics is not a good way to look about but perhaps it's better to look at how the sexes differ is their position anlysis and calculation.
1