What gender gap in chess?

by Wei Ji Ma
10/15/2020 – If you want to compare chess achievements between men and women, writes Professor Wei Ji Ma of NYU, given their vastly unequal numbers, it is a very bad idea to focus on the top male and female players. If you do you will need to account for the participation gap using an analysis similar to the one he presents. Prof. Ma supplies the tools needed to refute the theory of female inferiority.

ChessBase 17 - Mega package - Edition 2024 ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.

More...

Long before fake news was a thing, there were articles about the supposed inferiority of women to men in chess. In most other domains of life, such ideas would be considered reactionary and repulsive; yet, when writing about chess, they are somehow not only acceptable but even mainstream. A few days ago, we saw the latest installment in this unsavory series: an article on the Indian website Mint, titled “Why do women lose in chess?” and reprinted here on ChessBase. Like so many of its predecessors, this article asserts a gender gap in chess achievement, and then speculates about possible contributing factors, such as male gatekeepers, lack of role models, and biological differences. It quotes GM Humpy Koneru as saying that men are just better players, “You have to accept it.”

In good tradition, the article is incredibly sloppy in arguing that there is an achievement gap between women and men to begin with. The author notes that the top two female players, Hou Yifan and Koneru Humpy, are ranked only 86th and 283rd in the world, that no woman has been world champion, and that the gap between the best female and best male player is 205 Elo points. These arguments are all variations on a common theme: whatever metric of top players you use, women are clearly worse than men. There is a huge flaw in this argument: to fairly compare an underrepresented to an overrepresented group, you should never use the top individuals. That is a form of statistical malpractice that wouldn’t stand in an introductory college course.

The Mint article starts out promising. It points that only 16% of the players registered with the All India Chess Federation are female, and states, correctly, “Fewer participants at the entry level results in fewer chances for the top slots.” It then promptly abandons this key argument while giving extensive coverage to folk psychology about “killer instinct” and “emotional sensitivity”.

A thought experiment

Why is this a key argument? It’s really quite simple. Let’s say I have two groups, A and B. Group A has 10 people, group B has 2. Each of the 12 people gets randomly assigned a number between 1 and 100 (with replacement). Then I use the highest number in Group A as the score for Group A and the highest number in Group B as the score for Group B. On average, Group A will score 91.4 and Group B 67.2. The only difference between Groups A and B is the number of people. The larger group has more shots at a high score, so will on average get a higher score. The fair way to compare these unequally sized groups is by comparing their means (averages), not their top values. Of course, in this example, that would be 50 for both groups – no difference!

Indian women play as well as men on average

At this point, you might think that this is just a theoretical argument – surely, when looking at chess ratings, it cannot be that simple? So let’s have a closer look at chess ratings. I downloaded the October 6, 2020 FIDE Standard rating list, selected all players of the Indian federation, and removed all junior players (born 2000 or later), since their ratings are often not reliable. I was left with 19,064 players, of whom 17,899 (93.9%) were male and 1,165 (6.1%) were female. The best male player was a certain Viswanathan Anand at 2753, and the best female player was Humpy Koneru at 2586 – a gap of 167 points. GM Koneru, ranked 15th among all Indian players, is the only female in the top 20. On the surface, these facts superficially seem to point to a gender gap in achievement.

They don’t. With our thought experiment in mind, let’s look at the full rating distributions of male and female Indian players. They look like this (binned from 1000 to 2800 in bins of 50):

The huge discrepancy between the blue and orange lines reflects the participation gap. To compare the distributions more easily, we change the vertical axis from number of players to proportion of players (within each gender):

The line for female players is more jagged because there are fewer of them, but other than that, these two distributions don’t look radically different from each other. Indeed, the average ratings of men (1434) and women (1466) are comparable. And averages are the fairer metric for comparing men and women.

Is 167 points an unexpectedly large gap?

But this does not answer our questions. Is, for example, a gap of 167 points between the male and female top players unexpectedly large? To answer this question, we are now going to look at all ratings as a single pool, dropping the gender identifiers altogether. We then randomly draw 17,899 ratings from this pool. These form the “overrepresented” group, and the remaining 1,165 ratings form the “underrepresented” group. These numbers are exactly the numbers of male and female players in our data, but we have instead created completely arbitrary groups with these numbers of individuals. We record the top rating in both groups. We repeat this process 100,000 times. (For the aficionados: we are following the logic of permutation tests.)

Guess what? The difference between the top ratings in the Overrepresented and Underrepresented groups is a whopping 153 points on average (with a standard deviation of 93). Again, remember that these groups are completely identical to each other except in their number of individuals. The mere fact that the underrepresented group constitutes only 6.1% of the population causes a large difference in top ratings. In this light, the real gap of 167 points could easily be due to chance instead of due to a real difference between women and men, just like the gap in our thought experiment. It is that simple.

Other widely used metrics don’t show evidence for a gender gap either. For example, based on participation alone, one would expect only 1.2 female players in the top 20 overall. So Humpy Koneru being the only female in the Indian top 20 is completely in line with statistical expectations based on the participation gap.

Conclusion

We conclude that at least among non-junior FIDE-rated Indian players, there is no evidence that the “achievement gap” is anything but a participation gap. That is not to deny the first-person perspective of top female players, who might feel that they have reached a personal ceiling in their performance. But statistically, there is nothing to suggest that top female players are underperforming given the overall ratio of female to male players. In fact, taking into account the systemic injustices and biases that they had to overcome to get where they are, they are likely overperforming.

Take-aways:

  1. If you want to compare chess achievements between men and women, given their vastly unequal numbers, it is a very bad idea to focus on the top male and female players.

  2. If you insist to focus on the top players, you will have to account for the participation gap using an analysis similar to the one presented here. Just a casual remark won’t do it!

  3. Even if, hypothetically, you were to find a gender difference based on average ratings (rather than top ratings), you cannot jump to the conclusion that that difference is due to innate or biological factors. The first place to look would be systemic disadvantages and stereotype threat experienced by female players.

The statistical arguments presented in this article are elementary enough for an introductory statistics class in university. In case you want to repeat the analysis for different countries, you can check my Matlab code for the details. If you prefer to read a published paper, look no further than this excellent paper by Merim Bilalić, Kieran Smallbone, Peter McLeod, and Fernand Gobet (2009). (The free PDF can be found through Google Scholar.) Its title is the first question everyone should be asking: Why are (the best) women so good at chess? And everyone’s second question should be: How can we reduce the insane participation gap?

About the author

Wei Ji (also Whee Ky) Ma is a Dutch FM rated 2324 and a Professor of Neuroscience and Psychology at New York University. He previously explained for Chessbase how a “genius culture” in chess might contribute to excluding women.


Wei Ji (also Whee Ky) Ma is a Dutch FM rated 2324 and a Professor of Neuroscience and Psychology at New York University. He previously explained for Chessbase how a “genius culture” in chess might contribute to excluding women.

Discuss

Rules for reader comments

 
 

Not registered yet? Register

rls53 rls53 10/16/2020 03:04
I believe the problem with this calculation is that the results hold only for India – they don’t replicate with other countries.
I downloaded the data and repeated the calculations described by Professor Ma. For India, my results agree with his. Another way to present the results is the proportion of permutation samples where the male-female difference is greater than the real data – what statisticians call a p-value. In this case, the p-value is 0.39. Statisticians usually regard a p-value of less than 0.05 as "significant", but the result for India is consistent with random permutations.
However, same test for Russia, the difference between the strongest male and female players is 230, p-value 0.0006. FIDE lists Kasparov (2812) as the highest-rate Russian player, but if we base the calculation on Nepomniachtchi (2784), the p-value is still 0.003, which is still "highly significant". For Russia, regardless of whether Kasparov or Nepomniachtchi is the male reference point, the difference cannot be explained by random permutations.
I repeated the calculation for several other countries, with p-values: China, 0.04; USA, 0.007; Germany, 0.023; France, 0.016; England, 0.12; Netherlands, 0.01. Only for England does the p-value exceed 0.05.
I also made the calculations for the tenth-highest male and female players rather than the highest - this is a more "robust" calculation that is less susceptible to outliers. However, the results are still significant (except for India) - for England, in this case, the p-value is 0.0015.
What do I learn from this? Maybe India really is doing something different, for example getting girls involved in chess at a young age. But the general hypothesis advanced by Professor Ma does not stand up to close scrutiny. If his explanation is correct, it should apply to many other countries and not just India.
SimonReinhard SimonReinhard 10/15/2020 11:52
Regarding such factors, it has been well-known in Psychology that self-perception and the idea of one's potential can significantly influence people's performance. A famous example is the math test for seniors study: Two groups of seniors, both chosen to have roughly equal basic mathematical ability and educational background, had to do a math test. One group was told that seniors usually/often have a decline in cognitive ability and that it was expected that they would not do well. The other group was told that there was no reason that they should not get a high score. What happened? The first group scored 30% worse in the test. As far as I know, these results can be reproduced. So, if such effects exist, then it does not seem impossible that the absence of a female world champion and the lack of female top players(and other factors) also have a detrimental psychological effect on many young female players. Not all, for sure, but enough to skew the distribution again, in addition to the unequal numbers of participants.

All in all, a very tricky subject, but I hope that soon another Polgar or an even stronger female player emerges. It would be exciting to see :).
SimonReinhard SimonReinhard 10/15/2020 11:52
I welcome it very much that the author has taken this approach. It has been my opinion for quite some time that before analyzing other factors one should first take a close look at the different numbers of men and women playing chess.

In my opinion it is a fallacy to only look at the best players and, based on that, to claim that a higher potential of men to be good at chess exists.

To the contrary, if one takes the approach that basically every player, male or female, has a certain chance to have a certain rating (some probabilty distribution) and that higher ratings are rarer, then the whole situation can be compared to a lottery or a drawing of lots. If the set "men" has more lots than the set "women", then obviously the chance will be higher that some of the "men" lots are from the highest categories, even if (what I deem a natural basic assumption) men and women have the same potential to excel at chess.

A useful comparison would be: Take the ratio of women playing chess, which might be around 10%. Then check what happens if you take a random sample of 10% from the male pool and see what kind of distribution you get. If then there is still some discrepancy between expected distribution and real distributon, then the discussion can start inhowfar socially adverse factors contribute to that.
Jack Nayer Jack Nayer 10/15/2020 11:26
Okay, I read the article by Bilalic et al (all men by the way). There are potential female Firouzjas, but we do not know them because they do not play chess.
In this country, there are more female students in higher education than males. There are more females taking a masters than males. We have more female PhD students than males. There are female concert pianists, female CEOs, female scientists, female bishops, you name it. I am not suggesting that everything is equal, unfortunately far from it. There are, however, very few females taking a PhD in mathematics. Why? Is it a lack of role models? What is a role model anyway? Is it the patriarchy discouraging their daughters to study maths? Is it society at large? How exactly does that work? What keeps young girls away from chess clubs? Is it discrimination or tradition? Could it be less innate interest? What explains the existence of the participation gap? As long as you do not explain this, you explained nothing.
fche fche 10/15/2020 11:23
> To answer this question, we are now going to look at all ratings as a single pool, dropping the gender identifiers altogether.

In what world is this a valid statistical transformation? It *presumes* that the ratings are independent of gender, which is precisely the proposition being analyzed. It's a circular argument.
Eric Boesch Eric Boesch 10/15/2020 11:12
This claim there is no gender gap in achievement at the highest level of chess beyond that which can be explained purely in terms of participation is simply wrong. The choice of India where the writer knows the #2 female chess player in the world comes from is pure cherry picking. If we suppose that a 6% participation rate for women is a global average, and that a priori distributions are the same, then it is easy to see that the expected number of top 100 and top 300 female chess players would be 6 and 18, not 1 and 2. If we accept the author's claim of no gender gap, the probability of just 2 of the top 300 being women, assuming 6% of all rated players are women and identical a priori distributions of rating, would be 1.6 out of 1 million. Whether there are innate differences, I don't know -- I just know that there is a gap in top level results beyond that which would be expected based solely on participation levels.
paulo1176 paulo1176 10/15/2020 10:58
I'd like to congratulate Professor Wei Ji for your excellent article and Mr. Frederic for your constant "food for thougth" here in ChessBase. I must say too that I love your fine irony and sense of humour, Fred!
The subject is very passionate, and it's impressive how frequently "science" or, most accurately, pseudoscience is evoked to justify pretended superiority of an human group based on gender or race, or ethnic origin and so on.
This article adresses one important point - that makes no sense to compare artificial groups (like divisions of human beings by gender or race) that are unequally in size without re-weighting data, but also stresses that every difference found in "results" must not be explained by the simple assumption of an inate difference.
I suppose @chessgodo would promptly classify me as "egalitarian", but I'd like to point that these gender or race divisions per se does not exist in nature, but are rather human classification. If we consider the chromosomes, Vishy Anand and Weissmuller are men, Koneru Humpy and Ledecky are women, but certainly there are extreme differences between all of them. When we compare men and women, we are comparing not only the inate and finished product of chromosomes, but also an infinity of social and cultural patterns, and how people of both genders deal with it. The theme is valid mainly because we are searching subjacent causes that could explain why participation gap is so large. Make chess more interesting to everybody is the challenge.
Jack Nayer Jack Nayer 10/15/2020 10:44
Quandary: Save me time and just tell me on the basis of the ‘participation gap’ why Carlsen is better than Yifan. Why Caruana is better than Yifan. Continue doing it all to number 86. Or take Firouzja. He is 17. His rating is 2728. With the exception of 28 years old Judit Polgar, no woman chess player ever came close. Why not? The best women are lagging behind best men by ca. 200 elo. The fact remains that in chess the best men are better than the best women and quite significantly so.
Frederic Frederic 10/15/2020 10:16
@chessgodo: You don't feel even a slight tinge of pleasure to see how far female swimmers have progressed since the 1920s? You don't feel that should be celebrated, because men still remain superior? You think that is committing an "apex fallacy," as defined by Men's Rights Activism? I was just feeling really good to see Kathie Ledecky prove how a female can do things that nobody imagined a hundred years ago: I did not mechanically add that men (Michael Phelps) of course remain far superior.

Gosh, I hope I don't end that way.

@royce campbell: You have eagle eyes, my friend. It took me an hour to put the montage together, and one tends to lose track.
Quandary Quandary 10/15/2020 09:48
@Jack Nayer That is exactly what this article is addressing, maybe you should read it!
IntensityInsanity IntensityInsanity 10/15/2020 07:53
I find it an interesting discussion. I have been reading about this topic for years. Truth is, I am not on any particular side. I noticed that almost all posters here have a very definite position on this topic. However, I find it OK to just say that I don't know the truth. There is a big difference between the achievements of both genders in chess, and I do, indeed, wonder why? However, I have not yet been convinced by any one side. This article was good, and so far not one of the critics in these comments have been able to debunk it, other than just writing words like, "ridiculous", etc.
royce campbell royce campbell 10/15/2020 07:07
Interesting that the montage included Nazi Paikidze twice.
chessgod0 chessgod0 10/15/2020 04:44
@Frederic

1) The whole angle about Ledecky is a bit ridiculous as with a modern training and diet regimen, it's obvious that he would have defeated her.

2) "Stereotype threat" has been repeatedly debunked. It is simply not real, even if feminists and other egalitarians need it to be:

https://quillette.com/2020/02/22/lee-jussim-is-right-to-be-skeptical-about-stereotype-threat/
https://www.psypost.org/2018/10/study-fails-to-find-any-evidence-of-stereotype-threat-impairing-womens-cognitive-control-and-math-ability-52334
https://psycnet.apa.org/doiLanding?doi=10.1037%2Fapl0000420

Introducing "stereotype threat" into a scientific discussion means you are not actually having a "scientific discussion" at all.

3) Looking at the high achievers and drawing erroneous conclusions about the broad distribution is called the "apex fallacy". Feminists and other kinds of egalitarians do this all the time.

4) It's really disappointing to see these kinds of potentially divisive and polarizing discussions make headway here. I'm guessing it won't too long before this space becomes overtly politicized and people with dissenting views are cancelled and banned. I sincerely hope Im wrong.
Frederic Frederic 10/15/2020 03:53
@Stupido: "The constant blabber.." is also known as a "scientific discussion" (google that). There is a huge discrepancy in the number of women in the top 10, top 100 and in general in chess. Since no muscular power is involved (except for long castling) the question in every scientifically inquisitive mind arises: why? I have discussed it at length with top female players, who are close friends. They do not consider these discussions "blabber".

Incidentally you guys might be interested in a not directly related quiz question: how would 19-year-old Katie Ledecky fare in a swimming contest against Johnny Weissmuller, the greatest swimmer of the first half of the 20th Century? He won five Olympic gold medals and fifty-two U.S. National Championships. He set sixty-seven world records, and was never beaten in official competition during his entire career. The answer is she could finish the 800 meters, climb out of the water and stand attention for the entire American anthem before Weissmuller finished the race in second place. Details here: https://medium.com/@frederic_38110/swimming-now-the-girls-are-better-cb36996dc0fa
pfitschigogerl pfitschigogerl 10/15/2020 03:13
The question if women are under- or overperforming is pointless and rather boring. If you want to explode the myth that women are inferior at chess just give them the choice and opportunities to play the game (at whatever level) or do sth else . let´s finally take women and their abilities seriously and stop this nonsense with "women`s chess". It is antifeminist, discriminatory and a political and human disgrace.
fgkdjlkag fgkdjlkag 10/15/2020 02:54
"A 2500-rated woman on average is going to make a lot more money than a 2500-rated man. A strange sort of "systemic injustice"."

I don't consider it injustice. It can only be considered injustice if you believe that players should be financially compensated based on rating, but there are many other factors.

I haven't taken a look but I wonder if the mean ratings by gender are the same in the world and in other countries as in the analysis above.
Jack Nayer Jack Nayer 10/15/2020 02:39
I used to teach statistics -SPSS at a university in a previous life and I do not agree. While there is evidently a participation gap (and then, indeed, why?), it does not explain why Carlsen has a rating for 2800 something and Hou has a rating of 2500 something. It just doesn’t. For reasons no one can satisfactorily explain, the best women are not as good at chess as the best men. The evidence for this statement is overwhelming.
e-mars e-mars 10/15/2020 02:25
OK, given that this article relies on some better, statistical, scientific approach, it just moves the question to "why females are less represented?". Females - from what we know - are not weaker "at" chess, but they are certainly weaker "in" chess. They might be less represented because of exactly the same reasons other articles are claiming for being weaker "at"... If they drop out of playing chess because of "biological differences, changes in life priorities, gender inequality, cultural barriers and such" well, we are at the starting point again: how do you fix this inequality so that females will be as good as men "in" chess?
Has someone done any comparative work against other non-physical, intellectual and/or emotional only activity? e.g. translation/localisation jobs, or nursery jobs, where it seems clear that females have the upper hand?
tomohawk52 tomohawk52 10/15/2020 02:18
A 2500-rated woman on average is going to make a lot more money than a 2500-rated man. A strange sort of "systemic injustice".
kapil857 kapil857 10/15/2020 01:18
Question: The article states that 16% of the players registered with AICF are female. But then, a rating list is taken where junior players are removed, and that leaves 94% males to just 6% females. That is a huge difference from 84% to 16%.

Why is this difference? I cant imagine it is on account of junior players only, unless there are a lot lot more junior females registered as opposed to junior males.

So I am guessing it is because a large number of those 16% females are inactive and thus dont have an active rating?? And then, is this inactivity because females may be quit at an earlier age on average?? Possibly to start families?? And while some (like Humpy) return to chess, most dont?
flachspieler flachspieler 10/15/2020 12:24
One aspect are "heavy tails". A distribution (real-valued) is said to have heavy tails if it has much more large and small values in comparison with a normal distribution. In many fields, men have relatively heavy tails compared with women (in both directions: strong and weak). This might hold also in chess with respect to ratings.
fede666 fede666 10/15/2020 12:05
Chessbase should stick to reporting real chess news,,, unfortunately, political correctness is all the rage now...by the way nobody preventing women from playing better chess moves...
Stupido Stupido 10/15/2020 11:35
The constant blabber about this topic is becoming annoying, even when you are convinced that women and men can compete equally in chess (I am).
kamamura kamamura 10/15/2020 11:14
Amazing. You can use similar methodology to prove that turtles run as fast as rabbits.