Reading Time: 5 minutes
A study of gender bias claimed that orchestras that held “blind auditions,” where performers had been hidden from judges so they couldn’t know their sex, had been 50 percent more likely to hire women. This is how one newspaper reported the study results in 2013:
Bias cannot be avoided, we just can’t help ourselves. Research shows that we apply different standards when we compare men and women. While explicit discrimination certainly exists, perhaps the more arduous task is to eliminate our implicit biases — the ones we don’t even realise we have…Curt Rice, The Guardian
In the 1970s and 1980s, orchestras began using blind auditions. Candidates are situated on a stage behind a screen to play for a jury that cannot see them. In some orchestras, blind auditions are used just for the preliminary selection while others use it all the way to the end, until a hiring decision is made.
Even when the screen is only used for the preliminary round, it has a powerful impact; researchers have determined that this step alone makes it 50% more likely that a woman will advance to the finals. And the screen has also been demonstrated to be the source of a surge in the number of women being offered positions.
The study, by economists from Harvard and Princeton, has been cited roughly 1500 times since being published in 2000. It is one of the most-cited papers on the subject of gender bias. It has been used in journals, in newspapers and in TED talks, and even in court.
In early 2019, a data scientist named Jonatan Schaumburg-Müller Pallesen wrote a article titled “Orchestrating false beliefs about gender discrimination” on Medium criticising the reported observations. Later on he wrote a follow-up:
The surprising thing about this, is that even though the study does not contain any good evidence for their hypothesis, it is extremely well-known and often used as an example of scientifically proven gender discrimination. …
…This table unambiguously shows that men are doing comparatively better in blind auditions than in non-blind auditions.
…However, as I describe in my original post, this is observational data, and could be confounded in many ways. Thus I would not rely on it to draw any conclusions. But it is noticeable that the most significant result in the paper is in the opposite direction of the overall conclusion drawn.Jonatan Pallesen
His writings gathered much attention, but didn’t stop people citing the original report of the study as though it was accurate. A statistician at Columbia university named Andrew Gelman examined the study himself. Gelman confirmed Pallesen’s conclusions, i.e. the study’s claims don’t hold water:
Did blind orchestra auditions really benefit women?
This is not very impressive at all. Some fine words but the punchline seems to be that the data are too noisy to form any strong conclusions. And the bit about the point estimates being “economically significant”—that doesn’t mean anything at all. That’s just what you get when you have a small sample and noisy data, you get noisy estimates so you can get big numbers.Andrew Gelman,
Statistical Modeling, Causal Inference, and Social Science
As for the declaration that women’s probabilities of being hired improved by 50 percent in blind auditions, Gelman couldn’t discover any basis for it within the study to start with. Later, however, in the comments to his own report, he identified it: “there’s no way you should put that sort of claim in the conclusion of your paper unless you give the standard error. And if you look at the numbers in that table, you’ll see that the standard errors for these differences are large.” He further says:
And one problem with the paper is the expectation, among research articles, to present strong conclusions. You can see it right there: the authors make some statements and then immediately qualify them (the results are not statistically significant, they go in both directions, etc.), but then they can’t resist making strong conclusions and presenting mysterious numbers like that “50 percent” thing. And of course the generally positive reception on this paper would just seem to retroactively validate the strategy of taking weak or noisy data and making strong claims.Andrew Gelman,
Also in the remarks to Gelman’s post, it was suggested that political correctness may be the motive the paper was published by the journal that published it in 2000. Gelman countered that economic journals are recognised for being willing to favour results that undercut politically expected findings, at least more so than journals in other fields.
That may be so but the issue is how this study became a leading example of gender bias cited 1500 times. To put in another way, political correctness might not provide an explanation for the publication of the paper but almost certainly explains the frequency of citations.
Feminist Christina Hoff Sommers has her say one this debacle, too. If you have a subscriptions, you can read what she had to say in the Wall Street Journal. We will leave her with the last words, in a video: