The Cult of Averages
Society is obsessed with statistics, and even more so with boiling statistics down to a single number—generally the arithmetic mean. We want to believe our judgments are based on vast amounts of data, but shy away from the complexity of real analysis, and so we turn to the mean as our savior. Most visibly, we cite differences in means between demographic groups, or between time periods, to justify our political narratives. Yet we never question what these means mean[0][1].
Ill-Founded Means
The mean of a data set is its sum divided by its size. There is no self-evident interpretation of this number for arbitrary kinds of data, although people seem to assume otherwise. Next time a mean of a data set comes up, ask yourself whether it makes sense to talk about the sum of that data set. If not, how does dividing it by the number of data points make any more sense?
Many commonly-sited means lack any concrete interpretation. Consider mean GPAs, which are in fact means of means (and hence double silly). I'm sure you've read more than one piece citing differences in mean high-school GPAs between demographics groups—men and women, rich and poor, black and white[2]—as evidence of various structural failures. Ignore their conclusions for a minute[3], and try to explain how to interpret the mean GPA, without simply repeating the definition. "On average, how well students are doing" is a common attempt, but completely meaningless. "How well the average student is doing" is slightly better, but there is no such thing as a mean student; that would be like having a mean parent. In what sense is a C "half-way" between an A and an F, much less a C-average student "half-way" between a straight-A student and a straight-F student? What if someone re-numbered the GPA system so that a B was worth 2.5 points instead of 3? This would vastly change the mean GPA, but it has no real significance to anything we might want to measure. We're trying to use math to derive meaning from meaningless numbers. Garbage in, garbage out.
Another good example is IQ. A bedrock of modern threads of racism is the assertion that certain races have higher average IQs. But this is a very strange number to pay attention to if you know the definition of IQ: for a given test (with numeric scores presumed to correlated with intelligence), the IQ defined by that test is the unique increasing function of the score such that it is normally distributed across the population with mean 100 and standard deviation 15. Whether that makes any sense to you or not, it should be clear that a person's IQ is the result of a complex mathematical operation. It is difficult to interpret as it is. But how do you interpret the sum of two results of this complex operation? Divided by two? Can you expect that any conclusions you can draw from IQ apply to means of IQs? Of course not.
Abusing Means (and Medians)
Some means have a reasonable interpretation, but are used to draw broad conclusions they cannot actually support. Incomes are a good example. Mean incomes are a sensible concept, as two people can in fact pool their incomes and split the result. Of course, they generally don't do so, thus mean income is only a narrowly useful concept. For example, it is useful for evaluating how much a government would have to spend per citizen if it instituted a flat-rate income tax. Similarly, it is useful for estimating the total size of an economy. But outside these rare cases, the mean usually tells you little. Income taxes usually aren't flat-rate. Someone twice as rich doesn't, on average[4], buy twice as much rice; but on average they buy more than twice as many sports cars[5].
Comparisons again get us in trouble, since the mean is rarely what we actually want to compare. If the mean income of one group is lower than that of another, what does that mean exactly? Is there some sense in which the people in one group are uniformly paid less? Perhaps; this is certainly the narrative that is used when talking about disadvantaged groups in the United States—that they are reliably paid less for the same work. Differences in mean income are often used to support this intuition. But are the people of South Korea (per capita GDP, PPP: $37,900) uniformly paid so much less than the people of the Qatar (per capita GDP, PPP: $129,700)? Suddenly our intuition disagrees with what we thought the mean was telling us.
The basic assumption in these kinds of comparisons is that the distributions being compared are similar in shape, and are just shifted up or down. Sometimes this is true, but clearly sometimes it is not. Note that in the preceding examples I haven't made any claims about the shapes of these distributions—I don't know what they look like, and chances are neither do you. Nobody seems to be asking, and if they are, the data isn't being widely published.
Things get worse when we start comparing across time, because even if we think we know the distribution of a data set, this gives us no reason to think we know the distribution of the changes of that data set over time. Even the mean's sophisticated cousin—the median—often misleads us here[6]. Real median income in the United States has been rising since the end of the Great Depression, with a few brief interruptions. But what story does this tell? The story we want to hear is that across shorter time spans, individual people are making more money than they used to; and across longer time spans, children make more money than their parents did at their age, or at least over their lives. In this case, we know the reality is different. If we instead group workers by date of birth, we see that median lifetime wages have been falling for decades. If we further segment by sex, we see that men's lifetime wages have been falling since the cohort born in 1942(!), with falling inequality in women's wages making up the difference for the median across sexes until women's wages began stagnating in the 1980's[7]. So how can median incomes still be rising, albeit slowly? We're getting older, and old people make more money[8].
Nothing is Normal
This is a special case of a more general phenomenon. When people think about differences in means or medians, they often have a mental picture of a pair of bell curves, centered around different points, and base their conclusions on that picture. Implicitly, they are assuming that the distributions in question are so-called normal distributions.
Despite the unfortunate name, most distributions are not normal. Normal distributions are strikingly common by a statistician's standards, which is to say they resemble a non-trivial minority of the distributions one comes across. Specifically, any distribution which arises from the sum of a large number of independent, yet similarly distributed variables, will be approximately normal. Every one of these caveats is important. GPAs are a sum[9], but not of independent variables—of course students who do well in one class tend to do well in another. Individual income is generally not a sum of more than one source of income, and very rarely more than a few.
Among the hotly discussed statistics discussed above, only IQ is normally distributed—since it is defined as such. But this is misleading. As previously argued, nobody cares about IQ qua IQ, they care about is a proxy for something else they can't measure. But what are the chances that something else is normally distributed? Low. Furthermore, while IQ is defined to be normally distributed, measurements of it rely on transforming test scores into IQs based on a finite sample of scores. At either end of the score distribution, sample size falls and sample error grows. So someone whose score on one IQ test translates to an IQ of 160 (4 standard deviations above the median—by definition) will likely have a very different score on another test, and it is unlikely that only 0.006% of people would score higher.
So even when something appears to be normally distributed, we should only trust that within a couple standard deviations. And yet everyone wants to hear about outliers. Here in Silicon Valley, the vogue claim among male chauvinists is that, while perhaps on average women are as smart as men, men have higher variance in IQ, and therefore at the extreme high end of the IQ distribution (e.g., them and their friends), almost everyone is male (e.g., them and their friends)[10]. While it is true that the ratio a higher-variance normal distribution to a lower-variance one approaches infinity at the ends of the distribution, this is not true of approximately normal distributions, so in light of the previous paragraph this claim amounts to statistical bunk.
Doing it Right
The median is a simple and obvious improvement over the mean for most things. But why constrain ourselves to a single number? We live in an age of graphical media—why not show the distribution? If the conclusion is not obvious from the distribution, then either the effect is very small, or the statistical analysis required to reach it is complex and easy for non-experts to get wrong. In the first case it should only be of interest to domain experts, while in the second it can only be evaluated by them. Either way, this kind of statistical analysis is best left to peer-reviewed journal articles.
Of course, the reason we do things wrong isn't because doing things right is hard. It's because doing things wrong gives media companies plausible deniability to write "news" stories with shocking and yet clear-cut conclusions supporting a popular ideological position, ideal for social-media driven advertising.
It is difficult to get a man to understand something, when his salary depends upon his not understanding it! — Upton Sinclair[11]
- ^
In short: means are only meaningful inside vector spaces, and functions of means are only meaningful when the function is linear.
- ^
I'm punny, sue me.
- ^
Racial classifications are also ill-defined, and becoming increasingly divorced from reality as intermarriage increases. The computer scientist Les Earnest has written on this from a CS point of view.
- ^
They're probably right that there are structural failures in our school systems that substantially hinder the educations of various groups. This is certainly what my intuition tells me. But just because a given interpretation of the data supports your intuition doesn't mean that interpretation is right. Have courage—cite your intuition instead of deeply flawed data analysis!
- ^
Hopefully you're asking what average—mean or median? Here I mean mean.
- ^
Formally, both of these are nonlinear functions of income.
- ^
In fact, while at least the mean difference over time is the same as the difference in the means, since the median is non-linear we don't even have that guarantee.
- ^
This Washington Post article is unusual in that it exploits graphics to clearly explain these trends and avoids major statistical pitfalls.
- ^
I suspect that the underlying data in this study also takes into account falling labor force participation[12] in a way that most median wage statistics ignore, namely by counting workers who stop working; unfortunately I do not have the underlying data to back this up.
- ^
At least if you fix the number of classes.
- ^
Not just in Silicon Valley, mind you. Social scientists as prominent as Steven Pinker have made similar claims appealing to normal distributions of aptitude.
- ^
I, Candidate for Governor: And How I Got Licked (1935), ISBN 0-520-08198-6; repr. University of California Press, 1994, p. 109.
- ^
Labor force participation is the fraction of all adults (16+) who are in the labor force, e.g. count towards the "official" U-3 unemployment rate. Participation has been buoyed by women leaving the home, but despite this the participation rate has fallen 4 percentage points in the last 20 years (BLS, note that you have to change the time range to see 20 years). Part of this is the US's aging population, which wouldn't affect the study of lifetime earnings, but a large part is that 5.3 million more working-age adults have been added to disability rolls over the same time (Washington Post article on rising disability, another rare example of good statistical analysis in media, which avoids many pitfalls by looking at changes county-by-county).