False aggregates

David Albrecht
rude mechanicals
Published in
5 min readJan 26, 2018

--

The BLS reports the median annual wage for architects was $76,930 in 2016.

Look at that sentence. Doesn’t it sound impressive? Those numbers. Can’t you imagine someone on CNN saying it, dressed in a fancy suit? They’d put one of those little gray boxes on the bottom of the screen with the text, “Expert: Average salary $76,930 for architects in 2016”.

But isn’t that too broad a category, “architects”?

A lot of things bother me, but this one has a special place in my heart: the false aggregate, specifically, categorical statements about meaningless categories.

I was on a phone call earlier this year with my brother, and he said something about “averages”. I think we were talking about salaries; I got annoyed and said, “People need to stop obsessing so much over averages”. I tried to explain why I said it, but did a poor job. So this post is 50% explanation, 50% apology, which the linguistically-inclined will realize, are 100% the same thing (apologia).

False aggregates are tricky; they look like knowledge, but aren’t. Like Cheez-Whiz, they’re almost what they purport to be (food), but not quite; they’re fake, artificial.

I’m here to convince you why false aggregates, like Cheez-Whiz, are best avoided. One leads to bad aftertastes; the other, sloppy thinking and bad-decision making.

Elaborating on the problem

The discussion with my brother was about salaries. But the point is more general: looking at aggregate statistics has a way of “smoothing” things, making them appear more uniform, then they actually might be. The mere act of stating “the average…” implies a degree of usefulness, as if knowing something will be helpful.

Two issues. And it’s one of those irritating cases where you can’t make general statements because sometimes the issues become problems, and other times they don’t.

The first problem: sometimes specific factors interfere with, or “bias” measurements. Statisticians refer to these sources of bias as “confounding variables”, and their effects can be quite strong. Considering the case of pay for a particular job function, we might consider:

  • Personal network strength
  • Degree/educational quality
  • Relationship with colleagues and boss
  • Personal drive/motivation
  • Chaotic/turbulent home life

The second problem is worse: sometimes the population varies so much it’s just fundamentally dishonest to talk about an “average”; the things are just too different. An example: the average mass of a hydrogen molecule, a gorilla, and the sun: 6.63 * 10²⁹ kg.

But is that useful information? I see these errors of measurement all the time.

The median SAT score of a student in this district…

What classes did the student take? Were the teachers good?

How much did the student study outside of class? Did they have a stable home environment that encouraged academic excellence?

Median house prices increased 9%…

Houston just got hit by Hurricane Harvey; Seattle is popping at the seams with Amazon employees; West Virginia has a dying coal industry. In my hometown of Chicago, the war zone called “the south side” is a stone’s throw from the Gold Coast, some of the most expensive real estate in the upper Midwest.

Average starting salaries after graduation…

Who’s better off: 90k in Manhattan or 70k in Indianapolis?

What was the graduate’s GPA? How well did they network during undergrad? Do they have pre-graduation work experience?

Is the field growing? What’s the ten-year outlook?

How hard will the graduate work after school? What will their priorities be? Will they climb the corporate ladder, start a family, or do something else?

What if their family owns the company?

Dealing with it: some thoughts

Widespread, systematic oddities in our thinking make me curious.

Uncertainty is uncomfortable. I’m not the first to realize this; it’s the basis of most insurance ads. Unsure parents turn to US News and World Report’s “College Rankings” for assurance they aren’t making a foolish decision spending $200,000 sending their sons and daughters to college. Financial advisers sell certainty made from whole cloth. I think it’s worth realizing how innate this preference for certainty, even at the expense of accuracy, is.

When someone quotes an average, ask for the variance/standard deviation. Ask whether there are confounding factors. Beware of what Taleb calls Mediocristan vs. Extremistan. Many populations don’t have central tendencies.

Be skeptical of statistics, especially when used to persuade. It’s likely the persuader chose the most favorable statistic of many to present; the ones not presented might support the other side of the argument. Ask what evidence supports the opposite conclusion.

Beware of people and jobs where “sounding smart” is the job, especially if they have no skin in the game: news anchors, management consultants, most journalists, many academics. Even if they believe they’re right, evidence to the contrary rarely gets through (failed entrepreneurs don’t have this problem). Pay more heed to: owner-operators, fund managers who invest their own capital, doctors’ recommendations for their own families’ health, the parts auto mechanics buy for themselves.

Numerical false aggregates are only one type. I’m sure there are others. Be careful with your categories. Avoid “taxonomic gerrymandering”.

Related work

The topic of this essay is epistemology: the study of knowledge, including what’s knowable, and how we can be sure. It’s an old undertaking.

Much of this essay is inspired by Nassim Taleb’s writings on “modernity” and the risks of trusting those with too little skin in the game. I find his conservatism and “media diet” cultivation well-considered.

Much of Taleb’s thinking descends from Kahneman and Tversky, the fathers of behavioral economics. Thinking Fast and Slow catalogs an academic career spent studying human folly; it should be required reading in high schools. False aggregates are related to Kahneman’s base-rate fallacy.

Tim Ferriss is too self-promotional for my taste, but has the right idea with skepticism of one-size-fits-all solutions. He wouldn’t say it this way, but the problem is variance: some things have it, others don’t.

One of the best movies of epistemology is “The Big Short”. I love the opening quote:

“It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”

--

--