The bestselling author/economist discusses reality filters, Viagra, and statistician Florence Nightingale.
Tim Harford is an economist, journalist, and broadcaster. He is the author of The Data Detective: Ten Easy Rules to Make Sense of Statistics, Messy, and the million-selling The Undercover Economist. He is also a senior columnist at the Financial Times and the presenter of BBC Radio’s More or Less, How to Vaccinate the World, and Fifty Things That Made the Modern Economy, as well as host of the podcast Cautionary Tales.
What inspired you to write The Data Detective?
Two things. The first was my unease that people had become so cynical about statistics. Of course it is possible to lie with statistics, but it is easier to lie without them. There are important truths about the world that we can’t hope to see without solid statistics — and people who would like to obscure those truths often find it is easy simply to undermine statistics. After all, people were saying, “Lies, damned lies, and statistics” long before Donald Trump ever used the phrase “fake news.” The tactic is just the same, though. So I decided that statistics needed a defender.
The second spur to my writing the book was a sense that people needed help understanding their own biases as well as any technical details about statistics. I wanted to help people understand the world better by first understanding the ways in which they might fool themselves.
The Data Detective sets out 10 rules to help people use statistics to "illuminate reality with clarity and honesty." One of those rules is related to naive realism. How would you explain what that is?
Naive realism is our tendency to believe, instinctively, that we are seeing reality clearly, without filters or errors. George Carlin put his finger on the problem when he said, “Have you ever noticed that anybody driving slower than you is an idiot, and anyone going faster than you is a maniac?” That’s partly a joke about our intolerance of the failings of others, but it goes deeper. Of course, you are driving at what you perceive to be the appropriate speed — otherwise, you’d dab the accelerator or the brake. It can be hard to understand why anybody would ever see it differently. Naive realism leads us to trust our own experiences too much and be too dismissive of those who see things differently, for reasons we find hard to grasp.
You urge people to determine who is missing from data. What kinds of messes have you seen result from people being left out?
Oh, my favorite example is the discovery of sildenafil. Sildenafil was originally tested on an all-male subject group as a treatment for angina — chest pain — but it didn’t work very well. However, many of the men in the clinical trial reported the side effect of magnificent erections. The drug was branded “Viagra,” and the rest is history…except that more recent small trials with women have suggested that the drug might also be an effective treatment for menstrual cramps. That needs to be tested properly in a full clinical trial, but the point is that women never got the chance to discover accidental benefits of Viagra because they were not in the original trial.
A chapter of The Data Detective is devoted to big data sets. What do you suggest people do so they aren't misled by big data?
Big data sets and the algorithms that use them aren’t intelligible to the likes of you and me. As a result, we often overestimate their power. I pull apart an infamous example in my book: the algorithm used by Target that, reportedly, figured out a teenage girl was pregnant, much to the embarrassment of her father, who had vehemently complained that Target was sending her inappropriate coupons for diapers and maternity wear.
In fact, we don’t know how accurate the algorithm was — perhaps it sent such vouchers to all women under the age of 40. And if you could see what the algorithm saw — for example, that the young woman was buying vitamin supplements marketed at pregnant women — you might not find it hard to reach conclusions that might seem awe-inspiring to those not in the know.
I argue that what distinguishes science from alchemy is openness: Science progresses because we insist that people share data and methods, allowing others to check and test their results. Those traditions of openness are not being upheld by people harvesting data and training algorithms, and that is a problem. Even if I can’t evaluate an algorithm myself, I feel much better knowing that the details are open and can be rigorously tested by independent experts, people I am happy to trust.
Most of the world knows Florence Nightingale as a trailblazer in nursing, but you focus on her role as a statistician. How are those two facets of her life connected?
Florence Nightingale was an astonishing woman. She went to Istanbul in the 1850s to supervise the nurses working in military hospitals there treating British casualties of the Crimean War. The death toll was truly apocalyptic. When she returned from the war, she and her geek allies assembled some remarkable statistics, demonstrating beyond doubt [that] first, the deaths had been from communicable diseases, and second, that the deaths were preventable by adopting more hygienic practices in hospitals and barracks.
This was a huge battle for her to fight — she was well connected, but also a woman in a man’s world, causing considerable embarrassment to both the military and medical establishment. What fascinates me is the way she very deliberately used some elegant graphics — and perhaps mischievously persuasive ones — in order to make her case. She was very calculating about the power of this visual rhetoric. After sending one report to Queen Victoria, she wryly noted, “She may look at it because it has pictures.”
If you could make three statistics-related recommendations to the Biden administration, what would they be?
First, preserve the independence of official statistical sources. They will occasionally produce news that the government finds embarrassing, but they are statistical bedrock. Second, invest to rapidly improve the public-health-data infrastructure. It proved to be woefully inadequate to deal with covid. Some information was still being communicated by fax, while we lacked basic data such as the number of hospitals in the U.S. The better the data, the smarter the decisions we can make. Third, run regular horizon-scanning workshops where data geeks meet subject-matter experts to figure out what data they don’t have but might want to have to cope with future challenges. Right now, a surveillance dataset on viruses circulating in the population is front of mind. But there will be others!
Now that The Data Detective is out, what are you focusing on?
I’m pondering writing an edition of the book for younger readers. And I’m very excited about the new season of my Cautionary Tales podcast. Some of the stories are about data — and I’m having to pinch myself because Helena Bonham Carter is playing Florence Nightingale. That is not a sentence I imagined writing a few months ago.
[Editor's note: Interviewer Andrea M. Pawley will appear at the all-virtual Gaithersburg Book Festival on Thurs., May 6th, at noon (EDT).]
Andrea M. Pawley lives and writes in Washington, DC, her favorite city in the whole world.