Researchers recently attempted to replicate 100 published psychology experiments and studies. They succeeded in reproducing only about a third of them. In other words, more than half of the results that psychology journals saw fit to publish ended up being illusory.
There are a host of possible reasons for this discouraging result. One is simple fraud. Another is the practice of so-called p-hacking, or doing experiments until you find one that just barely meets the commonly accepted criterion for statistical significance. All in all, the finding seems to confirm the glum warning of John Ioannidis, who wrote a 2005 paper entitled “Why Most Published Research Findings are False.”
But it’s important to realize that the implications of the finding for science and the implications for policy aren’t the same. Science and policy work in two very different ways.
Low replicability isn't a disaster for science, because science is a slow, ponderous exercise even under the best conditions. Science is about finding principles, phenomena and theories that work consistently and reliably. Think of the laws of physics, or our understanding of the circulatory system, or the properties of graphene. These things have been tested hundreds of times, or thousands, or millions. There is zero chance that these scientific discoveries don’t work.
Psychology is no different. So what if two-thirds of experiments are mirages? Before a psychological finding is accepted as truth, it will be subject to many, many replication attempts. And not just replication, but testing in slightly different contexts, with different subject populations and different experimental techniques. This rigorous process will inevitably toss out all of the phantom findings, and only the truth will remain.
But policy is a different matter. For example, suppose we see a study that finds that people have unconscious bias against redheads. If we believe that one study, we might immediately implement policies to cancel out anti-redhead discrimination in hiring and school admissions. This is where non-replicable results become very dangerous. If the experimental result was a false alarm -- like two-thirds of the results in the replication study -- then we’ll take action based on wrong information.
This doesn’t just apply to government policy, but to any public reaction to scientific studies. Think about nutrition studies. By now we have learned not to trust any headline along the lines of “One glass of wine a day will cut your cancer risk.” The sheer number of nutrition research results that were later reversed has damaged the field’s credibility in the eyes of a public that has changed its dietary habits one too many times.
Economics is the same way. Econ writers, like me, often report interesting results from empirical studies. We try to use hedging words such as “maybe” or “could indicate,” and cite contradictory evidence if we know of any. But in the end, you have to be very careful reading our articles. Just as in psychology or nutrition science, economics results often don’t replicate.
The most famous example, of this, of course, is the Reinhart-Rogoff affair. Legendary international economists Carmen Reinhart and Kenneth Rogoff released a paper in 2010 claiming that if a country has a ratio of government debt to gross domestic product of more than 90 percent, its economic growth slows a lot. This finding, though it had not been replicated, was used as a warning by deficit hawks in many countries to discourage fiscal stimulus spending. But in 2013, a graduate student at the University of Massachusetts-Amherst tried to replicate the study and failed. He found that the authors had made a number of highly questionable decisions to drop various data points from the analysis, as well as a simple spreadsheet error. Another replication attempt by University of Michigan economist Miles Kimball turned out even worse for Reinhart and Rogoff’s thesis. In other words, people were recommending policy based on a result that turned out to be false, where they should have waited for replication.
Another example is a 2000 study on the effect of school choice on schooling outcomes by Harvard economist Caroline Hoxby. Hoxby examined cities where numerous rivers and streams chopped up cities into a large number of districts. Hoxby found that having many districts contributed to improved schooling outcomes, and concluded that school choice is good. But Princeton economist Jesse Rothstein tried to replicate Hoxby’s study, and found a number of problems -- most importantly, some of the paper’s results disappeared when small changes were made to the way the data was divided. Though Hoxby fired back in defense, the debate shows that recommending a policy of school vouchers during the years from 2000 to 2005 -- after Hoxby’s paper but before Rothstein’s -- would have been premature. Again, it would have been better to wait.
So when you read about economics studies, your first reaction should be skepticism. Trust meta-analyses and review articles more than single studies. Try not to let your political beliefs lure you into believing one result or another simply because it yields the conclusion you want. In the end, it pays to get the truth, and the truth is something that accumulates slowly.
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
To contact the author of this story:
Noah Smith at nsmith150@bloomberg.net
To contact the editor responsible for this story:
James Greiff at jgreiff@bloomberg.net
Open all references in tabs: [1 - 6]