Wikimedia Commons
When we talk about what new scientific research says, we want to say something understandable and digestible, especially when a finding is particularly shocking: people who don't believe in free will are more likely to cheat on their partner, for example.But accurate science isn't usually that shocking. It's an iterative process that chips away at the truth through many studies over time. Single studies can only show researchers getting a little closer to something true.
Researchers will sometimes practice bias (often without knowing it) and focus on data that shows what they were looking for — findings that are only true if they look at information in a certain way.
This doesn't mean their results are necessarily wrong, but it almost certainly means that a "shocking new study" is not as it seems, especially if it's the first one of its kind.
Such problems are so rampant in psychology that scientists have started to call it a potential "replicability crisis" — if studies are repeated, they may lead to different results.
A huge project from University of Virginia psychologist Brian Nosek, working with 270 other psychologists, just put this to the test by repeating 100 studies published in three of the top psychology journals in the world.
They published their findings in the journal Science on August 27, and what they found wasn't pretty.
Originally, 97% of the studies were statistically significant — meaning that researchers said their findings showed a less than 5% chance that they could have gotten the same results even if their data was false.
The chart below shows how the significance of the studies changed when they were repeated. Only 36% of the studies, just over a third, were still significant when repeated — these are the ones below the dashed line in the chart. The others were no longer significant.
Science/Tech Insider
Since the statistical significance of studies is only one way to evaluate them, and some argue that the 5% cutoff that defines "significance" isn't helpful, they also looked at how much of an effect there was in the study.
As The Atlantic's Ed Yong puts it, the effect values "measure the strength of a phenomenon; if your experiment shows that red lights make people angry, the effect size tells you how much angrier they get."
The chart below shows how the effect size changed when the studies were repeated. The closer the number gets to 1.00, the bigger the effect. You can see that the median effect size for most studies, as represented by the white dot, dropped far closer to zero when the studies were repeated.
Science/Tech Insider
In some cases, they even found a negative effect, with results opposite the original study.
As researchers who worked on the Reproducibility Project explain, this doesn't mean that you shouldn't believe psychology. It's just that science takes time, and finding something that's true often requires repeating work to make sure that a finding is real.
"This doesn’t mean the originals are wrong or false positives. There may be other reasons why they didn’t replicate, but this does mean that we don’t understand those reasons as well as we think we do. We can’t ignore that. We have data that says: We can do better," Nosek told The Atlantic.
Nosek's Center for Open Science will be examining other fields in the future, taking a look at everything from cancer biology to computer science.
In the meantime, maintain a healthy skepticism any time you hear "a study found" or "science says" — especially if there's only one study that supports that shocking new thing.
Open all references in tabs: [1 - 7]