Stats Can Trip Up Scientific Data Fabricators

Instances of research data that are "too good to be true" may soon be routinely revealed using a statistical method developed by a business school professor.

The psychology research community is already reeling from a report that Dirk Smeesters, PhD, a professor of consumer behavior and society at Erasmus University in Rotterdam, the Netherlands, manipulated data in his 2011 paper, "The effect of color (red versus blue) on assimilation versus contrast in prime-to-behavior effects" in the Journal of Experimental Psychology.

A statistical method developed by Uri Simonsohn, PhD, identified patterns in the raw data underlying work by Smeesters that strongly suggested they were not real, according to a report from Committee for Inquiry into Scientific Integrity at Erasmus University.

Simonsohn, who is at the Wharton School of the University of Pennsylvania in Philadelphia, told university officials of his suspicions, which conducted its own investigation and determined that something was indeed amiss. They concluded that the data were statistically highly unlikely.

Smeesters denied having made up the suspect data, but was unable to provide the raw data behind the findings, asserting the files were lost in a computer crash after he had sent a copies to Simonsohn.

But he did admit to selective omission of data points that undercut the hypotheses he was promoting. However, he insisted that such omission was common practice in psychology and marketing research.

Nevertheless, he agreed in late June to resign from the university and the paper, along with another study, was retracted.

The university report gave few details on Simonsohn's method, but his previous publications do offer some clues as to how his system works. Simonsohn did not respond to requests for an interview with MedPage Today.

According the Erasmus report, the method "is based on the assumption that there should be sufficient variation in reported averages of groups that are assumed to stem from the same population."

Therefore, it's a sign of data manipulation when averages of multiple groups taken from a population vary too little, the report said.

The basic idea is that data invented or manipulated to fit a predetermined pattern are, by their very nature, nonrandom and therefore can be distinguished from real data.

As a test, university investigators applied the analysis to data from four articles drawn randomly from the psychology literature. They found that, in each case, the group averages for study variables showed the expected amount of variability.

But when applied to Smeesters's papers, the analysis turned up several variables for which the averages were improbably similar.

The report noted that this alone was not sufficient evidence to prove that Smeesters had manipulated the data. When confronted with the results, Smeesters replied that he had omitted data from some subjects "to increase the strength of the effects, as a result of which the P-values become significant," according to the report.

It went on to say that Simonsohn was finishing a manuscript describing his method, tentatively titled "Finding Fake Data: Four True Stories, Some Stats, and a Call for Journals to Post All Data."

Simonsohn told Nature News that Smeesters was one of the "true stories" and that another involved a different psychologist who had previously been found to have falsified data. In another case, Simonsohn said he didn't have enough evidence to approach the university, and the study's co-authors refused to cooperate with him, Nature News reported.

Simonsohn's method is strictly a statistical analysis of data averages so it could be turned to biomedical publications. "The tool should be broadly applicable to other disciplines," he told Nature News.

Simonsohn, who teaches about decision processes at the Wharton School, has published several previous papers in which he has revealed subtle truths about other statistics.

Last year, he co-authored an article in Psychological Science suggesting that, when people are subjected to quantitative evaluation, "round numbers become implicit goals that strongly influence behavior."

In an effort that led him to his new role as research fraud detective, Simonsohn also helped write a 2011 paper in the same journal titled "False Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant."

That paper had described computer simulations and experiments "that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis," by methods such as those that Smeesters had admitted to using.

He and his co-authors recommended a series of requirements that journals and peer reviewers should set for manuscript authors, focusing on rigor in data collection and reporting of methods.

Regarding his method for identifying manipulated research data, Simonsohn told Nature News that it would be "worthwhile to find other ways ... we know people are really bad at emulating random data, so there should be all sorts of tests that could be developed."

Add Your Knowledge ™


Take Posttest

John Gever

Senior Editor

John Gever, Senior Editor, has covered biomedicine and medical technology for 30 years. He holds a B.S. from the University of Michigan and an M.S. from Boston University. Now based in Pittsburgh, he is the daily assignment editor for MedPage Today as well as general factotum on the reporting side. Go Pirates/Penguins/Steelers!


Leave a Reply