Johnson says statistical tests for differences between varieties of habanero peppers require an assessment of whether measured differences are statistically significant. In a recent Proceedings of the National Academy of Sciences article, Johnson reviews and discusses the criteria used to claim statistical significance in stylized measurements of pepper varieties. (Credit: Pamela Johnson.)


Many scientific research papers are far less accurate than they claim because a standard of evidence that researchers have been using for decades has been significantly overstated, Texas A&M University statistician Valen Johnson argues in a newly published paper.

Because it's difficult in scientific disciplines to arrive at unequivocal conclusions, scientists instead set a bar for how confident they need to be before they announce findings. This bar is called a p-value, which is the probability of observing data as extreme or more extreme than actual data observed in an experiment. So a p-value of .05 -- a widely used standard -- means that there's a 5 percent chance of seeing data more extreme than the data that was observed, given that the null hypothesis -- the status quo or lack of an effect -- is true.

However, Johnson notes the probability of seeing more extreme data is not the same as the probability that the tested hypothesis is true. Using Bayesian statistical methods, he has shown that when using a p-value of .05, there's actually a 20 to 25 percent chance that the reported findings aren't true -- a bar he says shouldn't be acceptable to scientists. Johnson details his results in a paper titled "Revised Standards for Statistical Evidence," released Monday (Nov. 11) in the early edition of the journal Proceedings of the National Academy of Sciences.

"The scientific community has been finding that a number of studies haven't been replicating," said Johnson, a professor in the Texas A&M Department of Statistics. "If the findings don't replicate, this notion that science is discovering truths erodes. If people are continually publishing articles that turn out not to be true, it undermines public confidence in science."

There has long been concern in the scientific community that the p-value of .05 may need another look, Johnson says, but the standard has become ingrained into the statistical culture since it was first proposed in the 1920s by English statistician Ronald Fisher. Any non-statistician who has taken a high school or undergraduate statistics course has likely done his or her homework problems primarily at the .05 significance level.

The problem is especially prevalent in biology, the social sciences and health fields, where studies about fish oil reducing the risk of heart attacks and Vitamin C curing the common cold often turn out not to be true, Johnson says. Physicists, by comparison, have been more sensitive to the problem and correcting it. The standard for the 2012 Higgs boson discovery, for instance, was a p-value of 0.0000003, Johnson notes.

"A lot of researchers in the social sciences are now trying to develop tests to find out why research papers aren't replicating," Johnson said. "Their premise has often been that scientists are cheating or not reporting all of their results, throwing out some data values or falsifying data perhaps, when in fact, it's just the way we're conducting tests that is going to naturally lead to a high rate of non-reproducibility."

One way to beef up accuracy in scientific papers is to use a lower p-value of 0.005, which would drop the false discovery rate by a factor of about 5 or 10, Johnson says. The tradeoff would be requiring scientists to double their sample size, which would lead to more expensive experiments.

But beyond undermining confidence in science, there are also financial costs associated with not lowering the p-value -- consequences Johnson has experienced firsthand as a professor of biostatistics at MD Anderson Cancer Center in Houston. There, he designed tests that determined whether new drugs would be effective at treating cancer. The pharmaceutical industry loses money when drugs found to be effective in early stage clinical trials turn out to not be effective when they are graduated to large-scale, expensive clinical trials. Johnson asserts that's partly the product of lower statistical-evidence thresholds in the earlier-phase trials.

"I think most statisticians are aware of the problem and agree that something needs to be done," Johnson said. "Experimental scientists will object to the higher standard and say it will make it more difficult to conduct experiments and get published. But I think journal editors will probably recognize the need to raise the bar and also that by raising the bar, they'll have fewer studies published that aren't true. It will also lead to fewer researchers going down the wrong path and wasting their efforts conducting follow-up research on studies that were flawed to begin with."

Click here for a copy of Johnson's paper.

For more about Johnson's findings and how he went about his research involving both classical and Bayesian statistical approaches, see this ABC Science-Australia feature.

To learn more about Johnson's research and his use of statistics to solve an eclectic range of issues, go to http://www.science.tamu.edu/articles/981/.

For more information about the Texas A&M Department of Statistics, visit http://www.stat.tamu.edu/.

# # # # # # # # # #

About Research at Texas A&M University: As one of the world's leading research institutions, Texas A&M is in the vanguard in making significant contributions to the storehouse of knowledge, including that of science and technology. Research conducted at Texas A&M represents annual expenditures of more than $776 million. That research creates new knowledge that provides basic, fundamental and applied contributions resulting in many cases in economic benefits to the state, nation and world. To learn more, visit http://vpr.tamu.edu.


Contact: Vimal Patel, (979) 845-7246 or vpatel@science.tamu.edu or Dr. Valen Johnson, (979) 845-3141 or vejohnson@tamu.edu

Patel Vimal

  • Raised Bar = Confidence

    Texas A&M statistician Valen Johnson is pushing for smaller p-values -- the accepted standard of significance needed to stake claim to a scientific discovery -- in hopes of ensuring more accurate, replicable research findings and, in turn, increased public confidence in science.

  • Valen Johnson

    Johnson, who joined the Texas A&M Statistics faculty in September, is a renowned expert in Bayesian statistics, which uses betting odds to assign probabilities.

© Texas A&M University. To request use of any of our photographs for educational use or to view additional options from our archive, please contact the College of Science Communications Office.

College of Science
517 Blocker
TAMU 3257 | 979-845-7361
Site Policies
Contact Webmaster
Social Media