John Ioannidis published a very interesting paper in PLoS Biology in 2005 entitled “Why most published research findings are false.” In it he argued that most affirmative results in biology papers that are based on a statistical significance test (e.g. p-value less than 0.05) are probably wrong. His argument was couched in traditional statistics language but it is really a Bayesian argument. The paper is a wake up call that we may need to look more closely at how we use statistics and even how we do research.
The question he asked was Given some hypothesis, what is the probability that the hypothesis is true given that an experiment confirms the result (up to some level of statistical significance)? Let be the probability that the hypothesis is true given a “yes” answer by an experiment, be the prior probability that the hypothesis is true, be the probability of getting a yes answer if the hypothesis is true, and be the probability of getting a yes answer under all conditions. Then by Bayes theorem where , and .
We can then compute if we have , and . is the prior probability that is based on everything you know or don’t know. Ioannidis writes it as where is the odds that the hypothesis is true versus it being false. In these terms, the likelihood is called the power of the experiment or study and is usually written as , where is the false negative probability or the Type II error rate. is the false positive probability or the Type I error rate and denoted by . Putting this all together gives
Often it is more convenient to consider the odds of being true versus being false: . So the odds of a hypothesis being true given a “statistically significant” result requires that , so increasing power and lowering false negatives are always a good thing. But the interesting thing to me, (which is obvious in retrospect) is that even if you have infinite power, you can still get a wrong result if your false negative rate is higher than your prior odds of correctness. This is made even worse if you have biases and Ioannidis gives typical parameter values to argue that most published papers must be false.
What was most illuminating to me is that many independent labs working on the same topic actually makes it less likely to be correct. The reason is that if many labs are working independently than (i.e. false negative rate goes down) but also (i.e. the false positive rate goes up). If many labs work on the same thing and don’t cooperate than the probability of getting a yes result goes up since the probability that everyone gets a negative result goes down. (This is the same problem you have if you do an experiment and don’t control for the number of effective hypotheses tested (for example, see here.) Hence the odds in the multiple labs case is , which goes to as goes to infinity. Thus, an infinite number of labs working on the same problem does not improve on the prior odds. So the next time you get rejected by a high impact journal because your work is not of sufficient interest, you can take consolation in the fact that your probability of being wrong just decreased.