The most recent dust-up over statistical significance and being wrong regards the publication of a paper on ESP in a top psychology journal. Here’s a link to a paper criticizing the result using Bayesian analysis. Below is an excerpt from the journal Science.
Science: The decision by a top psychology journal to publish a paper on extrasensory perception (ESP) has sparked a lively discussion on blogs and in the mainstream media. The paper’s author, Daryl Bem, a respected social psychologist and professor emeritus at Cornell University, argues that the results of nine experiments he conducted with more than 1000 college students provide statistically significant evidence of an ability to predict future events. Not surprisingly, word that the paper will appear in an upcoming issue of the Journal of Personality and Social Psychology (JPSP) has provoked outrage from pseudoscience debunkers and counteraccusations of closed-mindedness from those willing to consider the possibility of psychic powers.
It has also rekindled a long-running debate about whether the statistical tools commonly used in psychology—and most other areas of science—too often lead researchers astray. “The real lesson to be learned from this is not that ESP exists, it’s that the methods we’re using aren’t protecting us against spurious results,” says David Krantz, a statistician at Columbia University…
…But statisticians have long argued that there are problems lurking in the weeds when it comes to standard statistical methods like the t test. What scientists generally want to know is the probability that a given hypothesis is true given the data they’ve observed. But that’s not what a p-value tells them, says Adrian Raftery, a statistician at the University of Washington, Seattle. In Bem’s erotic photo experiment, the p-value of less than .01 means, by definition, that there’s less than a 1% chance he would have observed these data—or data pointing to an even stronger ESP effect—if ESP does not exist. “Some people would turn that around and say there’s a 99% chance there’s something going on, but that’s wrong,” Raftery says.
Not only does this type of thinking reflect a misunderstanding of what a p-value is, but it also overestimates the probability that an effect is real, Raftery and other statisticians say. Work by Raftery, for example, suggests that p-values in the .001 to .01 range reflect a true effect only 86% to 92% of the time. The problem is more acute for larger samples, which can give rise to a small p-value even when the effect is negligible for practical purposes, Raftery says.
He and others champion a different approach based on so-called Bayesian statistics. Based on a theory developed by Thomas Bayes, an 18th century English minister, these methods are designed to determine the probability that a hypothesis is true given the data a researcher has observed. It’s a more intuitive approach that’s conceptually more in line with the goals of scientists, say its advocates. Also, unlike the standard approach, which assumes that each new experiment takes place in a vacuum, Bayesian statistics takes prior knowledge into consideration.