The most recent dust-up over statistical significance and being wrong regards the publication of a paper on ESP in a top psychology journal. Here’s a link to a paper criticizing the result using Bayesian analysis. Below is an excerpt from the journal Science.

Science: The decision by a top psychology journal to publish a paper on extrasensory perception (ESP) has sparked a lively discussion on blogs and in the mainstream media. The paper’s author, Daryl Bem, a respected social psychologist and professor emeritus at Cornell University, argues that the results of nine experiments he conducted with more than 1000 college students provide statistically significant evidence of an ability to predict future events. Not surprisingly, word that the paper will appear in an upcoming issue of the

Journal of Personality and Social Psychology(JPSP) has provoked outrage from pseudoscience debunkers and counteraccusations of closed-mindedness from those willing to consider the possibility of psychic powers.It has also rekindled a long-running debate about whether the statistical tools commonly used in psychology—and most other areas of science—too often lead researchers astray. “The real lesson to be learned from this is not that ESP exists, it’s that the methods we’re using aren’t protecting us against spurious results,” says David Krantz, a statistician at Columbia University…

…But statisticians have long argued that there are problems lurking in the weeds when it comes to standard statistical methods like the

ttest. What scientists generally want to know is the probability that a given hypothesis is true given the data they’ve observed. But that’s not what a p-value tells them, says Adrian Raftery, a statistician at the University of Washington, Seattle. In Bem’s erotic photo experiment, the p-value of less than .01 means, by definition, that there’s less than a 1% chance he would have observed these data—or data pointing to an even stronger ESP effect—if ESP does not exist. “Some people would turn that around and say there’s a 99% chance there’s something going on, but that’s wrong,” Raftery says.Not only does this type of thinking reflect a misunderstanding of what a p-value is, but it also overestimates the probability that an effect is real, Raftery and other statisticians say. Work by Raftery, for example, suggests that p-values in the .001 to .01 range reflect a true effect only 86% to 92% of the time. The problem is more acute for larger samples, which can give rise to a small p-value even when the effect is negligible for practical purposes, Raftery says.

He and others champion a different approach based on so-called Bayesian statistics. Based on a theory developed by Thomas Bayes, an 18th century English minister, these methods are designed to determine the probability that a hypothesis is true given the data a researcher has observed. It’s a more intuitive approach that’s conceptually more in line with the goals of scientists, say its advocates. Also, unlike the standard approach, which assumes that each new experiment takes place in a vacuum, Bayesian statistics takes prior knowledge into consideration.

“Work by Raftery, for example, suggests that p-values in the .001 to .01 range reflect a true effect only 86% to 92% of the time. The problem is more acute for larger samples, which can give rise to a small p-value even when the effect is negligible for practical purposes, Raftery says.”

This thinking seems very muddled. Surely many effects are “negligible for practical purposes” but also “true.”

LikeLike

I think he means that you could have a variable that explains a minuscule amount of the variance (i.e. has a small effect) but because the sample size is so large the result is significant. Hence, a small p value doesn’t really tell you how important the effect is. It just says that the probability of arising from the null hypothesis is less than p.

LikeLike

[…] the same problems of many other clinical papers that I have posted on before (e.g. see here and here). The evolution of overconfidence paper does not rely on statistics but on a simple evolutionary […]

LikeLike