More on why most results are wrong

Science writer Jonah Lehrer has a nice article in the New Yorker on the “Decline Effect”, where the significance of many published results, mostly but not exclusively in clinical and psychological fields, tends to decline with time or disappear entirely.   The article cites the work of John Ioannidis, which I summarized here, on why most published results are false.  The article posits several explanations that basically boil down to selection bias and multiple comparisons.

I believe this problem stems from the fact that the reward structures in science are biased towards positive results and the more surprising and hence unlikely the result, the greater the impact and attention.   Generally, a result is deemed statistically significant if the probability of that result arising by random chance (technically if the null hypothesis were true) is less than 5%.  By this fact alone, 5% of published results are basically noise.  However, the actual amount is much higher because of selection bias and multiple comparisons.

In order to estimate significance, a sample set must be defined and it is quite simple for conscious and unconscious selection bias to arise when deciding on what data is to be included. For example, in clinical trials, some subjects or data points will be excluded for various reasons and this could bias the results towards a positive result.   Multiple comparison may even be harder to avoid.  For every published result of an investigator, there are countless “failed” results. For example, suppose you want to show pesticides cause cancer and you test different pesticides until one shows an effect.  Most likely you will assess significance only for that particular pesticide, not for all the others that didn’t show an effect. In some sense, to be truly fair, one should include all experiments that one has ever conducted when assessing significance with the odd implication that the criterion for significance will become more stringent as you age.

This is a tricky thing.  Suppose you do a study and measure ten things but you blind yourself to the results.  You then pick one of the items and test for significance.  In this case, have you done ten measurements or one?  The criterion for significance will be much more stringent if  it is the former. However, there are probably a hundred other things you could have measured but didn’t.  Should significance be tested against a null hypothesis that includes all potential events including those you didn’t measure?

I think the only way this problem will be solved is with an overhaul of the way science is practiced.  First of all, negative results must be taken as seriously as positive ones.  In some sense, the results of all experiments need to be published or at least made public in some database.  Second, the concept of statistical significance needs to be abolished.   There cannot be some artificial dividing line between significance and nonsignificance.  Adopting a Bayesian approach will help.  People will just report the probability that the result is true given some prior and likelihood.  In fact, P values used for assessing significance could be easily converted to Bayesian probabilities.   However, I doubt very much these proposals will be adopted any time soon.

Arsenic and old life

The big science news last week was the announcement and publication in Science that a strain of bacteria that lives on arsenic instead of phosphorous was discovered.  Arsenic, which appears below phosphorous in the periodic table, is toxic to most life forms mostly because it is chemically similar to phosphorous.  It had thus been postulated that there could be life forms that utilize arsenic instead of phosphorous.  In fact, astrophysicist Paul Davies had long suggested that a proof of principle of the possibility of alien life could be obtained by finding an alternative form of life  on earth.  The new bacterium comes from Mono Lake in California, which is very rich in arsenic.  The authors put some samples from the lake into a medium rich in arsenic but devoid of phosphorous to see what would grow and found a strain that grew robustly.  They then found that arsenic was actually incorporated into the proteins and DNA within the cells.  In a post from five years ago,  I  speculated that we might find some new organism living  on toxic waste some day although this cell is probably of ancient origin.  However, there has been strong criticisms of the paper since the announcement.  For example see here. Hence, the jury may still be out on arsenic loving microbes.

Talk at NYU

I was in New York yesterday and gave a talk at NYU in a joint Center for Neural Science and Courant Institute seminar. My slides are here. The talk is an updated version of the talk I gave before and summarized here.  The new parts include recent work on applying the model to Autism (see here) and some new work on resolving why mutual inhibition models of binocular rivalry do not reproduce Levelt’s fourth proposition, which states that as the contrast is decreased to both eyes, the dominance time of the percepts increases.  I will summarize the results of that work in detail when we finish the paper.