UPDATE 2017-07-03: Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence (.pdf, 2016, McShane & Gal) + comments (Hacker News).
UPDATE 2015-03-13: interesting article in PLOS Biology: The Extent and Consequences of P-Hacking in Science (2015, Head, Holman Lanfear, Kahn & Jennions) + press release.
UPDATE 2011-08-24: here is a good post on “publication count vs making impact on society” — not exactly the same topic as my post here, but also makes the case for focusing on the “oomph”-factor, the “qualitative size” of result rather than on a metric alone.
In The Cult of Statistical Significance, economists Stephen T. Ziliak and Deirde N. McCloskey consider various empirical sciences and remind us that statistical significance, by itself, does NOT equal scientific significance (.pdf). The authors criticize the ideas of R.A Fisher and restore the ideas of W.S. Gosset in honor while explaining their point.
“X has at the .05 level a significant effect on Y, therefore X is important for explaining Y”. So what? HOW important is X is for explaining Y? How does this finding help the world DECIDE AMONG POSSIBLE COURSES OF ACTION? What is the potential IMPACT of the claimed effect, e.g. measured in units of HEALTH, MONEY and OTHER ‘HUMAN’ VALUES? How LARGE is the impact in your field of science — the clinical significance, biological significance, psycho-pharmacological significance, …? Seemingly evident questions, but the authors convincingly demonstrate, using concrete examples, that these questions are often not answered (or even asked?) in real-world scientific practice.
The authors correctly state that a finding with LESS statistical significance may have MORE scientific significance, and contend against using a rather arbitrary threshold of statistical significance, e.g. p<0.05 (why not p<0.06 or p<0.15?), as a fixed, non-negotiable demarcation of science or ‘scientific proof’. The authors assert that a minimax strategy or other loss function needs to be employed in addition to P-value, R2, Student’s t, etc. Their point is summarized in this graph from their book (*):
Judging by the text it is clear that the book scratches a STRONG personal itch of the authors – I’d almost speculate they wrote this book as an assignment in anger management. I strongly recommend this book nonetheless 🙂
(*) Well, actually this version is a screencapture from http://www.deirdremccloskey.com/docs/jsm.pdf, but it’s the same graph from the same authors.