Statistical Significance != Scientific Significance

UPDATE 2019-03-20: Scientists rise up against statistical significance (Valentin Amrhein, Sander Greenland & Blake McShane; comment in Nature, 3 Mar 2019) + comments (Hacker News).

UPDATE 2018-01-31: Dispense with redundant P values (Joachim Goedhart; comment piece in Nature, 31 Jan 2018).

UPDATE 2017-07-03: Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence (.pdf, McShane & Gal, 2016) + comments (Hacker News).

UPDATE 2017-xx-xx: Response to the ASA’s Statement on p-Values: Context, Process, and Purpose (Edward L. Ionides, Alexander Giessing, Yaacov Ritov & Scott E. Page, in:  The American Statistician 71:1, 2017; paywalled, but also available here.)

UPDATE 2016-xx-xx: The [American Statistical Association’s] Statement on p-Values: Context, Process, and Purpose (Ronald L. Wasserstein & Nicole A. Lazar, in: The American Statistician 70:2, 2016; Open Access).

UPDATE 2015-03-13: The Extent and Consequences of P-Hacking in Science (Head, Holman Lanfear, Kahn & Jennions; in: PLOS Biology, 2015; Open Access) + press release.

UPDATE 2011-08-24: When am I going to get my money back? <– a good post on “publication count vs making impact on society” — not exactly the same topic as my post below, but also makes the case for focusing on the “oomph” factor, i.e., the “qualitative size” of results rather than on a sole metric alone.

In The Cult of Statistical Significance, economists Stephen T. Ziliak and Deirde N. McCloskey consider various empirical sciences and remind us that statistical significance, by itself, does NOT equal scientific significance (.pdf). The authors criticize the ideas of R.A Fisher and restore the ideas of W.S. Gosset in honor while explaining their point.

“X has at the .05 level a significant effect on Y, therefore X is important for explaining Y”. So what? HOW important is X is for explaining Y? How does this finding help the world DECIDE AMONG POSSIBLE COURSES OF ACTION? What is the potential IMPACT of the claimed effect, e.g. measured in units of HEALTH, MONEY and OTHER ‘HUMAN’ VALUES? How LARGE is the impact in your field of science — the clinical significance, biological significance, psycho-pharmacological significance, …? Seemingly evident questions, but the authors convincingly demonstrate, using concrete examples, that these questions are often not answered (or even asked?) in real-world scientific practice.

The authors correctly state that a finding with LESS statistical significance may have MORE scientific significance, and contend against using a rather arbitrary threshold of statistical significance, e.g. p<0.05 (why not p<0.06 or p<0.15?), as a fixed, non-negotiable demarcation of science or ‘scientific proof’. The authors assert that a minimax strategy or other loss function needs to be employed in addition to P-value, R2, Student’s t, etc. Their point is summarized in this graph (*):

Judging by the text, it is clear that the book scratches a serious personal itch of the authors – I’d almost speculate they wrote this book as an assignment in anger management. Either how, I strongly recommend this book.

(*) This version is a screencapture from http://www.deirdremccloskey.com/docs/jsm.pdf, but it’s the same graph from the same authors.

Leave a Reply

Your email address will not be published. Required fields are marked *