As I am sure everyone already knows, there is a new discussion about p-values in the field of psychology/statistics. While I do not feel qualified to have a public opinion about this, I do like reading all the papers and replies to papers and comments on papers. It kind of feels like I am observing the new iteration of the “nature-nurture” debate unfold. The nerd in me is very excited. And so, I thought it would be nice to compile a reading list, so that others who are also excited can read everything out there about what is going on. I might update this as more things get published. Also, go to Twitter, because it is extremely entertaining right now (sadly, I could not get myself to screenshot every entertaining thread that’s been happening, but if you look up E-J Wagenmakers, Daniel Laken, Andrew Gelman, and the other authors mentioned below, it should be easy to find!).
The paper that started it all:
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., … & Cesarini, D. (2017). Redefine statistical significance. Nature Human Behaviour, 1.
The papers that followed:
Greenland, S., & Amrhein, V. (2017). Remove, rather than redefine, statistical significance. Nature Human Behaviour, 1.
Crane, H. (2017, November 19). Why “Redefining Statistical Significance” Will Not Improve Reproducibility and Could Make the Replication Crisis Worse. Retrieved from psyarxiv.com/bp2z4
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., … Zwaan, R. A. (2017, September 18). Justify Your Alpha: A Response to “Redefine Statistical Significance”. Retrieved from psyarxiv.com/9s3y6
McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2017). Abandon statistical significance. arXiv preprint arXiv:1709.07588.
Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C., Beh, E. J., Bilgiç, Y., … & Chaigneau, S. E. (2017). Manipulating the alpha level cannot cure significance testing – comments on “Redefine statistical significance”. PeerJ Preprints.
Wicherts, J. M. (2017). The Weak Spots in Contemporary Science (and How to Fix Them). Animals, 7(12), 90.
Zenker, F., & Witte, E. H. (2017). From Discovery to Justification: Outline of an Ideal Research Program for Empirical Psychology. Frontiers in psychology, 8, 1847.
Some more informal replies:
E.J Wagenmakers (the paper that started it all) responds to points made in McShane et al. paper.
He also responds to Crane’s paper.
He also responds to Lakens et al.’s paper.
And he was on a panel at BITSS2017 with Daniel Lakens (see above) and Simine Vazire. If you want to watch the entire conference, you can do that here (the panel starts around the 30-minute mark on Day 1).
Basically, E.J. Wagenmakers has written a lot of responses to critiques on the paper that started it all. Here is the first, which mostly summarizes. The second, third, fourth, fifth, sixth, and seventh go into more detail about why the paper that started it all is right, using examples (I’m still not sure).
And of course, E.J. Wagenmakers responds to their response.
In the meantime, Daniel Lakens has written a post on common misconceptions about p-values.
The protagonists (basically the big names in all the main publications, from all sides) also participated in a roundtable discussion hosted by the International Methods Colloquium. (thank you @lionbehrens)
Of course, this isn’t a new discussion:
Cohen, J. (1995). The Earth is Round (p < .05). American Psychologist, 50(12).
And its inevitable follow-up:
Amrhein, V., Korner-Nievergelt, F., & Roth, T. (2017). The earth is flat (p> 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ Preprints.
If we want to get official about it:
Also, I really liked this paper targeting journalists:
Spotting Shady Statistics on The Open Notebook.
Just see this as a nice little reading list for the Winter Break. Oh? You already have 300 other things to work on during Winter Break? I have no idea how you feel. 😉