Do you understand the concepts of p-values and power?

Strangely enough, in almost all of my trainings in various institutions around the world these concepts were not clearly explained or discussed. Sure, p-values and NHST were used in reading articles and conducting statistical analyses, but their meanings, strengths, and weaknesses were rarely discussed, and the concept of power rarely mentioned. It was only in the last couple of years with the so-call “replication crisis” that I decided to take it upon myself to try and understand it better. When I arrived in the southern Netherlands I had the golden opportunity to join Daniel Lakens in the nearby Eindhoven who helped me realize how badly I understood these concepts. I think his free online course “Improving your statistical inferences” is a must for any researcher.

I recently decided to try and explain these concepts to some of the students I work with. I started out with the simply type I and type II errors.

Here is an example from “Essential Guide to Effect Sizes Statistical Power“:

Or an example that made it the easier for me to remember which is Type I and which is Type II following the sequence in the story about the boy who cried “wolf!” (below is the slide I made about that):

Daniel explains in great detail how these two types of errors relate to p-values and power with the somewhat shocking headline “How can p = 0.05 lead to wrong conclusions 30% of the time with a 5% Type 1 error rate?“. Try telling researchers and students that p-values smaller than .05 can lead to wrong conclusion 30% (!) of the time and watch their shock reactions and confusion. They’ll often dismiss it or ask you to explain, and the simple reply of “it depends on power” doesn’t help much. You need a visualization. Luckily, that blog post had a visualization taken from a related article in Perspecitves of Psychological Science titled “Sailing From the Seas of Chaos Into the Corridor of Stability: Practical Recommendations to Increase the Informational Value of Studies“, which features the following:

These figures feature error rates of alpha = .05, and power of 80% on the left and 35% on the right.

I thought was that clear enough, but when I showed this to students and asked them to tackle it they were, again, confused.

I then realized, that there’s no escaping putting these in clearer tables, which resulted in the following:

Meaning that when power is .80, we’ll draw wrong conclusions 12.5% of the time and when power is 35% we’ll draw wrong conclusion as high as 35% of time.

If 12.5% miss rate doesn’t satisfy you, as it shouldn’t, consider increasing power to 95% and even 99%. How would that affect your miss ratios?

So some people’s false intuitive assumption that the miss ratio with p < .05 is less than 5% can under some circumstances considered relevant when power is 95% and above.

I’m still struggling with those myself, but I hope that helped make that clearer.

Still confused? Then take Daniel’s course, and practice these yourself with R simulations.

Further readings:

- Demo Shinyapp – When does a significant p-value indicate a true effect? Understanding the Positive Predictive Value (PPV) of a p-value
- What’s the probability that a significant p-value indicates a true effect?
- p-checker The one-for-all p-value analyzer
- Why I’ve lost faith in p values