There is an excellent resource by Paul Ellis at:

where I just clicked on

This page talks about the problem with p-level being the be-all and end-all of way too many scientific studies. You might be aware that there is discussion about this in the scientific literature. (I think statisticians deserve prizes for still trying to get the rest of us to pay attention.) The problem is that the ONLY thing p-level tells you is the probability that rejecting the null hypothesis is the wrong thing to do. That’s why small is good: you want the probability of a false positive to be as small as possible. And that is all it does. Nothing else. Take a moment, if you will, to consider all the other ways you could be wrong: false negative, wrong question, wrong control, etc. Most importantly, it does not tell you if your result is significant in any *scientifically meaningful* way.

This is what Paul Ellis is talking about when he uses the vocabulary “substantive significance” and the link above goes right to the heart of the matter: researchers are confusing statistical significance with substantive significance, and journals are letting them get away with it. In my ideal world every result comes with, at least, the standard deviation, and the four things noted below: alpha, beta, sample size, and effect size, that last being accompanied by some sentences describing why that effect size was chosen.

You may be lucky enough to have as big a sample size as you want, but you still must use your brain to decide what matters, to design an experiment that actually answers your question, and to do appropriate controls so that you can make interesting comparisons. A large sample size may allow you to find very small differences, but if the differences are that small, do they matter? They very well might, but you must think that through, no statistic can do that for you.

There is, however, a numerical representation of that “minimum important difference” called the effect size. The best way to plan an experiment is to decide *in advance* on the effect size that you, with your brain, think is important enough to be worth detecting, *and* decide in advance how low you want the probability of a false positive to be (alpha, or, p-level), *and* decide how low you want the probability of a false negative to be (beta), *and* decide on the most information-packed way to measure, which includes deciding on the appropriate statistical test to use, *THEN* calculate the sample size and stick to it.

To learn more about effect size, go to effectsizefaq.com and read all about it.

## Discussion

## No comments yet.