From ResearchGate Q&A

I found this Q&A thread in ResearchGate:

What prevents you from using a p-value other than 0.05 as your statistical significance cut-off?

Even though there were already 84 answers, I added my own answer:

… for me choosing the critical p-value is not a statistical question. It is in the realm of the real-world effective cost of making the wrong decision. In research, it mainly relates to balancing “false positive” and “false negative” decisions. So, mostly informally, sometimes researchers set the critical value at 0.1 (10%) when replication is low. On the other hand when we have many replicates, we will find statistically significant differences that are biologically irrelevant. [Added only here: The 5 % tends to work not too badly for the number replicates used by many of us.]

In my opinion in every scientific publication, whatever critical value we use for discussing and interpreting the results, the actual p-values should always be given. Not doing so, just discards valuable information. Of course, one historical reason for not reporting actual values was the laborious calculations involved in obtaining values by interpolation when using printed tables.

The situation has far-reaching consequences when dealing with legal regulation compliance studies, or for environmental impact assessment, or safety. I would not want to take 1 in 20 risk of making the wrong decision concerning the possible lethal side-effect of a new medicine, while it might be acceptable to take that risk when comparing the new medicine to a currently used medicine known to be highly effective [but maybe not if comparing against a placebo]. In such cases we would want, rather than balance the risks of making false positive or false negative decisions, minimize one of them. In other words minimize the probability of the type of mistake that we need/want to avoid.

I have avoided statistical jargon, to make this understandable to more readers. Statisticians call these Type I and Type II errors, and there is plenty of literature on this. In any case I feel most comfortable with Tukey’s view on hypothesis testing, and his idea that we can NEVER ACCEPT the null hypothesis. We can either get evidence that A > B or A < B, and the alternative being that we have not enough evidence to decide which one is bigger. Of course in practice using power analysis, we can decide that we could have detected or not a difference that would be in practice relevant. However, this is conceptually very different to accepting that there is no difference or no effect.

[I would like to see students, and teachers, commenting on this problem, and how this fits with their understanding of the use of statistics in real situations. Please, just comment below. I will respond to any comments, and write a follow-up post on theĀ  effect of using different numbers of replicates on inferences derived from data].

Leave a Reply