In the last posts (first, second), I outlined a number of common errors in the usage and interpretation of P-values. Due to the base-rate fallacy or the multiple comparisons problem, the significance level alpha of a null-hypothesis significance test can easily be an order of magnitude lower than the true false positive rate. For example, under p>0.05, we could easily have a 50% error rate. These issues are one of the primary causes of the replication crises currently shaking psychology and medical science, where replication studies have found that the majority of significant results examined are insignificant upon replication. Fisheries and marine policy have many of the same risk factors driving the unreliability of scientific findings in those other fields, but no one has actually attempted to do any replication studies yet.
In the last post, I described some common misconceptions and problems with the use of null-hypothesis significance tests and P-values. In this post, I'll show more common ways that P-values are often misapplied, including how a significant result under alpha = 0.05 can have more than a 50% chance of being wrong.
Calling everything with p < 0.05 "significant" is just plain wrong
The practice of statistics in the sciences often takes the form of drawing scientific conclusions from cargo-cult application of inappropriate or outdated statistical methods, often to the exclusion of prior evidence or plausibility. This has serious consequences for reproducibility and reliability of scientific results. Perhaps the number one issue is the over-reliance and lack of understanding of null-hypothesis significance testing, and blind faith in the reliability of the P-values these tests provide.