Christian Hennig writes (see here for context):
Statistics is hard. Well-trained, experienced and knowledgeable statisticians disagree about standard methods. . . . The 2021 [American Statistical Association] task force statement states: “Indeed, P-values and significance tests are among the most studied and best understood statistical procedures in the statistics literature.” I do not disagree with this. Probability models assign probabilities to sets, and considering the probability of a well chosen data-dependent set is a very elementary way to assess the compatibility of a model with the data. . . . Still, considering the P-value as “among the best understood”, it is remarkable how much controversy, lack of understanding, and misunderstanding regarding them exist. Indeed there are issues with tests and P-values about which there is disagreement even among the most proficient experts, such as when and how exactly corrections for multiple testing should be used, or under what exact conditions a model can be taken as “valid”. Such decisions depend on the details of the individual situation, and there is no way around personal judgement. I do not think that this is a specific defect of P-values and tests. The task of quantifying evidence and reasoning under uncertainty is so hard that problems of these or other kinds arise with all alternative approaches as well.
This is well put. Hennig continues:
A much bigger problem is the tension between the difficulty of statistics and the demand for it to be simple and readily available. Data analysis is essential for science, industry, and society as a whole. Not all data analysis can be done by highly qualified statisticians, and society cannot wait with analysing data for statisticians to achieve perfect understanding and agreement. On top of this there are incentives for producing headline grabbing results, and society tends to attribute authority to those who convey certainty rather than to those who emphasise uncertainty. . . . Another important tension exists between the requirement for individual judgement and decision-making depending on the specifics of a situation, and the demand for automated mechanical procedures that can be easily taught, easily transferred from one situation to another, justified by appealing to simple general rules . . . P-values are so elementary and apparently simple a tool that they are particularly suitable for mechanical use and misuse. To have the data’s verdict about a scientific hypothesis summarised in a single number is a very tempting perspective, even more so if it comes without the requirement to specify a prior first, which puts many practitioners off a Bayesian approach. As a bonus, there are apparently well established cutoff values so that the number can even be reduced to a binary “accept or reject” statement. Of course all this belies the difficulty of statistics and a proper account of the specifics of the situation. As said in the 2016 ASA Statement, the P-value is an expression of the compatibility of the data with the null model, in a certain respect that is formalised by the test statistic. As such, I have no issues with tests and P-values as long as they are not interpreted as something that they are not. . . . It seems more difficult to acknowledge how models can help us to handle reality without being true, and how finding an incompatibility between data and model can be a starting point of an investigation how exactly reality is different and what that means. . . .
As statisticians we face the dilemma that we want statistics to be popular, authoritative, and in widespread use, but we also want it to be applied carefully and correctly, avoiding oversimplification and misinterpretation. That these aims are in conflict is in my view a major reason for the trouble with P-values, and if P-values were to be replaced by other approaches, I am convinced that we would see very similar trouble with them, and to some extent we already do. Ultimately I believe that as statisticians we should stand by the complexity and richness of our discipline, including the plurality of approaches. We should resist the temptation to give those who want a simple device to generate strong claims what they want, yet we also need to teach methods that can be widely applied, with a proper appreciation of pitfalls and limitations, because otherwise much data will be analysed with even less insight. Making reference to the second quote above, we exactly need to “contradict ourselves” in the sense of conveying what can be done, together with what the problems of any such approach are.
That’s what we try to do in Regression and Other Stories!