Beyond Binary: Why Null Hypothesis Significance Testing Should No Longer Be the Default for Statistical Analysis and Reporting
By Professor Eric T. Bradlow (@ebradlow) (GBK Collective and the Wharton School)
Null hypothesis significance testing (NHST) is the default approach to statistical analysis and reporting in marketing and, more broadly, in science. Despite its default role, however, NHST has long been criticized by both statisticians and applied researchers, including those within marketing.
The most prominent criticisms relate to NHST’s dichotomized categorization of results as “statistically significant” versus “statistically nonsignificant.” This binary treatment of results, using p-values or otherwise, loses information, can be misleading, and prevents meta-analyses which is what science is really all about!
In a new article published in The Journal of Marketing, my colleagues Blakeley B. McShane, John G. Lynch, Jr., Robert Meyer, and I propose a fundamental shift in statistical analysis and reporting in marketing and beyond. In fact, we propose abandoning NHST as the default approach altogether as statistical (non)significance should never be used as a basis to draw conclusions, or as a filter to select what data to prioritize when making decisions.
The Shortcomings of Null Hypothesis Significance Testing (NHST)
NHST, for all its widespread use, has very real limitations. For instance, using “statistical (non)significance” as a filter can create a biased view of what is truly significant. For example, consider a brand’s evaluation of a marketing campaign. The campaign may be deemed successful based solely on achieving statistical significance in terms of reaching different customer segments but fails to consider how it affected specific customer behaviors or loyalty. This oversimplification can lead to missed opportunities for deeper engagement or improvement.
Lack of Real-World Relevance
Furthermore, NHST's foundational assumption of a zero effect is never true in real-world situations. With large enough sample sizes, small P-values and "statistical significance" are almost a certainty, making the rejection of a null hypothesis (which posits zero effect) redundant.
In real-world situations, especially with large sample sizes, small P-values and "statistical significance" are almost a given. For example, when analyzing customer purchasing patterns, NHST might highlight only the most dramatic changes, missing subtler but important shifts that are crucial for understanding evolving trends.
“NHST’s binary treatment of results, using p-values or otherwise, loses information, can be misleading, and prevents meta-analyses which is what science is really all about!” - GBK Co-Founder and Wharton Professor Eric Bradlow
Overreliance on P-values
Perhaps the most widespread abuse of statistics is to ascertain where some statistical measure such as a P-value stands relative to 0.05 and take it as a basis to declare “statistical (non)significance,” then make general and certain conclusions from a single study. Single studies are never definitive and thus can never demonstrate an effect or no effect. The aim of studies should be to report results in an unfiltered manner so that they can later be used to make more general conclusions based on cumulative evidence from multiple studies.
In the realm of customer behavior analysis, this can be particularly misleading. For instance, a study might show a significant change in consumer behavior with a low P-value, but this doesn't always reflect the true complexity and variability of consumer actions. Real-world behaviors are influenced by a myriad of factors, and an overreliance on P-values can lead to oversimplified and potentially incorrect conclusions about customer trends.
Our Recommendations for a New Approach
In our article, we propose a major transition in statistical analysis and reporting. In addition to abandoning NHST, moving beyond the binary lens of "statistical significance," we advocate for recognizing the value of all research findings. This includes publishing studies irrespective of their P-values, promoting a more comprehensive view of research outcomes.
Our recommendations extend to a nuanced reporting method that emphasizes point and interval estimates over binary categorizations, offering deeper insights into data. Additionally, we suggest a holistic analysis that incorporates various factors like study design and prior evidence, not just P-values, as well as synthesizing conclusions from multiple studies to unlock better insights.
For marketers and brands, this shift is crucial. Embracing a broader approach to statistical analysis and constantly seeking out better data is key to gaining a richer understanding of market dynamics and customer behavior, leading to more informed decision-making.
For a deeper dive into our findings and recommendations, I encourage you to read the full article in the Journal of Marketing: “Statistical Significance’ and Statistical Reporting: Moving Beyond Binary.”