Hypothesis Testing

Hypothesis testing is a statistical method for evaluating an assumption about a population using data from a sample. It helps determine whether an observed pattern can be reasonably explained by chance or whether it provides evidence for an alternative explanation.

How it works

A researcher formulates two mutually exclusive statements:
Null hypothesis (H0): the default statement assumed true (e.g., a population mean equals a specific value).
Alternative hypothesis (Ha): what the researcher suspects instead (e.g., the mean differs from that value).

Using a random sample from the population, the analyst applies an appropriate statistical test (t-test, z-test, chi-square test, etc.) to assess how consistent the observed data are with H0. The result is usually summarized by a test statistic and a p-value, which quantify how likely the observed outcome (or one more extreme) would be if H0 were true.

Based on a preselected significance level (commonly 0.05), the analyst either:
Rejects H0 (concluding the data provide sufficient evidence for Ha), or
Fails to reject H0 (concluding the data do not provide sufficient evidence against H0).

Note: Standard phrasing is “fail to reject H0” rather than “accept H0,” because failing to reject does not prove the null hypothesis true.

4-step process

  1. State the hypotheses: define H0 and Ha clearly.
  2. Plan the analysis: choose the test, significance level (α), and sample method.
  3. Analyze the data: compute the test statistic and p-value.
  4. Interpret the result: reject or fail to reject H0 and report the conclusion in context.

Example: testing a coin

Question: Is a penny fair (heads probability = 0.5)?

  • H0: P(heads) = 0.5
  • Ha: P(heads) ≠ 0.5

If a sample of 100 flips yields 40 heads:
Calculate the probability of observing 40 or fewer heads (and the symmetric tail) under H0.
If that probability (p-value) is very small relative to α, reject H0 and conclude the coin likely isn’t fair.

If the sample shows 48 heads and 52 tails, the p-value will typically be large enough that we fail to reject H0, meaning the result is plausibly due to chance.

Simple explanation

Hypothesis testing is a structured way to compare explanations. You propose a default explanation (H0) and an alternative, collect data, and use statistics to judge which explanation the data support.

Brief history

Early forms of hypothesis testing date back centuries; one early example is John Arbuthnot’s 1710 analysis of birth records, which used probability arguments to assess whether observed patterns could be due to chance.

Benefits

  • Provides a formal, repeatable framework for evaluating claims using data.
  • Reduces reliance on intuition or bias by grounding decisions in statistical evidence.
  • Helps quantify uncertainty and supports decision-making in science, business, and policy.

Limitations and common pitfalls

  • Results depend on data quality, sample size, and the appropriateness of the chosen test.
  • Misinterpretation of p-values and overreliance on arbitrary significance thresholds can mislead conclusions.
  • Hypothesis testing can produce errors:
  • Type I error: rejecting a true null hypothesis (false positive).
  • Type II error: failing to reject a false null hypothesis (false negative).
  • Tests don’t prove hypotheses true; they only evaluate consistency between the data and H0.

Conclusion

Hypothesis testing is a foundational statistical tool for assessing whether observed data support or contradict a specific assumption about a population. By following a clear four-step process and understanding the limitations and possible errors, researchers can draw informed, data-driven conclusions.