📐 Math CalculatorsFree · No signup

P-Value Calculator

Calculate p-value for one-sample and two-sample t-tests and z-tests. Determine statistical significance of research results.

About the P-Value Calculator

A p-value calculator computes the probability of obtaining test results at least as extreme as observed, assuming the null hypothesis is true. The p-value is the most widely reported and most widely misunderstood statistic in scientific research. A small p-value (typically p < 0.05) indicates that your observed results would be unlikely to occur by chance alone if the null hypothesis were true — providing evidence against the null hypothesis. Our free p-value calculator handles the most common statistical tests: z-test (one-sample and two-sample, one-tailed and two-tailed), t-test (one-sample, independent samples, paired samples), chi-square test, and F-test. Enter your test statistic and degrees of freedom (or let the calculator compute the test statistic from your raw data), and receive the exact p-value with a clear explanation of what it means and how to interpret it in context.

Formula

z-test p = 2 x P(Z >= |z|) for two-tailed | t-test p = 2 x P(t_(df) >= |t|) | chi-square p = P(chi^2_(df) >= chi^2_obs)

How It Works

P-value calculation for a z-test: p = P(Z >= |z|) x 2 for two-tailed test. For z = 2.10 (two-tailed): p = P(Z >= 2.10) x 2 = 0.0357 x 2 = 0.0357 (using standard normal tables). This means that if the null hypothesis were true, there is a 3.57% probability of observing a test statistic this extreme or more extreme by chance. For a t-test with t = 2.10 and 20 degrees of freedom: p = approximately 0.049 (two-tailed) — slightly higher than the z-test p-value because the t-distribution has heavier tails than the z-distribution for finite sample sizes. For chi-square test with chi-square = 7.5 and 3 df: p = approximately 0.058 (using chi-square distribution with 3 df).

Tips & Best Practices

  • The most critical misconception: p = 0.03 does NOT mean "there is a 3% probability that the null hypothesis is true" — it means "if the null were true, there is a 3% chance of results this extreme." These are fundamentally different statements.
  • The 0.05 threshold is arbitrary: Ronald Fisher suggested 0.05 as a convenient threshold in the 1920s. It has no special mathematical significance. Some fields use 0.01 (genetics) or 0.001 (particle physics, where p < 5 x 10^-7 is the "five sigma" standard for discovery).
  • P-value inflation with multiple comparisons: testing 20 hypotheses simultaneously at p < 0.05 will produce approximately 1 false positive by chance. Apply Bonferroni correction (divide alpha by number of tests) or FDR correction for multiple comparisons.
  • Statistical significance versus practical significance: a study with n = 10,000 can find a statistically significant (p < 0.05) effect that is practically meaningless — a drug reducing blood pressure by 0.5 mmHg is statistically significant with a huge sample but clinically irrelevant. Always report effect sizes alongside p-values.
  • Publication bias: the scientific literature is biased toward statistically significant (p < 0.05) results because journals preferentially publish them. This creates the "file drawer problem" where many null results remain unpublished, inflating apparent effect sizes in meta-analyses.
  • Replication crisis: many published findings with p < 0.05 have failed to replicate in subsequent studies, partly due to p-hacking (selective analysis), underpowered studies, and overfitting. Effect size and confidence intervals provide more reliable evidence than p-values alone.
  • One-tailed versus two-tailed: one-tailed tests have more statistical power for detecting effects in the specified direction but are only appropriate when you can justify a directional prediction before seeing the data. Two-tailed tests are the default in most research contexts.
  • Exact p-values: modern statistical practice recommends reporting exact p-values (e.g., p = 0.031) rather than binary thresholds (p < 0.05 or p > 0.05), because exact values convey the strength of evidence rather than just pass/fail against an arbitrary threshold.

Who Uses This Calculator

Statistics students learn p-value interpretation and computation across all major statistical tests as a core curriculum requirement. Researchers in medicine, psychology, biology, and social science compute p-values to assess the statistical significance of experimental and observational study results. Medical journal editors and peer reviewers use p-value interpretation guidelines to evaluate manuscript statistical analyses. Clinical trialists compute p-values for primary and secondary endpoints to submit efficacy data to regulatory agencies. Data scientists perform hypothesis tests on A/B experiment results to determine whether observed differences exceed chance variation. Quality engineers test whether process changes produce statistically significant improvements in defect rates or output characteristics. Students preparing for statistics exams use the calculator to verify their hand calculations and check their interpretation of test statistics.

Optimised for: USA · Canada · UK · Australia · Calculations run in your browser · No data stored

Frequently Asked Questions

What p-value is statistically significant?

Typically p < 0.05 is considered statistically significant. For stronger evidence, researchers use p < 0.01 or p < 0.001.