TFT

Statistical Power and Sample Size Calculator

Plan your studies effectively. Calculate the statistical power, required sample size, or detectable effect size for common tests like t-tests and proportion tests.

Statistical Power Calculator

Calculate statistical power for hypothesis tests

Small: 0.2, Medium: 0.5, Large: 0.8

About Statistical Power

Statistical power is the probability of correctly rejecting the null hypothesis when it is false (detecting a true effect). It equals 1 - β, where β is the Type II error rate.

A power of 80% or higher is typically recommended. Power depends on effect size, sample size, significance level (α), and whether the test is one or two-tailed.

How the Statistical Power Calculator Works

Select what you want to calculate: power, sample size, effect size, or significance level. Choose your test type (one-sample t-test, two-sample t-test, proportion test, etc.) and enter the known parameters.

For sample size calculations, input desired power (typically 0.80), significance level (usually 0.05), and expected effect size. For power calculations, enter your sample size, effect size, and alpha level. The calculator uses non-central t-distributions or normal approximations.

Results include the calculated value plus interpretation. Power curves show how power changes with sample size or effect size. This helps you understand trade-offs in study design and make informed decisions about resource allocation.

When You'd Actually Use This

Planning clinical trials

Determine how many patients you need to detect a meaningful treatment effect. Ensure your trial has adequate power to find differences if they exist.

Designing A/B tests

Calculate sample size needed to detect a 5% conversion lift. Balance statistical rigor with practical constraints like traffic volume and test duration.

Grant proposal preparation

Justify your requested sample size to reviewers. Show that your study is adequately powered to detect hypothesized effects.

Evaluating completed studies

Calculate post-hoc power for non-significant results. Low power suggests the study couldn't detect effects, not that effects don't exist.

Psychology experiment design

Plan participant numbers for detecting expected effect sizes. Account for anticipated dropout rates by inflating initial sample size.

Quality improvement projects

Determine how many measurements you need to detect process improvements. Avoid wasting resources on oversized samples or risking missed improvements with undersized ones.

What to Know Before Using

Power is the probability of detecting an effect.Power = 1 - beta, where beta is Type II error rate. Conventionally, 0.80 power means 80% chance of finding an effect if it exists.

Effect size must be specified.Use Cohen's d for means (0.2 small, 0.5 medium, 0.8 large) or proportions for rates. Base estimates on prior research or pilot data.

Alpha level controls false positives.Standard alpha is 0.05. Lower alpha reduces false positives but requires larger samples. Consider your tolerance for Type I vs Type II errors.

One-tailed tests have more power.If you're certain about effect direction, one-tailed tests need smaller samples. But you can't detect effects in the opposite direction.

Pro tip: Always calculate power before collecting data, not after. Post-hoc power calculations using observed effect sizes are controversial and often misleading. Use them only for planning future studies.

Common Questions

What's a good power level?

0.80 (80%) is standard in most fields. Some areas like clinical trials use 0.90. Higher power requires larger samples but reduces risk of missing real effects.

How do I estimate effect size?

Use previous studies, pilot data, or subject-matter expertise. Consider the minimum effect that would be practically meaningful, not just statistically detectable.

Why is my required sample size so large?

Small effects need large samples to detect. High power requirements also increase sample size. Consider if you're targeting an unrealistically small effect.

What's the difference between one and two-sample?

One-sample compares a mean to a known value. Two-sample compares means between two groups. Two-sample tests typically need more total participants.

Can I adjust for dropout?

Yes, inflate your calculated sample size. If you expect 20% dropout, divide by 0.80. For 100 needed participants, recruit 100/0.80 = 125.

What if I can't get enough participants?

Accept lower power, increase effect size target (study only large effects), or use more sensitive measures. Consider collaborative multi-site studies.

Does power matter for significant results?

If you found significance, power was sufficient for that effect. But low-powered studies that find significance often overestimate effect sizes.