A/B testing is how data-driven companies make product decisions. Every major tech company runs thousands of experiments daily. This guide teaches you the full statistical framework — from sample size to final decision.

The A/B Testing Process

Define the hypothesis: what change do you want to test?
Choose your primary metric (conversion rate, revenue, CTR)
Calculate required sample size before starting
Randomly split users into control (A) and treatment (B)
Run until sample size is reached — do not stop early
Analyse with a statistical test and make a decision

Sample Size Calculation

from scipy import stats
import numpy as np

def required_sample_size(baseline, mde, alpha=0.05, power=0.8):
    p1, p2 = baseline, baseline + mde
    z_a = stats.norm.ppf(1 - alpha/2)
    z_b = stats.norm.ppf(power)
    p_pool = (p1 + p2) / 2
    n = ((z_a * np.sqrt(2*p_pool*(1-p_pool)) + z_b * np.sqrt(p1*(1-p1)+p2*(1-p2)))**2
         / (p2-p1)**2)
    return int(np.ceil(n))

n = required_sample_size(baseline=0.10, mde=0.02)
print(f'Need {n:,} users per group ({n*2:,} total)')

Simulate and Run the Test

np.random.seed(42)
n = 2000
control   = np.random.binomial(1, 0.10, n)
treatment = np.random.binomial(1, 0.12, n)

print(f'Control:   {control.mean()*100:.2f}% ({control.sum()}/{n})')
print(f'Treatment: {treatment.mean()*100:.2f}% ({treatment.sum()}/{n})')

Statistical Test

from scipy.stats import proportions_ztest

counts = [treatment.sum(), control.sum()]
nobs   = [n, n]
z_stat, p_value = proportions_ztest(counts, nobs)

print(f'p-value: {p_value:.4f}')
if p_value < 0.05:
    lift = (treatment.mean() - control.mean()) / control.mean() * 100
    print(f'Significant! Lift: +{lift:.1f}%')
else:
    print('Not significant — cannot conclude B is better')

Common Mistakes

Peeking: Stopping early when you see significance inflates false positives
Multiple metrics: Testing many metrics simultaneously — use Bonferroni correction
Novelty effect: Run long enough for initial curiosity to wear off
Ignoring practical significance: A tiny lift may be statistically significant but not worth shipping

FAQ

How long should I run an A/B test?

Until your pre-calculated sample size is reached — never stop early because results look good. At minimum, run for one full business cycle (7 days) to account for weekday/weekend effects.

What if my test is inconclusive?

Inconclusive is a valid result. It means the effect (if any) is smaller than your minimum detectable effect. Either accept the null, collect more data, or revise your hypothesis.

A/B Testing for Data Scientists: Complete Statistical Guide (2026)

Table of Content

The A/B Testing Process

Sample Size Calculation

Simulate and Run the Test

Statistical Test

Common Mistakes

FAQ

How long should I run an A/B test?

What if my test is inconclusive?

Leave feedback about this Cancel Reply

Latest Posts

Git for Data Scientists: Complete Beginner Guide (2026)

Data Visualisation with Seaborn: Complete Python Tutorial (2026)

Feature Engineering for Machine Learning: Complete Python Guide (2026)

List of Categories

About us

Categories

The latest

Git for Data Scientists: Complete Beginner Guide (2026)

Data Visualisation with Seaborn: Complete Python Tutorial (2026)

Feature Engineering for Machine Learning: Complete Python Guide (2026)

Subscribe

8 Transformative Principles: Mastering Effective Dashboard Design

UrbanObserver

Subscribe to newsletter

A/B Testing for Data Scientists: Complete Statistical Guide (2026)

Table of Content

The A/B Testing Process

Sample Size Calculation

Simulate and Run the Test

Statistical Test

Common Mistakes

FAQ

How long should I run an A/B test?

What if my test is inconclusive?

Leave feedback about this Cancel Reply

Latest Posts

Git for Data Scientists: Complete Beginner Guide (2026)

Data Visualisation with Seaborn: Complete Python Tutorial (2026)

Feature Engineering for Machine Learning: Complete Python Guide (2026)

List of Categories

About us

Categories

The latest

Git for Data Scientists: Complete Beginner Guide (2026)

Data Visualisation with Seaborn: Complete Python Tutorial (2026)

Feature Engineering for Machine Learning: Complete Python Guide (2026)

Subscribe

8 Transformative Principles: Mastering Effective Dashboard Design