The Bernoulli distribution is one of the most fundamental and widely used probability distributions in statistics and data science. Named after the Swiss mathematician Jacob Bernoulli, it is a discrete probability distribution that models experiments with exactly two possible outcomes. These outcomes are typically labeled as “success” and “failure,” “1” and “0,” or “yes” and “no.”
What is a Bernoulli Distribution?
The Bernoulli distribution describes the probability of success in a single trial of a binary experiment. The experiment must meet three criteria:
- There are only two outcomes.
- Each trial is independent of others.
- The probability of success remains constant.
For example, flipping a coin is a classic Bernoulli trial. If we define getting heads as a success, then the Bernoulli distribution can represent the probability of getting heads (success) or tails (failure). If the coin is fair, the probability of success (p) is 0.5. The probability mass function (PMF) of a Bernoulli distribution is:

P(X=x)=px(1−p)1−x,for x∈{0,1}P(X = x) = p^x (1 – p)^{1 – x}, \quad \text{for } x \in \{0, 1\}
Here, XX is a random variable that takes the value 1 with probability pp, and 0 with probability 1−p1 – p.
Key Properties of the Bernoulli Distribution
Understanding the properties of the Bernoulli distribution helps in applying it effectively in real-world scenarios. Some important characteristics include:
- Mean (Expected Value): The mean of a Bernoulli distribution is E[X]=pE[X] = p. This represents the expected outcome of the trial.
- Variance: The variance is given by Var(X)=p(1−p)Var(X) = p(1-p), measuring the spread of the distribution.
- Skewness and Kurtosis: For values of pp not equal to 0.5, the distribution becomes skewed. If p=0.5p = 0.5, the distribution is symmetric.
These simple properties make the Bernoulli distribution a building block for more complex statistical models and methods.
Applications of the Bernoulli Distribution
The Bernoulli distribution is widely applicable across various fields such as data science, machine learning, finance, and engineering. Some common applications include:
- Modeling Binary Outcomes: In medical studies, it can represent whether a patient has a disease (1) or not (0).
- A/B Testing: Online businesses often use Bernoulli trials to compare two versions of a webpage or product feature by modeling user actions as success/failure.
- Machine Learning Classification: In binary classification tasks, the target variable often follows a Bernoulli distribution.
- Quality Control: Manufacturing processes may use Bernoulli distributions to monitor defect presence or absence.
Relationship with Other Distributions
The Bernoulli distribution is closely related to several other probability distributions:
- Binomial Distribution: A binomial distribution is essentially the sum of multiple independent Bernoulli trials. If we conduct nn Bernoulli trials with the same probability of success pp, the total number of successes follows a binomial distribution.
- Geometric Distribution: This describes the number of Bernoulli trials needed to get the first success.
- Beta Distribution: Often used as the prior distribution for the probability of success in a Bernoulli trial in Bayesian statistics.
Conclusion
The Bernoulli distribution may appear simple, but it serves as a critical foundation for understanding probability and statistics. From its clear-cut binary outcomes to its role in forming more advanced models, the Bernoulli distribution is indispensable in both theoretical and applied statistics. Whether you’re analyzing user behavior, testing hypotheses, or designing machine learning algorithms, grasping the concept of the Bernoulli distribution equips you with a powerful tool for binary decision-making and probabilistic modeling.