Loading learning content...
How many customers will visit a website in the next hour? How many emails will arrive in your inbox today? How many mutations will occur in a DNA sequence? How many server failures will happen this month?\n\nThese questions all involve counting random events that occur over time or space. The Poisson distribution, named after French mathematician Siméon Denis Poisson (1781–1840), provides the mathematical framework for modeling such counts.\n\nThe Poisson distribution captures a specific type of randomness: events that occur independently, at a constant average rate, with no two events occurring at exactly the same instant. These conditions describe an astonishing variety of real-world phenomena, from radioactive decay to website traffic to typos in manuscripts.\n\nIn machine learning, the Poisson appears in diverse contexts: modeling count data, text analysis (word frequencies), rate prediction, anomaly detection, and as the foundation for Poisson regression. Understanding this distribution deepens your ability to model discrete, non-negative quantities that arise naturally in many domains.
By the end of this page, you will master the Poisson distribution: its derivation from the Binomial limit, mathematical properties, relationship to the Exponential distribution (waiting times), parameter estimation, and its applications in machine learning and real-world modeling.
A discrete random variable X follows a Poisson distribution with rate parameter λ > 0, written X ~ Poisson(λ), if its probability mass function (PMF) is:\n\n$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, 3, \ldots$$\n\nwhere:\n- λ (lambda) is the rate parameter — the expected number of events in the given interval\n- k is the number of events (non-negative integer)\n- e is Euler's number (≈ 2.71828...)\n- k! is the factorial of k\n\nThe support of the Poisson is all non-negative integers: {0, 1, 2, 3, ...}. Unlike the Binomial, there is no upper bound on k.
The Poisson PMF has an elegant structure: λᵏ/k! comes from the Taylor series of e^λ, while e^{-λ} normalizes the distribution. Together, Σₖ (λᵏ/k!) · e^{-λ} = e^{-λ} · e^λ = 1, confirming it's a valid probability distribution.
| Property | Formula | Interpretation |
|---|---|---|
| Support | {0, 1, 2, ...} | All non-negative integers (no upper bound) |
| Parameter | λ > 0 | Rate = expected count in the interval |
| Mean | E[X] = λ | Average number of events |
| Variance | Var(X) = λ | Variance equals mean (key diagnostic) |
| Skewness | 1/√λ | Always right-skewed; approaches 0 as λ → ∞ |
| Mode | ⌊λ⌋ or ⌊λ⌋ and ⌊λ⌋-1 | Most likely count(s) |
| MGF | exp(λ(e^t - 1)) | Moment generating function |
| Entropy | λ(1 - log λ) + e^{-λ}Σ... | Complex closed form |
The Poisson distribution arises naturally as the limit of the Binomial distribution when events are rare. This deep connection is known as the Poisson Limit Theorem (or Law of Rare Events).\n\n### Setting the Stage\n\nConsider a Binomial(n, p) distribution where:\n- n is large (many opportunities for events)\n- p is small (each event is individually rare)\n- np = λ remains constant (the expected count is fixed)\n\nAs n → ∞ and p → 0 with np → λ, the Binomial approaches the Poisson:\n\n$$\lim_{n \to \infty} \binom{n}{k} p^k (1-p)^{n-k} = \frac{\lambda^k e^{-\lambda}}{k!} \quad \text{where } p = \lambda/n$$
The Poisson approximation to the Binomial is accurate when n ≥ 20 and p ≤ 0.05, or when n ≥ 100 and np ≤ 10. The approximation improves as n increases and p decreases with np held constant.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
import numpy as npfrom scipy.stats import binom, poissonimport matplotlib.pyplot as plt def demonstrate_poisson_limit(): """ Demonstrate how Binomial converges to Poisson as n→∞, p→0, np=λ. """ lambda_fixed = 5 # Fixed expected value # Different values of n (with p = λ/n) n_values = [10, 20, 50, 100, 500, 1000] # Range of k values to plot k_max = 15 k_values = np.arange(0, k_max + 1) # True Poisson probabilities poisson_probs = poisson.pmf(k_values, lambda_fixed) print(f"Poisson Limit Theorem Demonstration (λ = {lambda_fixed})") print("=" * 60) for n in n_values: p = lambda_fixed / n # Binomial probabilities binom_probs = binom.pmf(k_values, n, p) # Maximum absolute difference max_diff = np.max(np.abs(binom_probs - poisson_probs)) # Total variation distance tv_distance = 0.5 * np.sum(np.abs(binom_probs - poisson_probs)) print(f"n = {n:4d}, p = {p:.4f}: Max diff = {max_diff:.6f}, " f"TV distance = {tv_distance:.6f}") # Theoretical bound: TV ≤ min(np², λ²/n) ≈ λ²/n for small p print("\nTheoretical bound: Total Variation ≤ λ²/n") for n in n_values: bound = lambda_fixed**2 / n print(f"n = {n:4d}: Bound = {bound:.6f}") demonstrate_poisson_limit() # Practical example: Rare event modelingprint("\n" + "=" * 60)print("Example: Defective Items in Manufacturing")print("=" * 60) defect_rate = 0.001 # 0.1% defect ratebatch_size = 10000 # 10,000 items per batchexpected_defects = defect_rate * batch_size # λ = 10 # Exact Binomialbinom_dist = binom(batch_size, defect_rate) # Poisson approximationpoisson_dist = poisson(expected_defects) # Compare P(X ≤ 15)k = 15binom_cdf = binom_dist.cdf(k)poisson_cdf = poisson_dist.cdf(k) print(f"Defect rate: {defect_rate:.4f}, Batch size: {batch_size}")print(f"Expected defects: λ = {expected_defects}")print(f"\nP(defects ≤ {k}):")print(f" Binomial: {binom_cdf:.6f}")print(f" Poisson: {poisson_cdf:.6f}")print(f" Difference: {abs(binom_cdf - poisson_cdf):.6f}")The Poisson distribution naturally arises from the Poisson process—a stochastic process that models events occurring randomly over time (or space). Understanding this connection provides deep intuition for when the Poisson is appropriate.\n\n### Defining the Poisson Process\n\nA Poisson process with rate λ > 0 is characterized by three assumptions:\n\n1. Stationarity: The probability of an event in any small interval depends only on the length of the interval, not when it occurs.\n\n$$P(\text{1 event in } (t, t+h)) = \lambda h + o(h)$$\n\n2. Independent Increments: The numbers of events in non-overlapping intervals are independent.\n\n3. No Simultaneous Events: At most one event can occur at any instant.\n\n$$P(\text{≥ 2 events in } (t, t+h)) = o(h)$$\n\nwhere o(h) means terms that go to zero faster than h as h → 0.\n\n### Main Results\n\nFrom these assumptions, it follows that the number of events N(t) in an interval of length t follows:\n\n$$N(t) \sim \text{Poisson}(\lambda t)$$\n\nThe waiting time T until the first event follows:\n\n$$T \sim \text{Exponential}(\lambda)$$\n\nwith PDF f(t) = λe^{-λt} for t ≥ 0.
The Poisson and Exponential distributions are two sides of the same coin. Poisson counts events in a fixed interval; Exponential measures time between events. If inter-arrival times are Exponential(λ), then counts are Poisson(λt). This duality is fundamental to queuing theory and event modeling.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
import numpy as npfrom scipy.stats import poisson, exponimport matplotlib.pyplot as plt class PoissonProcess: """ Simulate and analyze a Poisson process. """ def __init__(self, rate: float): """ Args: rate: λ, the average number of events per unit time """ self.rate = rate def simulate_events(self, duration: float) -> np.ndarray: """ Simulate event arrival times up to the given duration. Uses the inter-arrival time approach (Exponential waiting times). Returns: Array of event arrival times """ arrival_times = [] current_time = 0 while current_time < duration: # Wait time until next event ~ Exponential(λ) wait_time = np.random.exponential(1 / self.rate) current_time += wait_time if current_time < duration: arrival_times.append(current_time) return np.array(arrival_times) def count_events(self, events: np.ndarray, interval_end: float) -> int: """Count events in [0, interval_end].""" return np.sum(events <= interval_end) def theoretical_distribution(self, t: float) -> poisson: """Return the Poisson distribution for count in [0, t].""" return poisson(self.rate * t) # Demonstrationnp.random.seed(42) # Website traffic: average 10 visitors per minuterate = 10 # λ = 10 per minuteprocess = PoissonProcess(rate) # Simulate 1000 one-minute intervals and check distributionprint("Poisson Process Simulation: Website Traffic")print(f"Rate λ = {rate} visitors/minute")print("=" * 50) n_simulations = 10000counts = [] for _ in range(n_simulations): events = process.simulate_events(duration=1.0) # 1 minute counts.append(len(events)) counts = np.array(counts) # Compare empirical distribution to theoretical Poisson(λ)theoretical = poisson(rate) print(f"\nEmpirical vs Theoretical (Poisson({rate})):")print(f"Mean: Empirical = {counts.mean():.3f}, Theoretical = {rate}")print(f"Variance: Empirical = {counts.var():.3f}, Theoretical = {rate}")print(f"Std Dev: Empirical = {counts.std():.3f}, Theoretical = {np.sqrt(rate):.3f}") # Check some specific probabilitiesprint("\nProbability comparisons:")for k in [5, 10, 15]: emp_prob = np.mean(counts == k) theo_prob = theoretical.pmf(k) print(f"P(X = {k:2d}): Empirical = {emp_prob:.4f}, Theoretical = {theo_prob:.4f}") # Inter-arrival times should be Exponential(λ)print("\n" + "=" * 50)print("Inter-arrival Time Analysis")print("=" * 50) # Longer simulation to get many inter-arrival timesevents = process.simulate_events(duration=100) # 100 minutesinter_arrivals = np.diff(events) # Time between consecutive events print(f"Number of events: {len(events)}")print(f"Number of inter-arrival times: {len(inter_arrivals)}")print(f"\nInter-arrival times (should be Exponential(λ={rate})):")print(f"Mean: Empirical = {inter_arrivals.mean():.4f}, Theoretical = {1/rate:.4f}")print(f"Std Dev: Empirical = {inter_arrivals.std():.4f}, Theoretical = {1/rate:.4f}")Estimating the Poisson rate parameter λ from observed count data is straightforward via Maximum Likelihood Estimation.\n\n### MLE for λ\n\nGiven n independent observations x₁, x₂, ..., xₙ from a Poisson(λ) distribution, the likelihood is:\n\n$$L(\lambda) = \prod_{i=1}^n \frac{\lambda^{x_i} e^{-\lambda}}{x_i!} = \frac{\lambda^{\sum x_i} e^{-n\lambda}}{\prod_{i=1}^n x_i!}$$\n\nThe log-likelihood is:\n\n$$\ell(\lambda) = \left(\sum_{i=1}^n x_i\right) \ln \lambda - n\lambda - \sum_{i=1}^n \ln(x_i!)$$\n\nTaking the derivative and setting to zero:\n\n$$\frac{d\ell}{d\lambda} = \frac{\sum_{i=1}^n x_i}{\lambda} - n = 0$$\n\n$$\hat{\lambda}{MLE} = \frac{1}{n}\sum{i=1}^n x_i = \bar{x}$$\n\nThe MLE for λ is simply the sample mean—the average count.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
import numpy as npfrom scipy import stats def poisson_mle(counts: np.ndarray) -> dict: """ Maximum Likelihood Estimation for Poisson parameter λ. Returns point estimate and confidence intervals. """ n = len(counts) total = np.sum(counts) # MLE estimate (sample mean) lambda_hat = np.mean(counts) # Standard error se = np.sqrt(lambda_hat / n) # 95% CI using normal approximation z = 1.96 ci_normal = (lambda_hat - z * se, lambda_hat + z * se) # 95% CI using exact chi-squared method # 2 * n * λ_hat ~ χ²(2 * Σx_i) approximately # More precisely, for sum S = Σx_i ~ Poisson(nλ): # CI for nλ leads to CI for λ chi2_lower = stats.chi2.ppf(0.025, 2 * total) / (2 * n) chi2_upper = stats.chi2.ppf(0.975, 2 * (total + 1)) / (2 * n) ci_exact = (chi2_lower, chi2_upper) return { 'n': n, 'total_count': total, 'lambda_hat': lambda_hat, 'standard_error': se, 'ci_95_normal': ci_normal, 'ci_95_exact': ci_exact } def test_overdispersion(counts: np.ndarray) -> dict: """ Test if data is overdispersed relative to Poisson (variance > mean). """ n = len(counts) mean = np.mean(counts) var = np.var(counts, ddof=1) # Dispersion statistic dispersion = var / mean # Chi-squared test for overdispersion # Under Poisson: Σ(x_i - x̄)²/x̄ ~ χ²(n-1) chi2_stat = (n - 1) * dispersion p_value = 1 - stats.chi2.cdf(chi2_stat, df=n-1) return { 'mean': mean, 'variance': var, 'dispersion': dispersion, 'chi2_statistic': chi2_stat, 'p_value': p_value, 'overdispersed': p_value < 0.05 } # Example: Customer arrivals per hournp.random.seed(42)true_lambda = 7.5n_hours = 50arrivals = np.random.poisson(true_lambda, n_hours) print("Poisson Parameter Estimation: Customer Arrivals")print("=" * 50)print(f"True λ = {true_lambda}")print(f"Observed data: {arrivals[:10]}... (first 10 of {n_hours})") results = poisson_mle(arrivals)print(f"\nMLE Results:")print(f" λ̂ = {results['lambda_hat']:.4f}")print(f" Standard Error = {results['standard_error']:.4f}")print(f" 95% CI (Normal): [{results['ci_95_normal'][0]:.4f}, {results['ci_95_normal'][1]:.4f}]")print(f" 95% CI (Exact): [{results['ci_95_exact'][0]:.4f}, {results['ci_95_exact'][1]:.4f}]") # Test for overdispersiondispersion_test = test_overdispersion(arrivals)print(f"\nOverdispersion Test:")print(f" Mean = {dispersion_test['mean']:.4f}")print(f" Variance = {dispersion_test['variance']:.4f}")print(f" Dispersion ratio = {dispersion_test['dispersion']:.4f}")print(f" p-value = {dispersion_test['p_value']:.4f}")print(f" Conclusion: {'Overdispersed' if dispersion_test['overdispersed'] else 'Consistent with Poisson'}")Real count data often exhibits overdispersion (variance > mean), violating the Poisson assumption. Common causes include unobserved heterogeneity and clustering. Alternatives include the Negative Binomial distribution or quasi-Poisson models that explicitly model overdispersion.
The Poisson distribution has several elegant properties that make it mathematically tractable.\n\n### Additivity: Sum of Independent Poissons\n\nIf X₁ ~ Poisson(λ₁) and X₂ ~ Poisson(λ₂) are independent, then:\n\n$$X_1 + X_2 \sim \text{Poisson}(\lambda_1 + \lambda_2)$$\n\nMore generally, for independent Xᵢ ~ Poisson(λᵢ):\n\n$$\sum_{i=1}^n X_i \sim \text{Poisson}\left(\sum_{i=1}^n \lambda_i\right)$$\n\nProof via MGF:\n$$M_{X+Y}(t) = M_X(t) M_Y(t) = e^{\lambda_1(e^t-1)} e^{\lambda_2(e^t-1)} = e^{(\lambda_1+\lambda_2)(e^t-1)}$$\n\nThis is the MGF of Poisson(λ₁ + λ₂). ∎\n\nApplication: Total website traffic = traffic from channel A + traffic from channel B follows Poisson if each channel is Poisson.
| Property | Statement | Application |
|---|---|---|
| Additivity | Sum of indep. Poissons is Poisson | Aggregating event sources |
| Thinning/Splitting | Random classification gives independent Poissons | Customer segmentation |
| Rate scaling | X(t) ~ Poisson(λt) | Adjusting observation window |
| Mean = Variance | E[X] = Var(X) = λ | Overdispersion diagnostic |
| Asymptotic normality | Poisson(λ) → N(λ, λ) for large λ | Approximate inference |
The Poisson distribution appears throughout machine learning whenever we model counts or rates.\n\n### Poisson Regression\n\nPoisson regression models count data as a function of features:\n\n$$Y_i | \mathbf{x}_i \sim \text{Poisson}(\lambda_i) \quad \text{where} \quad \log \lambda_i = \mathbf{w}^T \mathbf{x}i + b$$\n\nThe log link function ensures λᵢ > 0. The negative log-likelihood loss is:\n\n$$\mathcal{L} = \sum{i=1}^n \left[ \lambda_i - y_i \log \lambda_i + \log(y_i!) \right]$$\n\nUse cases:\n- Predicting number of hospital admissions\n- Expected sales count\n- Number of defects in manufacturing\n- Citation counts for papers
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
import numpy as npfrom scipy.optimize import minimizefrom scipy.special import gammaln # log(n!) def poisson_nll(params, X, y): """ Negative log-likelihood for Poisson regression. Model: y ~ Poisson(λ), where log(λ) = X @ w + b NLL = Σ [λ_i - y_i * log(λ_i) + log(y_i!)] """ w, b = params[:-1], params[-1] log_lambda = X @ w + b lambda_ = np.exp(log_lambda) # Negative log-likelihood (ignoring constant log(y!) which doesn't affect optimization) nll = np.sum(lambda_ - y * log_lambda) return nll def fit_poisson_regression(X, y): """ Fit Poisson regression via MLE. """ n_features = X.shape[1] initial_params = np.zeros(n_features + 1) result = minimize(poisson_nll, initial_params, args=(X, y), method='L-BFGS-B') w = result.x[:-1] b = result.x[-1] return w, b def predict_poisson(X, w, b): """Predict expected counts.""" log_lambda = X @ w + b return np.exp(log_lambda) # Example: Predicting customer purchases based on featuresnp.random.seed(42)n = 500 # Features: age (normalized), income (normalized), website visitsX = np.random.randn(n, 3)true_w = np.array([0.5, 0.8, 0.3])true_b = 1.0 # True λ = exp(w'x + b)true_lambda = np.exp(X @ true_w + true_b) # Generate countsy = np.random.poisson(true_lambda) # Fit modelw_hat, b_hat = fit_poisson_regression(X, y) print("Poisson Regression Results")print("=" * 50)print(f"True weights: {true_w}")print(f"Estimated weights: {w_hat.round(4)}")print(f"True bias: {true_b}")print(f"Estimated bias: {b_hat:.4f}") # Predictionsy_pred = predict_poisson(X, w_hat, b_hat)print(f"\nMean actual count: {y.mean():.2f}")print(f"Mean predicted λ: {y_pred.mean():.2f}") # Deviance (measure of fit)deviance = 2 * np.sum(y * np.log(np.maximum(y, 1) / y_pred) - (y - y_pred))print(f"Deviance: {deviance:.2f}")The Poisson distribution is the fundamental model for counting random events. Let's consolidate the key concepts:
What's Next:\n\nWe've covered the fundamental discrete (Bernoulli/Binomial, Poisson) and continuous (Gaussian) distributions. Next, we explore the Exponential family—a unifying framework that encompasses these and many other distributions. Understanding exponential families reveals deep connections between distributions and provides powerful tools for generalized linear models and Bayesian inference.
You now have a comprehensive understanding of the Poisson distribution—from its derivation as the Binomial limit to its role in the Poisson process and its applications in machine learning. The Poisson is the natural choice whenever you need to model counts of events occurring randomly in time or space.