Loading learning content...
How many customers will visit a website in the next hour? How many emails will arrive in your inbox today? How many mutations will occur in a DNA sequence? How many server failures will happen this month?
These questions all involve counting random events that occur over time or space. The Poisson distribution, named after French mathematician Siméon Denis Poisson (1781–1840), provides the mathematical framework for modeling such counts.
The Poisson distribution captures a specific type of randomness: events that occur independently, at a constant average rate, with no two events occurring at exactly the same instant. These conditions describe an astonishing variety of real-world phenomena, from radioactive decay to website traffic to typos in manuscripts.
In machine learning, the Poisson appears in diverse contexts: modeling count data, text analysis (word frequencies), rate prediction, anomaly detection, and as the foundation for Poisson regression. Understanding this distribution deepens your ability to model discrete, non-negative quantities that arise naturally in many domains.
By the end of this page, you will master the Poisson distribution: its derivation from the Binomial limit, mathematical properties, relationship to the Exponential distribution (waiting times), parameter estimation, and its applications in machine learning and real-world modeling.
A discrete random variable X follows a Poisson distribution with rate parameter λ > 0, written X ~ Poisson(λ), if its probability mass function (PMF) is:
$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, 3, \ldots$$
where:
The support of the Poisson is all non-negative integers: {0, 1, 2, 3, ...}. Unlike the Binomial, there is no upper bound on k.
The Poisson PMF has an elegant structure: λᵏ/k! comes from the Taylor series of e^λ, while e^{-λ} normalizes the distribution. Together, Σₖ (λᵏ/k!) · e^{-λ} = e^{-λ} · e^λ = 1, confirming it's a valid probability distribution.
The Poisson has remarkably elegant moment properties.
Expected Value:
$$E[X] = \sum_{k=0}^{\infty} k \cdot \frac{\lambda^k e^{-\lambda}}{k!} = \lambda$$
Variance:
$$Var(X) = E[X^2] - (E[X])^2 = \lambda$$
Crucial insight: For the Poisson, the mean equals the variance (both equal λ).
This property is often used as a diagnostic: if count data has variance significantly different from its mean, the Poisson may not be appropriate (overdispersion or underdispersion).
Standard Deviation: $$\sigma = \sqrt{\lambda}$$
Coefficient of Variation: $$CV = \frac{\sigma}{\mu} = \frac{\sqrt{\lambda}}{\lambda} = \frac{1}{\sqrt{\lambda}}$$
As λ increases, the relative variability decreases.
| Property | Formula | Interpretation |
|---|---|---|
| Support | {0, 1, 2, ...} | All non-negative integers (no upper bound) |
| Parameter | λ > 0 | Rate = expected count in the interval |
| Mean | E[X] = λ | Average number of events |
| Variance | Var(X) = λ | Variance equals mean (key diagnostic) |
| Skewness | 1/√λ | Always right-skewed; approaches 0 as λ → ∞ |
| Mode | ⌊λ⌋ or ⌊λ⌋ and ⌊λ⌋-1 | Most likely count(s) |
| MGF | exp(λ(e^t - 1)) | Moment generating function |
| Entropy | λ(1 - log λ) + e^{-λ}Σ... | Complex closed form |
The MGF of the Poisson is:
$$M_X(t) = E[e^{tX}] = \sum_{k=0}^{\infty} e^{tk} \frac{\lambda^k e^{-\lambda}}{k!} = e^{-\lambda} \sum_{k=0}^{\infty} \frac{(\lambda e^t)^k}{k!} = e^{-\lambda} e^{\lambda e^t} = e^{\lambda(e^t - 1)}$$
This elegant form enables derivation of all moments and plays a key role in proving the sum of independent Poissons is Poisson.
The PGF (useful for count distributions) is:
$$G_X(s) = E[s^X] = \sum_{k=0}^{\infty} s^k \frac{\lambda^k e^{-\lambda}}{k!} = e^{\lambda(s-1)}$$
The Poisson distribution arises naturally as the limit of the Binomial distribution when events are rare. This deep connection is known as the Poisson Limit Theorem (or Law of Rare Events).
Consider a Binomial(n, p) distribution where:
As n → ∞ and p → 0 with np → λ, the Binomial approaches the Poisson:
$$\lim_{n \to \infty} \binom{n}{k} p^k (1-p)^{n-k} = \frac{\lambda^k e^{-\lambda}}{k!} \quad \text{where } p = \lambda/n$$
Let p = λ/n. The Binomial PMF is:
$$P(X=k) = \binom{n}{k} \left(\frac{\lambda}{n}\right)^k \left(1 - \frac{\lambda}{n}\right)^{n-k}$$
Step 1: Expand the binomial coefficient
$$\binom{n}{k} = \frac{n!}{k!(n-k)!} = \frac{n(n-1)(n-2)\cdots(n-k+1)}{k!}$$
Step 2: Rewrite
$$P(X=k) = \frac{n(n-1)\cdots(n-k+1)}{k!} \cdot \frac{\lambda^k}{n^k} \cdot \left(1 - \frac{\lambda}{n}\right)^{n-k}$$
Step 3: Take the limit as n → ∞
$$\frac{n(n-1)\cdots(n-k+1)}{n^k} = \frac{n}{n} \cdot \frac{n-1}{n} \cdots \frac{n-k+1}{n} \to 1 \cdot 1 \cdots 1 = 1$$
$$\left(1 - \frac{\lambda}{n}\right)^{n} \to e^{-\lambda} \quad \text{(definition of } e \text{)}$$
$$\left(1 - \frac{\lambda}{n}\right)^{-k} \to 1$$
Result:
$$\lim_{n \to \infty} P(X=k) = \frac{\lambda^k}{k!} \cdot e^{-\lambda} \quad \square$$
The Poisson approximation to the Binomial is accurate when n ≥ 20 and p ≤ 0.05, or when n ≥ 100 and np ≤ 10. The approximation improves as n increases and p decreases with np held constant.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
import numpy as npfrom scipy.stats import binom, poissonimport matplotlib.pyplot as plt def demonstrate_poisson_limit(): """ Demonstrate how Binomial converges to Poisson as n→∞, p→0, np=λ. """ lambda_fixed = 5 # Fixed expected value # Different values of n (with p = λ/n) n_values = [10, 20, 50, 100, 500, 1000] # Range of k values to plot k_max = 15 k_values = np.arange(0, k_max + 1) # True Poisson probabilities poisson_probs = poisson.pmf(k_values, lambda_fixed) print(f"Poisson Limit Theorem Demonstration (λ = {lambda_fixed})") print("=" * 60) for n in n_values: p = lambda_fixed / n # Binomial probabilities binom_probs = binom.pmf(k_values, n, p) # Maximum absolute difference max_diff = np.max(np.abs(binom_probs - poisson_probs)) # Total variation distance tv_distance = 0.5 * np.sum(np.abs(binom_probs - poisson_probs)) print(f"n = {n:4d}, p = {p:.4f}: Max diff = {max_diff:.6f}, " f"TV distance = {tv_distance:.6f}") # Theoretical bound: TV ≤ min(np², λ²/n) ≈ λ²/n for small p print("Theoretical bound: Total Variation ≤ λ²/n") for n in n_values: bound = lambda_fixed**2 / n print(f"n = {n:4d}: Bound = {bound:.6f}") demonstrate_poisson_limit() # Practical example: Rare event modelingprint("" + "=" * 60)print("Example: Defective Items in Manufacturing")print("=" * 60) defect_rate = 0.001 # 0.1% defect ratebatch_size = 10000 # 10,000 items per batchexpected_defects = defect_rate * batch_size # λ = 10 # Exact Binomialbinom_dist = binom(batch_size, defect_rate) # Poisson approximationpoisson_dist = poisson(expected_defects) # Compare P(X ≤ 15)k = 15binom_cdf = binom_dist.cdf(k)poisson_cdf = poisson_dist.cdf(k) print(f"Defect rate: {defect_rate:.4f}, Batch size: {batch_size}")print(f"Expected defects: λ = {expected_defects}")print(f"P(defects ≤ {k}):")print(f" Binomial: {binom_cdf:.6f}")print(f" Poisson: {poisson_cdf:.6f}")print(f" Difference: {abs(binom_cdf - poisson_cdf):.6f}")The Poisson distribution naturally arises from the Poisson process—a stochastic process that models events occurring randomly over time (or space). Understanding this connection provides deep intuition for when the Poisson is appropriate.
A Poisson process with rate λ > 0 is characterized by three assumptions:
1. Stationarity: The probability of an event in any small interval depends only on the length of the interval, not when it occurs.
$$P(\text{1 event in } (t, t+h)) = \lambda h + o(h)$$
2. Independent Increments: The numbers of events in non-overlapping intervals are independent.
3. No Simultaneous Events: At most one event can occur at any instant.
$$P(\text{≥ 2 events in } (t, t+h)) = o(h)$$
where o(h) means terms that go to zero faster than h as h → 0.
From these assumptions, it follows that the number of events N(t) in an interval of length t follows:
$$N(t) \sim \text{Poisson}(\lambda t)$$
The waiting time T until the first event follows:
$$T \sim \text{Exponential}(\lambda)$$
with PDF f(t) = λe^{-λt} for t ≥ 0.
The Poisson and Exponential distributions are two sides of the same coin. Poisson counts events in a fixed interval; Exponential measures time between events. If inter-arrival times are Exponential(λ), then counts are Poisson(λt). This duality is fundamental to queuing theory and event modeling.
Let N(t) be the number of events in [0, t]. We can derive the Poisson distribution by considering what happens as we discretize time.
Divide [0, t] into n small subintervals of length h = t/n. For very small h:
$$P(\text{event in subinterval}) \approx \lambda h = \frac{\lambda t}{n}$$
With n subintervals and independent events, the total count is approximately:
$$N(t) \approx \text{Binomial}\left(n, \frac{\lambda t}{n}\right)$$
As n → ∞ (infinitesimal subintervals), by the Poisson Limit Theorem:
$$N(t) \sim \text{Poisson}(\lambda t)$$
The Exponential distribution (waiting time) has the memoryless property:
$$P(T > s + t | T > s) = P(T > t)$$
The probability of waiting at least t more units, given you've already waited s units, is the same as the probability of waiting t units from the start. This reflects the "fresh start" nature of Poisson processes—past events don't affect future waiting times.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
import numpy as npfrom scipy.stats import poisson, exponimport matplotlib.pyplot as plt class PoissonProcess: """ Simulate and analyze a Poisson process. """ def __init__(self, rate: float): """ Args: rate: λ, the average number of events per unit time """ self.rate = rate def simulate_events(self, duration: float) -> np.ndarray: """ Simulate event arrival times up to the given duration. Uses the inter-arrival time approach (Exponential waiting times). Returns: Array of event arrival times """ arrival_times = [] current_time = 0 while current_time < duration: # Wait time until next event ~ Exponential(λ) wait_time = np.random.exponential(1 / self.rate) current_time += wait_time if current_time < duration: arrival_times.append(current_time) return np.array(arrival_times) def count_events(self, events: np.ndarray, interval_end: float) -> int: """Count events in [0, interval_end].""" return np.sum(events <= interval_end) def theoretical_distribution(self, t: float) -> poisson: """Return the Poisson distribution for count in [0, t].""" return poisson(self.rate * t) # Demonstrationnp.random.seed(42) # Website traffic: average 10 visitors per minuterate = 10 # λ = 10 per minuteprocess = PoissonProcess(rate) # Simulate 1000 one-minute intervals and check distributionprint("Poisson Process Simulation: Website Traffic")print(f"Rate λ = {rate} visitors/minute")print("=" * 50) n_simulations = 10000counts = [] for _ in range(n_simulations): events = process.simulate_events(duration=1.0) # 1 minute counts.append(len(events)) counts = np.array(counts) # Compare empirical distribution to theoretical Poisson(λ)theoretical = poisson(rate) print(f"Empirical vs Theoretical (Poisson({rate})):")print(f"Mean: Empirical = {counts.mean():.3f}, Theoretical = {rate}")print(f"Variance: Empirical = {counts.var():.3f}, Theoretical = {rate}")print(f"Std Dev: Empirical = {counts.std():.3f}, Theoretical = {np.sqrt(rate):.3f}") # Check some specific probabilitiesprint("Probability comparisons:")for k in [5, 10, 15]: emp_prob = np.mean(counts == k) theo_prob = theoretical.pmf(k) print(f"P(X = {k:2d}): Empirical = {emp_prob:.4f}, Theoretical = {theo_prob:.4f}") # Inter-arrival times should be Exponential(λ)print("" + "=" * 50)print("Inter-arrival Time Analysis")print("=" * 50) # Longer simulation to get many inter-arrival timesevents = process.simulate_events(duration=100) # 100 minutesinter_arrivals = np.diff(events) # Time between consecutive events print(f"Number of events: {len(events)}")print(f"Number of inter-arrival times: {len(inter_arrivals)}")print(f"Inter-arrival times (should be Exponential(λ={rate})):")print(f"Mean: Empirical = {inter_arrivals.mean():.4f}, Theoretical = {1/rate:.4f}")print(f"Std Dev: Empirical = {inter_arrivals.std():.4f}, Theoretical = {1/rate:.4f}")Estimating the Poisson rate parameter λ from observed count data is straightforward via Maximum Likelihood Estimation.
Given n independent observations x₁, x₂, ..., xₙ from a Poisson(λ) distribution, the likelihood is:
$$L(\lambda) = \prod_{i=1}^n \frac{\lambda^{x_i} e^{-\lambda}}{x_i!} = \frac{\lambda^{\sum x_i} e^{-n\lambda}}{\prod_{i=1}^n x_i!}$$
The log-likelihood is:
$$\ell(\lambda) = \left(\sum_{i=1}^n x_i\right) \ln \lambda - n\lambda - \sum_{i=1}^n \ln(x_i!)$$
Taking the derivative and setting to zero:
$$\frac{d\ell}{d\lambda} = \frac{\sum_{i=1}^n x_i}{\lambda} - n = 0$$
$$\hat{\lambda}{MLE} = \frac{1}{n}\sum{i=1}^n x_i = \bar{x}$$
The MLE for λ is simply the sample mean—the average count.
Unbiasedness: $$E[\hat{\lambda}] = E[\bar{X}] = \lambda$$
Variance: $$Var(\hat{\lambda}) = Var(\bar{X}) = \frac{\lambda}{n}$$
Standard Error: $$SE(\hat{\lambda}) = \sqrt{\frac{\lambda}{n}} \approx \sqrt{\frac{\hat{\lambda}}{n}}$$
Consistency: As n → ∞, λ̂ → λ in probability.
Asymptotic Normality: $$\sqrt{n}(\hat{\lambda} - \lambda) \stackrel{d}{\rightarrow} N(0, \lambda)$$
For large n: $$\hat{\lambda} \approx N\left(\lambda, \frac{\lambda}{n}\right)$$
1. Normal Approximation (large n): $$\hat{\lambda} \pm z_{\alpha/2} \sqrt{\frac{\hat{\lambda}}{n}}$$
2. Exact (using relationship to Chi-squared): The sum Σxᵢ ~ Poisson(nλ), and 2nλ̂ follows approximately χ²(2Σxᵢ + 1).
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
import numpy as npfrom scipy import stats def poisson_mle(counts: np.ndarray) -> dict: """ Maximum Likelihood Estimation for Poisson parameter λ. Returns point estimate and confidence intervals. """ n = len(counts) total = np.sum(counts) # MLE estimate (sample mean) lambda_hat = np.mean(counts) # Standard error se = np.sqrt(lambda_hat / n) # 95% CI using normal approximation z = 1.96 ci_normal = (lambda_hat - z * se, lambda_hat + z * se) # 95% CI using exact chi-squared method # 2 * n * λ_hat ~ χ²(2 * Σx_i) approximately # More precisely, for sum S = Σx_i ~ Poisson(nλ): # CI for nλ leads to CI for λ chi2_lower = stats.chi2.ppf(0.025, 2 * total) / (2 * n) chi2_upper = stats.chi2.ppf(0.975, 2 * (total + 1)) / (2 * n) ci_exact = (chi2_lower, chi2_upper) return { 'n': n, 'total_count': total, 'lambda_hat': lambda_hat, 'standard_error': se, 'ci_95_normal': ci_normal, 'ci_95_exact': ci_exact } def test_overdispersion(counts: np.ndarray) -> dict: """ Test if data is overdispersed relative to Poisson (variance > mean). """ n = len(counts) mean = np.mean(counts) var = np.var(counts, ddof=1) # Dispersion statistic dispersion = var / mean # Chi-squared test for overdispersion # Under Poisson: Σ(x_i - x̄)²/x̄ ~ χ²(n-1) chi2_stat = (n - 1) * dispersion p_value = 1 - stats.chi2.cdf(chi2_stat, df=n-1) return { 'mean': mean, 'variance': var, 'dispersion': dispersion, 'chi2_statistic': chi2_stat, 'p_value': p_value, 'overdispersed': p_value < 0.05 } # Example: Customer arrivals per hournp.random.seed(42)true_lambda = 7.5n_hours = 50arrivals = np.random.poisson(true_lambda, n_hours) print("Poisson Parameter Estimation: Customer Arrivals")print("=" * 50)print(f"True λ = {true_lambda}")print(f"Observed data: {arrivals[:10]}... (first 10 of {n_hours})") results = poisson_mle(arrivals)print(f"MLE Results:")print(f" λ̂ = {results['lambda_hat']:.4f}")print(f" Standard Error = {results['standard_error']:.4f}")print(f" 95% CI (Normal): [{results['ci_95_normal'][0]:.4f}, {results['ci_95_normal'][1]:.4f}]")print(f" 95% CI (Exact): [{results['ci_95_exact'][0]:.4f}, {results['ci_95_exact'][1]:.4f}]") # Test for overdispersiondispersion_test = test_overdispersion(arrivals)print(f"Overdispersion Test:")print(f" Mean = {dispersion_test['mean']:.4f}")print(f" Variance = {dispersion_test['variance']:.4f}")print(f" Dispersion ratio = {dispersion_test['dispersion']:.4f}")print(f" p-value = {dispersion_test['p_value']:.4f}")print(f" Conclusion: {'Overdispersed' if dispersion_test['overdispersed'] else 'Consistent with Poisson'}")Real count data often exhibits overdispersion (variance > mean), violating the Poisson assumption. Common causes include unobserved heterogeneity and clustering. Alternatives include the Negative Binomial distribution or quasi-Poisson models that explicitly model overdispersion.
The Poisson distribution has several elegant properties that make it mathematically tractable.
If X₁ ~ Poisson(λ₁) and X₂ ~ Poisson(λ₂) are independent, then:
$$X_1 + X_2 \sim \text{Poisson}(\lambda_1 + \lambda_2)$$
More generally, for independent Xᵢ ~ Poisson(λᵢ):
$$\sum_{i=1}^n X_i \sim \text{Poisson}\left(\sum_{i=1}^n \lambda_i\right)$$
Proof via MGF: $$M_{X+Y}(t) = M_X(t) M_Y(t) = e^{\lambda_1(e^t-1)} e^{\lambda_2(e^t-1)} = e^{(\lambda_1+\lambda_2)(e^t-1)}$$
This is the MGF of Poisson(λ₁ + λ₂). ∎
Application: Total website traffic = traffic from channel A + traffic from channel B follows Poisson if each channel is Poisson.
If X ~ Poisson(λ) represents total events, and each event is independently classified as type A with probability p or type B with probability (1-p), then:
Application: Customer arrivals (Poisson) with some fraction p making purchases → purchases and non-purchases are independent Poisson processes.
If N ~ Poisson(λ) and Y₁, Y₂, ... are i.i.d. with PGF G_Y(s), then:
$$S = \sum_{i=1}^N Y_i$$
has PGF:
$$G_S(s) = e^{\lambda(G_Y(s) - 1)} = \exp(\lambda(G_Y(s) - 1))$$
This is the compound Poisson distribution, important in insurance and finance.
| Property | Statement | Application |
|---|---|---|
| Additivity | Sum of indep. Poissons is Poisson | Aggregating event sources |
| Thinning/Splitting | Random classification gives independent Poissons | Customer segmentation |
| Rate scaling | X(t) ~ Poisson(λt) | Adjusting observation window |
| Mean = Variance | E[X] = Var(X) = λ | Overdispersion diagnostic |
| Asymptotic normality | Poisson(λ) → N(λ, λ) for large λ | Approximate inference |
For large λ (typically λ ≥ 10), the Poisson is well-approximated by a Gaussian:
$$\text{Poisson}(\lambda) \approx N(\lambda, \lambda)$$
Or after standardization:
$$\frac{X - \lambda}{\sqrt{\lambda}} \stackrel{d}{\rightarrow} N(0, 1)$$
This enables using z-tests and z-intervals for Poisson data with large counts.
The Poisson distribution appears throughout machine learning whenever we model counts or rates.
Poisson regression models count data as a function of features:
$$Y_i | \mathbf{x}_i \sim \text{Poisson}(\lambda_i) \quad \text{where} \quad \log \lambda_i = \mathbf{w}^T \mathbf{x}_i + b$$
The log link function ensures λᵢ > 0. The negative log-likelihood loss is:
$$\mathcal{L} = \sum_{i=1}^n \left[ \lambda_i - y_i \log \lambda_i + \log(y_i!) \right]$$
Use cases:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
import numpy as npfrom scipy.optimize import minimizefrom scipy.special import gammaln # log(n!) def poisson_nll(params, X, y): """ Negative log-likelihood for Poisson regression. Model: y ~ Poisson(λ), where log(λ) = X @ w + b NLL = Σ [λ_i - y_i * log(λ_i) + log(y_i!)] """ w, b = params[:-1], params[-1] log_lambda = X @ w + b lambda_ = np.exp(log_lambda) # Negative log-likelihood (ignoring constant log(y!) which doesn't affect optimization) nll = np.sum(lambda_ - y * log_lambda) return nll def fit_poisson_regression(X, y): """ Fit Poisson regression via MLE. """ n_features = X.shape[1] initial_params = np.zeros(n_features + 1) result = minimize(poisson_nll, initial_params, args=(X, y), method='L-BFGS-B') w = result.x[:-1] b = result.x[-1] return w, b def predict_poisson(X, w, b): """Predict expected counts.""" log_lambda = X @ w + b return np.exp(log_lambda) # Example: Predicting customer purchases based on featuresnp.random.seed(42)n = 500 # Features: age (normalized), income (normalized), website visitsX = np.random.randn(n, 3)true_w = np.array([0.5, 0.8, 0.3])true_b = 1.0 # True λ = exp(w'x + b)true_lambda = np.exp(X @ true_w + true_b) # Generate countsy = np.random.poisson(true_lambda) # Fit modelw_hat, b_hat = fit_poisson_regression(X, y) print("Poisson Regression Results")print("=" * 50)print(f"True weights: {true_w}")print(f"Estimated weights: {w_hat.round(4)}")print(f"True bias: {true_b}")print(f"Estimated bias: {b_hat:.4f}") # Predictionsy_pred = predict_poisson(X, w_hat, b_hat)print(f"Mean actual count: {y.mean():.2f}")print(f"Mean predicted λ: {y_pred.mean():.2f}") # Deviance (measure of fit)deviance = 2 * np.sum(y * np.log(np.maximum(y, 1) / y_pred) - (y - y_pred))print(f"Deviance: {deviance:.2f}")In text analysis, word counts often follow Poisson-like distributions:
Poisson assumption in LDA: Latent Dirichlet Allocation can be derived assuming Poisson word counts.
Sparse Additive Generative Models (SAGE): Model word counts as Poisson with log-linear rates.
If baseline event counts are Poisson(λ), we can detect anomalies:
Example: Server normally has 100 errors/hour (λ = 100). If we observe 150 errors:
$$P(X \geq 150) = 1 - F_{\text{Poisson}(100)}(149) \approx 0.000003$$
This is highly anomalous!
When predicting non-negative integers, Poisson loss is sometimes used:
$$\mathcal{L}_{\text{Poisson}} = \hat{y} - y \log(\hat{y})$$
where ŷ is the predicted rate (must be positive, often from softplus or exp activation).
The Poisson distribution is the fundamental model for counting random events. Let's consolidate the key concepts:
What's Next:
We've covered the fundamental discrete (Bernoulli/Binomial, Poisson) and continuous (Gaussian) distributions. Next, we explore the Exponential family—a unifying framework that encompasses these and many other distributions. Understanding exponential families reveals deep connections between distributions and provides powerful tools for generalized linear models and Bayesian inference.
You now have a comprehensive understanding of the Poisson distribution—from its derivation as the Binomial limit to its role in the Poisson process and its applications in machine learning. The Poisson is the natural choice whenever you need to model counts of events occurring randomly in time or space.