Loading learning content...
Every probability distribution can be characterized by functions that answer different but related questions about where probability resides. In the previous pages, we introduced the Probability Mass Function (PMF) for discrete random variables and the Probability Density Function (PDF) for continuous ones. Now we introduce the Cumulative Distribution Function (CDF)—a universal tool that works for all random variables.
These three functions form a complete toolkit for probabilistic reasoning:
Understanding when to use each, how they relate, and how to convert between them is essential for working with probabilistic machine learning models.
By the end of this page, you will understand the formal definitions and properties of PMFs, PDFs, and CDFs, master their interrelationships, and know how to use each effectively in ML contexts including sampling, probability computation, and model evaluation.
The CDF is the most fundamental way to describe a probability distribution—it works for discrete, continuous, and even mixed random variables.
Definition (Cumulative Distribution Function):
For any random variable $X$, the CDF is the function $F_X: \mathbb{R} \rightarrow [0, 1]$ defined by:
$$F_X(x) = P(X \leq x)$$
The CDF tells us the probability that $X$ takes a value less than or equal to $x$.
Properties of Any Valid CDF:
Every CDF must satisfy these properties (and any function satisfying them is a valid CDF):
1. Right-Continuity: $$\lim_{h \to 0^+} F(x + h) = F(x)$$
2. Monotonically Non-Decreasing: If $x_1 < x_2$, then $F(x_1) \leq F(x_2)$
3. Boundary Conditions: $$\lim_{x \to -\infty} F(x) = 0, \quad \lim_{x \to +\infty} F(x) = 1$$
These properties follow directly from the probability axioms. Property 2 reflects that $P(X \leq x_1) \leq P(X \leq x_2)$ when $x_1 < x_2$ (adding more values can only increase or maintain probability). Property 3 reflects that $P(X \leq -\infty) = 0$ (no probability below everything) and $P(X \leq \infty) = 1$ (all probability is below infinity).
| Distribution Type | CDF Behavior | Visualization |
|---|---|---|
| Discrete | Step function with jumps at each support point | Staircase pattern, flat between jumps |
| Continuous | Smooth, continuous curve | Smooth S-curve (for bounded support) |
| Mixed | Continuous with jumps at point masses | Smooth with occasional steps |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102
import numpy as npfrom scipy import stats def demonstrate_cdf_properties(): """ Demonstrate the fundamental properties of CDFs. """ print("CDF Properties Demonstration") print("=" * 60) # Example 1: Discrete CDF (Binomial) print("1. Discrete CDF: Binomial(n=10, p=0.5)") print("-" * 60) binom = stats.binom(n=10, p=0.5) # Show step function nature print("Step function behavior:") for x in [2.0, 2.5, 2.9, 3.0, 3.1, 3.5, 4.0]: print(f" F({x:.1f}) = P(X ≤ {x:.1f}) = {binom.cdf(x):.6f}") print("Note: F is constant between integers (steps only at support)") # Example 2: Continuous CDF (Normal) print("2. Continuous CDF: Normal(μ=0, σ=1)") print("-" * 60) normal = stats.norm(0, 1) # Smooth behavior print("Smooth behavior (no jumps):") for x in np.linspace(-2, 2, 9): print(f" F({x:+5.2f}) = {normal.cdf(x):.6f}") # Example 3: Verify monotonicity print("3. Monotonicity Check") print("-" * 60) x_values = np.linspace(-3, 3, 100) cdf_values = normal.cdf(x_values) # Check that each value >= previous is_monotonic = np.all(np.diff(cdf_values) >= 0) print(f"CDF is monotonically non-decreasing: {is_monotonic}") # Example 4: Boundary conditions print("4. Boundary Conditions") print("-" * 60) print(f"lim(x → -∞) F(x) = {normal.cdf(-100):.10f} ≈ 0") print(f"lim(x → +∞) F(x) = {normal.cdf(100):.10f} ≈ 1") def cdf_for_probability_computations(): """ Show how CDFs simplify probability computations. """ print(" Using CDFs for Probability Computation") print("=" * 60) normal = stats.norm(0, 1) print("Key formulas:") print(" P(X ≤ a) = F(a)") print(" P(X > a) = 1 - F(a)") print(" P(a < X ≤ b) = F(b) - F(a)") print(" P(a ≤ X ≤ b) = F(b) - F(a) [for continuous X]") a, b = -1, 2 print(f"Example with N(0,1), a={a}, b={b}:") print(f" P(X ≤ {a}) = F({a}) = {normal.cdf(a):.6f}") print(f" P(X > {b}) = 1 - F({b}) = {1 - normal.cdf(b):.6f}") print(f" P({a} < X ≤ {b}) = F({b}) - F({a}) = {normal.cdf(b) - normal.cdf(a):.6f}") # For discrete: P(X = k) = F(k) - F(k-1) print("For discrete RVs:") print(" P(X = k) = F(k) - F(k-1)") binom = stats.binom(10, 0.5) k = 5 p_k_via_pmf = binom.pmf(k) p_k_via_cdf = binom.cdf(k) - binom.cdf(k-1) print(f"Binomial(10, 0.5): P(X = {k})") print(f" Via PMF: {p_k_via_pmf:.6f}") print(f" Via CDF diff: F({k}) - F({k-1}) = {p_k_via_cdf:.6f}") demonstrate_cdf_properties()cdf_for_probability_computations()The PMF, PDF, and CDF are intimately related. Given one, you can derive the others (within a distribution type).
For Discrete Random Variables:
| From | To | Relationship |
|---|---|---|
| PMF to CDF | $F(x) = \sum_{k \leq x} p(k)$ | Sum all PMF values up to x |
| CDF to PMF | $p(k) = F(k) - F(k^-)$ | Difference at jump points |
Here $F(k^-) = \lim_{x \to k^-} F(x)$ is the left limit (value just before the jump).
For Continuous Random Variables:
| From | To | Relationship |
|---|---|---|
| PDF to CDF | $F(x) = \int_{-\infty}^{x} f(t) , dt$ | Integrate PDF from −∞ |
| CDF to PDF | $f(x) = \frac{d}{dx} F(x)$ | Differentiate CDF |
The CDF is the integral of the PDF; the PDF is the derivative of the CDF.
The relationship F(x) = ∫f(t)dt and f(x) = dF/dx is exactly the Fundamental Theorem of Calculus applied to probability. This isn't a coincidence—probability theory was designed so that measure-theoretic integrals behave like familiar calculus integrals.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
import numpy as npfrom scipy import stats, integrate def discrete_relationships(): """ Demonstrate PMF ↔ CDF relationships for discrete RVs. """ print("Discrete RV: PMF ↔ CDF Relationships") print("=" * 60) # Poisson distribution lam = 3 poisson = stats.poisson(lam) print(f"Poisson(λ={lam})") print("Computing CDF from PMF by summation:") print(f"{'k':>4} {'p(k)':>10} {'F(k) from sum':>15} {'F(k) from scipy':>18}") print("-" * 50) cumulative = 0 for k in range(10): pmf_k = poisson.pmf(k) cumulative += pmf_k cdf_k = poisson.cdf(k) print(f"{k:>4} {pmf_k:>10.6f} {cumulative:>15.6f} {cdf_k:>18.6f}") print("Recovering PMF from CDF:") print(" p(k) = F(k) - F(k-1)") for k in range(1, 6): recovered_pmf = poisson.cdf(k) - poisson.cdf(k-1) actual_pmf = poisson.pmf(k) print(f" p({k}) = F({k}) - F({k-1}) = {recovered_pmf:.6f} (actual: {actual_pmf:.6f})") def continuous_relationships(): """ Demonstrate PDF ↔ CDF relationships for continuous RVs. """ print(" Continuous RV: PDF ↔ CDF Relationships") print("=" * 60) # Standard Normal normal = stats.norm(0, 1) print("Normal(0, 1)") print("Computing CDF from PDF by integration:") print(f"{'x':>6} {'f(x)':>10} {'∫f(t)dt from -∞':>20} {'F(x) from scipy':>18}") print("-" * 56) for x in [-2, -1, 0, 1, 2]: pdf_x = normal.pdf(x) # Numerical integration cdf_integral, _ = integrate.quad(normal.pdf, -np.inf, x) cdf_scipy = normal.cdf(x) print(f"{x:>6.1f} {pdf_x:>10.6f} {cdf_integral:>20.10f} {cdf_scipy:>18.10f}") print("Recovering PDF from CDF by numerical differentiation:") print(" f(x) ≈ [F(x+h) - F(x-h)] / (2h)") h = 0.0001 for x in [-1.0, 0.0, 1.0, 2.0]: numerical_derivative = (normal.cdf(x + h) - normal.cdf(x - h)) / (2 * h) actual_pdf = normal.pdf(x) print(f" f({x:+.1f}) ≈ {numerical_derivative:.6f} (actual: {actual_pdf:.6f})") def survival_function(): """ Introduce the survival function (complement of CDF). """ print(" The Survival Function: S(x) = 1 - F(x) = P(X > x)") print("=" * 60) print("The survival function is critical in:") print(" - Reliability engineering: P(component survives past time t)") print(" - Medical studies: P(patient survives past time t)") print(" - ML: P-values, tail probabilities, outlier detection") normal = stats.norm(0, 1) print("For N(0, 1):") for x in [1, 1.645, 1.96, 2.576, 3]: survival = normal.sf(x) # sf = survival function complement = 1 - normal.cdf(x) print(f" S({x:.3f}) = P(X > {x:.3f}) = {survival:.6f}") print("Note: scipy provides both cdf(x) and sf(x) = 1 - cdf(x)") print(" sf(x) is more numerically stable for small tail probs") discrete_relationships()continuous_relationships()survival_function()The CDF answers 'Given a value $x$, what's the probability of being at most $x$?' The inverse CDF (also called the quantile function) reverses this: 'Given a probability $p$, what value has that probability below it?'
Definition (Quantile Function / Inverse CDF):
For a random variable $X$ with CDF $F_X$, the quantile function $Q_X: (0, 1) \rightarrow \mathbb{R}$ is:
$$Q_X(p) = \inf{x \in \mathbb{R} : F_X(x) \geq p}$$
The infimum handles discrete distributions where the CDF jumps past $p$ without hitting it exactly. For continuous distributions with strictly increasing CDF, this simplifies to $Q_X = F_X^{-1}$.
Key Quantiles:
| Quantile | Probability | Name | Usage |
|---|---|---|---|
| Q(0.5) | 50% | Median | Robust central tendency |
| Q(0.25), Q(0.75) | 25%, 75% | Quartiles | Spread via IQR |
| Q(0.01), ..., Q(0.99) | 1%, ..., 99% | Percentiles | Distribution description |
| Q(0.025), Q(0.975) | 2.5%, 97.5% | — | 95% confidence intervals |
The Quantile-Probability Duality:
$$F(Q(p)) = p \quad \text{and} \quad Q(F(x)) = x$$
(with some technicalities for discrete cases)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
import numpy as npfrom scipy import stats def demonstrate_quantiles(): """ Demonstrate quantile functions and inverse transform sampling. """ print("Quantile Functions (Inverse CDF)") print("=" * 60) # Standard Normal quantiles normal = stats.norm(0, 1) print("Standard Normal N(0, 1) Quantiles:") print("-" * 40) quantiles = [0.001, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975, 0.99] for p in quantiles: q = normal.ppf(p) # ppf = percent point function = inverse CDF print(f" Q({p:.3f}) = {q:+.4f}") # Verify: F(Q(p)) = p print("Verification: F(Q(p)) = p") for p in [0.25, 0.5, 0.75]: q = normal.ppf(p) recovered_p = normal.cdf(q) print(f" F(Q({p})) = F({q:.4f}) = {recovered_p:.6f}") def inverse_transform_sampling(): """ Demonstrate inverse transform sampling. """ print(" Inverse Transform Sampling") print("=" * 60) print("Method: To sample from distribution F,") print(" 1. Sample U ~ Uniform(0, 1)") print(" 2. Return Q(U) = F⁻¹(U)") np.random.seed(42) n_samples = 100000 # Target: Exponential(λ=2) true_lambda = 2.0 exponential = stats.expon(scale=1/true_lambda) # Method 1: Direct sampling (scipy) direct_samples = exponential.rvs(n_samples) # Method 2: Inverse transform uniform_samples = np.random.uniform(0, 1, n_samples) inverse_transform_samples = exponential.ppf(uniform_samples) print(f"Target: Exponential(λ={true_lambda})") print(f"True mean = 1/λ = {1/true_lambda}") print(f"Direct sampling mean: {direct_samples.mean():.4f}") print(f"Inverse transform mean: {inverse_transform_samples.mean():.4f}") print(f"True variance = 1/λ² = {1/true_lambda**2}") print(f"Direct sampling var: {direct_samples.var():.4f}") print(f"Inverse transform var: {inverse_transform_samples.var():.4f}") # For Exponential, we can derive the inverse CDF analytically print(" Analytical Inverse Transform for Exponential:") print(" CDF: F(x) = 1 - exp(-λx)") print(" Solving F(x) = u for x:") print(" u = 1 - exp(-λx)") print(" exp(-λx) = 1 - u") print(" x = -log(1-u)/λ = Q(u)") # Verify analytical_samples = -np.log(1 - uniform_samples) / true_lambda print(f"Analytical inverse transform mean: {analytical_samples.mean():.4f}") def quantile_regression_intuition(): """ Explain why quantile regression matters. """ print(" Quantile Regression: Beyond the Mean") print("=" * 60) print("Why predict quantiles instead of mean?") print("-" * 40) print(""" Scenario: Predicting delivery times • Mean prediction: 5 days • Median prediction (Q(0.5)): 4 days • 90th percentile (Q(0.9)): 12 days A customer asking "when will it arrive?" needs different info: - "Most likely": Use median (robust to outliers) - "Plan for worst case": Use high quantile (Q(0.9)) - "Optimistic estimate": Use low quantile (Q(0.1)) Mean-only models can't provide this richness! """) # Simulate heteroscedastic data np.random.seed(42) x = np.linspace(0, 10, 1000) # Variance increases with x (heteroscedasticity) y = 2*x + 1 + np.random.normal(0, 1 + 0.5*x, 1000) # Quantiles at different x values print("Simulated delivery time quantiles:") print(f"{'x (distance)':<15} {'Q(0.1)':<10} {'Q(0.5)':<10} {'Q(0.9)':<10}") print("-" * 45) for x_val in [1, 5, 10]: mask = np.abs(x - x_val) < 0.5 local_y = y[mask] q10 = np.percentile(local_y, 10) q50 = np.percentile(local_y, 50) q90 = np.percentile(local_y, 90) print(f"{x_val:<15.0f} {q10:<10.2f} {q50:<10.2f} {q90:<10.2f}") print("Note: Spread (Q90 - Q10) increases with distance!") demonstrate_quantiles()inverse_transform_sampling()quantile_regression_intuition()CDFs provide a natural framework for comparing distributions—whether comparing a model's predictions to observed data, or comparing two different models.
Key Comparison Methods:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
import numpy as npfrom scipy import stats def ks_test_demonstration(): """ Demonstrate the Kolmogorov-Smirnov test for distribution comparison. """ print("Kolmogorov-Smirnov Test") print("=" * 60) np.random.seed(42) # Test 1: Data from N(0,1) vs N(0,1) hypothesis (should match) print("Test 1: N(0,1) data vs N(0,1) hypothesis") data_1 = np.random.normal(0, 1, 500) stat_1, pval_1 = stats.kstest(data_1, 'norm', args=(0, 1)) print(f" KS statistic: {stat_1:.4f}") print(f" p-value: {pval_1:.4f}") print(f" Conclusion: {'Match (p > 0.05)' if pval_1 > 0.05 else 'Mismatch'}") # Test 2: Data from N(0.5, 1) vs N(0,1) hypothesis (should differ) print("Test 2: N(0.5, 1) data vs N(0,1) hypothesis") data_2 = np.random.normal(0.5, 1, 500) stat_2, pval_2 = stats.kstest(data_2, 'norm', args=(0, 1)) print(f" KS statistic: {stat_2:.4f}") print(f" p-value: {pval_2:.4f}") print(f" Conclusion: {'Match (p > 0.05)' if pval_2 > 0.05 else 'Mismatch detected!'}") # Test 3: Two-sample KS test print("Test 3: Two-sample KS test") sample_a = np.random.normal(0, 1, 300) sample_b = np.random.normal(0, 1.2, 300) # Slightly different variance stat_3, pval_3 = stats.ks_2samp(sample_a, sample_b) print(f" Sample A: N(0, 1), n=300") print(f" Sample B: N(0, 1.44), n=300") print(f" KS statistic: {stat_3:.4f}") print(f" p-value: {pval_3:.4f}") print(f" Conclusion: {'Same distribution' if pval_3 > 0.05 else 'Different distributions!'}") def empirical_cdf(): """ Demonstrate empirical CDF construction. """ print(" Empirical CDF (ECDF)") print("=" * 60) print("The ECDF is the ML estimator for the true CDF:") print(" F_n(x) = (1/n) * #{i : X_i ≤ x}") print(" = Fraction of samples at or below x") np.random.seed(42) samples = np.random.exponential(scale=2, size=20) samples_sorted = np.sort(samples) print(f"Samples (sorted): {samples_sorted[:8].round(2)}...") print("ECDF evaluation:") for x in [0.5, 1.0, 2.0, 5.0]: ecdf_val = np.mean(samples <= x) true_cdf = stats.expon(scale=2).cdf(x) print(f" F̂({x:.1f}) = {ecdf_val:.3f}, True F({x:.1f}) = {true_cdf:.3f}") print("Glivenko-Cantelli Theorem:") print(" As n → ∞, sup|F_n(x) - F(x)| → 0 almost surely") print(" The ECDF converges uniformly to the true CDF!") # Show convergence print("Convergence with sample size:") for n in [10, 100, 1000, 10000]: samples_n = np.random.exponential(scale=2, size=n) # KS statistic measures sup|ECDF - CDF| ks_stat, _ = stats.kstest(samples_n, 'expon', args=(0, 2)) print(f" n = {n:>5}: max|F̂ - F| ≈ {ks_stat:.4f}") def calibration_check(): """ Show how CDF/quantiles are used for calibration checking. """ print(" Model Calibration via CDF") print("=" * 60) print("A well-calibrated probabilistic model satisfies:") print(" For predicted CDF F, observed y should have F(y) ~ Uniform(0,1)") print("This is the 'PIT' (Probability Integral Transform)") np.random.seed(42) n = 1000 # Scenario 1: Well-calibrated model # True: Y ~ N(μ, 1), Model predicts N(μ, 1) true_mu = np.random.uniform(-1, 1, n) y_true = true_mu + np.random.normal(0, 1, n) # Model predicts N(true_mu, 1) - correctly specified pit_values = stats.norm.cdf(y_true, loc=true_mu, scale=1) # PIT should be uniform ks_stat, p_val = stats.kstest(pit_values, 'uniform') print(f"Well-calibrated model:") print(f" PIT ~ Uniform test: KS = {ks_stat:.4f}, p = {p_val:.4f}") print(f" Calibrated: {'Yes' if p_val > 0.05 else 'No'}") # Scenario 2: Miscalibrated (underestimated variance) # Model thinks variance is 0.5, but true variance is 1 pit_miscal = stats.norm.cdf(y_true, loc=true_mu, scale=0.5) ks_stat2, p_val2 = stats.kstest(pit_miscal, 'uniform') print(f"Miscalibrated model (underestimated σ):") print(f" PIT ~ Uniform test: KS = {ks_stat2:.4f}, p = {p_val2:.4f}") print(f" Calibrated: {'Yes' if p_val2 > 0.05 else 'No'}") ks_test_demonstration()empirical_cdf()calibration_check()Working with PMFs, PDFs, and CDFs in practice requires attention to numerical issues. Values can span many orders of magnitude, leading to overflow, underflow, and precision loss.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
import numpy as npfrom scipy import statsfrom scipy.special import logsumexp def log_space_computations(): """ Demonstrate log-space probability computations. """ print("Log-Space Probability Computations") print("=" * 60) # Problem: Compute product of many small probabilities print("Problem: Compute ∏ P(xᵢ) for many observations") # Example: 1000 samples from N(0,1), compute joint probability np.random.seed(42) samples = np.random.normal(0, 1, 1000) normal = stats.norm(0, 1) # BAD: Direct product (will underflow) probs = normal.pdf(samples) direct_product = np.prod(probs) print(f" Direct product: {direct_product}") # Will be 0.0 # GOOD: Log space computation log_probs = normal.logpdf(samples) log_product = np.sum(log_probs) print(f" Log of product (sum of logs): {log_product:.4f}") # The actual product is exp(-1416) ≈ 10^(-615), way smaller than 10^(-308) print(f" This is approximately 10^{log_product/np.log(10):.0f}") print(f" Double precision limit: ~10^(-308)") def logsumexp_trick(): """ Demonstrate the log-sum-exp trick for stable probability addition. """ print(" Log-Sum-Exp Trick") print("=" * 60) print("Problem: Compute log(p₁ + p₂) given log(p₁), log(p₂)") print("Naive: log(exp(log_p1) + exp(log_p2)) → overflow/underflow") # Example with very small probabilities log_p1 = -500 # p1 ≈ 10^(-217) log_p2 = -501 # p2 ≈ 10^(-218) print(f" log(p₁) = {log_p1}, log(p₂) = {log_p2}") # NAive approach fails try: naive_result = np.log(np.exp(log_p1) + np.exp(log_p2)) print(f" Naive result: {naive_result}") except: print(" Naive: OVERFLOW/UNDERFLOW") # Actually np.exp(-500) = 0 due to underflow print(f" np.exp({log_p1}) = {np.exp(log_p1)}") # 0.0 print(f" So naive gives: log(0 + 0) = -inf") # Log-sum-exp trick # log(exp(a) + exp(b)) = a + log(1 + exp(b-a)) [if b < a] stable_result = logsumexp([log_p1, log_p2]) print(f" Log-sum-exp result: {stable_result:.6f}") # Verify: log(p1 + p2) when p1/p2 = e^1, so p1 ≈ 2.7 * p2 # log(p1 + p2) = log(p1 * (1 + p2/p1)) = log_p1 + log(1 + e^(-1)) expected = log_p1 + np.log(1 + np.exp(log_p2 - log_p1)) print(f" Expected (analytical): {expected:.6f}") def extreme_tail_probabilities(): """ Demonstrate handling of extreme tail probabilities. """ print(" Extreme Tail Probabilities") print("=" * 60) normal = stats.norm(0, 1) print("P(X > x) for large x in N(0,1):") print(f"{'x':>8} {'1-cdf(x)':>20} {'sf(x)':>20} {'logsf(x)':>15}") print("-" * 65) for x in [3, 5, 8, 10, 30, 37]: one_minus_cdf = 1 - normal.cdf(x) survival = normal.sf(x) log_survival = normal.logsf(x) print(f"{x:>8} {one_minus_cdf:>20.10e} {survival:>20.10e} {log_survival:>15.4f}") print("Note: 1 - cdf(x) loses precision for large x") print(" sf(x) and logsf(x) remain accurate") # Critical for p-value computations print(" This matters for p-values:") z_score = 8.0 print(f" z-score = {z_score}") print(f" Two-tailed p = 2 * P(|X| > {z_score})") print(f" Using 1 - cdf: {2 * (1 - normal.cdf(z_score)):.2e}") print(f" Using sf: {2 * normal.sf(z_score):.2e}") print(f" Using logsf: exp(log(2) + logsf) = {np.exp(np.log(2) + normal.logsf(z_score)):.2e}") log_space_computations()logsumexp_trick()extreme_tail_probabilities()In ML, always work with log-likelihoods, not likelihoods. Use logpdf/logpmf instead of pdf/pmf. Use logsumexp for mixtures. Use logsf for p-values. This isn't optional—it's required for numerical correctness in virtually all real applications.
We've established how PMFs, PDFs, and CDFs form a complete toolkit for describing and computing with probability distributions.
| Aspect | PMF p(x) | PDF f(x) | CDF F(x) |
|---|---|---|---|
| Applies to | Discrete only | Continuous only | All distributions |
| Value meaning | P(X = x) | Density (not prob) | P(X ≤ x) |
| Can exceed 1? | No | Yes | No (bounded [0,1]) |
| Sums/integrates to | 1 | 1 | Approaches 1 |
| Get probability | Sum p(x) values | Integrate f(x) | Difference F(b)-F(a) |
| Shape | Bar heights | Smooth curve | Non-decreasing curve |
What's Next:
Having mastered the functions that describe distributions, we now turn to summarizing distributions with numbers: expectation and variance. These summary statistics capture the center and spread of a distribution, form the foundation of loss functions in ML, and lead to bias-variance analysis and beyond.
You now command the trinity of distribution functions: PMF, PDF, and CDF. You understand their relationships, can convert between them, and know how to handle the computational challenges they present. This foundation enables all probabilistic reasoning in machine learning.