Loading learning content...
Every probability distribution can be characterized by functions that answer different but related questions about where probability resides. In the previous pages, we introduced the Probability Mass Function (PMF) for discrete random variables and the Probability Density Function (PDF) for continuous ones. Now we introduce the Cumulative Distribution Function (CDF)—a universal tool that works for all random variables.
These three functions form a complete toolkit for probabilistic reasoning:
Understanding when to use each, how they relate, and how to convert between them is essential for working with probabilistic machine learning models.
By the end of this page, you will understand the formal definitions and properties of PMFs, PDFs, and CDFs, master their interrelationships, and know how to use each effectively in ML contexts including sampling, probability computation, and model evaluation.
The CDF is the most fundamental way to describe a probability distribution—it works for discrete, continuous, and even mixed random variables.
Definition (Cumulative Distribution Function):
For any random variable $X$, the CDF is the function $F_X: \mathbb{R} \rightarrow [0, 1]$ defined by:
$$F_X(x) = P(X \leq x)$$
The CDF tells us the probability that $X$ takes a value less than or equal to $x$.
Properties of Any Valid CDF:
Every CDF must satisfy these properties (and any function satisfying them is a valid CDF):
1. Right-Continuity: $$\lim_{h \to 0^+} F(x + h) = F(x)$$
2. Monotonically Non-Decreasing: If $x_1 < x_2$, then $F(x_1) \leq F(x_2)$
3. Boundary Conditions: $$\lim_{x \to -\infty} F(x) = 0, \quad \lim_{x \to +\infty} F(x) = 1$$
These properties follow directly from the probability axioms. Property 2 reflects that $P(X \leq x_1) \leq P(X \leq x_2)$ when $x_1 < x_2$ (adding more values can only increase or maintain probability). Property 3 reflects that $P(X \leq -\infty) = 0$ (no probability below everything) and $P(X \leq \infty) = 1$ (all probability is below infinity).
| Distribution Type | CDF Behavior | Visualization |
|---|---|---|
| Discrete | Step function with jumps at each support point | Staircase pattern, flat between jumps |
| Continuous | Smooth, continuous curve | Smooth S-curve (for bounded support) |
| Mixed | Continuous with jumps at point masses | Smooth with occasional steps |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
import numpy as npfrom scipy import stats def demonstrate_cdf_properties(): """ Demonstrate the fundamental properties of CDFs. """ print("CDF Properties Demonstration") print("=" * 60) # Example 1: Discrete CDF (Binomial) print("\n1. Discrete CDF: Binomial(n=10, p=0.5)") print("-" * 60) binom = stats.binom(n=10, p=0.5) # Show step function nature print("\nStep function behavior:") for x in [2.0, 2.5, 2.9, 3.0, 3.1, 3.5, 4.0]: print(f" F({x:.1f}) = P(X ≤ {x:.1f}) = {binom.cdf(x):.6f}") print("\nNote: F is constant between integers (steps only at support)") # Example 2: Continuous CDF (Normal) print("\n2. Continuous CDF: Normal(μ=0, σ=1)") print("-" * 60) normal = stats.norm(0, 1) # Smooth behavior print("\nSmooth behavior (no jumps):") for x in np.linspace(-2, 2, 9): print(f" F({x:+5.2f}) = {normal.cdf(x):.6f}") # Example 3: Verify monotonicity print("\n3. Monotonicity Check") print("-" * 60) x_values = np.linspace(-3, 3, 100) cdf_values = normal.cdf(x_values) # Check that each value >= previous is_monotonic = np.all(np.diff(cdf_values) >= 0) print(f"CDF is monotonically non-decreasing: {is_monotonic}") # Example 4: Boundary conditions print("\n4. Boundary Conditions") print("-" * 60) print(f"lim(x → -∞) F(x) = {normal.cdf(-100):.10f} ≈ 0") print(f"lim(x → +∞) F(x) = {normal.cdf(100):.10f} ≈ 1") def cdf_for_probability_computations(): """ Show how CDFs simplify probability computations. """ print("\n\nUsing CDFs for Probability Computation") print("=" * 60) normal = stats.norm(0, 1) print("\nKey formulas:") print(" P(X ≤ a) = F(a)") print(" P(X > a) = 1 - F(a)") print(" P(a < X ≤ b) = F(b) - F(a)") print(" P(a ≤ X ≤ b) = F(b) - F(a) [for continuous X]") a, b = -1, 2 print(f"\nExample with N(0,1), a={a}, b={b}:") print(f" P(X ≤ {a}) = F({a}) = {normal.cdf(a):.6f}") print(f" P(X > {b}) = 1 - F({b}) = {1 - normal.cdf(b):.6f}") print(f" P({a} < X ≤ {b}) = F({b}) - F({a}) = {normal.cdf(b) - normal.cdf(a):.6f}") # For discrete: P(X = k) = F(k) - F(k-1) print("\nFor discrete RVs:") print(" P(X = k) = F(k) - F(k-1)") binom = stats.binom(10, 0.5) k = 5 p_k_via_pmf = binom.pmf(k) p_k_via_cdf = binom.cdf(k) - binom.cdf(k-1) print(f"\nBinomial(10, 0.5): P(X = {k})") print(f" Via PMF: {p_k_via_pmf:.6f}") print(f" Via CDF diff: F({k}) - F({k-1}) = {p_k_via_cdf:.6f}") demonstrate_cdf_properties()cdf_for_probability_computations()The PMF, PDF, and CDF are intimately related. Given one, you can derive the others (within a distribution type).
For Discrete Random Variables:
| From | To | Relationship |
|---|---|---|
| PMF to CDF | $F(x) = \sum_{k \leq x} p(k)$ | Sum all PMF values up to x |
| CDF to PMF | $p(k) = F(k) - F(k^-)$ | Difference at jump points |
Here $F(k^-) = \lim_{x \to k^-} F(x)$ is the left limit (value just before the jump).
For Continuous Random Variables:
| From | To | Relationship |
|---|---|---|
| PDF to CDF | $F(x) = \int_{-\infty}^{x} f(t) , dt$ | Integrate PDF from −∞ |
| CDF to PDF | $f(x) = \frac{d}{dx} F(x)$ | Differentiate CDF |
The CDF is the integral of the PDF; the PDF is the derivative of the CDF.
The relationship F(x) = ∫f(t)dt and f(x) = dF/dx is exactly the Fundamental Theorem of Calculus applied to probability. This isn't a coincidence—probability theory was designed so that measure-theoretic integrals behave like familiar calculus integrals.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
import numpy as npfrom scipy import stats, integrate def discrete_relationships(): """ Demonstrate PMF ↔ CDF relationships for discrete RVs. """ print("Discrete RV: PMF ↔ CDF Relationships") print("=" * 60) # Poisson distribution lam = 3 poisson = stats.poisson(lam) print(f"\nPoisson(λ={lam})") print("\nComputing CDF from PMF by summation:") print(f"{'k':>4} {'p(k)':>10} {'F(k) from sum':>15} {'F(k) from scipy':>18}") print("-" * 50) cumulative = 0 for k in range(10): pmf_k = poisson.pmf(k) cumulative += pmf_k cdf_k = poisson.cdf(k) print(f"{k:>4} {pmf_k:>10.6f} {cumulative:>15.6f} {cdf_k:>18.6f}") print("\nRecovering PMF from CDF:") print(" p(k) = F(k) - F(k-1)") for k in range(1, 6): recovered_pmf = poisson.cdf(k) - poisson.cdf(k-1) actual_pmf = poisson.pmf(k) print(f" p({k}) = F({k}) - F({k-1}) = {recovered_pmf:.6f} (actual: {actual_pmf:.6f})") def continuous_relationships(): """ Demonstrate PDF ↔ CDF relationships for continuous RVs. """ print("\n\nContinuous RV: PDF ↔ CDF Relationships") print("=" * 60) # Standard Normal normal = stats.norm(0, 1) print("\nNormal(0, 1)") print("\nComputing CDF from PDF by integration:") print(f"{'x':>6} {'f(x)':>10} {'∫f(t)dt from -∞':>20} {'F(x) from scipy':>18}") print("-" * 56) for x in [-2, -1, 0, 1, 2]: pdf_x = normal.pdf(x) # Numerical integration cdf_integral, _ = integrate.quad(normal.pdf, -np.inf, x) cdf_scipy = normal.cdf(x) print(f"{x:>6.1f} {pdf_x:>10.6f} {cdf_integral:>20.10f} {cdf_scipy:>18.10f}") print("\nRecovering PDF from CDF by numerical differentiation:") print(" f(x) ≈ [F(x+h) - F(x-h)] / (2h)") h = 0.0001 for x in [-1.0, 0.0, 1.0, 2.0]: numerical_derivative = (normal.cdf(x + h) - normal.cdf(x - h)) / (2 * h) actual_pdf = normal.pdf(x) print(f" f({x:+.1f}) ≈ {numerical_derivative:.6f} (actual: {actual_pdf:.6f})") def survival_function(): """ Introduce the survival function (complement of CDF). """ print("\n\nThe Survival Function: S(x) = 1 - F(x) = P(X > x)") print("=" * 60) print("\nThe survival function is critical in:") print(" - Reliability engineering: P(component survives past time t)") print(" - Medical studies: P(patient survives past time t)") print(" - ML: P-values, tail probabilities, outlier detection") normal = stats.norm(0, 1) print("\nFor N(0, 1):") for x in [1, 1.645, 1.96, 2.576, 3]: survival = normal.sf(x) # sf = survival function complement = 1 - normal.cdf(x) print(f" S({x:.3f}) = P(X > {x:.3f}) = {survival:.6f}") print("\nNote: scipy provides both cdf(x) and sf(x) = 1 - cdf(x)") print(" sf(x) is more numerically stable for small tail probs") discrete_relationships()continuous_relationships()survival_function()The CDF answers 'Given a value $x$, what's the probability of being at most $x$?' The inverse CDF (also called the quantile function) reverses this: 'Given a probability $p$, what value has that probability below it?'
Definition (Quantile Function / Inverse CDF):
For a random variable $X$ with CDF $F_X$, the quantile function $Q_X: (0, 1) \rightarrow \mathbb{R}$ is:
$$Q_X(p) = \inf{x \in \mathbb{R} : F_X(x) \geq p}$$
The infimum handles discrete distributions where the CDF jumps past $p$ without hitting it exactly. For continuous distributions with strictly increasing CDF, this simplifies to $Q_X = F_X^{-1}$.
Key Quantiles:
| Quantile | Probability | Name | Usage |
|---|---|---|---|
| Q(0.5) | 50% | Median | Robust central tendency |
| Q(0.25), Q(0.75) | 25%, 75% | Quartiles | Spread via IQR |
| Q(0.01), ..., Q(0.99) | 1%, ..., 99% | Percentiles | Distribution description |
| Q(0.025), Q(0.975) | 2.5%, 97.5% | — | 95% confidence intervals |
The Quantile-Probability Duality:
$$F(Q(p)) = p \quad \text{and} \quad Q(F(x)) = x$$
(with some technicalities for discrete cases)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125
import numpy as npfrom scipy import stats def demonstrate_quantiles(): """ Demonstrate quantile functions and inverse transform sampling. """ print("Quantile Functions (Inverse CDF)") print("=" * 60) # Standard Normal quantiles normal = stats.norm(0, 1) print("\nStandard Normal N(0, 1) Quantiles:") print("-" * 40) quantiles = [0.001, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975, 0.99] for p in quantiles: q = normal.ppf(p) # ppf = percent point function = inverse CDF print(f" Q({p:.3f}) = {q:+.4f}") # Verify: F(Q(p)) = p print("\nVerification: F(Q(p)) = p") for p in [0.25, 0.5, 0.75]: q = normal.ppf(p) recovered_p = normal.cdf(q) print(f" F(Q({p})) = F({q:.4f}) = {recovered_p:.6f}") def inverse_transform_sampling(): """ Demonstrate inverse transform sampling. """ print("\n\nInverse Transform Sampling") print("=" * 60) print("\nMethod: To sample from distribution F,") print(" 1. Sample U ~ Uniform(0, 1)") print(" 2. Return Q(U) = F⁻¹(U)") np.random.seed(42) n_samples = 100000 # Target: Exponential(λ=2) true_lambda = 2.0 exponential = stats.expon(scale=1/true_lambda) # Method 1: Direct sampling (scipy) direct_samples = exponential.rvs(n_samples) # Method 2: Inverse transform uniform_samples = np.random.uniform(0, 1, n_samples) inverse_transform_samples = exponential.ppf(uniform_samples) print(f"\nTarget: Exponential(λ={true_lambda})") print(f"True mean = 1/λ = {1/true_lambda}") print(f"\nDirect sampling mean: {direct_samples.mean():.4f}") print(f"Inverse transform mean: {inverse_transform_samples.mean():.4f}") print(f"\nTrue variance = 1/λ² = {1/true_lambda**2}") print(f"Direct sampling var: {direct_samples.var():.4f}") print(f"Inverse transform var: {inverse_transform_samples.var():.4f}") # For Exponential, we can derive the inverse CDF analytically print("\n\nAnalytical Inverse Transform for Exponential:") print(" CDF: F(x) = 1 - exp(-λx)") print(" Solving F(x) = u for x:") print(" u = 1 - exp(-λx)") print(" exp(-λx) = 1 - u") print(" x = -log(1-u)/λ = Q(u)") # Verify analytical_samples = -np.log(1 - uniform_samples) / true_lambda print(f"\nAnalytical inverse transform mean: {analytical_samples.mean():.4f}") def quantile_regression_intuition(): """ Explain why quantile regression matters. """ print("\n\nQuantile Regression: Beyond the Mean") print("=" * 60) print("\nWhy predict quantiles instead of mean?") print("-" * 40) print(""" Scenario: Predicting delivery times • Mean prediction: 5 days • Median prediction (Q(0.5)): 4 days • 90th percentile (Q(0.9)): 12 days A customer asking "when will it arrive?" needs different info: - "Most likely": Use median (robust to outliers) - "Plan for worst case": Use high quantile (Q(0.9)) - "Optimistic estimate": Use low quantile (Q(0.1)) Mean-only models can't provide this richness! """) # Simulate heteroscedastic data np.random.seed(42) x = np.linspace(0, 10, 1000) # Variance increases with x (heteroscedasticity) y = 2*x + 1 + np.random.normal(0, 1 + 0.5*x, 1000) # Quantiles at different x values print("Simulated delivery time quantiles:") print(f"{'x (distance)':<15} {'Q(0.1)':<10} {'Q(0.5)':<10} {'Q(0.9)':<10}") print("-" * 45) for x_val in [1, 5, 10]: mask = np.abs(x - x_val) < 0.5 local_y = y[mask] q10 = np.percentile(local_y, 10) q50 = np.percentile(local_y, 50) q90 = np.percentile(local_y, 90) print(f"{x_val:<15.0f} {q10:<10.2f} {q50:<10.2f} {q90:<10.2f}") print("\nNote: Spread (Q90 - Q10) increases with distance!") demonstrate_quantiles()inverse_transform_sampling()quantile_regression_intuition()CDFs provide a natural framework for comparing distributions—whether comparing a model's predictions to observed data, or comparing two different models.
Key Comparison Methods:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
import numpy as npfrom scipy import stats def ks_test_demonstration(): """ Demonstrate the Kolmogorov-Smirnov test for distribution comparison. """ print("Kolmogorov-Smirnov Test") print("=" * 60) np.random.seed(42) # Test 1: Data from N(0,1) vs N(0,1) hypothesis (should match) print("\nTest 1: N(0,1) data vs N(0,1) hypothesis") data_1 = np.random.normal(0, 1, 500) stat_1, pval_1 = stats.kstest(data_1, 'norm', args=(0, 1)) print(f" KS statistic: {stat_1:.4f}") print(f" p-value: {pval_1:.4f}") print(f" Conclusion: {'Match (p > 0.05)' if pval_1 > 0.05 else 'Mismatch'}") # Test 2: Data from N(0.5, 1) vs N(0,1) hypothesis (should differ) print("\nTest 2: N(0.5, 1) data vs N(0,1) hypothesis") data_2 = np.random.normal(0.5, 1, 500) stat_2, pval_2 = stats.kstest(data_2, 'norm', args=(0, 1)) print(f" KS statistic: {stat_2:.4f}") print(f" p-value: {pval_2:.4f}") print(f" Conclusion: {'Match (p > 0.05)' if pval_2 > 0.05 else 'Mismatch detected!'}") # Test 3: Two-sample KS test print("\nTest 3: Two-sample KS test") sample_a = np.random.normal(0, 1, 300) sample_b = np.random.normal(0, 1.2, 300) # Slightly different variance stat_3, pval_3 = stats.ks_2samp(sample_a, sample_b) print(f" Sample A: N(0, 1), n=300") print(f" Sample B: N(0, 1.44), n=300") print(f" KS statistic: {stat_3:.4f}") print(f" p-value: {pval_3:.4f}") print(f" Conclusion: {'Same distribution' if pval_3 > 0.05 else 'Different distributions!'}") def empirical_cdf(): """ Demonstrate empirical CDF construction. """ print("\n\nEmpirical CDF (ECDF)") print("=" * 60) print("\nThe ECDF is the ML estimator for the true CDF:") print(" F_n(x) = (1/n) * #{i : X_i ≤ x}") print(" = Fraction of samples at or below x") np.random.seed(42) samples = np.random.exponential(scale=2, size=20) samples_sorted = np.sort(samples) print(f"\nSamples (sorted): {samples_sorted[:8].round(2)}...") print("\nECDF evaluation:") for x in [0.5, 1.0, 2.0, 5.0]: ecdf_val = np.mean(samples <= x) true_cdf = stats.expon(scale=2).cdf(x) print(f" F̂({x:.1f}) = {ecdf_val:.3f}, True F({x:.1f}) = {true_cdf:.3f}") print("\nGlivenko-Cantelli Theorem:") print(" As n → ∞, sup|F_n(x) - F(x)| → 0 almost surely") print(" The ECDF converges uniformly to the true CDF!") # Show convergence print("\nConvergence with sample size:") for n in [10, 100, 1000, 10000]: samples_n = np.random.exponential(scale=2, size=n) # KS statistic measures sup|ECDF - CDF| ks_stat, _ = stats.kstest(samples_n, 'expon', args=(0, 2)) print(f" n = {n:>5}: max|F̂ - F| ≈ {ks_stat:.4f}") def calibration_check(): """ Show how CDF/quantiles are used for calibration checking. """ print("\n\nModel Calibration via CDF") print("=" * 60) print("\nA well-calibrated probabilistic model satisfies:") print(" For predicted CDF F, observed y should have F(y) ~ Uniform(0,1)") print("\nThis is the 'PIT' (Probability Integral Transform)") np.random.seed(42) n = 1000 # Scenario 1: Well-calibrated model # True: Y ~ N(μ, 1), Model predicts N(μ, 1) true_mu = np.random.uniform(-1, 1, n) y_true = true_mu + np.random.normal(0, 1, n) # Model predicts N(true_mu, 1) - correctly specified pit_values = stats.norm.cdf(y_true, loc=true_mu, scale=1) # PIT should be uniform ks_stat, p_val = stats.kstest(pit_values, 'uniform') print(f"\nWell-calibrated model:") print(f" PIT ~ Uniform test: KS = {ks_stat:.4f}, p = {p_val:.4f}") print(f" Calibrated: {'Yes' if p_val > 0.05 else 'No'}") # Scenario 2: Miscalibrated (underestimated variance) # Model thinks variance is 0.5, but true variance is 1 pit_miscal = stats.norm.cdf(y_true, loc=true_mu, scale=0.5) ks_stat2, p_val2 = stats.kstest(pit_miscal, 'uniform') print(f"\nMiscalibrated model (underestimated σ):") print(f" PIT ~ Uniform test: KS = {ks_stat2:.4f}, p = {p_val2:.4f}") print(f" Calibrated: {'Yes' if p_val2 > 0.05 else 'No'}") ks_test_demonstration()empirical_cdf()calibration_check()Working with PMFs, PDFs, and CDFs in practice requires attention to numerical issues. Values can span many orders of magnitude, leading to overflow, underflow, and precision loss.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import numpy as npfrom scipy import statsfrom scipy.special import logsumexp def log_space_computations(): """ Demonstrate log-space probability computations. """ print("Log-Space Probability Computations") print("=" * 60) # Problem: Compute product of many small probabilities print("\nProblem: Compute ∏ P(xᵢ) for many observations") # Example: 1000 samples from N(0,1), compute joint probability np.random.seed(42) samples = np.random.normal(0, 1, 1000) normal = stats.norm(0, 1) # BAD: Direct product (will underflow) probs = normal.pdf(samples) direct_product = np.prod(probs) print(f"\n Direct product: {direct_product}") # Will be 0.0 # GOOD: Log space computation log_probs = normal.logpdf(samples) log_product = np.sum(log_probs) print(f" Log of product (sum of logs): {log_product:.4f}") # The actual product is exp(-1416) ≈ 10^(-615), way smaller than 10^(-308) print(f" This is approximately 10^{log_product/np.log(10):.0f}") print(f" Double precision limit: ~10^(-308)") def logsumexp_trick(): """ Demonstrate the log-sum-exp trick for stable probability addition. """ print("\n\nLog-Sum-Exp Trick") print("=" * 60) print("\nProblem: Compute log(p₁ + p₂) given log(p₁), log(p₂)") print("Naive: log(exp(log_p1) + exp(log_p2)) → overflow/underflow") # Example with very small probabilities log_p1 = -500 # p1 ≈ 10^(-217) log_p2 = -501 # p2 ≈ 10^(-218) print(f"\n log(p₁) = {log_p1}, log(p₂) = {log_p2}") # NAive approach fails try: naive_result = np.log(np.exp(log_p1) + np.exp(log_p2)) print(f" Naive result: {naive_result}") except: print(" Naive: OVERFLOW/UNDERFLOW") # Actually np.exp(-500) = 0 due to underflow print(f" np.exp({log_p1}) = {np.exp(log_p1)}") # 0.0 print(f" So naive gives: log(0 + 0) = -inf") # Log-sum-exp trick # log(exp(a) + exp(b)) = a + log(1 + exp(b-a)) [if b < a] stable_result = logsumexp([log_p1, log_p2]) print(f"\n Log-sum-exp result: {stable_result:.6f}") # Verify: log(p1 + p2) when p1/p2 = e^1, so p1 ≈ 2.7 * p2 # log(p1 + p2) = log(p1 * (1 + p2/p1)) = log_p1 + log(1 + e^(-1)) expected = log_p1 + np.log(1 + np.exp(log_p2 - log_p1)) print(f" Expected (analytical): {expected:.6f}") def extreme_tail_probabilities(): """ Demonstrate handling of extreme tail probabilities. """ print("\n\nExtreme Tail Probabilities") print("=" * 60) normal = stats.norm(0, 1) print("\nP(X > x) for large x in N(0,1):") print(f"{'x':>8} {'1-cdf(x)':>20} {'sf(x)':>20} {'logsf(x)':>15}") print("-" * 65) for x in [3, 5, 8, 10, 30, 37]: one_minus_cdf = 1 - normal.cdf(x) survival = normal.sf(x) log_survival = normal.logsf(x) print(f"{x:>8} {one_minus_cdf:>20.10e} {survival:>20.10e} {log_survival:>15.4f}") print("\nNote: 1 - cdf(x) loses precision for large x") print(" sf(x) and logsf(x) remain accurate") # Critical for p-value computations print("\n\nThis matters for p-values:") z_score = 8.0 print(f" z-score = {z_score}") print(f" Two-tailed p = 2 * P(|X| > {z_score})") print(f" Using 1 - cdf: {2 * (1 - normal.cdf(z_score)):.2e}") print(f" Using sf: {2 * normal.sf(z_score):.2e}") print(f" Using logsf: exp(log(2) + logsf) = {np.exp(np.log(2) + normal.logsf(z_score)):.2e}") log_space_computations()logsumexp_trick()extreme_tail_probabilities()In ML, always work with log-likelihoods, not likelihoods. Use logpdf/logpmf instead of pdf/pmf. Use logsumexp for mixtures. Use logsf for p-values. This isn't optional—it's required for numerical correctness in virtually all real applications.
We've established how PMFs, PDFs, and CDFs form a complete toolkit for describing and computing with probability distributions.
| Aspect | PMF p(x) | PDF f(x) | CDF F(x) |
|---|---|---|---|
| Applies to | Discrete only | Continuous only | All distributions |
| Value meaning | P(X = x) | Density (not prob) | P(X ≤ x) |
| Can exceed 1? | No | Yes | No (bounded [0,1]) |
| Sums/integrates to | 1 | 1 | Approaches 1 |
| Get probability | Sum p(x) values | Integrate f(x) | Difference F(b)-F(a) |
| Shape | Bar heights | Smooth curve | Non-decreasing curve |
What's Next:
Having mastered the functions that describe distributions, we now turn to summarizing distributions with numbers: expectation and variance. These summary statistics capture the center and spread of a distribution, form the foundation of loss functions in ML, and lead to bias-variance analysis and beyond.
You now command the trinity of distribution functions: PMF, PDF, and CDF. You understand their relationships, can convert between them, and know how to handle the computational challenges they present. This foundation enables all probabilistic reasoning in machine learning.