Machine LearningRandom Variables and Distributions

Random Variables and Distributions

LevelIntermediate

Duration90 mins

TopicRandom Variables and Distributions

3 / 5

PMF, PDF, and CDF

The Trinity of Distribution Functions

Every probability distribution can be characterized by functions that answer different but related questions about where probability resides. In the previous pages, we introduced the Probability Mass Function (PMF) for discrete random variables and the Probability Density Function (PDF) for continuous ones. Now we introduce the Cumulative Distribution Function (CDF)—a universal tool that works for all random variables.

These three functions form a complete toolkit for probabilistic reasoning:

PMF: 'What is the probability of exactly this value?' (discrete only)
PDF: 'What is the probability density at this point?' (continuous only)
CDF: 'What is the probability of at most this value?' (universal)

Understanding when to use each, how they relate, and how to convert between them is essential for working with probabilistic machine learning models.

What You Will Learn

By the end of this page, you will understand the formal definitions and properties of PMFs, PDFs, and CDFs, master their interrelationships, and know how to use each effectively in ML contexts including sampling, probability computation, and model evaluation.

The Cumulative Distribution Function (CDF)

The CDF is the most fundamental way to describe a probability distribution—it works for discrete, continuous, and even mixed random variables.

Definition (Cumulative Distribution Function):

For any random variable $X$, the CDF is the function $F_X: \mathbb{R} \rightarrow [0, 1]$ defined by:

$$F_X(x) = P(X \leq x)$$

The CDF tells us the probability that $X$ takes a value less than or equal to $x$.

Properties of Any Valid CDF:

Every CDF must satisfy these properties (and any function satisfying them is a valid CDF):

1. Right-Continuity: $$\lim_{h \to 0^+} F(x + h) = F(x)$$

2. Monotonically Non-Decreasing: If $x_1 < x_2$, then $F(x_1) \leq F(x_2)$

3. Boundary Conditions: $$\lim_{x \to -\infty} F(x) = 0, \quad \lim_{x \to +\infty} F(x) = 1$$

These properties follow directly from the probability axioms. Property 2 reflects that $P(X \leq x_1) \leq P(X \leq x_2)$ when $x_1 < x_2$ (adding more values can only increase or maintain probability). Property 3 reflects that $P(X \leq -\infty) = 0$ (no probability below everything) and $P(X \leq \infty) = 1$ (all probability is below infinity).

CDF Shapes for Different Distribution Types
Distribution Type	CDF Behavior	Visualization
Discrete	Step function with jumps at each support point	Staircase pattern, flat between jumps
Continuous	Smooth, continuous curve	Smooth S-curve (for bounded support)
Mixed	Continuous with jumps at point masses	Smooth with occasional steps

cdf_properties.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from scipy import stats
 
def demonstrate_cdf_properties():
    """
    Demonstrate the fundamental properties of CDFs.
    """
    print("CDF Properties Demonstration")
    print("=" * 60)
    
    # Example 1: Discrete CDF (Binomial)
    print("\n1. Discrete CDF: Binomial(n=10, p=0.5)")
    print("-" * 60)
    
    binom = stats.binom(n=10, p=0.5)
    
    # Show step function nature
    print("\nStep function behavior:")
    for x in [2.0, 2.5, 2.9, 3.0, 3.1, 3.5, 4.0]:
        print(f"  F({x:.1f}) = P(X ≤ {x:.1f}) = {binom.cdf(x):.6f}")
    
    print("\nNote: F is constant between integers (steps only at support)")
    
    # Example 2: Continuous CDF (Normal)
    print("\n2. Continuous CDF: Normal(μ=0, σ=1)")
    print("-" * 60)
    
    normal = stats.norm(0, 1)
    
    # Smooth behavior
    print("\nSmooth behavior (no jumps):")
    for x in np.linspace(-2, 2, 9):
        print(f"  F({x:+5.2f}) = {normal.cdf(x):.6f}")
    
    # Example 3: Verify monotonicity
    print("\n3. Monotonicity Check")
    print("-" * 60)
    
    x_values = np.linspace(-3, 3, 100)
    cdf_values = normal.cdf(x_values)
    
    # Check that each value >= previous
    is_monotonic = np.all(np.diff(cdf_values) >= 0)
    print(f"CDF is monotonically non-decreasing: {is_monotonic}")
    
    # Example 4: Boundary conditions
    print("\n4. Boundary Conditions")
    print("-" * 60)
    
    print(f"lim(x → -∞) F(x) = {normal.cdf(-100):.10f} ≈ 0")
    print(f"lim(x → +∞) F(x) = {normal.cdf(100):.10f} ≈ 1")
 
 
def cdf_for_probability_computations():
    """
    Show how CDFs simplify probability computations.
    """
    print("\n\nUsing CDFs for Probability Computation")
    print("=" * 60)
    
    normal = stats.norm(0, 1)
    
    print("\nKey formulas:")
    print("  P(X ≤ a) = F(a)")
    print("  P(X > a) = 1 - F(a)")
    print("  P(a < X ≤ b) = F(b) - F(a)")
    print("  P(a ≤ X ≤ b) = F(b) - F(a) [for continuous X]")
    
    a, b = -1, 2
    print(f"\nExample with N(0,1), a={a}, b={b}:")
    print(f"  P(X ≤ {a}) = F({a}) = {normal.cdf(a):.6f}")
    print(f"  P(X > {b}) = 1 - F({b}) = {1 - normal.cdf(b):.6f}")
    print(f"  P({a} < X ≤ {b}) = F({b}) - F({a}) = {normal.cdf(b) - normal.cdf(a):.6f}")
    
    # For discrete: P(X = k) = F(k) - F(k-1)
    print("\nFor discrete RVs:")
    print("  P(X = k) = F(k) - F(k-1)") 
    
    binom = stats.binom(10, 0.5)
    k = 5
    p_k_via_pmf = binom.pmf(k)
    p_k_via_cdf = binom.cdf(k) - binom.cdf(k-1)
    print(f"\nBinomial(10, 0.5): P(X = {k})")
    print(f"  Via PMF: {p_k_via_pmf:.6f}")
    print(f"  Via CDF diff: F({k}) - F({k-1}) = {p_k_via_cdf:.6f}")
 
 
demonstrate_cdf_properties()
cdf_for_probability_computations()

Relationships: Connecting PMF, PDF, and CDF

The PMF, PDF, and CDF are intimately related. Given one, you can derive the others (within a distribution type).

For Discrete Random Variables:

From	To	Relationship
PMF to CDF	$F(x) = \sum_{k \leq x} p(k)$	Sum all PMF values up to x
CDF to PMF	$p(k) = F(k) - F(k^-)$	Difference at jump points

Here $F(k^-) = \lim_{x \to k^-} F(x)$ is the left limit (value just before the jump).

For Continuous Random Variables:

From	To	Relationship
PDF to CDF	$F(x) = \int_{-\infty}^{x} f(t) , dt$	Integrate PDF from −∞
CDF to PDF	$f(x) = \frac{d}{dx} F(x)$	Differentiate CDF

The CDF is the integral of the PDF; the PDF is the derivative of the CDF.

The Fundamental Theorem of Calculus Connection

The relationship F(x) = ∫f(t)dt and f(x) = dF/dx is exactly the Fundamental Theorem of Calculus applied to probability. This isn't a coincidence—probability theory was designed so that measure-theoretic integrals behave like familiar calculus integrals.

pmf_pdf_cdf_relationships.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import numpy as np
from scipy import stats, integrate
 
def discrete_relationships():
    """
    Demonstrate PMF ↔ CDF relationships for discrete RVs.
    """
    print("Discrete RV: PMF ↔ CDF Relationships")
    print("=" * 60)
    
    # Poisson distribution
    lam = 3
    poisson = stats.poisson(lam)
    
    print(f"\nPoisson(λ={lam})")
    print("\nComputing CDF from PMF by summation:")
    print(f"{'k':>4} {'p(k)':>10} {'F(k) from sum':>15} {'F(k) from scipy':>18}")
    print("-" * 50)
    
    cumulative = 0
    for k in range(10):
        pmf_k = poisson.pmf(k)
        cumulative += pmf_k
        cdf_k = poisson.cdf(k)
        print(f"{k:>4} {pmf_k:>10.6f} {cumulative:>15.6f} {cdf_k:>18.6f}")
    
    print("\nRecovering PMF from CDF:")
    print("  p(k) = F(k) - F(k-1)")
    
    for k in range(1, 6):
        recovered_pmf = poisson.cdf(k) - poisson.cdf(k-1)
        actual_pmf = poisson.pmf(k)
        print(f"  p({k}) = F({k}) - F({k-1}) = {recovered_pmf:.6f} (actual: {actual_pmf:.6f})")
 
 
def continuous_relationships():
    """
    Demonstrate PDF ↔ CDF relationships for continuous RVs.
    """
    print("\n\nContinuous RV: PDF ↔ CDF Relationships")
    print("=" * 60)
    
    # Standard Normal
    normal = stats.norm(0, 1)
    
    print("\nNormal(0, 1)")
    print("\nComputing CDF from PDF by integration:")
    print(f"{'x':>6} {'f(x)':>10} {'∫f(t)dt from -∞':>20} {'F(x) from scipy':>18}")
    print("-" * 56)
    
    for x in [-2, -1, 0, 1, 2]:
        pdf_x = normal.pdf(x)
        # Numerical integration
        cdf_integral, _ = integrate.quad(normal.pdf, -np.inf, x)
        cdf_scipy = normal.cdf(x)
        print(f"{x:>6.1f} {pdf_x:>10.6f} {cdf_integral:>20.10f} {cdf_scipy:>18.10f}")
    
    print("\nRecovering PDF from CDF by numerical differentiation:")
    print("  f(x) ≈ [F(x+h) - F(x-h)] / (2h)")
    
    h = 0.0001
    for x in [-1.0, 0.0, 1.0, 2.0]:
        numerical_derivative = (normal.cdf(x + h) - normal.cdf(x - h)) / (2 * h)
        actual_pdf = normal.pdf(x)
        print(f"  f({x:+.1f}) ≈ {numerical_derivative:.6f} (actual: {actual_pdf:.6f})")
 
 
def survival_function():
    """
    Introduce the survival function (complement of CDF).
    """
    print("\n\nThe Survival Function: S(x) = 1 - F(x) = P(X > x)")
    print("=" * 60)
    
    print("\nThe survival function is critical in:")
    print("  - Reliability engineering: P(component survives past time t)")
    print("  - Medical studies: P(patient survives past time t)")
    print("  - ML: P-values, tail probabilities, outlier detection")
    
    normal = stats.norm(0, 1)
    
    print("\nFor N(0, 1):")
    for x in [1, 1.645, 1.96, 2.576, 3]:
        survival = normal.sf(x)  # sf = survival function
        complement = 1 - normal.cdf(x)
        print(f"  S({x:.3f}) = P(X > {x:.3f}) = {survival:.6f}")
    
    print("\nNote: scipy provides both cdf(x) and sf(x) = 1 - cdf(x)")
    print("      sf(x) is more numerically stable for small tail probs")
 
 
discrete_relationships()
continuous_relationships()
survival_function()

Quantiles and the Inverse CDF

The CDF answers 'Given a value $x$, what's the probability of being at most $x$?' The inverse CDF (also called the quantile function) reverses this: 'Given a probability $p$, what value has that probability below it?'

Definition (Quantile Function / Inverse CDF):

For a random variable $X$ with CDF $F_X$, the quantile function $Q_X: (0, 1) \rightarrow \mathbb{R}$ is:

$$Q_X(p) = \inf{x \in \mathbb{R} : F_X(x) \geq p}$$

The infimum handles discrete distributions where the CDF jumps past $p$ without hitting it exactly. For continuous distributions with strictly increasing CDF, this simplifies to $Q_X = F_X^{-1}$.

Key Quantiles:

Quantile	Probability	Name	Usage
Q(0.5)	50%	Median	Robust central tendency
Q(0.25), Q(0.75)	25%, 75%	Quartiles	Spread via IQR
Q(0.01), ..., Q(0.99)	1%, ..., 99%	Percentiles	Distribution description
Q(0.025), Q(0.975)	2.5%, 97.5%	—	95% confidence intervals

The Quantile-Probability Duality:

$$F(Q(p)) = p \quad \text{and} \quad Q(F(x)) = x$$

(with some technicalities for discrete cases)

Quantile Functions in ML

•Inverse Transform Sampling: To sample from any distribution, sample U ~ Uniform(0,1) and return Q(U). This works because Q(U) has CDF F.
•Quantile Regression: Instead of predicting E[Y|X], predict quantiles like Q(0.5|X) or Q(0.9|X). Captures outcome distribution, not just center.
•Confidence Intervals: Q(α/2) and Q(1-α/2) give symmetric (1-α) intervals for symmetric distributions.
•Outlier Detection: Points beyond Q(0.01) or Q(0.99) may be outliers. More robust than mean±3σ.
•Model Calibration: Comparing predicted quantiles to observed frequencies tests if distributional predictions are correct.

quantile_functions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import numpy as np
from scipy import stats
 
def demonstrate_quantiles():
    """
    Demonstrate quantile functions and inverse transform sampling.
    """
    print("Quantile Functions (Inverse CDF)")
    print("=" * 60)
    
    # Standard Normal quantiles
    normal = stats.norm(0, 1)
    
    print("\nStandard Normal N(0, 1) Quantiles:")
    print("-" * 40)
    
    quantiles = [0.001, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975, 0.99]
    for p in quantiles:
        q = normal.ppf(p)  # ppf = percent point function = inverse CDF
        print(f"  Q({p:.3f}) = {q:+.4f}")
    
    # Verify: F(Q(p)) = p
    print("\nVerification: F(Q(p)) = p")
    for p in [0.25, 0.5, 0.75]:
        q = normal.ppf(p)
        recovered_p = normal.cdf(q)
        print(f"  F(Q({p})) = F({q:.4f}) = {recovered_p:.6f}")
 
 
def inverse_transform_sampling():
    """
    Demonstrate inverse transform sampling.
    """
    print("\n\nInverse Transform Sampling")
    print("=" * 60)
    print("\nMethod: To sample from distribution F,")
    print("  1. Sample U ~ Uniform(0, 1)")
    print("  2. Return Q(U) = F⁻¹(U)")
    
    np.random.seed(42)
    n_samples = 100000
    
    # Target: Exponential(λ=2)
    true_lambda = 2.0
    exponential = stats.expon(scale=1/true_lambda)
    
    # Method 1: Direct sampling (scipy)
    direct_samples = exponential.rvs(n_samples)
    
    # Method 2: Inverse transform
    uniform_samples = np.random.uniform(0, 1, n_samples)
    inverse_transform_samples = exponential.ppf(uniform_samples)
    
    print(f"\nTarget: Exponential(λ={true_lambda})")
    print(f"True mean = 1/λ = {1/true_lambda}")
    print(f"\nDirect sampling mean: {direct_samples.mean():.4f}")
    print(f"Inverse transform mean: {inverse_transform_samples.mean():.4f}")
    
    print(f"\nTrue variance = 1/λ² = {1/true_lambda**2}")
    print(f"Direct sampling var: {direct_samples.var():.4f}")
    print(f"Inverse transform var: {inverse_transform_samples.var():.4f}")
    
    # For Exponential, we can derive the inverse CDF analytically
    print("\n\nAnalytical Inverse Transform for Exponential:")
    print("  CDF: F(x) = 1 - exp(-λx)")
    print("  Solving F(x) = u for x:")
    print("  u = 1 - exp(-λx)")
    print("  exp(-λx) = 1 - u")
    print("  x = -log(1-u)/λ = Q(u)")
    
    # Verify
    analytical_samples = -np.log(1 - uniform_samples) / true_lambda
    print(f"\nAnalytical inverse transform mean: {analytical_samples.mean():.4f}")
 
 
def quantile_regression_intuition():
    """
    Explain why quantile regression matters.
    """
    print("\n\nQuantile Regression: Beyond the Mean")
    print("=" * 60)
    
    print("\nWhy predict quantiles instead of mean?")
    print("-" * 40)
    
    print("""
    Scenario: Predicting delivery times
    
    • Mean prediction: 5 days
    • Median prediction (Q(0.5)): 4 days  
    • 90th percentile (Q(0.9)): 12 days
    
    A customer asking "when will it arrive?" needs different info:
    - "Most likely": Use median (robust to outliers)
    - "Plan for worst case": Use high quantile (Q(0.9))
    - "Optimistic estimate": Use low quantile (Q(0.1))
    
    Mean-only models can't provide this richness!
    """)
    
    # Simulate heteroscedastic data
    np.random.seed(42)
    x = np.linspace(0, 10, 1000)
    # Variance increases with x (heteroscedasticity)
    y = 2*x + 1 + np.random.normal(0, 1 + 0.5*x, 1000)
    
    # Quantiles at different x values
    print("Simulated delivery time quantiles:")
    print(f"{'x (distance)':<15} {'Q(0.1)':<10} {'Q(0.5)':<10} {'Q(0.9)':<10}")
    print("-" * 45)
    
    for x_val in [1, 5, 10]:
        mask = np.abs(x - x_val) < 0.5
        local_y = y[mask]
        q10 = np.percentile(local_y, 10)
        q50 = np.percentile(local_y, 50)
        q90 = np.percentile(local_y, 90)
        print(f"{x_val:<15.0f} {q10:<10.2f} {q50:<10.2f} {q90:<10.2f}")
    
    print("\nNote: Spread (Q90 - Q10) increases with distance!")
 
 
demonstrate_quantiles()
inverse_transform_sampling()
quantile_regression_intuition()

Comparing Distributions via CDF

CDFs provide a natural framework for comparing distributions—whether comparing a model's predictions to observed data, or comparing two different models.

Key Comparison Methods:

Distribution Comparison Techniques

•Kolmogorov-Smirnov (KS) Distance: D = sup|F₁(x) - F₂(x)|. The maximum vertical distance between two CDFs. For one-sample test, compare empirical CDF to theoretical.
•Q-Q Plots: Plot quantiles of one distribution against another. If they match, points lie on y = x line. Deviations reveal where distributions differ.
•P-P Plots: Plot F₁(x) vs F₂(x) for each x. Similar to Q-Q but uses CDF values instead of quantiles.
•Cramér-von Mises: Integrated squared distance ∫(F₁(x) - F₂(x))² dF(x). More sensitive than KS to differences in tails.
•Anderson-Darling: Weighted Cramér-von Mises that emphasizes tail differences. Better for detecting tail deviations.

distribution_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
import numpy as np
from scipy import stats
 
def ks_test_demonstration():
    """
    Demonstrate the Kolmogorov-Smirnov test for distribution comparison.
    """
    print("Kolmogorov-Smirnov Test")
    print("=" * 60)
    
    np.random.seed(42)
    
    # Test 1: Data from N(0,1) vs N(0,1) hypothesis (should match)
    print("\nTest 1: N(0,1) data vs N(0,1) hypothesis")
    data_1 = np.random.normal(0, 1, 500)
    stat_1, pval_1 = stats.kstest(data_1, 'norm', args=(0, 1))
    print(f"  KS statistic: {stat_1:.4f}")
    print(f"  p-value: {pval_1:.4f}")
    print(f"  Conclusion: {'Match (p > 0.05)' if pval_1 > 0.05 else 'Mismatch'}")
    
    # Test 2: Data from N(0.5, 1) vs N(0,1) hypothesis (should differ)
    print("\nTest 2: N(0.5, 1) data vs N(0,1) hypothesis")
    data_2 = np.random.normal(0.5, 1, 500)
    stat_2, pval_2 = stats.kstest(data_2, 'norm', args=(0, 1))
    print(f"  KS statistic: {stat_2:.4f}")
    print(f"  p-value: {pval_2:.4f}")
    print(f"  Conclusion: {'Match (p > 0.05)' if pval_2 > 0.05 else 'Mismatch detected!'}")
    
    # Test 3: Two-sample KS test
    print("\nTest 3: Two-sample KS test")
    sample_a = np.random.normal(0, 1, 300)
    sample_b = np.random.normal(0, 1.2, 300)  # Slightly different variance
    
    stat_3, pval_3 = stats.ks_2samp(sample_a, sample_b)
    print(f"  Sample A: N(0, 1), n=300")
    print(f"  Sample B: N(0, 1.44), n=300")
    print(f"  KS statistic: {stat_3:.4f}")
    print(f"  p-value: {pval_3:.4f}")
    print(f"  Conclusion: {'Same distribution' if pval_3 > 0.05 else 'Different distributions!'}")
 
 
def empirical_cdf():
    """
    Demonstrate empirical CDF construction.
    """
    print("\n\nEmpirical CDF (ECDF)")
    print("=" * 60)
    
    print("\nThe ECDF is the ML estimator for the true CDF:")
    print("  F_n(x) = (1/n) * #{i : X_i ≤ x}")
    print("  = Fraction of samples at or below x")
    
    np.random.seed(42)
    samples = np.random.exponential(scale=2, size=20)
    samples_sorted = np.sort(samples)
    
    print(f"\nSamples (sorted): {samples_sorted[:8].round(2)}...")
    
    print("\nECDF evaluation:")
    for x in [0.5, 1.0, 2.0, 5.0]:
        ecdf_val = np.mean(samples <= x)
        true_cdf = stats.expon(scale=2).cdf(x)
        print(f"  F̂({x:.1f}) = {ecdf_val:.3f}, True F({x:.1f}) = {true_cdf:.3f}")
    
    print("\nGlivenko-Cantelli Theorem:")
    print("  As n → ∞, sup|F_n(x) - F(x)| → 0 almost surely")
    print("  The ECDF converges uniformly to the true CDF!")
    
    # Show convergence
    print("\nConvergence with sample size:")
    for n in [10, 100, 1000, 10000]:
        samples_n = np.random.exponential(scale=2, size=n)
        # KS statistic measures sup|ECDF - CDF|
        ks_stat, _ = stats.kstest(samples_n, 'expon', args=(0, 2))
        print(f"  n = {n:>5}: max|F̂ - F| ≈ {ks_stat:.4f}")
 
 
def calibration_check():
    """
    Show how CDF/quantiles are used for calibration checking.
    """
    print("\n\nModel Calibration via CDF")
    print("=" * 60)
    
    print("\nA well-calibrated probabilistic model satisfies:")
    print("  For predicted CDF F, observed y should have F(y) ~ Uniform(0,1)")
    print("\nThis is the 'PIT' (Probability Integral Transform)")
    
    np.random.seed(42)
    n = 1000
    
    # Scenario 1: Well-calibrated model
    # True: Y ~ N(μ, 1), Model predicts N(μ, 1)
    true_mu = np.random.uniform(-1, 1, n)
    y_true = true_mu + np.random.normal(0, 1, n)
    
    # Model predicts N(true_mu, 1) - correctly specified
    pit_values = stats.norm.cdf(y_true, loc=true_mu, scale=1)
    
    # PIT should be uniform
    ks_stat, p_val = stats.kstest(pit_values, 'uniform')
    print(f"\nWell-calibrated model:")
    print(f"  PIT ~ Uniform test: KS = {ks_stat:.4f}, p = {p_val:.4f}")
    print(f"  Calibrated: {'Yes' if p_val > 0.05 else 'No'}")
    
    # Scenario 2: Miscalibrated (underestimated variance)
    # Model thinks variance is 0.5, but true variance is 1
    pit_miscal = stats.norm.cdf(y_true, loc=true_mu, scale=0.5)
    
    ks_stat2, p_val2 = stats.kstest(pit_miscal, 'uniform')
    print(f"\nMiscalibrated model (underestimated σ):")
    print(f"  PIT ~ Uniform test: KS = {ks_stat2:.4f}, p = {p_val2:.4f}")
    print(f"  Calibrated: {'Yes' if p_val2 > 0.05 else 'No'}")
 
 
ks_test_demonstration()
empirical_cdf()
calibration_check()

Computational Considerations

Working with PMFs, PDFs, and CDFs in practice requires attention to numerical issues. Values can span many orders of magnitude, leading to overflow, underflow, and precision loss.

Common Numerical Issues

•Underflow in Products: P(x₁) × P(x₂) × ... × P(xₙ) quickly underflows when n is large and probabilities are small.
•Log-Space Instability: log(p₁ + p₂) can't be computed stably from log(p₁) and log(p₂)—need log-sum-exp trick.
•Large Tail Probabilities: For N(0,1), P(X > 38.5) ≈ 10⁻³²³, smaller than double precision allows. Use survival function.
•Quantile at Extremes: Q(10⁻³⁰⁰) involves extremely negative values that may hit -∞.
•CDF Near 0 or 1: 1 - F(x) loses precision when F(x) ≈ 1. Use survival function sf(x) directly.

numerical_stability.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
from scipy import stats
from scipy.special import logsumexp
 
def log_space_computations():
    """
    Demonstrate log-space probability computations.
    """
    print("Log-Space Probability Computations")
    print("=" * 60)
    
    # Problem: Compute product of many small probabilities
    print("\nProblem: Compute ∏ P(xᵢ) for many observations")
    
    # Example: 1000 samples from N(0,1), compute joint probability
    np.random.seed(42)
    samples = np.random.normal(0, 1, 1000)
    normal = stats.norm(0, 1)
    
    # BAD: Direct product (will underflow)
    probs = normal.pdf(samples)
    direct_product = np.prod(probs)
    print(f"\n  Direct product: {direct_product}")  # Will be 0.0
    
    # GOOD: Log space computation
    log_probs = normal.logpdf(samples)
    log_product = np.sum(log_probs)
    print(f"  Log of product (sum of logs): {log_product:.4f}")
    
    # The actual product is exp(-1416) ≈ 10^(-615), way smaller than 10^(-308)
    print(f"  This is approximately 10^{log_product/np.log(10):.0f}")
    print(f"  Double precision limit: ~10^(-308)")
 
 
def logsumexp_trick():
    """
    Demonstrate the log-sum-exp trick for stable probability addition.
    """
    print("\n\nLog-Sum-Exp Trick")
    print("=" * 60)
    
    print("\nProblem: Compute log(p₁ + p₂) given log(p₁), log(p₂)")
    print("Naive: log(exp(log_p1) + exp(log_p2)) → overflow/underflow")
    
    # Example with very small probabilities
    log_p1 = -500  # p1 ≈ 10^(-217)
    log_p2 = -501  # p2 ≈ 10^(-218)
    
    print(f"\n  log(p₁) = {log_p1}, log(p₂) = {log_p2}")
    
    # NAive approach fails
    try:
        naive_result = np.log(np.exp(log_p1) + np.exp(log_p2))
        print(f"  Naive result: {naive_result}")
    except:
        print("  Naive: OVERFLOW/UNDERFLOW")
    
    # Actually np.exp(-500) = 0 due to underflow
    print(f"  np.exp({log_p1}) = {np.exp(log_p1)}")  # 0.0
    print(f"  So naive gives: log(0 + 0) = -inf")
    
    # Log-sum-exp trick
    # log(exp(a) + exp(b)) = a + log(1 + exp(b-a))  [if b < a]
    stable_result = logsumexp([log_p1, log_p2])
    print(f"\n  Log-sum-exp result: {stable_result:.6f}")
    
    # Verify: log(p1 + p2) when p1/p2 = e^1, so p1 ≈ 2.7 * p2
    # log(p1 + p2) = log(p1 * (1 + p2/p1)) = log_p1 + log(1 + e^(-1))
    expected = log_p1 + np.log(1 + np.exp(log_p2 - log_p1))
    print(f"  Expected (analytical): {expected:.6f}")
 
 
def extreme_tail_probabilities():
    """
    Demonstrate handling of extreme tail probabilities.
    """
    print("\n\nExtreme Tail Probabilities")
    print("=" * 60)
    
    normal = stats.norm(0, 1)
    
    print("\nP(X > x) for large x in N(0,1):")
    print(f"{'x':>8} {'1-cdf(x)':>20} {'sf(x)':>20} {'logsf(x)':>15}")
    print("-" * 65)
    
    for x in [3, 5, 8, 10, 30, 37]:
        one_minus_cdf = 1 - normal.cdf(x)
        survival = normal.sf(x)
        log_survival = normal.logsf(x)
        
        print(f"{x:>8} {one_minus_cdf:>20.10e} {survival:>20.10e} {log_survival:>15.4f}")
    
    print("\nNote: 1 - cdf(x) loses precision for large x")
    print("      sf(x) and logsf(x) remain accurate")
    
    # Critical for p-value computations
    print("\n\nThis matters for p-values:")
    z_score = 8.0
    print(f"  z-score = {z_score}")
    print(f"  Two-tailed p = 2 * P(|X| > {z_score})")
    print(f"  Using 1 - cdf: {2 * (1 - normal.cdf(z_score)):.2e}")
    print(f"  Using sf:      {2 * normal.sf(z_score):.2e}")
    print(f"  Using logsf:   exp(log(2) + logsf) = {np.exp(np.log(2) + normal.logsf(z_score)):.2e}")
 
 
log_space_computations()
logsumexp_trick()
extreme_tail_probabilities()

Always Use Log-Space for Likelihoods

In ML, always work with log-likelihoods, not likelihoods. Use logpdf/logpmf instead of pdf/pmf. Use logsumexp for mixtures. Use logsf for p-values. This isn't optional—it's required for numerical correctness in virtually all real applications.

Summary: A Unified View

We've established how PMFs, PDFs, and CDFs form a complete toolkit for describing and computing with probability distributions.

Summary: PMF, PDF, and CDF
Aspect	PMF p(x)	PDF f(x)	CDF F(x)
Applies to	Discrete only	Continuous only	All distributions
Value meaning	P(X = x)	Density (not prob)	P(X ≤ x)
Can exceed 1?	No	Yes	No (bounded [0,1])
Sums/integrates to	1	1	Approaches 1
Get probability	Sum p(x) values	Integrate f(x)	Difference F(b)-F(a)
Shape	Bar heights	Smooth curve	Non-decreasing curve

Key Takeaways

•CDF is Universal: F(x) = P(X ≤ x) works for discrete, continuous, and mixed distributions.
•Relationships: CDF = integral/sum of PDF/PMF; PDF/PMF = derivative/difference of CDF.
•Quantiles Invert CDFs: Q(p) finds the value x such that F(x) = p. Essential for sampling, confidence intervals, and percentiles.
•Empirical CDF: The sample-based estimator F̂(x) = #{i: Xᵢ ≤ x}/n converges to the true CDF.
•Distribution Comparison: KS tests, Q-Q plots, and calibration checks all use CDFs.
•Numerical Care: Use log-space, survival functions, and logsumexp for stable computations.

What's Next:

Having mastered the functions that describe distributions, we now turn to summarizing distributions with numbers: expectation and variance. These summary statistics capture the center and spread of a distribution, form the foundation of loss functions in ML, and lead to bias-variance analysis and beyond.

Page Complete

You now command the trinity of distribution functions: PMF, PDF, and CDF. You understand their relationships, can convert between them, and know how to handle the computational challenges they present. This foundation enables all probabilistic reasoning in machine learning.

3 / 5

Loading learning content...

Machine LearningRandom Variables and Distributions

Random Variables and Distributions

LevelIntermediate

Duration90 mins

TopicRandom Variables and Distributions

3 / 5

PMF, PDF, and CDF

The Trinity of Distribution Functions

These three functions form a complete toolkit for probabilistic reasoning:

PMF: 'What is the probability of exactly this value?' (discrete only)
PDF: 'What is the probability density at this point?' (continuous only)
CDF: 'What is the probability of at most this value?' (universal)

Understanding when to use each, how they relate, and how to convert between them is essential for working with probabilistic machine learning models.

What You Will Learn

The Cumulative Distribution Function (CDF)

The CDF is the most fundamental way to describe a probability distribution—it works for discrete, continuous, and even mixed random variables.

Definition (Cumulative Distribution Function):

For any random variable $X$, the CDF is the function $F_X: \mathbb{R} \rightarrow [0, 1]$ defined by:

$$F_X(x) = P(X \leq x)$$

The CDF tells us the probability that $X$ takes a value less than or equal to $x$.

Properties of Any Valid CDF:

Every CDF must satisfy these properties (and any function satisfying them is a valid CDF):

1. Right-Continuity: $$\lim_{h \to 0^+} F(x + h) = F(x)$$

2. Monotonically Non-Decreasing: If $x_1 < x_2$, then $F(x_1) \leq F(x_2)$

3. Boundary Conditions: $$\lim_{x \to -\infty} F(x) = 0, \quad \lim_{x \to +\infty} F(x) = 1$$

CDF Shapes for Different Distribution Types
Distribution Type	CDF Behavior	Visualization
Discrete	Step function with jumps at each support point	Staircase pattern, flat between jumps
Continuous	Smooth, continuous curve	Smooth S-curve (for bounded support)
Mixed	Continuous with jumps at point masses	Smooth with occasional steps

cdf_properties.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from scipy import stats
 
def demonstrate_cdf_properties():
    """
    Demonstrate the fundamental properties of CDFs.
    """
    print("CDF Properties Demonstration")
    print("=" * 60)
    
    # Example 1: Discrete CDF (Binomial)
    print("\n1. Discrete CDF: Binomial(n=10, p=0.5)")
    print("-" * 60)
    
    binom = stats.binom(n=10, p=0.5)
    
    # Show step function nature
    print("\nStep function behavior:")
    for x in [2.0, 2.5, 2.9, 3.0, 3.1, 3.5, 4.0]:
        print(f"  F({x:.1f}) = P(X ≤ {x:.1f}) = {binom.cdf(x):.6f}")
    
    print("\nNote: F is constant between integers (steps only at support)")
    
    # Example 2: Continuous CDF (Normal)
    print("\n2. Continuous CDF: Normal(μ=0, σ=1)")
    print("-" * 60)
    
    normal = stats.norm(0, 1)
    
    # Smooth behavior
    print("\nSmooth behavior (no jumps):")
    for x in np.linspace(-2, 2, 9):
        print(f"  F({x:+5.2f}) = {normal.cdf(x):.6f}")
    
    # Example 3: Verify monotonicity
    print("\n3. Monotonicity Check")
    print("-" * 60)
    
    x_values = np.linspace(-3, 3, 100)
    cdf_values = normal.cdf(x_values)
    
    # Check that each value >= previous
    is_monotonic = np.all(np.diff(cdf_values) >= 0)
    print(f"CDF is monotonically non-decreasing: {is_monotonic}")
    
    # Example 4: Boundary conditions
    print("\n4. Boundary Conditions")
    print("-" * 60)
    
    print(f"lim(x → -∞) F(x) = {normal.cdf(-100):.10f} ≈ 0")
    print(f"lim(x → +∞) F(x) = {normal.cdf(100):.10f} ≈ 1")
 
 
def cdf_for_probability_computations():
    """
    Show how CDFs simplify probability computations.
    """
    print("\n\nUsing CDFs for Probability Computation")
    print("=" * 60)
    
    normal = stats.norm(0, 1)
    
    print("\nKey formulas:")
    print("  P(X ≤ a) = F(a)")
    print("  P(X > a) = 1 - F(a)")
    print("  P(a < X ≤ b) = F(b) - F(a)")
    print("  P(a ≤ X ≤ b) = F(b) - F(a) [for continuous X]")
    
    a, b = -1, 2
    print(f"\nExample with N(0,1), a={a}, b={b}:")
    print(f"  P(X ≤ {a}) = F({a}) = {normal.cdf(a):.6f}")
    print(f"  P(X > {b}) = 1 - F({b}) = {1 - normal.cdf(b):.6f}")
    print(f"  P({a} < X ≤ {b}) = F({b}) - F({a}) = {normal.cdf(b) - normal.cdf(a):.6f}")
    
    # For discrete: P(X = k) = F(k) - F(k-1)
    print("\nFor discrete RVs:")
    print("  P(X = k) = F(k) - F(k-1)") 
    
    binom = stats.binom(10, 0.5)
    k = 5
    p_k_via_pmf = binom.pmf(k)
    p_k_via_cdf = binom.cdf(k) - binom.cdf(k-1)
    print(f"\nBinomial(10, 0.5): P(X = {k})")
    print(f"  Via PMF: {p_k_via_pmf:.6f}")
    print(f"  Via CDF diff: F({k}) - F({k-1}) = {p_k_via_cdf:.6f}")
 
 
demonstrate_cdf_properties()
cdf_for_probability_computations()

Relationships: Connecting PMF, PDF, and CDF

The PMF, PDF, and CDF are intimately related. Given one, you can derive the others (within a distribution type).

For Discrete Random Variables:

From	To	Relationship
PMF to CDF	$F(x) = \sum_{k \leq x} p(k)$	Sum all PMF values up to x
CDF to PMF	$p(k) = F(k) - F(k^-)$	Difference at jump points

Here $F(k^-) = \lim_{x \to k^-} F(x)$ is the left limit (value just before the jump).

For Continuous Random Variables:

From	To	Relationship
PDF to CDF	$F(x) = \int_{-\infty}^{x} f(t) , dt$	Integrate PDF from −∞
CDF to PDF	$f(x) = \frac{d}{dx} F(x)$	Differentiate CDF

The CDF is the integral of the PDF; the PDF is the derivative of the CDF.

The Fundamental Theorem of Calculus Connection

pmf_pdf_cdf_relationships.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import numpy as np
from scipy import stats, integrate
 
def discrete_relationships():
    """
    Demonstrate PMF ↔ CDF relationships for discrete RVs.
    """
    print("Discrete RV: PMF ↔ CDF Relationships")
    print("=" * 60)
    
    # Poisson distribution
    lam = 3
    poisson = stats.poisson(lam)
    
    print(f"\nPoisson(λ={lam})")
    print("\nComputing CDF from PMF by summation:")
    print(f"{'k':>4} {'p(k)':>10} {'F(k) from sum':>15} {'F(k) from scipy':>18}")
    print("-" * 50)
    
    cumulative = 0
    for k in range(10):
        pmf_k = poisson.pmf(k)
        cumulative += pmf_k
        cdf_k = poisson.cdf(k)
        print(f"{k:>4} {pmf_k:>10.6f} {cumulative:>15.6f} {cdf_k:>18.6f}")
    
    print("\nRecovering PMF from CDF:")
    print("  p(k) = F(k) - F(k-1)")
    
    for k in range(1, 6):
        recovered_pmf = poisson.cdf(k) - poisson.cdf(k-1)
        actual_pmf = poisson.pmf(k)
        print(f"  p({k}) = F({k}) - F({k-1}) = {recovered_pmf:.6f} (actual: {actual_pmf:.6f})")
 
 
def continuous_relationships():
    """
    Demonstrate PDF ↔ CDF relationships for continuous RVs.
    """
    print("\n\nContinuous RV: PDF ↔ CDF Relationships")
    print("=" * 60)
    
    # Standard Normal
    normal = stats.norm(0, 1)
    
    print("\nNormal(0, 1)")
    print("\nComputing CDF from PDF by integration:")
    print(f"{'x':>6} {'f(x)':>10} {'∫f(t)dt from -∞':>20} {'F(x) from scipy':>18}")
    print("-" * 56)
    
    for x in [-2, -1, 0, 1, 2]:
        pdf_x = normal.pdf(x)
        # Numerical integration
        cdf_integral, _ = integrate.quad(normal.pdf, -np.inf, x)
        cdf_scipy = normal.cdf(x)
        print(f"{x:>6.1f} {pdf_x:>10.6f} {cdf_integral:>20.10f} {cdf_scipy:>18.10f}")
    
    print("\nRecovering PDF from CDF by numerical differentiation:")
    print("  f(x) ≈ [F(x+h) - F(x-h)] / (2h)")
    
    h = 0.0001
    for x in [-1.0, 0.0, 1.0, 2.0]:
        numerical_derivative = (normal.cdf(x + h) - normal.cdf(x - h)) / (2 * h)
        actual_pdf = normal.pdf(x)
        print(f"  f({x:+.1f}) ≈ {numerical_derivative:.6f} (actual: {actual_pdf:.6f})")
 
 
def survival_function():
    """
    Introduce the survival function (complement of CDF).
    """
    print("\n\nThe Survival Function: S(x) = 1 - F(x) = P(X > x)")
    print("=" * 60)
    
    print("\nThe survival function is critical in:")
    print("  - Reliability engineering: P(component survives past time t)")
    print("  - Medical studies: P(patient survives past time t)")
    print("  - ML: P-values, tail probabilities, outlier detection")
    
    normal = stats.norm(0, 1)
    
    print("\nFor N(0, 1):")
    for x in [1, 1.645, 1.96, 2.576, 3]:
        survival = normal.sf(x)  # sf = survival function
        complement = 1 - normal.cdf(x)
        print(f"  S({x:.3f}) = P(X > {x:.3f}) = {survival:.6f}")
    
    print("\nNote: scipy provides both cdf(x) and sf(x) = 1 - cdf(x)")
    print("      sf(x) is more numerically stable for small tail probs")
 
 
discrete_relationships()
continuous_relationships()
survival_function()

Quantiles and the Inverse CDF

Definition (Quantile Function / Inverse CDF):

For a random variable $X$ with CDF $F_X$, the quantile function $Q_X: (0, 1) \rightarrow \mathbb{R}$ is:

$$Q_X(p) = \inf{x \in \mathbb{R} : F_X(x) \geq p}$$

The infimum handles discrete distributions where the CDF jumps past $p$ without hitting it exactly. For continuous distributions with strictly increasing CDF, this simplifies to $Q_X = F_X^{-1}$.

Key Quantiles:

Quantile	Probability	Name	Usage
Q(0.5)	50%	Median	Robust central tendency
Q(0.25), Q(0.75)	25%, 75%	Quartiles	Spread via IQR
Q(0.01), ..., Q(0.99)	1%, ..., 99%	Percentiles	Distribution description
Q(0.025), Q(0.975)	2.5%, 97.5%	—	95% confidence intervals

The Quantile-Probability Duality:

$$F(Q(p)) = p \quad \text{and} \quad Q(F(x)) = x$$

(with some technicalities for discrete cases)

Quantile Functions in ML

•Inverse Transform Sampling: To sample from any distribution, sample U ~ Uniform(0,1) and return Q(U). This works because Q(U) has CDF F.
•Quantile Regression: Instead of predicting E[Y|X], predict quantiles like Q(0.5|X) or Q(0.9|X). Captures outcome distribution, not just center.
•Confidence Intervals: Q(α/2) and Q(1-α/2) give symmetric (1-α) intervals for symmetric distributions.
•Outlier Detection: Points beyond Q(0.01) or Q(0.99) may be outliers. More robust than mean±3σ.
•Model Calibration: Comparing predicted quantiles to observed frequencies tests if distributional predictions are correct.

quantile_functions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import numpy as np
from scipy import stats
 
def demonstrate_quantiles():
    """
    Demonstrate quantile functions and inverse transform sampling.
    """
    print("Quantile Functions (Inverse CDF)")
    print("=" * 60)
    
    # Standard Normal quantiles
    normal = stats.norm(0, 1)
    
    print("\nStandard Normal N(0, 1) Quantiles:")
    print("-" * 40)
    
    quantiles = [0.001, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975, 0.99]
    for p in quantiles:
        q = normal.ppf(p)  # ppf = percent point function = inverse CDF
        print(f"  Q({p:.3f}) = {q:+.4f}")
    
    # Verify: F(Q(p)) = p
    print("\nVerification: F(Q(p)) = p")
    for p in [0.25, 0.5, 0.75]:
        q = normal.ppf(p)
        recovered_p = normal.cdf(q)
        print(f"  F(Q({p})) = F({q:.4f}) = {recovered_p:.6f}")
 
 
def inverse_transform_sampling():
    """
    Demonstrate inverse transform sampling.
    """
    print("\n\nInverse Transform Sampling")
    print("=" * 60)
    print("\nMethod: To sample from distribution F,")
    print("  1. Sample U ~ Uniform(0, 1)")
    print("  2. Return Q(U) = F⁻¹(U)")
    
    np.random.seed(42)
    n_samples = 100000
    
    # Target: Exponential(λ=2)
    true_lambda = 2.0
    exponential = stats.expon(scale=1/true_lambda)
    
    # Method 1: Direct sampling (scipy)
    direct_samples = exponential.rvs(n_samples)
    
    # Method 2: Inverse transform
    uniform_samples = np.random.uniform(0, 1, n_samples)
    inverse_transform_samples = exponential.ppf(uniform_samples)
    
    print(f"\nTarget: Exponential(λ={true_lambda})")
    print(f"True mean = 1/λ = {1/true_lambda}")
    print(f"\nDirect sampling mean: {direct_samples.mean():.4f}")
    print(f"Inverse transform mean: {inverse_transform_samples.mean():.4f}")
    
    print(f"\nTrue variance = 1/λ² = {1/true_lambda**2}")
    print(f"Direct sampling var: {direct_samples.var():.4f}")
    print(f"Inverse transform var: {inverse_transform_samples.var():.4f}")
    
    # For Exponential, we can derive the inverse CDF analytically
    print("\n\nAnalytical Inverse Transform for Exponential:")
    print("  CDF: F(x) = 1 - exp(-λx)")
    print("  Solving F(x) = u for x:")
    print("  u = 1 - exp(-λx)")
    print("  exp(-λx) = 1 - u")
    print("  x = -log(1-u)/λ = Q(u)")
    
    # Verify
    analytical_samples = -np.log(1 - uniform_samples) / true_lambda
    print(f"\nAnalytical inverse transform mean: {analytical_samples.mean():.4f}")
 
 
def quantile_regression_intuition():
    """
    Explain why quantile regression matters.
    """
    print("\n\nQuantile Regression: Beyond the Mean")
    print("=" * 60)
    
    print("\nWhy predict quantiles instead of mean?")
    print("-" * 40)
    
    print("""
    Scenario: Predicting delivery times
    
    • Mean prediction: 5 days
    • Median prediction (Q(0.5)): 4 days  
    • 90th percentile (Q(0.9)): 12 days
    
    A customer asking "when will it arrive?" needs different info:
    - "Most likely": Use median (robust to outliers)
    - "Plan for worst case": Use high quantile (Q(0.9))
    - "Optimistic estimate": Use low quantile (Q(0.1))
    
    Mean-only models can't provide this richness!
    """)
    
    # Simulate heteroscedastic data
    np.random.seed(42)
    x = np.linspace(0, 10, 1000)
    # Variance increases with x (heteroscedasticity)
    y = 2*x + 1 + np.random.normal(0, 1 + 0.5*x, 1000)
    
    # Quantiles at different x values
    print("Simulated delivery time quantiles:")
    print(f"{'x (distance)':<15} {'Q(0.1)':<10} {'Q(0.5)':<10} {'Q(0.9)':<10}")
    print("-" * 45)
    
    for x_val in [1, 5, 10]:
        mask = np.abs(x - x_val) < 0.5
        local_y = y[mask]
        q10 = np.percentile(local_y, 10)
        q50 = np.percentile(local_y, 50)
        q90 = np.percentile(local_y, 90)
        print(f"{x_val:<15.0f} {q10:<10.2f} {q50:<10.2f} {q90:<10.2f}")
    
    print("\nNote: Spread (Q90 - Q10) increases with distance!")
 
 
demonstrate_quantiles()
inverse_transform_sampling()
quantile_regression_intuition()

Comparing Distributions via CDF

CDFs provide a natural framework for comparing distributions—whether comparing a model's predictions to observed data, or comparing two different models.

Key Comparison Methods:

Distribution Comparison Techniques

•Kolmogorov-Smirnov (KS) Distance: D = sup|F₁(x) - F₂(x)|. The maximum vertical distance between two CDFs. For one-sample test, compare empirical CDF to theoretical.
•Q-Q Plots: Plot quantiles of one distribution against another. If they match, points lie on y = x line. Deviations reveal where distributions differ.
•P-P Plots: Plot F₁(x) vs F₂(x) for each x. Similar to Q-Q but uses CDF values instead of quantiles.
•Cramér-von Mises: Integrated squared distance ∫(F₁(x) - F₂(x))² dF(x). More sensitive than KS to differences in tails.
•Anderson-Darling: Weighted Cramér-von Mises that emphasizes tail differences. Better for detecting tail deviations.

distribution_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
import numpy as np
from scipy import stats
 
def ks_test_demonstration():
    """
    Demonstrate the Kolmogorov-Smirnov test for distribution comparison.
    """
    print("Kolmogorov-Smirnov Test")
    print("=" * 60)
    
    np.random.seed(42)
    
    # Test 1: Data from N(0,1) vs N(0,1) hypothesis (should match)
    print("\nTest 1: N(0,1) data vs N(0,1) hypothesis")
    data_1 = np.random.normal(0, 1, 500)
    stat_1, pval_1 = stats.kstest(data_1, 'norm', args=(0, 1))
    print(f"  KS statistic: {stat_1:.4f}")
    print(f"  p-value: {pval_1:.4f}")
    print(f"  Conclusion: {'Match (p > 0.05)' if pval_1 > 0.05 else 'Mismatch'}")
    
    # Test 2: Data from N(0.5, 1) vs N(0,1) hypothesis (should differ)
    print("\nTest 2: N(0.5, 1) data vs N(0,1) hypothesis")
    data_2 = np.random.normal(0.5, 1, 500)
    stat_2, pval_2 = stats.kstest(data_2, 'norm', args=(0, 1))
    print(f"  KS statistic: {stat_2:.4f}")
    print(f"  p-value: {pval_2:.4f}")
    print(f"  Conclusion: {'Match (p > 0.05)' if pval_2 > 0.05 else 'Mismatch detected!'}")
    
    # Test 3: Two-sample KS test
    print("\nTest 3: Two-sample KS test")
    sample_a = np.random.normal(0, 1, 300)
    sample_b = np.random.normal(0, 1.2, 300)  # Slightly different variance
    
    stat_3, pval_3 = stats.ks_2samp(sample_a, sample_b)
    print(f"  Sample A: N(0, 1), n=300")
    print(f"  Sample B: N(0, 1.44), n=300")
    print(f"  KS statistic: {stat_3:.4f}")
    print(f"  p-value: {pval_3:.4f}")
    print(f"  Conclusion: {'Same distribution' if pval_3 > 0.05 else 'Different distributions!'}")
 
 
def empirical_cdf():
    """
    Demonstrate empirical CDF construction.
    """
    print("\n\nEmpirical CDF (ECDF)")
    print("=" * 60)
    
    print("\nThe ECDF is the ML estimator for the true CDF:")
    print("  F_n(x) = (1/n) * #{i : X_i ≤ x}")
    print("  = Fraction of samples at or below x")
    
    np.random.seed(42)
    samples = np.random.exponential(scale=2, size=20)
    samples_sorted = np.sort(samples)
    
    print(f"\nSamples (sorted): {samples_sorted[:8].round(2)}...")
    
    print("\nECDF evaluation:")
    for x in [0.5, 1.0, 2.0, 5.0]:
        ecdf_val = np.mean(samples <= x)
        true_cdf = stats.expon(scale=2).cdf(x)
        print(f"  F̂({x:.1f}) = {ecdf_val:.3f}, True F({x:.1f}) = {true_cdf:.3f}")
    
    print("\nGlivenko-Cantelli Theorem:")
    print("  As n → ∞, sup|F_n(x) - F(x)| → 0 almost surely")
    print("  The ECDF converges uniformly to the true CDF!")
    
    # Show convergence
    print("\nConvergence with sample size:")
    for n in [10, 100, 1000, 10000]:
        samples_n = np.random.exponential(scale=2, size=n)
        # KS statistic measures sup|ECDF - CDF|
        ks_stat, _ = stats.kstest(samples_n, 'expon', args=(0, 2))
        print(f"  n = {n:>5}: max|F̂ - F| ≈ {ks_stat:.4f}")
 
 
def calibration_check():
    """
    Show how CDF/quantiles are used for calibration checking.
    """
    print("\n\nModel Calibration via CDF")
    print("=" * 60)
    
    print("\nA well-calibrated probabilistic model satisfies:")
    print("  For predicted CDF F, observed y should have F(y) ~ Uniform(0,1)")
    print("\nThis is the 'PIT' (Probability Integral Transform)")
    
    np.random.seed(42)
    n = 1000
    
    # Scenario 1: Well-calibrated model
    # True: Y ~ N(μ, 1), Model predicts N(μ, 1)
    true_mu = np.random.uniform(-1, 1, n)
    y_true = true_mu + np.random.normal(0, 1, n)
    
    # Model predicts N(true_mu, 1) - correctly specified
    pit_values = stats.norm.cdf(y_true, loc=true_mu, scale=1)
    
    # PIT should be uniform
    ks_stat, p_val = stats.kstest(pit_values, 'uniform')
    print(f"\nWell-calibrated model:")
    print(f"  PIT ~ Uniform test: KS = {ks_stat:.4f}, p = {p_val:.4f}")
    print(f"  Calibrated: {'Yes' if p_val > 0.05 else 'No'}")
    
    # Scenario 2: Miscalibrated (underestimated variance)
    # Model thinks variance is 0.5, but true variance is 1
    pit_miscal = stats.norm.cdf(y_true, loc=true_mu, scale=0.5)
    
    ks_stat2, p_val2 = stats.kstest(pit_miscal, 'uniform')
    print(f"\nMiscalibrated model (underestimated σ):")
    print(f"  PIT ~ Uniform test: KS = {ks_stat2:.4f}, p = {p_val2:.4f}")
    print(f"  Calibrated: {'Yes' if p_val2 > 0.05 else 'No'}")
 
 
ks_test_demonstration()
empirical_cdf()
calibration_check()

Computational Considerations

Working with PMFs, PDFs, and CDFs in practice requires attention to numerical issues. Values can span many orders of magnitude, leading to overflow, underflow, and precision loss.

Common Numerical Issues

•Underflow in Products: P(x₁) × P(x₂) × ... × P(xₙ) quickly underflows when n is large and probabilities are small.
•Log-Space Instability: log(p₁ + p₂) can't be computed stably from log(p₁) and log(p₂)—need log-sum-exp trick.
•Large Tail Probabilities: For N(0,1), P(X > 38.5) ≈ 10⁻³²³, smaller than double precision allows. Use survival function.
•Quantile at Extremes: Q(10⁻³⁰⁰) involves extremely negative values that may hit -∞.
•CDF Near 0 or 1: 1 - F(x) loses precision when F(x) ≈ 1. Use survival function sf(x) directly.

numerical_stability.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
from scipy import stats
from scipy.special import logsumexp
 
def log_space_computations():
    """
    Demonstrate log-space probability computations.
    """
    print("Log-Space Probability Computations")
    print("=" * 60)
    
    # Problem: Compute product of many small probabilities
    print("\nProblem: Compute ∏ P(xᵢ) for many observations")
    
    # Example: 1000 samples from N(0,1), compute joint probability
    np.random.seed(42)
    samples = np.random.normal(0, 1, 1000)
    normal = stats.norm(0, 1)
    
    # BAD: Direct product (will underflow)
    probs = normal.pdf(samples)
    direct_product = np.prod(probs)
    print(f"\n  Direct product: {direct_product}")  # Will be 0.0
    
    # GOOD: Log space computation
    log_probs = normal.logpdf(samples)
    log_product = np.sum(log_probs)
    print(f"  Log of product (sum of logs): {log_product:.4f}")
    
    # The actual product is exp(-1416) ≈ 10^(-615), way smaller than 10^(-308)
    print(f"  This is approximately 10^{log_product/np.log(10):.0f}")
    print(f"  Double precision limit: ~10^(-308)")
 
 
def logsumexp_trick():
    """
    Demonstrate the log-sum-exp trick for stable probability addition.
    """
    print("\n\nLog-Sum-Exp Trick")
    print("=" * 60)
    
    print("\nProblem: Compute log(p₁ + p₂) given log(p₁), log(p₂)")
    print("Naive: log(exp(log_p1) + exp(log_p2)) → overflow/underflow")
    
    # Example with very small probabilities
    log_p1 = -500  # p1 ≈ 10^(-217)
    log_p2 = -501  # p2 ≈ 10^(-218)
    
    print(f"\n  log(p₁) = {log_p1}, log(p₂) = {log_p2}")
    
    # NAive approach fails
    try:
        naive_result = np.log(np.exp(log_p1) + np.exp(log_p2))
        print(f"  Naive result: {naive_result}")
    except:
        print("  Naive: OVERFLOW/UNDERFLOW")
    
    # Actually np.exp(-500) = 0 due to underflow
    print(f"  np.exp({log_p1}) = {np.exp(log_p1)}")  # 0.0
    print(f"  So naive gives: log(0 + 0) = -inf")
    
    # Log-sum-exp trick
    # log(exp(a) + exp(b)) = a + log(1 + exp(b-a))  [if b < a]
    stable_result = logsumexp([log_p1, log_p2])
    print(f"\n  Log-sum-exp result: {stable_result:.6f}")
    
    # Verify: log(p1 + p2) when p1/p2 = e^1, so p1 ≈ 2.7 * p2
    # log(p1 + p2) = log(p1 * (1 + p2/p1)) = log_p1 + log(1 + e^(-1))
    expected = log_p1 + np.log(1 + np.exp(log_p2 - log_p1))
    print(f"  Expected (analytical): {expected:.6f}")
 
 
def extreme_tail_probabilities():
    """
    Demonstrate handling of extreme tail probabilities.
    """
    print("\n\nExtreme Tail Probabilities")
    print("=" * 60)
    
    normal = stats.norm(0, 1)
    
    print("\nP(X > x) for large x in N(0,1):")
    print(f"{'x':>8} {'1-cdf(x)':>20} {'sf(x)':>20} {'logsf(x)':>15}")
    print("-" * 65)
    
    for x in [3, 5, 8, 10, 30, 37]:
        one_minus_cdf = 1 - normal.cdf(x)
        survival = normal.sf(x)
        log_survival = normal.logsf(x)
        
        print(f"{x:>8} {one_minus_cdf:>20.10e} {survival:>20.10e} {log_survival:>15.4f}")
    
    print("\nNote: 1 - cdf(x) loses precision for large x")
    print("      sf(x) and logsf(x) remain accurate")
    
    # Critical for p-value computations
    print("\n\nThis matters for p-values:")
    z_score = 8.0
    print(f"  z-score = {z_score}")
    print(f"  Two-tailed p = 2 * P(|X| > {z_score})")
    print(f"  Using 1 - cdf: {2 * (1 - normal.cdf(z_score)):.2e}")
    print(f"  Using sf:      {2 * normal.sf(z_score):.2e}")
    print(f"  Using logsf:   exp(log(2) + logsf) = {np.exp(np.log(2) + normal.logsf(z_score)):.2e}")
 
 
log_space_computations()
logsumexp_trick()
extreme_tail_probabilities()

Always Use Log-Space for Likelihoods

Summary: A Unified View

We've established how PMFs, PDFs, and CDFs form a complete toolkit for describing and computing with probability distributions.

Summary: PMF, PDF, and CDF
Aspect	PMF p(x)	PDF f(x)	CDF F(x)
Applies to	Discrete only	Continuous only	All distributions
Value meaning	P(X = x)	Density (not prob)	P(X ≤ x)
Can exceed 1?	No	Yes	No (bounded [0,1])
Sums/integrates to	1	1	Approaches 1
Get probability	Sum p(x) values	Integrate f(x)	Difference F(b)-F(a)
Shape	Bar heights	Smooth curve	Non-decreasing curve

Key Takeaways

•CDF is Universal: F(x) = P(X ≤ x) works for discrete, continuous, and mixed distributions.
•Relationships: CDF = integral/sum of PDF/PMF; PDF/PMF = derivative/difference of CDF.
•Quantiles Invert CDFs: Q(p) finds the value x such that F(x) = p. Essential for sampling, confidence intervals, and percentiles.
•Empirical CDF: The sample-based estimator F̂(x) = #{i: Xᵢ ≤ x}/n converges to the true CDF.
•Distribution Comparison: KS tests, Q-Q plots, and calibration checks all use CDFs.
•Numerical Care: Use log-space, survival functions, and logsumexp for stable computations.

What's Next:

Page Complete

3 / 5