Loading learning content...
In the previous page, we mastered discrete random variables—quantities that take countably many values like class labels, word tokens, or click counts. Now we confront a profound extension: continuous random variables, which can take any value in an uncountable set like an interval $[a, b]$ or the entire real line $\mathbb{R}$.
This isn't merely a generalization—it requires a fundamental conceptual shift. For discrete random variables, we could ask 'What is the probability that $X$ equals exactly 3?' and get a meaningful nonzero answer. For continuous random variables, this question becomes problematic. The probability that a truly continuous quantity equals exactly any specific value is zero.
Yet continuous random variables are everywhere in machine learning:
By the end of this page, you will understand why continuous random variables require probability density functions (PDFs) instead of PMFs, master the mathematical properties of PDFs, and see how these concepts enable regression models, generative learning, and probabilistic inference in continuous spaces.
Before defining continuous random variables formally, we must confront a seeming paradox that illuminates the conceptual leap required.
The Problem:
Consider a random variable $X$ uniformly distributed on $[0, 1]$—every value in the interval is 'equally likely.' There are uncountably many values in $[0, 1]$ (as many as the entire real line). If we tried to assign equal positive probability to each, what would happen?
Suppose each point has probability $p > 0$. The sum over all points would be: $$\sum_{x \in [0,1]} p = \text{uncountable sum of positive values} = \infty$$
This violates the probability axiom that total probability equals 1. We're forced to conclude that for any individual point $x$: $$P(X = x) = 0$$
Yet the event 'X takes some value in [0, 1]' has probability 1. How can a union of probability-zero events have probability 1?
The resolution lies in recognizing that we cannot meaningfully sum uncountably many numbers—even zeros. Countable additivity (a probability axiom) says P(∪ Aᵢ) = Σ P(Aᵢ) for countable unions, but [0,1] is uncountable. We need a different approach: instead of point probabilities, we work with probability densities and recover probabilities through integration over intervals.
The Conceptual Shift:
| Discrete World | Continuous World |
|---|---|
| Individual values have probability | Individual values have zero probability |
| PMF: $p(x) = P(X = x)$ | PDF: $f(x) |
eq P(X = x)$ | | Sum to get probabilities | Integrate to get probabilities | | $P(X \in A) = \sum_{x \in A} p(x)$ | $P(X \in A) = \int_A f(x) dx$ | | PMF values are probabilities | PDF values are densities (can exceed 1) |
Thinking in terms of probability density rather than point probability is the key insight. Density represents 'probability per unit length'—a rate at which probability accumulates as you sweep across the real line.
With the conceptual groundwork laid, we can now define continuous random variables precisely.
Definition (Continuous Random Variable):
A random variable $X$ is called continuous if there exists a non-negative function $f_X: \mathbb{R} \rightarrow [0, \infty)$ such that for all intervals $[a, b]$:
$$P(a \leq X \leq b) = \int_a^b f_X(x) , dx$$
The function $f_X$ is called the probability density function (PDF) of $X$.
Key Properties:
Critical Distinction: PDF vs PMF
A common source of confusion: the PDF value $f_X(x)$ is not a probability. It's a density—probability per unit interval. This has immediate consequences:
Intuition via Histogram:
Imagine sampling $X$ many times and plotting a histogram with very narrow bins of width $\Delta x$. The height of each bar, divided by $\Delta x$, approaches $f(x)$ as sample size increases and bin width shrinks. The PDF is the limiting shape of a normalized histogram.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats def demonstrate_pdf_properties(): """ Demonstrate key properties of probability density functions. """ print("Probability Density Function Properties") print("=" * 55) # Example 1: Uniform distribution on [0, 0.5] # PDF = 2.0 on [0, 0.5], zero elsewhere a, b = 0, 0.5 uniform_pdf = 1 / (b - a) # = 2.0 print(f"Example 1: Uniform[0, 0.5]") print(f" PDF value: f(x) = {uniform_pdf}") print(f" Note: f(x) = 2.0 > 1, but this is NOT a probability!") # Verify normalization integral, _ = scipy_integrate(lambda x: uniform_pdf, a, b) print(f" ∫ f(x)dx from 0 to 0.5 = {integral:.4f} (normalized correctly)") # Example 2: Standard Normal print(f"Example 2: Standard Normal N(0, 1)") normal = stats.norm(0, 1) # PDF at various points points = [0, 1, 2, -1] print(" PDF values (not probabilities!):") for x in points: print(f" f({x:2d}) = {normal.pdf(x):.6f}") # At x=0, f(0) ≈ 0.399, highest for standard normal # The value 0.399 still has meaning: probability mass is # concentrated near 0 # Computing actual probabilities via integration print(" Actual probabilities (via integration):") print(f" P(-1 ≤ X ≤ 1) = {normal.cdf(1) - normal.cdf(-1):.6f}") print(f" P(-2 ≤ X ≤ 2) = {normal.cdf(2) - normal.cdf(-2):.6f}") print(f" P(X = 0) = 0.000000 (point probability is zero)") # Example 3: Exponential distribution print(f"Example 3: Exponential(λ=3)") exp_dist = stats.expon(scale=1/3) # scipy uses scale = 1/λ # PDF at x=0 pdf_at_0 = exp_dist.pdf(0) print(f" f(0) = λ = {pdf_at_0:.4f}") print(f" Again, f(0) = 3 > 1, perfectly valid for a density!") # Key insight: integrate to get probabilities print(f" P(X ≤ 1) = {exp_dist.cdf(1):.6f}") print(f" P(X ≤ 2) = {exp_dist.cdf(2):.6f}") def scipy_integrate(f, a, b): """Simple numerical integration.""" from scipy.integrate import quad return quad(f, a, b) demonstrate_pdf_properties()Never interpret f(x) as P(X = x). If you see f(0.3) = 2.5 for some distribution and think 'the probability of 0.3 is 2.5,' you've made the classic error. PDF values are densities; they must be integrated over an interval to yield probabilities. P(X = 0.3) = 0 always for continuous X.
Just as with discrete random variables, the support of a continuous random variable is the set of values it can actually take—where the PDF is positive.
Definition (Support of Continuous RV):
$$\text{supp}(X) = \overline{{x \in \mathbb{R} : f_X(x) > 0}}$$
(The overline denotes closure—including limit points. For practical purposes, it's where $f(x) > 0$.)
Common Support Types:
| Distribution | Parameters | Support | ML Use Case |
|---|---|---|---|
| Uniform | a, b | [a, b] | Random initialization, priors |
| Normal (Gaussian) | μ, σ² | ℝ = (-∞, +∞) | Regression outputs, latent variables |
| Exponential | λ > 0 | [0, +∞) | Time between events, positive quantities |
| Beta | α, β > 0 | [0, 1] | Probabilities, proportions |
| Gamma | α, β > 0 | [0, +∞) | Positive continuous quantities |
| Log-Normal | μ, σ² | (0, +∞) | Multiplicative processes, sizes |
| Truncated Normal | μ, σ, a, b | [a, b] | Bounded continuous quantities |
Why Support Matters in ML:
1. Model-Data Compatibility: If your target variable is strictly positive (e.g., prices, durations), using a Normal distribution (support = ℝ) allows your model to predict impossible negative values. Better choices:
2. Numerical Stability: Evaluating log f(x) when x is outside the support gives log(0) = -∞. This breaks gradient descent. Either:
3. Generative Models: When generating samples, the support constrains what values are possible. A VAE with Gaussian latent space can produce any real vector; one with Beta-distributed components produces values in [0, 1]ⁿ.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
import numpy as npfrom scipy import stats def demonstrate_support_issues(): """ Demonstrate problems that arise from support mismatch. """ print("Support Mismatch Issues in ML") print("=" * 55) # Problem: Modeling strictly positive data (e.g., house prices) # with a Gaussian distribution # Simulated house prices (log-normally distributed, hence positive) np.random.seed(42) true_prices = np.random.lognormal(mean=12, sigma=0.5, size=1000) print(f"Data: House prices (strictly positive)") print(f" Min: ${true_prices.min():, .0f }") print(f" Max: ${true_prices.max():,.0f}") print(f" Mean: ${true_prices.mean():,.0f}") # BAD: Fit a Gaussian(support = ℝ) gaussian_fit = stats.norm.fit(true_prices) gaussian_dist = stats.norm(* gaussian_fit) # What's the probability of negative prices? p_negative = gaussian_dist.cdf(0) print(f"BAD: Gaussian model") print(f" Fitted μ = ${gaussian_fit[0]:,.0f}, σ = ${gaussian_fit[1]:,.0f}") print(f" P(price < 0) = {p_negative:.6f}") print(f" This model assigns {p_negative*100:.4f}% probability to impossible negative prices!") # If we sample from this model: gaussian_samples = gaussian_dist.rvs(10000) n_negative = np.sum(gaussian_samples < 0) print(f" In 10,000 samples: {n_negative} are negative (invalid)") # GOOD: Fit a Log - Normal(support = (0, ∞)) # Log - normal: if Y ~Normal(μ, σ²), then X = exp(Y) ~LogNormal(μ, σ²) log_prices = np.log(true_prices) lognormal_params = stats.norm.fit(log_prices) # Fit normal to log - data print(f"GOOD: Log-Normal model") print(f" Support: (0, ∞) — matches data constraint") print(f" P(price < 0) = 0 (impossible by construction)") # Sample from log - normal: all samples are positive lognormal_samples = np.exp(stats.norm(* lognormal_params).rvs(10000)) n_negative_ln = np.sum(lognormal_samples < 0) print(f" In 10,000 samples: {n_negative_ln} are negative (as expected: 0)") # Log - likelihood comparison ll_gaussian = np.mean(gaussian_dist.logpdf(true_prices)) ll_lognormal = np.mean(stats.lognorm.logpdf(true_prices, s = lognormal_params[1], scale = np.exp(lognormal_params[0]))) print(f"Log-likelihood comparison:") print(f" Gaussian: {ll_gaussian:.4f}") print(f" Log-Normal: {ll_lognormal:.4f}") print(f" Log-Normal fits better (higher is better)") demonstrate_support_issues()The fundamental operation with continuous random variables is computing probabilities of events, which requires integration.
Probability of an Interval: $$P(a \leq X \leq b) = \int_a^b f_X(x) , dx$$
Key Observations:
Endpoint Flexibility: For continuous RVs, $P(a < X < b) = P(a \leq X < b) = P(a < X \leq b) = P(a \leq X \leq b)$ because $P(X = a) = 0$. Boundaries don't matter.
Complement Rule: $P(X > a) = 1 - P(X \leq a) = 1 - F_X(a)$ where $F_X$ is the CDF.
Union of Disjoint Intervals: For disjoint intervals $A$ and $B$: $$P(X \in A \cup B) = \int_A f(x) dx + \int_B f(x) dx$$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import numpy as npfrom scipy import stats, integrate class ContinuousPDF: """ Wrapper class demonstrating probability computations with continuous distributions. """ def __init__(self, distribution): """ Args: distribution: A scipy.stats distribution object """ self.dist = distribution def prob_interval(self, a: float, b: float) -> float: """ Compute P(a ≤ X ≤ b) via CDF difference. This is more accurate than numerical integration when analytical CDF is available. """ return self.dist.cdf(b) - self.dist.cdf(a) def prob_interval_integrate(self, a: float, b: float) -> float: """ Compute P(a ≤ X ≤ b) via numerical integration. Useful when only PDF is available(no closed - form CDF). """ result, _ = integrate.quad(self.dist.pdf, a, b) return result def prob_greater(self, a: float) -> float: """Compute P(X > a) = 1 - P(X ≤ a).""" return 1 - self.dist.cdf(a) def prob_less(self, a: float) -> float: """Compute P(X < a) = P(X ≤ a) (for continuous RV).""" return self.dist.cdf(a) def prob_outside(self, a: float, b: float) -> float: """Compute P(X < a or X > b).""" return self.dist.cdf(a) + (1 - self.dist.cdf(b)) def quantile(self, p: float) -> float: """Compute x such that P(X ≤ x) = p.""" return self.dist.ppf(p) # Demonstration with Standard Normal print("Probability Computations with N(0, 1)") print("=" * 55) normal = ContinuousPDF(stats.norm(0, 1)) print("Basic Probabilities:") print(f" P(-1 ≤ X ≤ 1) = {normal.prob_interval(-1, 1):.6f}") print(f" P(-2 ≤ X ≤ 2) = {normal.prob_interval(-2, 2):.6f}") print(f" P(-3 ≤ X ≤ 3) = {normal.prob_interval(-3, 3):.6f}") print("The 68-95-99.7 Rule:") print(f" Within 1σ: {normal.prob_interval(-1, 1)*100:.1f}%") print(f" Within 2σ: {normal.prob_interval(-2, 2)*100:.1f}%") print(f" Within 3σ: {normal.prob_interval(-3, 3)*100:.1f}%") print("Tail Probabilities:") print(f" P(X > 1.645) = {normal.prob_greater(1.645):.6f} (≈ 5%)") print(f" P(X > 1.96) = {normal.prob_greater(1.96):.6f} (≈ 2.5%)") print(f" P(X > 2.576) = {normal.prob_greater(2.576):.6f} (≈ 0.5%)") print("Quantiles (inverse CDF):") print(f" 95th percentile: x = {normal.quantile(0.95):.4f}") print(f" 99th percentile: x = {normal.quantile(0.99):.4f}") print(f" Median (50th): x = {normal.quantile(0.50):.4f}") print("Verification via Numerical Integration:") p_via_cdf = normal.prob_interval(-1.5, 0.5) p_via_int = normal.prob_interval_integrate(-1.5, 0.5) print(f" P(-1.5 ≤ X ≤ 0.5) via CDF: {p_via_cdf:.10f}") print(f" P(-1.5 ≤ X ≤ 0.5) via ∫: {p_via_int:.10f}") print(f" Difference: {abs(p_via_cdf - p_via_int):.2e}")If your distribution has a closed-form CDF, use P(a ≤ X ≤ b) = F(b) - F(a) instead of numerically integrating the PDF. It's faster, more accurate, and avoids numerical integration pitfalls like handling infinite domains or near-singular functions.
When we apply a function to a continuous random variable, we get a new random variable with a different distribution. Unlike the discrete case (where we just tracked which values mapped together), the continuous case requires careful calculus.
The Change of Variables Formula:
Let $X$ be a continuous RV with PDF $f_X$, and let $Y = g(X)$ where $g$ is a strictly monotonic (strictly increasing or decreasing) differentiable function. Then $Y$ has PDF:
$$f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|$$
The term $|\frac{d}{dy} g^{-1}(y)|$ is called the Jacobian factor—it accounts for how the transformation stretches or compresses probability density.
Intuition:
Consider $Y = 2X$ where $X \sim \text{Uniform}(0, 1)$. Then $Y$ ranges over $[0, 2]$. But what's the PDF of $Y$?
Using the formula:
The Jacobian compensates for the stretching.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import numpy as npfrom scipy import stats def transform_continuous_rv(): """ Demonstrate transformations of continuous random variables. """ print("Continuous RV Transformations") print("=" * 55) # Example 1: Linear transformation Y = aX + b # If X ~N(μ, σ²), then Y = aX + b ~N(aμ + b, a²σ²) print("Example 1: Linear Transform Y = 2X + 3") print(" X ~ N(0, 1)") X = stats.norm(0, 1) # Analytical result a, b = 2, 3 mu_Y = a * 0 + b # = 3 sigma_Y = abs(a) * 1 # = 2 Y_analytical = stats.norm(mu_Y, sigma_Y) # Verify via sampling samples_X = X.rvs(100000) samples_Y = a * samples_X + b print(f" Analytical: Y ~ N({mu_Y}, {sigma_Y**2})") print(f" Empirical: mean = {samples_Y.mean():.4f}, std = {samples_Y.std():.4f}") # Example 2: Squaring Y = X² # If X ~N(0, 1), then X² ~χ²(1)(chi - squared with 1 df) print("Example 2: Y = X² where X ~ N(0, 1)") print(" Result: Y ~ χ²(1)") samples_X2 = samples_X ** 2 chi2_1 = stats.chi2(df = 1) print(f" Empirical mean: {samples_X2.mean():.4f}") print(f" χ²(1) mean: {chi2_1.mean():.4f}") print(f" Empirical var: {samples_X2.var():.4f}") print(f" χ²(1) variance: {chi2_1.var():.4f}") # Example 3: Exponential transformation Y = exp(X)(Log - normal) # If X ~N(μ, σ²), then Y = exp(X) ~LogNormal(μ, σ²) print("Example 3: Y = exp(X) where X ~ N(0, 1)") print(" Result: Y ~ LogNormal(0, 1)") samples_exp = np.exp(samples_X) lognorm_01 = stats.lognorm(s = 1, scale = 1) # s = σ, scale = exp(μ) print(f" Empirical mean: {samples_exp.mean():.4f}") print(f" LogNormal mean: {lognorm_01.mean():.4f}") # Example 4: The log transform(inverse of exp) # If Y ~LogNormal(μ, σ²), then log(Y) ~N(μ, σ²) print("Example 4: Z = log(Y) where Y ~ LogNormal") print(" This recovers the original Normal distribution") samples_log = np.log(samples_exp) print(f" Original X ~ N(0,1), Empirical: mean={samples_log.mean():.4f}, std={samples_log.std():.4f}") def jacobian_example(): """ Demonstrate the Jacobian correction in detail. """ print("" + "=" * 55) print("The Jacobian Factor: Why It Matters") print("=" * 55) # Without Jacobian correction, transformed densities won't normalize # X ~Uniform(0, 1), f_X(x) = 1 # Y = X² ∈[0, 1] # g(x) = x², g⁻¹(y) = √y # dg⁻¹/dy = 1/(2√y) # Correct PDF: f_Y(y) = f_X(√y) * | 1 / (2√y)| = 1 * 1 / (2√y) = 1 / (2√y) print("X ~ Uniform(0, 1), Y = X²") print("Without Jacobian: f_Y(y) = f_X(√y) = 1 (WRONG)") print("With Jacobian: f_Y(y) = 1/(2√y) (CORRECT)") # Verify normalization from scipy.integrate import quad wrong_pdf = lambda y: 1.0 if 0 < y < 1 else 0 correct_pdf = lambda y: 1 / (2 * np.sqrt(y)) if 0 < y < 1 else 0 integral_wrong, _ = quad(wrong_pdf, 0.001, 0.999) integral_correct, _ = quad(correct_pdf, 0.001, 0.999) print(f"Normalization check (should be 1.0):") print(f" Without Jacobian: ∫ f_Y dy ≈ {integral_wrong:.4f} (wrong!)") print(f" With Jacobian: ∫ f_Y dy ≈ {integral_correct:.4f} (correct)") transform_continuous_rv() jacobian_example()The Jacobian factor appears in normalizing flows, a class of generative models that transform simple distributions (like Gaussians) into complex ones through a chain of invertible functions. The log-likelihood requires computing |det J| at each step—a computational bottleneck that drives architectural choices in flow-based models.
Certain continuous distributions appear repeatedly in machine learning. Mastering their properties is essential for both understanding existing models and designing new ones.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
import numpy as npfrom scipy import stats def compare_key_distributions(): """ Compare properties of key continuous distributions. """ print("Key Continuous Distributions for ML") print("=" * 60) distributions = [ ("Uniform(0, 1)", stats.uniform(0, 1), "[0, 1]"), ("Normal(0, 1)", stats.norm(0, 1), "ℝ"), ("Exp(λ=1)", stats.expon(scale = 1), "[0, ∞)"), ("Beta(2, 5)", stats.beta(2, 5), "[0, 1]"), ("Gamma(2, 1)", stats.gamma(2, scale = 1), "[0, ∞)"), ] print(f"{'Distribution':<20} {'Support':<12} {'Mean':>8} {'Var':>8} {'Median':>8}") print("-" * 60) for name, dist, support in distributions: print(f"{name:<20} {support:<12} {dist.mean():>8.4f} {dist.var():>8.4f} {dist.median():>8.4f}") # ML Application: Regression residuals print(" ML Application: Modeling Regression Residuals") print("=" * 60) # Generate synthetic residuals(should be normal if homoscedastic) np.random.seed(42) true_residuals = np.random.normal(0, 2, 500) # Fit normal distribution mu, sigma = stats.norm.fit(true_residuals) print(f"Fitted Normal: μ = {mu:.4f}, σ = {sigma:.4f}") # Check fit quality with Shapiro - Wilk test stat, p_value = stats.shapiro(true_residuals[: 50]) # use subset for speed print(f"Shapiro-Wilk test: p = {p_value:.4f}") print(f"Normality assumption: {'Supported' if p_value > 0.05 else 'Rejected'}") # ML Application: Bayesian priors print(" ML Application: Beta Distribution as Prior") print("=" * 60) priors = [ ("Uninformative", 1, 1), # Uniform ("Weak belief p≈0.5", 2, 2), # Centered, low confidence ("Strong belief p≈0.5", 10, 10), # Centered, high confidence ("Belief p≈0.2", 2, 8), # Skewed toward small p ] print(f"{'Prior Type':<25} {'α':>4} {'β':>4} {'Mean':>8} {'Mode':>8} {'Std':>8}") print("-" * 60) for name, a, b in priors: dist = stats.beta(a, b) mode = (a - 1) / (a + b - 2) if a > 1 and b > 1 else 'N/A' mode_str = f"{mode:.4f}" if isinstance(mode, float) else mode print(f"{name:<25} {a:>4} {b:>4} {dist.mean():>8.4f} {mode_str:>8} {dist.std():>8.4f}") compare_key_distributions()Not all random variables are purely discrete or purely continuous. Real ML systems often involve mixed distributions that have both continuous components and discrete point masses.
Definition (Mixed Distribution):
A random variable $X$ has a mixed distribution if its CDF can be written as: $$F_X(x) = p \cdot F_d(x) + (1-p) \cdot F_c(x)$$
where $F_d$ is a discrete CDF, $F_c$ is a continuous CDF, and $p \in (0, 1)$.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
import numpy as npfrom scipy import stats def zero_inflated_poisson_demo(): """ Demonstrate a Zero - Inflated Poisson(ZIP) model, a mixed distribution common in ML applications. """ print("Zero-Inflated Poisson Distribution") print("=" * 55) print("Model: With probability π, X=0 (structural zero)") print(" With probability 1-π, X ~ Poisson(λ)") # Parameters pi = 0.3 # Probability of structural zero lambda_ = 3.0 # Poisson parameter print(f"Parameters: π = {pi}, λ = {lambda_}") # Generate samples np.random.seed(42) n_samples = 10000 # Two - step generation structural_zero = np.random.binomial(1, pi, n_samples) # 1 if structural zero poisson_samples = np.random.poisson(lambda_, n_samples) zip_samples = np.where(structural_zero == 1, 0, poisson_samples) # Compare to regular Poisson poisson_only = np.random.poisson(lambda_, n_samples) print(f"Empirical comparison:") print(f" ZIP P(X=0) = {np.mean(zip_samples == 0):.4f}") print(f" Poisson P(X=0) = {np.mean(poisson_only == 0):.4f}") print(f" Theoretical ZIP P(X=0) = π + (1-π)exp(-λ) = {pi + (1-pi)*np.exp(-lambda_):.4f}") print(f" Theoretical Poisson P(X=0) = exp(-λ) = {np.exp(-lambda_):.4f}") print(f" ZIP mean = {np.mean(zip_samples):.4f}") print(f" Poisson mean = {np.mean(poisson_only):.4f}") print(f" Theoretical ZIP mean = (1-π)λ = {(1-pi)*lambda_:.4f}") # This matters for ML because naive Poisson regression on # zero - inflated data will give biased parameter estimates print("⚠️ Using standard Poisson on ZIP data biases all estimates!") print(" Always check for excess zeros in count data.") def censored_data_demo(): """ Demonstrate censored data, another mixed distribution. """ print(" Censored Data (Right Censoring)") print("=" * 55) print("Model: True X ~ Exponential(λ), but we only observe min(X, C)") lambda_ = 0.5 # True rate C = 3.0 # Censoring threshold n_samples = 10000 np.random.seed(42) true_values = np.random.exponential(1 / lambda_, n_samples) observed = np.minimum(true_values, C) is_censored = true_values >= C print(f"True λ = {lambda_}, Censoring at C = {C}") print(f"Censoring rate: {np.mean(is_censored)*100:.1f}%") # Naive mean estimate(biased) naive_mean = np.mean(observed) true_mean = 1 / lambda_ print(f"Mean estimation:") print(f" True mean (1/λ): {true_mean:.4f}") print(f" Naive mean: {naive_mean:.4f} (biased downward!)") print(f" Bias: {(naive_mean - true_mean):.4f}") print("⚠️ Ignoring censoring leads to biased estimates!") print(" Use survival analysis methods (Kaplan-Meier, Cox PH)") zero_inflated_poisson_demo() censored_data_demo()When data comes from a mixed distribution but you fit a pure continuous or pure discrete model, your parameter estimates will be biased, often severely. Always examine your data for excess point masses (like zeros) or censoring before choosing a model family.
We've established the mathematical framework for continuous random variables—the foundation for modeling quantities that vary smoothly across uncountable domains.
What's Next:
The next page unifies our treatment of discrete and continuous random variables through the Cumulative Distribution Function (CDF) and provides a detailed treatment of PMFs and PDFs together. We'll see how the CDF provides a universal interface to probability distributions, regardless of whether they're discrete, continuous, or mixed.
You now understand continuous random variables and probability density functions. The conceptual shift from 'probability of a point' to 'density at a point' is fundamental—it unlocks the mathematics of regression, generative modeling, and all continuous inference in machine learning.