Loading content...
In the previous page, we established that Monte Carlo estimation converts integration problems into sampling problems. But a critical challenge remains: how do we generate samples from a target distribution when we can't directly sample from it?
For simple distributions—uniform, Gaussian, exponential—most programming languages provide built-in samplers. But Bayesian posteriors rarely take these convenient forms. Instead, we face distributions defined implicitly through the product of likelihoods and priors, often with complex shapes, multiple modes, or irregular boundaries.
Rejection sampling is our first technique for bridging this gap. It's conceptually elegant: use a simpler proposal distribution we can sample from, and systematically accept or reject samples to recover the target distribution. When it works, it produces exact i.i.d. samples—as good as if we had a direct sampler for the target.
This page develops rejection sampling from first principles. You will understand the geometric intuition behind the algorithm, derive the acceptance probability, analyze efficiency, and recognize when rejection sampling is appropriate versus when more sophisticated methods are needed.
Rejection sampling rests on a beautifully geometric insight. Consider sampling uniformly from the region under a probability density curve. If we project these points onto the x-axis, the resulting values follow the target density.
Why this works:
Think of a probability density $p(x)$ as a curve in two dimensions. The probability that $X$ falls in an interval $[a, b]$ equals the area under the curve between $a$ and $b$:
$$P(a \leq X \leq b) = \int_a^b p(x) , dx$$
If we sample points uniformly from the entire region under the curve (the area between the curve and the x-axis), more points fall where the curve is high. Projecting to the x-axis gives exactly the distribution we want.
The practical problem:
Directly sampling uniformly from the region under an arbitrary curve is as hard as the original problem. We need a workable approximation.
The envelope solution:
Suppose we have a proposal distribution $q(x)$ that we can sample from, and a constant $M$ such that:
$$M \cdot q(x) \geq p(x) \quad \text{for all } x$$
The function $M \cdot q(x)$ forms an envelope that completely covers the target density. We can sample uniformly from under this envelope (which is easy), then discard points that fall above the target curve.
The requirement Mq(x) ≥ p(x) must hold for ALL x in the support. If the envelope dips below the target anywhere, samples from that region will be under-represented, producing biased results. Always verify dominance across the entire domain, paying special attention to tails.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats def visualize_envelope(target_pdf, proposal_pdf, M, x_range, title): """ Visualize the envelope relationship for rejection sampling. """ x = np.linspace(x_range[0], x_range[1], 1000) p_x = target_pdf(x) # Target density Mq_x = M * proposal_pdf(x) # Envelope plt.figure(figsize=(10, 6)) # Fill the region under the envelope plt.fill_between(x, 0, Mq_x, alpha=0.3, color='blue', label=f'Envelope Mq(x), M={M:.2f}') # Fill the region under the target plt.fill_between(x, 0, p_x, alpha=0.5, color='green', label='Target p(x)') plt.plot(x, Mq_x, 'b-', linewidth=2) plt.plot(x, p_x, 'g-', linewidth=2) plt.xlabel('x') plt.ylabel('Density') plt.title(title) plt.legend() plt.grid(True, alpha=0.3) # Check if envelope dominates if np.any(Mq_x < p_x - 1e-10): print("WARNING: Envelope does NOT dominate target everywhere!") else: print("✓ Envelope properly dominates target") return plt # Example: Sample from Beta(2.7, 6.3) using Uniform proposalalpha, beta_param = 2.7, 6.3 def target_pdf(x): return stats.beta.pdf(x, alpha, beta_param) def proposal_pdf(x): return np.ones_like(x) # Uniform(0, 1) # Find M: max of target / proposal = max of target (since proposal = 1)x_opt = (alpha - 1) / (alpha + beta_param - 2) # Mode of BetaM = target_pdf(x_opt) print(f"Target: Beta({alpha}, {beta_param})")print(f"Proposal: Uniform(0, 1)")print(f"Optimal M = {M:.4f}")print(f"Expected acceptance rate = 1/M = {1/M:.4f} = {100/M:.1f}%") visualize_envelope(target_pdf, proposal_pdf, M, (0, 1), f'Rejection Sampling: Beta({alpha}, {beta_param}) with Uniform Envelope')plt.show()With the geometric intuition established, we can formalize the rejection sampling algorithm.
Problem setup:
The algorithm:
Repeat:
1. Draw x from proposal: x ~ q(x)
2. Draw u uniformly: u ~ Uniform(0, 1)
3. If u ≤ p(x) / (M · q(x)):
Accept x as a sample from p(x)
Else:
Reject x and return to step 1
Why this produces samples from $p(x)$:
We're sampling uniformly from the region under the envelope $M \cdot q(x)$, which we do by:
We accept if this point $(x, u \cdot M \cdot q(x))$ falls under the target curve $p(x)$. The probability of acceptance at point $x$ is exactly $\frac{p(x)}{M \cdot q(x)}$.
The joint density of accepted samples is:
p(accepted at x) = p(proposed x) × p(accept | x) = q(x) × p(x)/(Mq(x)) = p(x)/M
Normalizing by the total acceptance probability (1/M), we recover p(x). This proves samples are exactly distributed according to the target.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import numpy as npfrom scipy import stats def rejection_sampling(target_pdf, proposal_sampler, proposal_pdf, M, n_samples): """ Generate samples from target_pdf using rejection sampling. Parameters: ----------- target_pdf : callable The target probability density function p(x) proposal_sampler : callable Function that returns a sample from the proposal q(x) proposal_pdf : callable The proposal probability density function q(x) M : float Bound constant such that M * q(x) >= p(x) for all x n_samples : int Number of samples to generate Returns: -------- samples : ndarray Array of samples from the target distribution acceptance_rate : float Fraction of proposals that were accepted """ samples = [] n_proposed = 0 while len(samples) < n_samples: # Step 1: Draw from proposal x = proposal_sampler() n_proposed += 1 # Step 2: Draw uniform for acceptance u = np.random.uniform(0, 1) # Step 3: Accept or reject acceptance_prob = target_pdf(x) / (M * proposal_pdf(x)) if u <= acceptance_prob: samples.append(x) acceptance_rate = n_samples / n_proposed return np.array(samples), acceptance_rate # Example: Sample from Beta(2.7, 6.3) using Uniform(0,1) proposalalpha, beta_param = 2.7, 6.3 def target_pdf(x): return stats.beta.pdf(x, alpha, beta_param) def proposal_sampler(): return np.random.uniform(0, 1) def proposal_pdf(x): return 1.0 # Uniform(0, 1) # Compute M as max of target (since proposal is uniform with density 1)x_mode = (alpha - 1) / (alpha + beta_param - 2)M = target_pdf(x_mode) print(f"=== Rejection Sampling: Beta({alpha}, {beta_param}) ===")print(f"Proposal: Uniform(0, 1)")print(f"M = {M:.4f}")print(f"Theoretical acceptance rate: {1/M:.4f}")print() # Generate samplesn_samples = 10000samples, acc_rate = rejection_sampling( target_pdf, proposal_sampler, proposal_pdf, M, n_samples) print(f"Generated {n_samples} samples")print(f"Empirical acceptance rate: {acc_rate:.4f}")print() # Verify samples match target distributionprint("Sample statistics vs theoretical:")print(f" Sample mean: {np.mean(samples):.4f}")print(f" True mean: {alpha / (alpha + beta_param):.4f}")print(f" Sample var: {np.var(samples):.4f}")true_var = (alpha * beta_param) / ((alpha + beta_param)**2 * (alpha + beta_param + 1))print(f" True var: {true_var:.4f}")The efficiency of rejection sampling is entirely determined by the acceptance probability—the fraction of proposals we keep.
Overall acceptance probability:
$$P(\text{accept}) = \int P(\text{accept} | x) , q(x) , dx = \int \frac{p(x)}{M \cdot q(x)} \cdot q(x) , dx = \frac{1}{M} \int p(x) , dx = \frac{1}{M}$$
The acceptance rate equals exactly $\frac{1}{M}$—the ratio of the area under the target to the area under the envelope.
Optimal choice of $M$:
To maximize efficiency, we want the smallest $M$ that still satisfies the dominance condition:
$$M^* = \max_x \frac{p(x)}{q(x)}$$
This is the supremum ratio of target to proposal. The optimal $M$ makes the envelope as tight as possible around the target.
The efficiency bottleneck:
Rejection sampling efficiency degrades when:
In moderate dimensions, even a good proposal leads to astronomical $M$ values and essentially zero acceptance.
Consider matching a d-dimensional standard Gaussian to itself with a slightly inflated envelope. If the proposal has variance 1.1 instead of 1.0, the ratio of densities at the mode scales as (1.1)^(d/2). In d=100 dimensions, M ≈ 10^21. We would need 10^21 proposals per accepted sample—utterly impractical. This exponential degradation makes rejection sampling unusable for high-dimensional posteriors.
| Dimension | Optimal M (Gaussian) | Acceptance Rate | Proposals per Sample |
|---|---|---|---|
| 1 | 1.05 | 95% | ~1 |
| 5 | 1.28 | 78% | ~1.3 |
| 10 | 1.65 | 61% | ~1.6 |
| 20 | 2.71 | 37% | ~2.7 |
| 50 | 12.2 | 8.2% | ~12 |
| 100 | 148 | 0.68% | ~150 |
| 500 | 10^12 | 10^-12 | 10^12 |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
import numpy as np def analyze_gaussian_rejection_efficiency(): """ Analyze how rejection sampling efficiency degrades with dimension. Scenario: Target is N(0, I_d), proposal is N(0, 1.1*I_d) This is nearly optimal - same shape, just slightly inflated. """ print("=== Rejection Sampling Efficiency: Gaussian in d Dimensions ===") print("Target: N(0, I_d)") print("Proposal: N(0, 1.1 * I_d)") print() print(f"{'Dimension':>10} | {'M':>12} | {'Acceptance':>12} | {'Proposals/Sample':>16}") print("-" * 60) for d in [1, 2, 5, 10, 20, 50, 100, 200, 500]: # For Gaussians, M = (proposal_var / target_var)^(d/2) sigma_ratio = 1.1 M = sigma_ratio ** (d / 2) acceptance = 1 / M if M < 1e15: print(f"{d:>10} | {M:>12.2f} | {acceptance:>11.2e} | {M:>16.1f}") else: print(f"{d:>10} | {M:>12.2e} | {acceptance:>11.2e} | {'impractical':>16}") print() print("Key insight: Even with nearly optimal proposal (only 10% inflated),") print("efficiency becomes impractical beyond d ≈ 50-100 dimensions.") def optimize_proposal_constant(target_pdf, proposal_pdf, x_range, n_grid=10000): """ Find optimal M by grid search over the domain. M* = max_x p(x) / q(x) """ if isinstance(x_range[0], (int, float)): # 1D case x = np.linspace(x_range[0], x_range[1], n_grid) ratios = target_pdf(x) / proposal_pdf(x) M_opt = np.max(ratios) x_max = x[np.argmax(ratios)] return M_opt, x_max else: raise NotImplementedError("Use MCMC for high-dimensional cases") analyze_gaussian_rejection_efficiency() # 1D Example: Finding optimal Mfrom scipy import stats print("\n=== Finding Optimal M for 1D Beta-Gaussian ===")print("Target: Beta(5, 2) - right-skewed on [0,1]")print("Proposal: Truncated Normal(0.7, 0.3) on [0,1]") def target(x): return stats.beta.pdf(x, 5, 2) def proposal(x): # Truncated normal on [0, 1] a, b = (0 - 0.7) / 0.3, (1 - 0.7) / 0.3 # Standardize bounds return stats.truncnorm.pdf(x, a, b, loc=0.7, scale=0.3) M_opt, x_max = optimize_proposal_constant(target, proposal, (0.001, 0.999))print(f"Optimal M = {M_opt:.4f} (achieved at x = {x_max:.4f})")print(f"Expected acceptance rate = {1/M_opt:.4f} = {100/M_opt:.1f}%")The art of rejection sampling lies in selecting a good proposal distribution. The ideal proposal closely matches the target's shape while remaining easy to sample from.
Desiderata for proposals:
Common proposal choices:
| Target Type | Good Proposals | Notes |
|---|---|---|
| Bounded support | Uniform | Simple, but inefficient for peaked targets |
| Unimodal | Gaussian, t-distribution | t-distribution for heavy tails |
| Log-concave | Laplace, Exponential | Adaptive rejection sampling |
| Mixture | Match mixture structure | Use proposal mixture |
| Heavy-tailed | Cauchy, t with low df | Never use Gaussian for Cauchy target |
The tail matching requirement:
If the target has heavier tails than the proposal, the ratio $p(x)/q(x) \to \infty$ as $|x| \to \infty$, making $M$ infinite. No valid bound exists.
$$\text{If } \frac{p(x)}{q(x)} \to \infty \text{ as } x \to \pm\infty, \text{ then rejection sampling fails}$$
Never use a Gaussian proposal for a Cauchy target, or any light-tailed proposal for a heavy-tailed target. The Gaussian decays as exp(-x²) while Cauchy decays as 1/x². The ratio explodes: Cauchy/Gaussian → ∞. No finite M exists. This is not a efficiency problem—rejection sampling is mathematically impossible in this case.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
import numpy as npfrom scipy import stats def compare_proposals_for_target(): """ Compare different proposals for sampling from a mixture distribution. Target: 0.3 * N(-2, 0.5) + 0.7 * N(2, 1) """ # Define target def target_pdf(x): return (0.3 * stats.norm.pdf(x, -2, 0.5) + 0.7 * stats.norm.pdf(x, 2, 1)) # Proposal 1: Single Gaussian centered at mixture mean def proposal1_pdf(x): return stats.norm.pdf(x, 0.8, 2) # Mean at 0.3*(-2) + 0.7*2 = 0.8 def proposal1_sampler(): return np.random.normal(0.8, 2) # Proposal 2: Wider Gaussian def proposal2_pdf(x): return stats.norm.pdf(x, 0.5, 3) def proposal2_sampler(): return np.random.normal(0.5, 3) # Proposal 3: t-distribution (heavier tails) df = 5 def proposal3_pdf(x): return stats.t.pdf(x, df, loc=0.5, scale=2) def proposal3_sampler(): return stats.t.rvs(df, loc=0.5, scale=2) # Find M for each proposal (grid search) x = np.linspace(-6, 8, 10000) proposals = [ ("N(0.8, 2)", proposal1_pdf, proposal1_sampler), ("N(0.5, 3)", proposal2_pdf, proposal2_sampler), (f"t(df={df}, mu=0.5, s=2)", proposal3_pdf, proposal3_sampler), ] print("=== Proposal Comparison for Mixture Target ===") print("Target: 0.3*N(-2, 0.5) + 0.7*N(2, 1)") print() for name, pdf, sampler in proposals: target_vals = target_pdf(x) proposal_vals = pdf(x) # Avoid division by zero valid = proposal_vals > 1e-10 ratios = np.zeros_like(x) ratios[valid] = target_vals[valid] / proposal_vals[valid] M = np.max(ratios) acceptance = 1 / M print(f"Proposal: {name}") print(f" M = {M:.4f}") print(f" Acceptance rate = {acceptance:.4f} = {100*acceptance:.1f}%") print() compare_proposals_for_target()In Bayesian inference, we typically know the posterior only up to a normalizing constant:
$$p(\theta | \mathcal{D}) = \frac{\tilde{p}(\theta | \mathcal{D})}{Z}$$
where $\tilde{p}(\theta | \mathcal{D}) = p(\mathcal{D} | \theta) \cdot p(\theta)$ is the unnormalized posterior and $Z = p(\mathcal{D})$ is unknown.
Can rejection sampling work here?
Yes! We need a bound on the unnormalized density:
$$\tilde{M} \cdot q(\theta) \geq \tilde{p}(\theta | \mathcal{D}) \quad \text{for all } \theta$$
The algorithm remains identical, with acceptance probability:
$$P(\text{accept} | \theta) = \frac{\tilde{p}(\theta | \mathcal{D})}{\tilde{M} \cdot q(\theta)}$$
Why this still works:
The overall acceptance rate becomes $\frac{Z}{\tilde{M}}$ instead of $\frac{1}{M}$. But since both the target and accepted samples are scaled by the same constant $Z$, the accepted samples still follow the normalized posterior.
The challenge:
Finding the bound $\tilde{M} = \max_\theta \frac{\tilde{p}(\theta | \mathcal{D})}{q(\theta)}$ requires optimizing the ratio—which can be as hard as the original inference problem. For complex posteriors, we may not know where the posterior peaks.
For rejection sampling to work with unnormalized densities, we must find a valid bound M̃. If we underestimate M̃, samples near the maximum are rejected too rarely, biasing results. If we overestimate M̃, we waste computation on low-acceptance sampling. In high dimensions, finding the maximum of p̃(θ)/q(θ) is often intractable, limiting rejection sampling's applicability.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
import numpy as npfrom scipy import stats, optimize def rejection_sampling_unnormalized( unnormalized_pdf, proposal_sampler, proposal_pdf, M_tilde, n_samples): """ Rejection sampling for unnormalized target densities. Parameters: ----------- unnormalized_pdf : callable Unnormalized target density p̃(x) (proportional to true density) M_tilde : float Bound such that M_tilde * q(x) >= p̃(x) for all x Returns samples from the normalized version of unnormalized_pdf. """ samples = [] n_proposed = 0 while len(samples) < n_samples: x = proposal_sampler() n_proposed += 1 u = np.random.uniform(0, 1) # Use unnormalized density in acceptance ratio acceptance_prob = unnormalized_pdf(x) / (M_tilde * proposal_pdf(x)) if u <= acceptance_prob: samples.append(x) acceptance_rate = n_samples / n_proposed return np.array(samples), acceptance_rate # Example: Unnormalized posterior from Bayesian inference# Likelihood: Binomial(n=20, p=theta)# Prior: Beta(1, 1) = Uniform(0, 1)# Data: observed 14 successes n_trials = 20n_success = 14 def unnormalized_posterior(theta): """Unnormalized: likelihood * prior""" if theta <= 0 or theta >= 1: return 0.0 likelihood = stats.binom.pmf(n_success, n_trials, theta) prior = stats.beta.pdf(theta, 1, 1) # Uniform return likelihood * prior def proposal_sampler(): return np.random.uniform(0, 1) def proposal_pdf(theta): return 1.0 # Find M_tilde by optimizationresult = optimize.minimize_scalar( lambda x: -unnormalized_posterior(x), bounds=(0.01, 0.99), method='bounded')M_tilde = -result.fun * 1.01 # Slightly inflate for safety print("=== Bayesian Inference via Rejection Sampling ===")print(f"Data: {n_success} successes in {n_trials} trials")print(f"Prior: Beta(1, 1) = Uniform")print(f"Proposal: Uniform(0, 1)")print(f"M̃ = {M_tilde:.6f}")print() samples, acc_rate = rejection_sampling_unnormalized( unnormalized_posterior, proposal_sampler, proposal_pdf, M_tilde, 10000) # True posterior is Beta(1 + 14, 1 + 6) = Beta(15, 7)true_mean = 15 / (15 + 7)true_var = (15 * 7) / ((15 + 7)**2 * (15 + 7 + 1)) print(f"Acceptance rate: {acc_rate:.4f}")print(f"Sample mean: {np.mean(samples):.4f} (true: {true_mean:.4f})")print(f"Sample var: {np.var(samples):.4f} (true: {true_var:.4f})")print(f"95% CI: [{np.percentile(samples, 2.5):.4f}, {np.percentile(samples, 97.5):.4f}]")Adaptive Rejection Sampling (ARS) improves efficiency by automatically constructing and refining the envelope during sampling. This eliminates the need to pre-specify the proposal.
Key insight: Log-concave targets
A density $p(x)$ is log-concave if $\log p(x)$ is a concave function. This includes:
For log-concave densities, tangent lines to $\log p(x)$ lie entirely above the curve, making them perfect envelopes.
The ARS algorithm:
Efficiency gains:
ARS is particularly valuable for Gibbs sampling, where we need to sample from conditional distributions. Many conditionals in hierarchical models are log-concave, making ARS an automatic, efficient choice. Libraries like R's 'ars' package and Python's 'adaptive_rejection_sampling' package provide implementations.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
import numpy as npfrom scipy import stats def simple_ars_demo(log_pdf, log_pdf_derivative, x_range, n_samples): """ Simplified ARS demonstration for 1D log-concave densities. For production use, use scipy.stats or specialized ARS libraries. This is pedagogical, showing the core idea. """ a, b = x_range samples = [] # Initialize with a few points hull_points = [a + 0.1 * (b - a), (a + b) / 2, b - 0.1 * (b - a)] hull_logp = [log_pdf(x) for x in hull_points] hull_deriv = [log_pdf_derivative(x) for x in hull_points] n_density_evals = len(hull_points) n_accepted = 0 while len(samples) < n_samples: # For simplicity, sample uniformly and use standard rejection # (Full ARS would sample from piecewise-exponential envelope) x = np.random.uniform(a, b) log_px = log_pdf(x) n_density_evals += 1 # Find envelope value at x (piecewise linear on log scale) # This is simplified - real ARS uses tangent lines envelope_log = -np.inf for i, (xi, logpi, di) in enumerate(zip(hull_points, hull_logp, hull_deriv)): # Tangent line: log p(xi) + d(xi) * (x - xi) tangent = logpi + di * (x - xi) envelope_log = max(envelope_log, tangent) # Acceptance probability u = np.random.uniform(0, 1) if np.log(u) <= log_px - envelope_log: samples.append(x) n_accepted += 1 # Add point to refine envelope if len(hull_points) < 20: # Limit refinement hull_points.append(x) hull_logp.append(log_px) hull_deriv.append(log_pdf_derivative(x)) print(f"ARS Statistics:") print(f" Samples generated: {len(samples)}") print(f" Density evaluations: {n_density_evals}") print(f" Refinement points: {len(hull_points)}") return np.array(samples) # Example: Sample from Gamma(shape=3, rate=2) using ARSshape, rate = 3, 2 def log_gamma_pdf(x): if x <= 0: return -np.inf return (shape - 1) * np.log(x) - rate * x def log_gamma_derivative(x): return (shape - 1) / x - rate print("=== Adaptive Rejection Sampling: Gamma(3, 2) ===")print("(Log-concave since shape >= 1)")print() # True mean = shape/rate = 1.5, var = shape/rate^2 = 0.75samples = simple_ars_demo(log_gamma_pdf, log_gamma_derivative, (0.01, 8), 5000) print(f"\nSample mean: {np.mean(samples):.4f} (true: {shape/rate:.4f})")print(f"Sample var: {np.var(samples):.4f} (true: {shape/rate**2:.4f})")Rejection sampling is elegant but has significant limitations that restrict its practical applicability.
When rejection sampling works well:
When rejection sampling fails:
The fundamental trade-off:
Rejection sampling produces exact i.i.d. samples—the gold standard. But this exactness comes at a cost: the requirement for an envelope that dominates everywhere becomes impossible to satisfy efficiently in high dimensions.
For high-dimensional Bayesian posteriors, we trade exactness for practicality and move to Markov Chain Monte Carlo (MCMC), which produces correlated samples but scales to any dimension.
| Scenario | Verdict | Reason |
|---|---|---|
| 1D posterior, log-concave | ✓ Excellent | Use ARS, nearly 100% acceptance |
| 1D posterior, complex | ✓ Good | Standard RS, moderate efficiency |
| 2-5D posterior, simple shape | △ Viable | Careful proposal design needed |
| 5-20D posterior | ✗ Marginal | Efficiency drops significantly |
20D posterior | ✗ Impractical | Use MCMC instead |
| Multimodal posterior | ✗ Difficult | Envelope spans all modes, huge M |
| Unknown normalizing constant | ✓ Works | Same algorithm, find bound on unnormalized |
Despite limitations, rejection sampling remains important:
• Exact sampling in low dimensions (e.g., prior predictive checks) • Component in Gibbs samplers (sampling conditionals) • Theoretical foundation for understanding more advanced methods • Benchmark for validating MCMC implementations
Knowing when to use it—and when to move beyond it—is part of computational Bayesian expertise.
Rejection sampling provides our first practical technique for sampling from complex distributions. Its geometric elegance—sampling uniformly from under a curve—translates into a simple accept-reject algorithm.
What's Next:
Rejection sampling's dimensional limitations motivate our next topic: Importance Sampling. Instead of rejecting samples that miss the target, importance sampling reweights samples from the proposal to correct for distributional mismatch. This avoids the hard dominance requirement and opens new possibilities—but introduces its own challenges with variance inflation.
You now understand rejection sampling: its elegant geometric foundation, practical algorithm, efficiency analysis, and fundamental limitations. This sets the stage for importance sampling and ultimately MCMC, which overcome these limitations for high-dimensional Bayesian inference.