Approximate Inference Sampling - Learning Module

Loading content...

0/245

Rejection Sampling

Sampling from Complex Distributions

In the previous page, we established that Monte Carlo estimation converts integration problems into sampling problems. But a critical challenge remains: how do we generate samples from a target distribution when we can't directly sample from it?

For simple distributions—uniform, Gaussian, exponential—most programming languages provide built-in samplers. But Bayesian posteriors rarely take these convenient forms. Instead, we face distributions defined implicitly through the product of likelihoods and priors, often with complex shapes, multiple modes, or irregular boundaries.

Rejection sampling is our first technique for bridging this gap. It's conceptually elegant: use a simpler proposal distribution we can sample from, and systematically accept or reject samples to recover the target distribution. When it works, it produces exact i.i.d. samples—as good as if we had a direct sampler for the target.

What You Will Learn

This page develops rejection sampling from first principles. You will understand the geometric intuition behind the algorithm, derive the acceptance probability, analyze efficiency, and recognize when rejection sampling is appropriate versus when more sophisticated methods are needed.

The Envelope Idea

Rejection sampling rests on a beautifully geometric insight. Consider sampling uniformly from the region under a probability density curve. If we project these points onto the x-axis, the resulting values follow the target density.

Why this works:

Think of a probability density $p(x)$ as a curve in two dimensions. The probability that $X$ falls in an interval $[a, b]$ equals the area under the curve between $a$ and $b$:

$$P(a \leq X \leq b) = \int_a^b p(x) , dx$$

If we sample points uniformly from the entire region under the curve (the area between the curve and the x-axis), more points fall where the curve is high. Projecting to the x-axis gives exactly the distribution we want.

The practical problem:

Directly sampling uniformly from the region under an arbitrary curve is as hard as the original problem. We need a workable approximation.

The envelope solution:

Suppose we have a proposal distribution $q(x)$ that we can sample from, and a constant $M$ such that:

$$M \cdot q(x) \geq p(x) \quad \text{for all } x$$

The function $M \cdot q(x)$ forms an envelope that completely covers the target density. We can sample uniformly from under this envelope (which is easy), then discard points that fall above the target curve.

The Envelope Must Dominate Everywhere

The requirement Mq(x) ≥ p(x) must hold for ALL x in the support. If the envelope dips below the target anywhere, samples from that region will be under-represented, producing biased results. Always verify dominance across the entire domain, paying special attention to tails.

envelope_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
 
def visualize_envelope(target_pdf, proposal_pdf, M, x_range, title):
    """
    Visualize the envelope relationship for rejection sampling.
    """
    x = np.linspace(x_range[0], x_range[1], 1000)
    
    p_x = target_pdf(x)       # Target density
    Mq_x = M * proposal_pdf(x) # Envelope
    
    plt.figure(figsize=(10, 6))
    
    # Fill the region under the envelope
    plt.fill_between(x, 0, Mq_x, alpha=0.3, color='blue', label=f'Envelope Mq(x), M={M:.2f}')
    
    # Fill the region under the target
    plt.fill_between(x, 0, p_x, alpha=0.5, color='green', label='Target p(x)')
    
    plt.plot(x, Mq_x, 'b-', linewidth=2)
    plt.plot(x, p_x, 'g-', linewidth=2)
    
    plt.xlabel('x')
    plt.ylabel('Density')
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # Check if envelope dominates
    if np.any(Mq_x < p_x - 1e-10):
        print("WARNING: Envelope does NOT dominate target everywhere!")
    else:
        print("✓ Envelope properly dominates target")
    
    return plt
 
 
# Example: Sample from Beta(2.7, 6.3) using Uniform proposal
alpha, beta_param = 2.7, 6.3
 
def target_pdf(x):
    return stats.beta.pdf(x, alpha, beta_param)
 
def proposal_pdf(x):
    return np.ones_like(x)  # Uniform(0, 1)
 
# Find M: max of target / proposal = max of target (since proposal = 1)
x_opt = (alpha - 1) / (alpha + beta_param - 2)  # Mode of Beta
M = target_pdf(x_opt)
 
print(f"Target: Beta({alpha}, {beta_param})")
print(f"Proposal: Uniform(0, 1)")
print(f"Optimal M = {M:.4f}")
print(f"Expected acceptance rate = 1/M = {1/M:.4f} = {100/M:.1f}%")
 
visualize_envelope(target_pdf, proposal_pdf, M, (0, 1), 
                   f'Rejection Sampling: Beta({alpha}, {beta_param}) with Uniform Envelope')
plt.show()

The Rejection Sampling Algorithm

With the geometric intuition established, we can formalize the rejection sampling algorithm.

Problem setup:

Target distribution: $p(x)$ (we want samples from this)
Proposal distribution: $q(x)$ (we can sample from this)
Bound constant: $M$ such that $M \cdot q(x) \geq p(x)$ for all $x$

The algorithm:

Repeat:
    1. Draw x from proposal: x ~ q(x)
    2. Draw u uniformly: u ~ Uniform(0, 1)
    3. If u ≤ p(x) / (M · q(x)):
           Accept x as a sample from p(x)
       Else:
           Reject x and return to step 1

Why this produces samples from $p(x)$:

We're sampling uniformly from the region under the envelope $M \cdot q(x)$, which we do by:

Drawing $x$ according to how much "area" the envelope has at that point (proportional to $q(x)$)
Drawing a vertical position $u$ uniformly between $0$ and $M \cdot q(x)$

We accept if this point $(x, u \cdot M \cdot q(x))$ falls under the target curve $p(x)$. The probability of acceptance at point $x$ is exactly $\frac{p(x)}{M \cdot q(x)}$.

Mathematical Proof of Correctness

The joint density of accepted samples is:

p(accepted at x) = p(proposed x) × p(accept | x) = q(x) × p(x)/(Mq(x)) = p(x)/M

Normalizing by the total acceptance probability (1/M), we recover p(x). This proves samples are exactly distributed according to the target.

rejection_sampling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
from scipy import stats
 
def rejection_sampling(target_pdf, proposal_sampler, proposal_pdf, M, n_samples):
    """
    Generate samples from target_pdf using rejection sampling.
    
    Parameters:
    -----------
    target_pdf : callable
        The target probability density function p(x)
    proposal_sampler : callable
        Function that returns a sample from the proposal q(x)
    proposal_pdf : callable
        The proposal probability density function q(x)
    M : float
        Bound constant such that M * q(x) >= p(x) for all x
    n_samples : int
        Number of samples to generate
    
    Returns:
    --------
    samples : ndarray
        Array of samples from the target distribution
    acceptance_rate : float
        Fraction of proposals that were accepted
    """
    samples = []
    n_proposed = 0
    
    while len(samples) < n_samples:
        # Step 1: Draw from proposal
        x = proposal_sampler()
        n_proposed += 1
        
        # Step 2: Draw uniform for acceptance
        u = np.random.uniform(0, 1)
        
        # Step 3: Accept or reject
        acceptance_prob = target_pdf(x) / (M * proposal_pdf(x))
        
        if u <= acceptance_prob:
            samples.append(x)
    
    acceptance_rate = n_samples / n_proposed
    return np.array(samples), acceptance_rate
 
 
# Example: Sample from Beta(2.7, 6.3) using Uniform(0,1) proposal
alpha, beta_param = 2.7, 6.3
 
def target_pdf(x):
    return stats.beta.pdf(x, alpha, beta_param)
 
def proposal_sampler():
    return np.random.uniform(0, 1)
 
def proposal_pdf(x):
    return 1.0  # Uniform(0, 1)
 
# Compute M as max of target (since proposal is uniform with density 1)
x_mode = (alpha - 1) / (alpha + beta_param - 2)
M = target_pdf(x_mode)
 
print(f"=== Rejection Sampling: Beta({alpha}, {beta_param}) ===")
print(f"Proposal: Uniform(0, 1)")
print(f"M = {M:.4f}")
print(f"Theoretical acceptance rate: {1/M:.4f}")
print()
 
# Generate samples
n_samples = 10000
samples, acc_rate = rejection_sampling(
    target_pdf, proposal_sampler, proposal_pdf, M, n_samples
)
 
print(f"Generated {n_samples} samples")
print(f"Empirical acceptance rate: {acc_rate:.4f}")
print()
 
# Verify samples match target distribution
print("Sample statistics vs theoretical:")
print(f"  Sample mean: {np.mean(samples):.4f}")
print(f"  True mean:   {alpha / (alpha + beta_param):.4f}")
print(f"  Sample var:  {np.var(samples):.4f}")
true_var = (alpha * beta_param) / ((alpha + beta_param)**2 * (alpha + beta_param + 1))
print(f"  True var:    {true_var:.4f}")

Acceptance Probability and Efficiency

The efficiency of rejection sampling is entirely determined by the acceptance probability—the fraction of proposals we keep.

Overall acceptance probability:

$$P(\text{accept}) = \int P(\text{accept} | x) , q(x) , dx = \int \frac{p(x)}{M \cdot q(x)} \cdot q(x) , dx = \frac{1}{M} \int p(x) , dx = \frac{1}{M}$$

The acceptance rate equals exactly $\frac{1}{M}$—the ratio of the area under the target to the area under the envelope.

Optimal choice of $M$:

To maximize efficiency, we want the smallest $M$ that still satisfies the dominance condition:

$$M^* = \max_x \frac{p(x)}{q(x)}$$

This is the supremum ratio of target to proposal. The optimal $M$ makes the envelope as tight as possible around the target.

The efficiency bottleneck:

Rejection sampling efficiency degrades when:

The proposal $q(x)$ is a poor match to the target $p(x)$
The target has heavy tails that the proposal must cover
The dimension is high (exponentially worse)

In moderate dimensions, even a good proposal leads to astronomical $M$ values and essentially zero acceptance.

The Curse of Dimensionality Strikes Again

Consider matching a d-dimensional standard Gaussian to itself with a slightly inflated envelope. If the proposal has variance 1.1 instead of 1.0, the ratio of densities at the mode scales as (1.1)^(d/2). In d=100 dimensions, M ≈ 10^21. We would need 10^21 proposals per accepted sample—utterly impractical. This exponential degradation makes rejection sampling unusable for high-dimensional posteriors.

Rejection Sampling Efficiency vs Dimension
Dimension	Optimal M (Gaussian)	Acceptance Rate	Proposals per Sample
1	1.05	95%	~1
5	1.28	78%	~1.3
10	1.65	61%	~1.6
20	2.71	37%	~2.7
50	12.2	8.2%	~12
100	148	0.68%	~150
500	10^12	10^-12	10^12

efficiency_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import numpy as np
 
def analyze_gaussian_rejection_efficiency():
    """
    Analyze how rejection sampling efficiency degrades with dimension.
    
    Scenario: Target is N(0, I_d), proposal is N(0, 1.1*I_d)
    This is nearly optimal - same shape, just slightly inflated.
    """
    print("=== Rejection Sampling Efficiency: Gaussian in d Dimensions ===")
    print("Target: N(0, I_d)")
    print("Proposal: N(0, 1.1 * I_d)")
    print()
    print(f"{'Dimension':>10} | {'M':>12} | {'Acceptance':>12} | {'Proposals/Sample':>16}")
    print("-" * 60)
    
    for d in [1, 2, 5, 10, 20, 50, 100, 200, 500]:
        # For Gaussians, M = (proposal_var / target_var)^(d/2)
        sigma_ratio = 1.1
        M = sigma_ratio ** (d / 2)
        acceptance = 1 / M
        
        if M < 1e15:
            print(f"{d:>10} | {M:>12.2f} | {acceptance:>11.2e} | {M:>16.1f}")
        else:
            print(f"{d:>10} | {M:>12.2e} | {acceptance:>11.2e} | {'impractical':>16}")
    
    print()
    print("Key insight: Even with nearly optimal proposal (only 10% inflated),")
    print("efficiency becomes impractical beyond d ≈ 50-100 dimensions.")
 
 
def optimize_proposal_constant(target_pdf, proposal_pdf, x_range, n_grid=10000):
    """
    Find optimal M by grid search over the domain.
    M* = max_x p(x) / q(x)
    """
    if isinstance(x_range[0], (int, float)):
        # 1D case
        x = np.linspace(x_range[0], x_range[1], n_grid)
        ratios = target_pdf(x) / proposal_pdf(x)
        M_opt = np.max(ratios)
        x_max = x[np.argmax(ratios)]
        return M_opt, x_max
    else:
        raise NotImplementedError("Use MCMC for high-dimensional cases")
 
 
analyze_gaussian_rejection_efficiency()
 
# 1D Example: Finding optimal M
from scipy import stats
 
print("\n=== Finding Optimal M for 1D Beta-Gaussian ===")
print("Target: Beta(5, 2) - right-skewed on [0,1]")
print("Proposal: Truncated Normal(0.7, 0.3) on [0,1]")
 
def target(x):
    return stats.beta.pdf(x, 5, 2)
 
def proposal(x):
    # Truncated normal on [0, 1]
    a, b = (0 - 0.7) / 0.3, (1 - 0.7) / 0.3  # Standardize bounds
    return stats.truncnorm.pdf(x, a, b, loc=0.7, scale=0.3)
 
M_opt, x_max = optimize_proposal_constant(target, proposal, (0.001, 0.999))
print(f"Optimal M = {M_opt:.4f} (achieved at x = {x_max:.4f})")
print(f"Expected acceptance rate = {1/M_opt:.4f} = {100/M_opt:.1f}%")

Choosing the Proposal Distribution

The art of rejection sampling lies in selecting a good proposal distribution. The ideal proposal closely matches the target's shape while remaining easy to sample from.

Desiderata for proposals:

Easy to sample: We need efficient i.i.d. samples from $q(x)$
Easy to evaluate: We compute $q(x)$ in the acceptance ratio
Similar shape to target: Minimizes the bound $M$
Heavier tails than target: Ensures dominance in the tails

Common proposal choices:

Target Type	Good Proposals	Notes
Bounded support	Uniform	Simple, but inefficient for peaked targets
Unimodal	Gaussian, t-distribution	t-distribution for heavy tails
Log-concave	Laplace, Exponential	Adaptive rejection sampling
Mixture	Match mixture structure	Use proposal mixture
Heavy-tailed	Cauchy, t with low df	Never use Gaussian for Cauchy target

The tail matching requirement:

If the target has heavier tails than the proposal, the ratio $p(x)/q(x) \to \infty$ as $|x| \to \infty$, making $M$ infinite. No valid bound exists.

$$\text{If } \frac{p(x)}{q(x)} \to \infty \text{ as } x \to \pm\infty, \text{ then rejection sampling fails}$$

Critical Mistake: Light-Tailed Proposals for Heavy-Tailed Targets

Never use a Gaussian proposal for a Cauchy target, or any light-tailed proposal for a heavy-tailed target. The Gaussian decays as exp(-x²) while Cauchy decays as 1/x². The ratio explodes: Cauchy/Gaussian → ∞. No finite M exists. This is not a efficiency problem—rejection sampling is mathematically impossible in this case.

Good Proposal Choices

•Match target's support (bounded, half-line, full line)
•Tails at least as heavy as target
•Mode location roughly aligned
•Variance similar or slightly larger
•Simple closed-form density and sampler

Proposal Pitfalls

•Lighter tails than target (M = ∞)
•Mode far from target's mode
•Much narrower than target
•Mismatched support
•Complex to evaluate or sample

proposal_selection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import numpy as np
from scipy import stats
 
def compare_proposals_for_target():
    """
    Compare different proposals for sampling from a mixture distribution.
    Target: 0.3 * N(-2, 0.5) + 0.7 * N(2, 1)
    """
    # Define target
    def target_pdf(x):
        return (0.3 * stats.norm.pdf(x, -2, 0.5) + 
                0.7 * stats.norm.pdf(x, 2, 1))
    
    # Proposal 1: Single Gaussian centered at mixture mean
    def proposal1_pdf(x):
        return stats.norm.pdf(x, 0.8, 2)  # Mean at 0.3*(-2) + 0.7*2 = 0.8
    
    def proposal1_sampler():
        return np.random.normal(0.8, 2)
    
    # Proposal 2: Wider Gaussian
    def proposal2_pdf(x):
        return stats.norm.pdf(x, 0.5, 3)
    
    def proposal2_sampler():
        return np.random.normal(0.5, 3)
    
    # Proposal 3: t-distribution (heavier tails)
    df = 5
    def proposal3_pdf(x):
        return stats.t.pdf(x, df, loc=0.5, scale=2)
    
    def proposal3_sampler():
        return stats.t.rvs(df, loc=0.5, scale=2)
    
    # Find M for each proposal (grid search)
    x = np.linspace(-6, 8, 10000)
    
    proposals = [
        ("N(0.8, 2)", proposal1_pdf, proposal1_sampler),
        ("N(0.5, 3)", proposal2_pdf, proposal2_sampler),
        (f"t(df={df}, mu=0.5, s=2)", proposal3_pdf, proposal3_sampler),
    ]
    
    print("=== Proposal Comparison for Mixture Target ===")
    print("Target: 0.3*N(-2, 0.5) + 0.7*N(2, 1)")
    print()
    
    for name, pdf, sampler in proposals:
        target_vals = target_pdf(x)
        proposal_vals = pdf(x)
        
        # Avoid division by zero
        valid = proposal_vals > 1e-10
        ratios = np.zeros_like(x)
        ratios[valid] = target_vals[valid] / proposal_vals[valid]
        
        M = np.max(ratios)
        acceptance = 1 / M
        
        print(f"Proposal: {name}")
        print(f"  M = {M:.4f}")
        print(f"  Acceptance rate = {acceptance:.4f} = {100*acceptance:.1f}%")
        print()
 
 
compare_proposals_for_target()

Rejection Sampling for Unnormalized Densities

In Bayesian inference, we typically know the posterior only up to a normalizing constant:

$$p(\theta | \mathcal{D}) = \frac{\tilde{p}(\theta | \mathcal{D})}{Z}$$

where $\tilde{p}(\theta | \mathcal{D}) = p(\mathcal{D} | \theta) \cdot p(\theta)$ is the unnormalized posterior and $Z = p(\mathcal{D})$ is unknown.

Can rejection sampling work here?

Yes! We need a bound on the unnormalized density:

$$\tilde{M} \cdot q(\theta) \geq \tilde{p}(\theta | \mathcal{D}) \quad \text{for all } \theta$$

The algorithm remains identical, with acceptance probability:

$$P(\text{accept} | \theta) = \frac{\tilde{p}(\theta | \mathcal{D})}{\tilde{M} \cdot q(\theta)}$$

Why this still works:

The overall acceptance rate becomes $\frac{Z}{\tilde{M}}$ instead of $\frac{1}{M}$. But since both the target and accepted samples are scaled by the same constant $Z$, the accepted samples still follow the normalized posterior.

The challenge:

Finding the bound $\tilde{M} = \max_\theta \frac{\tilde{p}(\theta | \mathcal{D})}{q(\theta)}$ requires optimizing the ratio—which can be as hard as the original inference problem. For complex posteriors, we may not know where the posterior peaks.

The Bound-Finding Problem

For rejection sampling to work with unnormalized densities, we must find a valid bound M̃. If we underestimate M̃, samples near the maximum are rejected too rarely, biasing results. If we overestimate M̃, we waste computation on low-acceptance sampling. In high dimensions, finding the maximum of p̃(θ)/q(θ) is often intractable, limiting rejection sampling's applicability.

unnormalized_rejection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import numpy as np
from scipy import stats, optimize
 
def rejection_sampling_unnormalized(
    unnormalized_pdf, 
    proposal_sampler, 
    proposal_pdf, 
    M_tilde, 
    n_samples
):
    """
    Rejection sampling for unnormalized target densities.
    
    Parameters:
    -----------
    unnormalized_pdf : callable
        Unnormalized target density p̃(x) (proportional to true density)
    M_tilde : float
        Bound such that M_tilde * q(x) >= p̃(x) for all x
    
    Returns samples from the normalized version of unnormalized_pdf.
    """
    samples = []
    n_proposed = 0
    
    while len(samples) < n_samples:
        x = proposal_sampler()
        n_proposed += 1
        
        u = np.random.uniform(0, 1)
        
        # Use unnormalized density in acceptance ratio
        acceptance_prob = unnormalized_pdf(x) / (M_tilde * proposal_pdf(x))
        
        if u <= acceptance_prob:
            samples.append(x)
    
    acceptance_rate = n_samples / n_proposed
    return np.array(samples), acceptance_rate
 
 
# Example: Unnormalized posterior from Bayesian inference
# Likelihood: Binomial(n=20, p=theta)
# Prior: Beta(1, 1) = Uniform(0, 1)
# Data: observed 14 successes
 
n_trials = 20
n_success = 14
 
def unnormalized_posterior(theta):
    """Unnormalized: likelihood * prior"""
    if theta <= 0 or theta >= 1:
        return 0.0
    likelihood = stats.binom.pmf(n_success, n_trials, theta)
    prior = stats.beta.pdf(theta, 1, 1)  # Uniform
    return likelihood * prior
 
def proposal_sampler():
    return np.random.uniform(0, 1)
 
def proposal_pdf(theta):
    return 1.0
 
# Find M_tilde by optimization
result = optimize.minimize_scalar(
    lambda x: -unnormalized_posterior(x),
    bounds=(0.01, 0.99),
    method='bounded'
)
M_tilde = -result.fun * 1.01  # Slightly inflate for safety
 
print("=== Bayesian Inference via Rejection Sampling ===")
print(f"Data: {n_success} successes in {n_trials} trials")
print(f"Prior: Beta(1, 1) = Uniform")
print(f"Proposal: Uniform(0, 1)")
print(f"M̃ = {M_tilde:.6f}")
print()
 
samples, acc_rate = rejection_sampling_unnormalized(
    unnormalized_posterior, proposal_sampler, proposal_pdf, M_tilde, 10000
)
 
# True posterior is Beta(1 + 14, 1 + 6) = Beta(15, 7)
true_mean = 15 / (15 + 7)
true_var = (15 * 7) / ((15 + 7)**2 * (15 + 7 + 1))
 
print(f"Acceptance rate: {acc_rate:.4f}")
print(f"Sample mean: {np.mean(samples):.4f} (true: {true_mean:.4f})")
print(f"Sample var:  {np.var(samples):.4f} (true: {true_var:.4f})")
print(f"95% CI: [{np.percentile(samples, 2.5):.4f}, {np.percentile(samples, 97.5):.4f}]")

Adaptive Rejection Sampling

Adaptive Rejection Sampling (ARS) improves efficiency by automatically constructing and refining the envelope during sampling. This eliminates the need to pre-specify the proposal.

Key insight: Log-concave targets

A density $p(x)$ is log-concave if $\log p(x)$ is a concave function. This includes:

Gaussian distributions
Exponential distributions
Gamma (with shape ≥ 1)
Beta (with both parameters ≥ 1)
Laplace distributions

For log-concave densities, tangent lines to $\log p(x)$ lie entirely above the curve, making them perfect envelopes.

The ARS algorithm:

Initialize with a few evaluation points ${x_1, \ldots, x_k}$
Construct piecewise-linear upper bound on $\log p(x)$ using tangent lines
Construct piecewise-linear lower bound using secant lines
Sample from the envelope (easy for piecewise-exponential)
Apply squeeze test using lower bound (saves evaluations)
If needed, compute $p(x)$ and apply acceptance test
Add new point to refine envelope
Repeat

Efficiency gains:

Envelope improves adaptively with each sample
Squeeze test avoids expensive density evaluations
Converges to 100% acceptance as envelope tightens

When to Use Adaptive Rejection Sampling

ARS is particularly valuable for Gibbs sampling, where we need to sample from conditional distributions. Many conditionals in hierarchical models are log-concave, making ARS an automatic, efficient choice. Libraries like R's 'ars' package and Python's 'adaptive_rejection_sampling' package provide implementations.

adaptive_rejection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from scipy import stats
 
def simple_ars_demo(log_pdf, log_pdf_derivative, x_range, n_samples):
    """
    Simplified ARS demonstration for 1D log-concave densities.
    
    For production use, use scipy.stats or specialized ARS libraries.
    This is pedagogical, showing the core idea.
    """
    a, b = x_range
    samples = []
    
    # Initialize with a few points
    hull_points = [a + 0.1 * (b - a), (a + b) / 2, b - 0.1 * (b - a)]
    hull_logp = [log_pdf(x) for x in hull_points]
    hull_deriv = [log_pdf_derivative(x) for x in hull_points]
    
    n_density_evals = len(hull_points)
    n_accepted = 0
    
    while len(samples) < n_samples:
        # For simplicity, sample uniformly and use standard rejection
        # (Full ARS would sample from piecewise-exponential envelope)
        x = np.random.uniform(a, b)
        log_px = log_pdf(x)
        n_density_evals += 1
        
        # Find envelope value at x (piecewise linear on log scale)
        # This is simplified - real ARS uses tangent lines
        envelope_log = -np.inf
        for i, (xi, logpi, di) in enumerate(zip(hull_points, hull_logp, hull_deriv)):
            # Tangent line: log p(xi) + d(xi) * (x - xi)
            tangent = logpi + di * (x - xi)
            envelope_log = max(envelope_log, tangent)
        
        # Acceptance probability
        u = np.random.uniform(0, 1)
        if np.log(u) <= log_px - envelope_log:
            samples.append(x)
            n_accepted += 1
            
            # Add point to refine envelope
            if len(hull_points) < 20:  # Limit refinement
                hull_points.append(x)
                hull_logp.append(log_px)
                hull_deriv.append(log_pdf_derivative(x))
    
    print(f"ARS Statistics:")
    print(f"  Samples generated: {len(samples)}")
    print(f"  Density evaluations: {n_density_evals}")
    print(f"  Refinement points: {len(hull_points)}")
    
    return np.array(samples)
 
 
# Example: Sample from Gamma(shape=3, rate=2) using ARS
shape, rate = 3, 2
 
def log_gamma_pdf(x):
    if x <= 0:
        return -np.inf
    return (shape - 1) * np.log(x) - rate * x
 
def log_gamma_derivative(x):
    return (shape - 1) / x - rate
 
print("=== Adaptive Rejection Sampling: Gamma(3, 2) ===")
print("(Log-concave since shape >= 1)")
print()
 
# True mean = shape/rate = 1.5, var = shape/rate^2 = 0.75
samples = simple_ars_demo(log_gamma_pdf, log_gamma_derivative, (0.01, 8), 5000)
 
print(f"\nSample mean: {np.mean(samples):.4f} (true: {shape/rate:.4f})")
print(f"Sample var:  {np.var(samples):.4f} (true: {shape/rate**2:.4f})")

Limitations and When to Use Rejection Sampling

Rejection sampling is elegant but has significant limitations that restrict its practical applicability.

When rejection sampling works well:

Low dimensions (d ≤ 2-3): Efficiency degradation isn't severe
Good proposal available: Shape-matched proposals with heavy enough tails
Bounded support: Uniform proposal is often adequate
Log-concave targets: Adaptive rejection sampling is highly efficient
Component in larger algorithm: Sampling conditionals in Gibbs sampling

When rejection sampling fails:

High dimensions: Exponential growth of $M$ makes sampling impractical
Unknown mode location: Can't find the maximum for the bound
Heavy-tailed targets with heavier tails than any tractable proposal
Multimodal targets: Proposals covering all modes have huge $M$
Complex constraints: Hard to find dominating proposals

The fundamental trade-off:

Rejection sampling produces exact i.i.d. samples—the gold standard. But this exactness comes at a cost: the requirement for an envelope that dominates everywhere becomes impossible to satisfy efficiently in high dimensions.

For high-dimensional Bayesian posteriors, we trade exactness for practicality and move to Markov Chain Monte Carlo (MCMC), which produces correlated samples but scales to any dimension.

Rejection Sampling: Decision Guide
Scenario	Verdict	Reason
1D posterior, log-concave	✓ Excellent	Use ARS, nearly 100% acceptance
1D posterior, complex	✓ Good	Standard RS, moderate efficiency
2-5D posterior, simple shape	△ Viable	Careful proposal design needed
5-20D posterior	✗ Marginal	Efficiency drops significantly
20D posterior	✗ Impractical	Use MCMC instead
Multimodal posterior	✗ Difficult	Envelope spans all modes, huge M
Unknown normalizing constant	✓ Works	Same algorithm, find bound on unnormalized

Rejection Sampling's Niche

Despite limitations, rejection sampling remains important:

• Exact sampling in low dimensions (e.g., prior predictive checks) • Component in Gibbs samplers (sampling conditionals) • Theoretical foundation for understanding more advanced methods • Benchmark for validating MCMC implementations

Knowing when to use it—and when to move beyond it—is part of computational Bayesian expertise.

Summary: Rejection Sampling

Rejection sampling provides our first practical technique for sampling from complex distributions. Its geometric elegance—sampling uniformly from under a curve—translates into a simple accept-reject algorithm.

Key Takeaways

•Envelope requirement: Need bound $M$ such that $M \cdot q(x) \geq p(x)$ everywhere; the proposal must dominate the target
•Acceptance rate = 1/M: Efficiency depends entirely on how tight the envelope fits the target
•Tail matching critical: Proposal tails must be at least as heavy as target; lighter tails make RS impossible
•Curse of dimensionality: $M$ grows exponentially with dimension, making RS impractical beyond d ≈ 5-10
•Works for unnormalized: Algorithm remains valid when target is known only up to a constant
•Adaptive RS for log-concave: ARS automatically builds and refines envelopes for log-concave targets
•Produces exact i.i.d. samples: The gold standard output, with independent samples from the exact target

What's Next:

Rejection sampling's dimensional limitations motivate our next topic: Importance Sampling. Instead of rejecting samples that miss the target, importance sampling reweights samples from the proposal to correct for distributional mismatch. This avoids the hard dominance requirement and opens new possibilities—but introduces its own challenges with variance inflation.

Page Complete

You now understand rejection sampling: its elegant geometric foundation, practical algorithm, efficiency analysis, and fundamental limitations. This sets the stage for importance sampling and ultimately MCMC, which overcome these limitations for high-dimensional Bayesian inference.