Random Search - Learning Module

Loading content...

0/278

Coverage Probability

Quantifying Random Search Success

A common objection to random search is its apparent unpredictability: "How do I know I've sampled enough?" Unlike grid search, which deterministically covers the space, random search is probabilistic. But this randomness is not unquantifiable—coverage probability theory provides precise answers.

Coverage probability tells us: given that good configurations occupy some fraction of the search space, how many random samples do we need to find one with high confidence? This transforms random search from a "hope and pray" strategy into one with rigorous guarantees.

This page develops the complete mathematical framework for coverage probability, enabling principled budget allocation for random search.

What You Will Learn

By the end of this page, you will derive coverage probability formulas, calculate required sample sizes for target success rates, understand the relationship between "good region" size and budget needs, and apply these formulas to real hyperparameter optimization scenarios.

The Basic Coverage Formula

Let us formalize the probability of finding good hyperparameter configurations.

Setup:

Define the hyperparameter space $\mathcal{H}$ with a probability measure (from our sampling distribution). Let $\mathcal{G} \subset \mathcal{H}$ be the "good" region—configurations with performance within acceptable bounds of optimal. Define:

$$\alpha = P(\mathbf{h} \in \mathcal{G}) = \int_{\mathcal{G}} p(\mathbf{h}) , d\mathbf{h}$$

This is the probability that a single random sample lands in the good region.

Coverage Probability:

With $n$ independent random samples, the probability that at least one lands in $\mathcal{G}$ is:

$$P(\text{success}) = 1 - P(\text{all fail}) = 1 - (1-\alpha)^n$$

Inverse Formula (Required Samples):

To achieve success probability $p$, we need:

$$n = \frac{\log(1-p)}{\log(1-\alpha)} = \frac{\log(1-p)}{\log(1-\alpha)}$$

For small $\alpha$, using $\log(1-\alpha) \approx -\alpha$:

$$n \approx \frac{-\log(1-p)}{\alpha} = \frac{\log\frac{1}{1-p}}{\alpha}$$

Required Samples for Target Success Probability
Success Prob.	α = 10%	α = 5%	α = 1%	α = 0.1%
50%	7	14	69	693
75%	14	28	138	1,386
90%	22	45	230	2,302
95%	29	59	299	2,995
99%	44	90	459	4,603
99.9%	66	135	688	6,905

The Rule of 60

A practical heuristic: 60 random samples give ~95% probability of finding a top-5% configuration. This "rule of 60" is a useful starting point for budget planning. For top-1% configurations, you need ~300 samples.

Estimating α in Practice

The coverage formula requires knowing $\alpha$—the fraction of good configurations. In practice, we estimate this empirically or use reasonable assumptions.

Approach 1: Define Good Relative to Observed Best

Define "good" as within $\epsilon$ of the best score seen so far: $$\mathcal{G} = {\mathbf{h} : f(\mathbf{h}) \geq f^* - \epsilon}$$

Run a pilot study with $n_0$ samples, then estimate: $$\hat{\alpha} = \frac{#{i : f(\mathbf{h}^{(i)}) \geq f^* - \epsilon}}{n_0}$$

Approach 2: Assume Unimodal Objective

For many well-behaved objectives, the good region forms a ball around the optimum. If the objective is roughly quadratic near optima, the volume of the $\epsilon$-good region scales as: $$\alpha \propto \epsilon^{d/2}$$

where $d$ is the effective dimensionality.

Approach 3: Use Pessimistic Default

When unsure, assume $\alpha = 1%$ to $5%$ (good configurations are 1-5% of the space). This is conservative for most problems and leads to reasonable budgets (60-300 samples).

coverage_probability.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import numpy as np
from typing import Tuple
from dataclasses import dataclass
 
@dataclass
class CoverageBudget:
    """Budget calculation result for random search."""
    alpha: float  # Good region fraction
    target_probability: float
    required_samples: int
    actual_probability: float
 
def required_samples(alpha: float, target_prob: float) -> int:
    """
    Calculate samples needed to achieve target success probability.
    
    Parameters:
    -----------
    alpha: Fraction of search space that is "good"
    target_prob: Desired probability of finding a good config
    
    Returns:
    --------
    Minimum number of samples needed
    """
    if alpha <= 0 or alpha >= 1:
        raise ValueError("alpha must be in (0, 1)")
    if target_prob <= 0 or target_prob >= 1:
        raise ValueError("target_prob must be in (0, 1)")
    
    n = np.log(1 - target_prob) / np.log(1 - alpha)
    return int(np.ceil(n))
 
def success_probability(n: int, alpha: float) -> float:
    """
    Calculate probability of success with n samples.
    
    P(at least one good) = 1 - (1 - alpha)^n
    """
    return 1 - (1 - alpha) ** n
 
def estimate_alpha_from_pilot(
    scores: np.ndarray,
    epsilon: float = 0.01
) -> Tuple[float, float]:
    """
    Estimate alpha from pilot study scores.
    
    Parameters:
    -----------
    scores: Array of scores from pilot random search
    epsilon: Tolerance from best (relative or absolute)
    
    Returns:
    --------
    (alpha_estimate, standard_error)
    """
    best = scores.max()
    threshold = best - epsilon * abs(best) if epsilon < 1 else best - epsilon
    n_good = np.sum(scores >= threshold)
    n_total = len(scores)
    
    alpha_hat = n_good / n_total
    se = np.sqrt(alpha_hat * (1 - alpha_hat) / n_total)
    
    return alpha_hat, se
 
def plan_random_search_budget(
    pilot_scores: np.ndarray = None,
    assumed_alpha: float = 0.05,
    target_prob: float = 0.95,
    epsilon: float = 0.01
) -> CoverageBudget:
    """
    Plan random search budget based on coverage probability.
    """
    if pilot_scores is not None:
        alpha, se = estimate_alpha_from_pilot(pilot_scores, epsilon)
        # Use conservative estimate
        alpha = max(alpha - 2 * se, 0.001)
    else:
        alpha = assumed_alpha
    
    n_required = required_samples(alpha, target_prob)
    actual_prob = success_probability(n_required, alpha)
    
    return CoverageBudget(
        alpha=alpha,
        target_probability=target_prob,
        required_samples=n_required,
        actual_probability=actual_prob
    )
 
# Example usage
def demonstrate_coverage_planning():
    """Show coverage-based budget planning."""
    
    print("=== Coverage Probability Budget Planning ===
")
    
    # Scenario 1: Conservative (assume 5% good)
    budget1 = plan_random_search_budget(
        assumed_alpha=0.05, target_prob=0.95
    )
    print(f"Conservative (α=5%, p=95%): n={budget1.required_samples}")
    
    # Scenario 2: Optimistic (assume 10% good)
    budget2 = plan_random_search_budget(
        assumed_alpha=0.10, target_prob=0.95
    )
    print(f"Optimistic (α=10%, p=95%): n={budget2.required_samples}")
    
    # Scenario 3: High confidence (99%)
    budget3 = plan_random_search_budget(
        assumed_alpha=0.05, target_prob=0.99
    )
    print(f"High confidence (α=5%, p=99%): n={budget3.required_samples}")
    
    # Scenario 4: From pilot study
    np.random.seed(42)
    pilot_scores = np.random.randn(50) + np.random.rand(50)
    budget4 = plan_random_search_budget(
        pilot_scores=pilot_scores, target_prob=0.95
    )
    print(f"From pilot (estimated α={budget4.alpha:.2%}): "
          f"n={budget4.required_samples}")
 
demonstrate_coverage_planning()

Multi-Objective Coverage

Real hyperparameter optimization often involves multiple objectives: high accuracy, low inference time, small model size. Coverage probability extends to this setting.

Per-Objective Good Regions:

For $m$ objectives, define good regions $\mathcal{G}_1, \ldots, \mathcal{G}_m$ with individual probabilities $\alpha_1, \ldots, \alpha_m$.

Independent Objectives:

If objectives are independent (rare in practice), the joint good region has probability: $$\alpha_{\text{joint}} = \prod_{j=1}^{m} \alpha_j$$

For $m=3$ objectives with $\alpha_j = 0.1$ each, $\alpha_{\text{joint}} = 0.001$, requiring ~3,000 samples for 95% success.

Correlated Objectives:

In practice, objectives are often positively correlated (simple models are often both fast and less accurate). If correlation is $\rho > 0$: $$\alpha_{\text{joint}} > \prod_{j=1}^{m} \alpha_j$$

The practical implication: covering multiple objectives is harder than single-objective, but not as hard as assuming independence suggests.

Multi-Objective Complexity

Multi-objective coverage drops quickly. For 3 modestly-defined objectives (α=10% each), even with positive correlation, expect to need 5-10× more samples than single-objective optimization.

Adaptive Budget Strategies

Rather than fixing the budget upfront, adaptive strategies adjust based on observed results.

Strategy 1: Early Termination

Set a minimum acceptable performance $\tau$. Stop when a configuration exceeds $\tau$: $$\text{Stop if } \max_{i \leq n} f(\mathbf{h}^{(i)}) \geq \tau$$

This is efficient when a "good enough" solution exists and is found early.

Strategy 2: Diminishing Returns Criterion

Track the improvement rate. Stop when improvements plateau: $$\text{Stop if } \frac{f^_n - f^_{n-k}}{k} < \delta$$

for some window $k$ and threshold $\delta$.

Strategy 3: Confidence-Based Stopping

Estimate $\alpha$ online and stop when confidence in having found a good configuration is high: $$\text{Stop if } 1 - (1 - \hat{\alpha})^n > p_{\text{target}}$$

adaptive_budget.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import numpy as np
from typing import Callable, Optional
from dataclasses import dataclass
 
@dataclass
class AdaptiveSearchResult:
    """Result from adaptive random search."""
    best_params: dict
    best_score: float
    n_iterations: int
    stopping_reason: str
 
def adaptive_random_search(
    sample_fn: Callable[[], dict],
    evaluate_fn: Callable[[dict], float],
    max_iter: int = 1000,
    min_iter: int = 20,
    target_score: Optional[float] = None,
    patience: int = 50,
    improvement_threshold: float = 0.001
) -> AdaptiveSearchResult:
    """
    Random search with adaptive stopping.
    
    Stopping criteria (checked after min_iter):
    1. Target score reached
    2. No improvement in 'patience' iterations
    3. Max iterations reached
    """
    best_score = float('-inf')
    best_params = None
    iterations_since_improvement = 0
    
    for i in range(max_iter):
        # Sample and evaluate
        params = sample_fn()
        score = evaluate_fn(params)
        
        # Update best
        if score > best_score:
            improvement = score - best_score
            best_score = score
            best_params = params
            iterations_since_improvement = 0
        else:
            iterations_since_improvement += 1
        
        # Check stopping criteria (after minimum iterations)
        if i >= min_iter:
            # Criterion 1: Target reached
            if target_score and best_score >= target_score:
                return AdaptiveSearchResult(
                    best_params=best_params,
                    best_score=best_score,
                    n_iterations=i + 1,
                    stopping_reason=f"Target score {target_score} reached"
                )
            
            # Criterion 2: Patience exhausted
            if iterations_since_improvement >= patience:
                return AdaptiveSearchResult(
                    best_params=best_params,
                    best_score=best_score,
                    n_iterations=i + 1,
                    stopping_reason=f"No improvement for {patience} iterations"
                )
    
    return AdaptiveSearchResult(
        best_params=best_params,
        best_score=best_score,
        n_iterations=max_iter,
        stopping_reason="Max iterations reached"
    )

Confidence Intervals for Random Search

Random search results are stochastic—running again may find different optima. Quantifying this uncertainty is important for reproducibility.

Confidence Interval for Best Score:

Let $X_1, \ldots, X_n$ be scores from $n$ random configurations. The best observed score $X_{(n)} = \max_i X_i$ follows the order statistic distribution.

For the true $p$-th percentile of the score distribution, a $(1-\alpha)$ confidence interval for the achievable score is: $$\left[X_{(\lceil n \cdot q_1 \rceil)}, X_{(\lfloor n \cdot q_2 \rfloor)}\right]$$

where $q_1, q_2$ are appropriate quantiles from the beta distribution.

Practical Approach: Multiple Runs

Run random search $R$ times with different seeds. Report:

Mean of best scores across runs
Standard deviation
Min and max best scores

This gives an empirical distribution of what random search achieves.

Reproducibility Practice

For publications or production deployments, run random search 3-5 times with different seeds. Report the mean ± std of the best configurations found. This gives readers/users a realistic expectation of what to expect.

Summary: Coverage Probability Theory

Key Takeaways

•Coverage probability $P = 1 - (1-\alpha)^n$ quantifies the chance of finding good configurations with $n$ samples.
•The Rule of 60: 60 samples give 95% probability of finding a top-5% configuration.
•Estimate α from pilot studies, domain knowledge, or conservative defaults (1-5%).
•Multi-objective optimization requires substantially more samples as α shrinks multiplicatively.
•Adaptive strategies can reduce waste by stopping early when objectives are met or progress stalls.
•Report uncertainty through multiple runs with different seeds.

What's Next:

The next page explores the practical advantages of random search: simplicity, anytime behavior, robustness to search space specification, and debugging benefits.

Page Complete

You can now calculate exactly how many random samples you need to find good hyperparameters with any desired confidence level. Coverage probability transforms random search from guesswork into principled optimization.