Loading content...
A common objection to random search is its apparent unpredictability: "How do I know I've sampled enough?" Unlike grid search, which deterministically covers the space, random search is probabilistic. But this randomness is not unquantifiable—coverage probability theory provides precise answers.
Coverage probability tells us: given that good configurations occupy some fraction of the search space, how many random samples do we need to find one with high confidence? This transforms random search from a "hope and pray" strategy into one with rigorous guarantees.
This page develops the complete mathematical framework for coverage probability, enabling principled budget allocation for random search.
By the end of this page, you will derive coverage probability formulas, calculate required sample sizes for target success rates, understand the relationship between "good region" size and budget needs, and apply these formulas to real hyperparameter optimization scenarios.
Let us formalize the probability of finding good hyperparameter configurations.
Setup:
Define the hyperparameter space $\mathcal{H}$ with a probability measure (from our sampling distribution). Let $\mathcal{G} \subset \mathcal{H}$ be the "good" region—configurations with performance within acceptable bounds of optimal. Define:
$$\alpha = P(\mathbf{h} \in \mathcal{G}) = \int_{\mathcal{G}} p(\mathbf{h}) , d\mathbf{h}$$
This is the probability that a single random sample lands in the good region.
Coverage Probability:
With $n$ independent random samples, the probability that at least one lands in $\mathcal{G}$ is:
$$P(\text{success}) = 1 - P(\text{all fail}) = 1 - (1-\alpha)^n$$
Inverse Formula (Required Samples):
To achieve success probability $p$, we need:
$$n = \frac{\log(1-p)}{\log(1-\alpha)} = \frac{\log(1-p)}{\log(1-\alpha)}$$
For small $\alpha$, using $\log(1-\alpha) \approx -\alpha$:
$$n \approx \frac{-\log(1-p)}{\alpha} = \frac{\log\frac{1}{1-p}}{\alpha}$$
| Success Prob. | α = 10% | α = 5% | α = 1% | α = 0.1% |
|---|---|---|---|---|
| 50% | 7 | 14 | 69 | 693 |
| 75% | 14 | 28 | 138 | 1,386 |
| 90% | 22 | 45 | 230 | 2,302 |
| 95% | 29 | 59 | 299 | 2,995 |
| 99% | 44 | 90 | 459 | 4,603 |
| 99.9% | 66 | 135 | 688 | 6,905 |
A practical heuristic: 60 random samples give ~95% probability of finding a top-5% configuration. This "rule of 60" is a useful starting point for budget planning. For top-1% configurations, you need ~300 samples.
The coverage formula requires knowing $\alpha$—the fraction of good configurations. In practice, we estimate this empirically or use reasonable assumptions.
Approach 1: Define Good Relative to Observed Best
Define "good" as within $\epsilon$ of the best score seen so far: $$\mathcal{G} = {\mathbf{h} : f(\mathbf{h}) \geq f^* - \epsilon}$$
Run a pilot study with $n_0$ samples, then estimate: $$\hat{\alpha} = \frac{#{i : f(\mathbf{h}^{(i)}) \geq f^* - \epsilon}}{n_0}$$
Approach 2: Assume Unimodal Objective
For many well-behaved objectives, the good region forms a ball around the optimum. If the objective is roughly quadratic near optima, the volume of the $\epsilon$-good region scales as: $$\alpha \propto \epsilon^{d/2}$$
where $d$ is the effective dimensionality.
Approach 3: Use Pessimistic Default
When unsure, assume $\alpha = 1%$ to $5%$ (good configurations are 1-5% of the space). This is conservative for most problems and leads to reasonable budgets (60-300 samples).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
import numpy as npfrom typing import Tuplefrom dataclasses import dataclass @dataclassclass CoverageBudget: """Budget calculation result for random search.""" alpha: float # Good region fraction target_probability: float required_samples: int actual_probability: float def required_samples(alpha: float, target_prob: float) -> int: """ Calculate samples needed to achieve target success probability. Parameters: ----------- alpha: Fraction of search space that is "good" target_prob: Desired probability of finding a good config Returns: -------- Minimum number of samples needed """ if alpha <= 0 or alpha >= 1: raise ValueError("alpha must be in (0, 1)") if target_prob <= 0 or target_prob >= 1: raise ValueError("target_prob must be in (0, 1)") n = np.log(1 - target_prob) / np.log(1 - alpha) return int(np.ceil(n)) def success_probability(n: int, alpha: float) -> float: """ Calculate probability of success with n samples. P(at least one good) = 1 - (1 - alpha)^n """ return 1 - (1 - alpha) ** n def estimate_alpha_from_pilot( scores: np.ndarray, epsilon: float = 0.01) -> Tuple[float, float]: """ Estimate alpha from pilot study scores. Parameters: ----------- scores: Array of scores from pilot random search epsilon: Tolerance from best (relative or absolute) Returns: -------- (alpha_estimate, standard_error) """ best = scores.max() threshold = best - epsilon * abs(best) if epsilon < 1 else best - epsilon n_good = np.sum(scores >= threshold) n_total = len(scores) alpha_hat = n_good / n_total se = np.sqrt(alpha_hat * (1 - alpha_hat) / n_total) return alpha_hat, se def plan_random_search_budget( pilot_scores: np.ndarray = None, assumed_alpha: float = 0.05, target_prob: float = 0.95, epsilon: float = 0.01) -> CoverageBudget: """ Plan random search budget based on coverage probability. """ if pilot_scores is not None: alpha, se = estimate_alpha_from_pilot(pilot_scores, epsilon) # Use conservative estimate alpha = max(alpha - 2 * se, 0.001) else: alpha = assumed_alpha n_required = required_samples(alpha, target_prob) actual_prob = success_probability(n_required, alpha) return CoverageBudget( alpha=alpha, target_probability=target_prob, required_samples=n_required, actual_probability=actual_prob ) # Example usagedef demonstrate_coverage_planning(): """Show coverage-based budget planning.""" print("=== Coverage Probability Budget Planning ===") # Scenario 1: Conservative (assume 5% good) budget1 = plan_random_search_budget( assumed_alpha=0.05, target_prob=0.95 ) print(f"Conservative (α=5%, p=95%): n={budget1.required_samples}") # Scenario 2: Optimistic (assume 10% good) budget2 = plan_random_search_budget( assumed_alpha=0.10, target_prob=0.95 ) print(f"Optimistic (α=10%, p=95%): n={budget2.required_samples}") # Scenario 3: High confidence (99%) budget3 = plan_random_search_budget( assumed_alpha=0.05, target_prob=0.99 ) print(f"High confidence (α=5%, p=99%): n={budget3.required_samples}") # Scenario 4: From pilot study np.random.seed(42) pilot_scores = np.random.randn(50) + np.random.rand(50) budget4 = plan_random_search_budget( pilot_scores=pilot_scores, target_prob=0.95 ) print(f"From pilot (estimated α={budget4.alpha:.2%}): " f"n={budget4.required_samples}") demonstrate_coverage_planning()Real hyperparameter optimization often involves multiple objectives: high accuracy, low inference time, small model size. Coverage probability extends to this setting.
Per-Objective Good Regions:
For $m$ objectives, define good regions $\mathcal{G}_1, \ldots, \mathcal{G}_m$ with individual probabilities $\alpha_1, \ldots, \alpha_m$.
Independent Objectives:
If objectives are independent (rare in practice), the joint good region has probability: $$\alpha_{\text{joint}} = \prod_{j=1}^{m} \alpha_j$$
For $m=3$ objectives with $\alpha_j = 0.1$ each, $\alpha_{\text{joint}} = 0.001$, requiring ~3,000 samples for 95% success.
Correlated Objectives:
In practice, objectives are often positively correlated (simple models are often both fast and less accurate). If correlation is $\rho > 0$: $$\alpha_{\text{joint}} > \prod_{j=1}^{m} \alpha_j$$
The practical implication: covering multiple objectives is harder than single-objective, but not as hard as assuming independence suggests.
Multi-objective coverage drops quickly. For 3 modestly-defined objectives (α=10% each), even with positive correlation, expect to need 5-10× more samples than single-objective optimization.
Rather than fixing the budget upfront, adaptive strategies adjust based on observed results.
Strategy 1: Early Termination
Set a minimum acceptable performance $\tau$. Stop when a configuration exceeds $\tau$: $$\text{Stop if } \max_{i \leq n} f(\mathbf{h}^{(i)}) \geq \tau$$
This is efficient when a "good enough" solution exists and is found early.
Strategy 2: Diminishing Returns Criterion
Track the improvement rate. Stop when improvements plateau: $$\text{Stop if } \frac{f^_n - f^_{n-k}}{k} < \delta$$
for some window $k$ and threshold $\delta$.
Strategy 3: Confidence-Based Stopping
Estimate $\alpha$ online and stop when confidence in having found a good configuration is high: $$\text{Stop if } 1 - (1 - \hat{\alpha})^n > p_{\text{target}}$$
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
import numpy as npfrom typing import Callable, Optionalfrom dataclasses import dataclass @dataclassclass AdaptiveSearchResult: """Result from adaptive random search.""" best_params: dict best_score: float n_iterations: int stopping_reason: str def adaptive_random_search( sample_fn: Callable[[], dict], evaluate_fn: Callable[[dict], float], max_iter: int = 1000, min_iter: int = 20, target_score: Optional[float] = None, patience: int = 50, improvement_threshold: float = 0.001) -> AdaptiveSearchResult: """ Random search with adaptive stopping. Stopping criteria (checked after min_iter): 1. Target score reached 2. No improvement in 'patience' iterations 3. Max iterations reached """ best_score = float('-inf') best_params = None iterations_since_improvement = 0 for i in range(max_iter): # Sample and evaluate params = sample_fn() score = evaluate_fn(params) # Update best if score > best_score: improvement = score - best_score best_score = score best_params = params iterations_since_improvement = 0 else: iterations_since_improvement += 1 # Check stopping criteria (after minimum iterations) if i >= min_iter: # Criterion 1: Target reached if target_score and best_score >= target_score: return AdaptiveSearchResult( best_params=best_params, best_score=best_score, n_iterations=i + 1, stopping_reason=f"Target score {target_score} reached" ) # Criterion 2: Patience exhausted if iterations_since_improvement >= patience: return AdaptiveSearchResult( best_params=best_params, best_score=best_score, n_iterations=i + 1, stopping_reason=f"No improvement for {patience} iterations" ) return AdaptiveSearchResult( best_params=best_params, best_score=best_score, n_iterations=max_iter, stopping_reason="Max iterations reached" )Random search results are stochastic—running again may find different optima. Quantifying this uncertainty is important for reproducibility.
Confidence Interval for Best Score:
Let $X_1, \ldots, X_n$ be scores from $n$ random configurations. The best observed score $X_{(n)} = \max_i X_i$ follows the order statistic distribution.
For the true $p$-th percentile of the score distribution, a $(1-\alpha)$ confidence interval for the achievable score is: $$\left[X_{(\lceil n \cdot q_1 \rceil)}, X_{(\lfloor n \cdot q_2 \rfloor)}\right]$$
where $q_1, q_2$ are appropriate quantiles from the beta distribution.
Practical Approach: Multiple Runs
Run random search $R$ times with different seeds. Report:
This gives an empirical distribution of what random search achieves.
For publications or production deployments, run random search 3-5 times with different seeds. Report the mean ± std of the best configurations found. This gives readers/users a realistic expectation of what to expect.
What's Next:
The next page explores the practical advantages of random search: simplicity, anytime behavior, robustness to search space specification, and debugging benefits.
You can now calculate exactly how many random samples you need to find good hyperparameters with any desired confidence level. Coverage probability transforms random search from guesswork into principled optimization.