Loading learning content...
Accuracy is necessary but rarely sufficient for production ML systems. Real-world deployments face a constellation of constraints that determine whether a model can actually be used: latency budgets that dictate response times, memory limits that constrain deployment environments, model size restrictions for edge or mobile deployment, fairness requirements mandated by regulation or ethics, and custom business constraints unique to each application.
AutoML systems that optimize purely for predictive performance often produce models that cannot be deployed. A 99.5% accurate model that requires 500ms inference latency is useless for a 50ms real-time requirement. A highly accurate ensemble that consumes 16GB of RAM cannot run on a 4GB edge device. Understanding constraint handling transforms AutoML from an academic exercise into a production-ready tool.
By the end of this page, you will understand how to incorporate latency, memory, size, fairness, and custom constraints into AutoML pipelines. You'll learn constraint formulation strategies, multi-objective optimization approaches, and practical techniques for balancing accuracy against operational requirements.
Production ML systems face diverse constraints that can be categorized along multiple dimensions. Understanding this taxonomy is essential for systematic constraint handling.
| Category | Constraint Type | Example | Typical Origin |
|---|---|---|---|
| Performance | Inference Latency | < 10ms p99 latency | Real-time systems, user experience |
| Performance | Throughput | 10,000 predictions/second | High-volume batch processing |
| Resource | Memory (RAM) | < 4GB peak memory | Edge devices, containerized deployments |
| Resource | Model Size (Disk) | < 100MB serialized model | Mobile apps, bandwidth-limited deployment |
| Resource | Compute (CPU/GPU) | CPU-only inference, no GPU | Cost constraints, deployment environment |
| Fairness | Demographic Parity | Equal positive prediction rates across groups | Regulatory compliance, ethical AI |
| Fairness | Equalized Odds | Equal TPR and FPR across protected groups | Legal requirements, fairness audits |
| Regulatory | Model Complexity | Linear models only, decision trees < depth 5 | Interpretability mandates (GDPR, credit scoring) |
| Business | Feature Restrictions | Cannot use certain features | Privacy, data availability, legal restrictions |
| Business | Update Frequency | Model must retrain within 1 hour | Data freshness requirements |
Hard vs. Soft Constraints:
Constraints differ in their strictness:
Hard Constraints — Must be satisfied; violating models are rejected regardless of accuracy.
Soft Constraints — Preferred but negotiable; violations incur penalties rather than rejection.
AutoML systems must handle both types differently: hard constraints filter the search space, while soft constraints are incorporated into the optimization objective.
Many AutoML failures occur because constraints were not identified before the search began. Invest time upfront to document all hard constraints (dealbreakers) and soft constraints (preferences) with stakeholders from engineering, product, legal, and ethics teams. Discovering constraints after a week-long AutoML run is extremely costly.
Inference latency is often the most critical constraint for user-facing ML systems. A model that takes too long to respond is functionally useless, regardless of accuracy. AutoML must select and configure models that meet latency requirements while maximizing performance.
Latency Factors in ML Models:
Inference latency depends on multiple factors that AutoML must consider:
| Factor | Impact | AutoML Control |
|---|---|---|
| Model Type | 100-1000x variation (linear vs. deep ensemble) | Algorithm selection |
| Model Size | Proportional (more parameters = slower) | Hyperparameter constraints |
| Feature Count | Linear to quadratic relationship | Feature selection |
| Data Preprocessing | Can dominate for complex transformations | Pipeline optimization |
| Batch Size | Amortization benefits for batched inference | Deployment configuration |
| Hardware | 10-100x variation (CPU vs. GPU) | Deployment targeting |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
import timeimport numpy as npfrom sklearn.base import BaseEstimatorfrom typing import List, Callable, Tuple class LatencyConstrainedAutoML: """ AutoML with latency constraints. Filters candidate models based on inference latency requirements before evaluating accuracy, ensuring all finalists meet production needs. """ def __init__( self, max_latency_ms: float, target_percentile: float = 99.0, warmup_iterations: int = 5, benchmark_iterations: int = 100, ): """ Args: max_latency_ms: Maximum acceptable inference latency in milliseconds target_percentile: Percentile to measure (default p99) warmup_iterations: Iterations for JIT warmup before benchmarking benchmark_iterations: Number of iterations for latency measurement """ self.max_latency_ms = max_latency_ms self.target_percentile = target_percentile self.warmup_iterations = warmup_iterations self.benchmark_iterations = benchmark_iterations self.latency_results = {} def measure_latency( self, model: BaseEstimator, X_sample: np.ndarray, ) -> Tuple[float, float, float]: """ Measure inference latency for a trained model. Returns: Tuple of (median_ms, p95_ms, p99_ms) """ # Warmup (allows JIT compilation, cache warming) for _ in range(self.warmup_iterations): _ = model.predict(X_sample) # Benchmark latencies = [] for _ in range(self.benchmark_iterations): start = time.perf_counter() _ = model.predict(X_sample) end = time.perf_counter() latencies.append((end - start) * 1000) # Convert to ms latencies = np.array(latencies) return ( np.median(latencies), np.percentile(latencies, 95), np.percentile(latencies, 99), ) def check_latency_constraint( self, model: BaseEstimator, X_sample: np.ndarray, model_name: str = "", ) -> bool: """ Check if model meets latency constraint. Returns True if model passess, False otherwise. """ median, p95, p99 = self.measure_latency(model, X_sample) # Select target percentile if self.target_percentile == 99: target_latency = p99 elif self.target_percentile == 95: target_latency = p95 else: target_latency = median passes = target_latency <= self.max_latency_ms # Store results self.latency_results[model_name] = { 'median_ms': round(median, 3), 'p95_ms': round(p95, 3), 'p99_ms': round(p99, 3), 'target_latency_ms': round(target_latency, 3), 'constraint_ms': self.max_latency_ms, 'passes': passes, } return passes def filter_by_latency( self, models: List[Tuple[str, BaseEstimator]], X_sample: np.ndarray, ) -> List[Tuple[str, BaseEstimator]]: """ Filter models that violate latency constraint. Args: models: List of (name, trained_model) tuples X_sample: Sample input for latency measurement Returns: List of models that pass latency constraint """ passing_models = [] print(f"Latency constraint: {self.max_latency_ms}ms (p{self.target_percentile:.0f})") for name, model in models: passes = self.check_latency_constraint(model, X_sample, name) result = self.latency_results[name] status = "✓ PASS" if passes else "✗ FAIL" print(f" {name}: {result['target_latency_ms']:.2f}ms {status}") if passes: passing_models.append((name, model)) print(f"\n{len(passing_models)}/{len(models)} models pass latency constraint") return passing_models # Example: Algorithm-specific latency estimatesALGORITHM_LATENCY_PROFILES = { # Per-sample inference time estimates (microseconds) for 100 features 'LogisticRegression': {'base': 5, 'per_feature': 0.02}, 'DecisionTree': {'base': 2, 'per_depth': 0.5}, 'RandomForest': {'base': 10, 'per_tree': 2}, 'XGBoost': {'base': 5, 'per_tree': 0.5}, 'LightGBM': {'base': 3, 'per_tree': 0.3}, 'MLP_small': {'base': 50, 'per_layer': 10}, 'MLP_large': {'base': 200, 'per_layer': 50}, 'Ensemble_stacked': {'base': 100, 'per_model': 30},} def estimate_latency( algorithm: str, num_features: int = 100, num_trees: int = 100, num_layers: int = 3, depth: int = 6,) -> float: """ Estimate inference latency in milliseconds. These are rough estimates for CPU inference; actual latency depends on hardware, implementation, and data characteristics. """ profile = ALGORITHM_LATENCY_PROFILES.get(algorithm, {'base': 100}) latency_us = profile['base'] if 'per_feature' in profile: latency_us += profile['per_feature'] * num_features if 'per_tree' in profile: latency_us += profile['per_tree'] * num_trees if 'per_depth' in profile: latency_us += profile['per_depth'] * depth if 'per_layer' in profile: latency_us += profile['per_layer'] * num_layers if 'per_model' in profile: latency_us += profile['per_model'] * 5 # Assume 5 models in ensemble return latency_us / 1000 # Convert to milliseconds # Usagefor algo in ALGORITHM_LATENCY_PROFILES: latency = estimate_latency(algo) print(f"{algo}: ~{latency:.2f}ms per prediction")The most efficient approach to latency constraints is search space pruning: exclude algorithm families that cannot possibly meet requirements. If you need < 5ms latency, exclude neural networks and large ensembles from the search space entirely rather than training and then rejecting them.
Memory constraints limit the RAM available during inference, while size constraints limit the serialized model size on disk or during transmission. These constraints are critical for edge deployment, mobile applications, containerized microservices, and bandwidth-limited environments.
Memory Contributors in ML Models:
| Component | Typical Size | Optimization Approaches |
|---|---|---|
| Model Parameters | 4 bytes per float32 weight | Quantization, pruning |
| Feature Preprocessing | Vocabulary dicts, scalers | Sparse representations |
| Tree Structures | ~100 bytes per tree node | Depth limits, tree count |
| Ensemble Members | Sum of individual models | Ensemble pruning, distillation |
| Runtime Overhead | Framework, buffers | Framework selection |
| Input/Output Buffers | Batch size × features | Batch size limits |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193
import sysimport pickleimport tempfilefrom typing import Any, Dict, List, Tupleimport numpy as np def get_model_memory_mb(model: Any) -> float: """ Estimate model memory usage in MB. Uses multiple methods: serialized size, sys.getsizeof (limited), and model-specific estimators. """ # Method 1: Pickle size (good proxy for many models) with tempfile.NamedTemporaryFile() as f: pickle.dump(model, f) f.flush() pickle_size_mb = f.tell() / (1024 * 1024) # Method 2: Try to get more accurate estimates for known model types model_type = type(model).__name__ if hasattr(model, 'get_params'): params = model.get_params() else: params = {} # Model-specific size estimation extra_runtime_mb = 0 if 'RandomForest' in model_type: # Trees expand in memory due to runtime structures n_estimators = getattr(model, 'n_estimators', 100) extra_runtime_mb = n_estimators * 0.1 # ~100KB per tree runtime overhead elif 'XGB' in model_type or 'LGBM' in model_type or 'LightGBM' in model_type: # Gradient boosting models are relatively efficient extra_runtime_mb = pickle_size_mb * 0.2 elif 'Keras' in model_type or 'Sequential' in model_type or 'torch' in str(type(model)): # Neural networks: weights + optimizer state + buffers extra_runtime_mb = pickle_size_mb * 1.5 # Conservative estimate estimated_runtime_mb = pickle_size_mb + extra_runtime_mb return { 'pickle_size_mb': round(pickle_size_mb, 2), 'estimated_runtime_mb': round(estimated_runtime_mb, 2), 'model_type': model_type, } class MemoryConstrainedModelSelector: """ Select models that fit within memory constraints. Balances accuracy against memory requirements, supporting both hard limits and soft preferences. """ def __init__( self, max_memory_mb: float, prefer_smaller: bool = True, memory_penalty_weight: float = 0.1, ): """ Args: max_memory_mb: Hard memory limit in MB prefer_smaller: If True, add penalty for memory usage memory_penalty_weight: Weight for memory penalty (0-1) """ self.max_memory_mb = max_memory_mb self.prefer_smaller = prefer_smaller self.memory_penalty_weight = memory_penalty_weight def score_with_memory_penalty( self, accuracy_score: float, memory_mb: float, ) -> float: """ Compute penalized score accounting for memory usage. penalty = memory_penalty_weight * (memory_mb / max_memory_mb) adjusted_score = accuracy_score * (1 - penalty) """ if memory_mb > self.max_memory_mb: return -1.0 # Invalid: exceeds hard constraint if not self.prefer_smaller: return accuracy_score # Soft penalty proportional to memory usage memory_fraction = memory_mb / self.max_memory_mb penalty = self.memory_penalty_weight * memory_fraction return accuracy_score * (1 - penalty) def select_best( self, candidates: List[Dict], ) -> Dict: """ Select best model from candidates. Args: candidates: List of dicts with 'model', 'accuracy', 'memory_mb' keys Returns: Best candidate that meets constraints """ valid_candidates = [] for candidate in candidates: memory_mb = candidate['memory_mb'] accuracy = candidate['accuracy'] if memory_mb > self.max_memory_mb: print(f" Rejecting {candidate.get('name', 'model')}: " f"{memory_mb:.1f}MB > {self.max_memory_mb}MB limit") continue penalized_score = self.score_with_memory_penalty(accuracy, memory_mb) valid_candidates.append({ **candidate, 'penalized_score': penalized_score, }) if not valid_candidates: raise ValueError(f"No models fit within {self.max_memory_mb}MB constraint") # Sort by penalized score valid_candidates.sort(key=lambda x: x['penalized_score'], reverse=True) best = valid_candidates[0] print(f"\nSelected: {best.get('name', 'model')}") print(f" Accuracy: {best['accuracy']:.4f}") print(f" Memory: {best['memory_mb']:.1f}MB") print(f" Penalized Score: {best['penalized_score']:.4f}") return best # Model compression techniques for memory reductiondef quantize_model_weights(model, precision='int8'): """ Quantize model weights to reduce memory footprint. Note: This is a simplified example. Real quantization requires framework-specific implementations (TensorFlow Lite, ONNX, etc.) """ quantization_savings = { 'float32': 1.0, # Baseline 'float16': 0.5, # 50% reduction 'int8': 0.25, # 75% reduction 'int4': 0.125, # 87.5% reduction (accuracy tradeoff) } return { 'original_precision': 'float32', 'target_precision': precision, 'expected_size_reduction': 1 - quantization_savings[precision], } def prune_ensemble(ensemble_models, target_count=5, X_val=None, y_val=None): """ Prune ensemble to top-N models by validation performance. Reduces memory by removing redundant or weak ensemble members. """ if len(ensemble_models) <= target_count: return ensemble_models # Score each model scores = [] for i, model in enumerate(ensemble_models): if X_val is not None and y_val is not None: from sklearn.metrics import accuracy_score pred = model.predict(X_val) score = accuracy_score(y_val, pred) else: score = 0 # No validation data; keep original order scores.append((score, i, model)) # Keep top performers scores.sort(reverse=True) pruned = [model for _, _, model in scores[:target_count]] print(f"Pruned ensemble: {len(ensemble_models)} → {len(pruned)} models") return prunedWhen the best AutoML model exceeds memory constraints, knowledge distillation offers a solution: train a smaller 'student' model to mimic the larger 'teacher' model's predictions. This often achieves 90-95% of the teacher's accuracy at a fraction of the memory cost.
Fairness constraints ensure ML models do not discriminate against protected groups. These constraints are increasingly mandated by regulations (GDPR, Equal Credit Opportunity Act) and organizational ethics policies. AutoML must incorporate fairness as a first-class objective.
It's mathematically proven that certain fairness metrics cannot be simultaneously satisfied except in special cases (Chouldechova, 2017; Kleinberg et al., 2016). For example, you cannot achieve both calibration and equalized odds when base rates differ between groups. Choose fairness metrics carefully based on the application context and legal requirements.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199
import numpy as npfrom typing import Dict, List, Callablefrom sklearn.base import BaseEstimator class FairnessMetrics: """ Compute fairness metrics for binary classification. """ @staticmethod def demographic_parity_difference( y_pred: np.ndarray, protected_attribute: np.ndarray, ) -> float: """ Compute demographic parity difference. Returns: |P(Ŷ=1|A=0) - P(Ŷ=1|A=1)| Lower is fairer (0 = perfect parity). """ group_0_mask = protected_attribute == 0 group_1_mask = protected_attribute == 1 rate_0 = y_pred[group_0_mask].mean() rate_1 = y_pred[group_1_mask].mean() return abs(rate_0 - rate_1) @staticmethod def equalized_odds_difference( y_true: np.ndarray, y_pred: np.ndarray, protected_attribute: np.ndarray, ) -> float: """ Compute equalized odds difference. Returns: max(|TPR_0 - TPR_1|, |FPR_0 - FPR_1|) Lower is fairer (0 = perfect equalized odds). """ group_0_mask = protected_attribute == 0 group_1_mask = protected_attribute == 1 # True Positive Rates tpr_0 = y_pred[(group_0_mask) & (y_true == 1)].mean() tpr_1 = y_pred[(group_1_mask) & (y_true == 1)].mean() # False Positive Rates fpr_0 = y_pred[(group_0_mask) & (y_true == 0)].mean() fpr_1 = y_pred[(group_1_mask) & (y_true == 0)].mean() return max(abs(tpr_0 - tpr_1), abs(fpr_0 - fpr_1)) @staticmethod def equal_opportunity_difference( y_true: np.ndarray, y_pred: np.ndarray, protected_attribute: np.ndarray, ) -> float: """ Compute equal opportunity difference (TPR difference). Returns: |TPR_0 - TPR_1| """ group_0_pos = (protected_attribute == 0) & (y_true == 1) group_1_pos = (protected_attribute == 1) & (y_true == 1) tpr_0 = y_pred[group_0_pos].mean() if group_0_pos.sum() > 0 else 0 tpr_1 = y_pred[group_1_pos].mean() if group_1_pos.sum() > 0 else 0 return abs(tpr_0 - tpr_1) class FairnessConstrainedAutoML: """ AutoML with fairness constraints. Supports both hard constraints (reject models exceeding threshold) and soft constraints (penalize unfair models in ranking). """ def __init__( self, fairness_metric: str = 'demographic_parity', max_disparity: float = 0.1, is_hard_constraint: bool = True, fairness_weight: float = 0.3, ): """ Args: fairness_metric: One of 'demographic_parity', 'equalized_odds', 'equal_opportunity' max_disparity: Maximum allowed disparity (for hard constraints) is_hard_constraint: If True, reject models exceeding max_disparity fairness_weight: Weight for fairness in combined objective (for soft constraints) """ self.fairness_metric = fairness_metric self.max_disparity = max_disparity self.is_hard_constraint = is_hard_constraint self.fairness_weight = fairness_weight self.metrics = FairnessMetrics() def compute_fairness( self, y_true: np.ndarray, y_pred: np.ndarray, protected_attribute: np.ndarray, ) -> float: """Compute the configured fairness metric.""" if self.fairness_metric == 'demographic_parity': return self.metrics.demographic_parity_difference(y_pred, protected_attribute) elif self.fairness_metric == 'equalized_odds': return self.metrics.equalized_odds_difference(y_true, y_pred, protected_attribute) elif self.fairness_metric == 'equal_opportunity': return self.metrics.equal_opportunity_difference(y_true, y_pred, protected_attribute) else: raise ValueError(f"Unknown fairness metric: {self.fairness_metric}") def evaluate_model( self, model: BaseEstimator, X: np.ndarray, y_true: np.ndarray, protected_attribute: np.ndarray, accuracy_metric: Callable = None, ) -> Dict: """ Evaluate model for both accuracy and fairness. """ y_pred = model.predict(X) # Accuracy if accuracy_metric: accuracy = accuracy_metric(y_true, y_pred) else: accuracy = (y_true == y_pred).mean() # Fairness disparity = self.compute_fairness(y_true, y_pred, protected_attribute) # Combined score (higher is better) if self.is_hard_constraint: if disparity > self.max_disparity: combined_score = -1 # Rejected else: combined_score = accuracy else: # Soft constraint: penalize based on disparity fairness_score = max(0, 1 - disparity / self.max_disparity) combined_score = ( (1 - self.fairness_weight) * accuracy + self.fairness_weight * fairness_score ) return { 'accuracy': accuracy, 'disparity': disparity, 'fairness_metric': self.fairness_metric, 'passes_constraint': disparity <= self.max_disparity, 'combined_score': combined_score, } def select_fair_model( self, models: List[BaseEstimator], X: np.ndarray, y_true: np.ndarray, protected_attribute: np.ndarray, ) -> BaseEstimator: """ Select best model that satisfies fairness constraints. """ results = [] for i, model in enumerate(models): result = self.evaluate_model(model, X, y_true, protected_attribute) result['model_idx'] = i result['model'] = model results.append(result) status = "✓" if result['passes_constraint'] else "✗" print(f"Model {i}: Acc={result['accuracy']:.3f}, " f"{self.fairness_metric}={result['disparity']:.3f} {status}") # Filter and sort valid = [r for r in results if r['combined_score'] >= 0] if not valid: raise ValueError(f"No models satisfy fairness constraint " f"({self.fairness_metric} <= {self.max_disparity})") valid.sort(key=lambda x: x['combined_score'], reverse=True) best = valid[0] print(f"\nSelected Model {best['model_idx']}: " f"Accuracy={best['accuracy']:.3f}, " f"Disparity={best['disparity']:.3f}") return best['model']Fairness-Accuracy Tradeoffs:
Enforcing fairness constraints typically reduces accuracy. This tradeoff is fundamental and must be communicated to stakeholders:
| Fairness Threshold | Typical Accuracy Impact | Recommendation |
|---|---|---|
| Very Strict (< 0.01 disparity) | 5-15% accuracy reduction | Only for critical applications |
| Strict (< 0.05 disparity) | 2-8% accuracy reduction | Standard for regulated industries |
| Moderate (< 0.10 disparity) | 1-3% accuracy reduction | Good balance for most applications |
| Loose (< 0.20 disparity) | < 1% accuracy reduction | Minimal impact, basic compliance |
The actual tradeoff depends on the dataset's inherent bias patterns.
Real-world AutoML often involves multiple competing objectives: maximize accuracy, minimize latency, satisfy fairness constraints, limit memory usage. Multi-objective optimization (MOO) provides principled frameworks for navigating these tradeoffs.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208
import numpy as npfrom typing import List, Dict, Tuple, Callablefrom dataclasses import dataclass @dataclassclass Objective: """Definition of an optimization objective.""" name: str direction: str # 'maximize' or 'minimize' weight: float = 1.0 threshold: float = None # Hard constraint threshold is_constraint: bool = False class MultiObjectiveAutoML: """ Multi-objective AutoML framework supporting various MOO strategies. """ def __init__(self, objectives: List[Objective]): """ Args: objectives: List of Objective definitions """ self.objectives = objectives self.constraint_objectives = [o for o in objectives if o.is_constraint] self.optimization_objectives = [o for o in objectives if not o.is_constraint] def evaluate_model( self, model_scores: Dict[str, float], ) -> Dict: """ Evaluate a model against all objectives. Args: model_scores: Dict mapping objective name to score Returns: Evaluation result with constraint checks and combined score """ # Check constraints first constraint_violations = [] for obj in self.constraint_objectives: score = model_scores[obj.name] if obj.direction == 'maximize': passes = score >= obj.threshold else: passes = score <= obj.threshold if not passes: constraint_violations.append(obj.name) if constraint_violations: return { 'feasible': False, 'violations': constraint_violations, 'combined_score': float('-inf'), } # Compute weighted combined score for optimization objectives combined_score = 0 for obj in self.optimization_objectives: score = model_scores[obj.name] # Normalize direction (always maximize internally) if obj.direction == 'minimize': score = -score combined_score += obj.weight * score return { 'feasible': True, 'violations': [], 'combined_score': combined_score, 'individual_scores': model_scores, } def pareto_frontier( self, candidates: List[Dict[str, float]], ) -> List[int]: """ Find Pareto-optimal solutions from candidate set. A solution is Pareto-optimal if no other solution is better in all objectives. Returns: Indices of Pareto-optimal candidates """ n = len(candidates) is_dominated = [False] * n obj_names = [o.name for o in self.optimization_objectives] for i in range(n): if is_dominated[i]: continue for j in range(n): if i == j or is_dominated[j]: continue # Check if j dominates i j_better_in_all = True j_strictly_better_in_any = False for obj in self.optimization_objectives: score_i = candidates[i][obj.name] score_j = candidates[j][obj.name] if obj.direction == 'maximize': if score_j < score_i: j_better_in_all = False if score_j > score_i: j_strictly_better_in_any = True else: # minimize if score_j > score_i: j_better_in_all = False if score_j < score_i: j_strictly_better_in_any = True if j_better_in_all and j_strictly_better_in_any: is_dominated[i] = True break return [i for i in range(n) if not is_dominated[i]] def select_best( self, candidates: List[Dict[str, float]], strategy: str = 'weighted_sum', ) -> Tuple[int, Dict]: """ Select best candidate according to strategy. Args: candidates: List of score dictionaries for each candidate strategy: 'weighted_sum', 'pareto_knee', 'constraint_first' Returns: Tuple of (best_index, evaluation_result) """ if strategy == 'weighted_sum': # Simple: use combined weighted score evaluations = [self.evaluate_model(c) for c in candidates] feasible = [(i, e) for i, e in enumerate(evaluations) if e['feasible']] if not feasible: raise ValueError("No feasible solutions found") best_idx, best_eval = max(feasible, key=lambda x: x[1]['combined_score']) return best_idx, best_eval elif strategy == 'pareto_knee': # Find Pareto frontier, then select "knee" point pareto_indices = self.pareto_frontier(candidates) if not pareto_indices: raise ValueError("No Pareto-optimal solutions found") # Knee = point with best average normalized objective value pareto_candidates = [candidates[i] for i in pareto_indices] # Normalize each objective to [0, 1] normalized_scores = [] for obj in self.optimization_objectives: values = [c[obj.name] for c in pareto_candidates] min_val, max_val = min(values), max(values) range_val = max_val - min_val if max_val != min_val else 1 for i, c in enumerate(pareto_candidates): if len(normalized_scores) <= i: normalized_scores.append({}) normalized_scores[i][obj.name] = (c[obj.name] - min_val) / range_val # Knee = best average normalized score avg_scores = [np.mean(list(ns.values())) for ns in normalized_scores] local_best = np.argmax(avg_scores) best_idx = pareto_indices[local_best] return best_idx, self.evaluate_model(candidates[best_idx]) else: raise ValueError(f"Unknown strategy: {strategy}") # Example usageobjectives = [ Objective('accuracy', 'maximize', weight=1.0), Objective('latency_ms', 'minimize', weight=0.3), Objective('disparity', 'minimize', threshold=0.1, is_constraint=True), Objective('memory_mb', 'minimize', threshold=500, is_constraint=True),] moo = MultiObjectiveAutoML(objectives) # Sample candidatescandidates = [ {'accuracy': 0.95, 'latency_ms': 50, 'disparity': 0.08, 'memory_mb': 300}, {'accuracy': 0.92, 'latency_ms': 10, 'disparity': 0.05, 'memory_mb': 100}, {'accuracy': 0.97, 'latency_ms': 200, 'disparity': 0.12, 'memory_mb': 800}, # Violates constraints {'accuracy': 0.90, 'latency_ms': 5, 'disparity': 0.03, 'memory_mb': 50},] best_idx, result = moo.select_best(candidates, strategy='weighted_sum')print(f"Best model: {best_idx}")print(f"Result: {result}")When stakeholders must choose from multiple valid tradeoffs, present the Pareto frontier visually. A scatter plot of accuracy vs. latency (or any two key objectives) showing only non-dominated points helps stakeholders understand the tradeoff landscape and make informed choices.
Specifying constraints correctly requires understanding both the constraint semantics and how AutoML systems interpret them. This section covers practical constraint specification across popular AutoML frameworks.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228
"""Constraint Specification Examples Across AutoML Frameworks""" # ============================================# AutoGluon: Constraint-Aware Training# ============================================from autogluon.tabular import TabularPredictor # Time and resource constraintspredictor = TabularPredictor(label='target').fit( train_data, time_limit=3600, # Total time constraint presets='best_quality', # Exclude slow models (implicit latency constraint) excluded_model_types=['NN_TORCH', 'FASTAI'], # Memory constraint via hyperparameter limits hyperparameters={ 'GBM': { 'num_boost_round': 500, # Limit tree count → memory 'max_depth': 6, # Limit depth → size }, 'RF': { 'n_estimators': 100, # Limit forest size }, }, # Inference speed constraint ag_args_fit={ 'max_memory_usage_ratio': 0.8, # Memory ceiling },) # Inference optimization after trainingpredictor.persist_models() # Optimize for inference latency # ============================================# Auto-sklearn: Algorithm Constraints# ============================================from autosklearn.classification import AutoSklearnClassifier # Constrain search space to interpretable modelsclf = AutoSklearnClassifier( time_left_for_this_task=1800, per_run_time_limit=180, # Include only interpretable classifiers include={ 'classifier': [ 'decision_tree', 'extra_trees', # Still interpretable via feature importance 'gradient_boosting', # Reasonably interpretable ], }, # Exclude black-box models exclude={ 'classifier': [ 'mlp', # Neural network 'adaboost', # Less interpretable ensemble ], }, # Limit ensemble complexity ensemble_size=5, ensemble_nbest=5, # Memory limit memory_limit=4096, # 4GB) # ============================================# H2O AutoML: Constraints and Monotonicity# ============================================import h2ofrom h2o.automl import H2OAutoML h2o.init() # With constraints on model types and monotonicityaml = H2OAutoML( max_runtime_secs=3600, max_models=20, # Exclude complex models exclude_algos=['DeepLearning'], # Monotonicity constraints (domain knowledge) monotone_constraints={ 'credit_score': 1, # Higher score → lower risk (positive) 'debt_to_income': -1, # Higher DTI → higher risk (negative) }, # Stopping criteria stopping_metric='AUC', stopping_rounds=5, stopping_tolerance=0.001,) # ============================================# Custom Constraint Implementation# ============================================class ConstraintValidator: """ Validate models against production constraints before deployment. """ def __init__(self, constraints: dict): """ Args: constraints: Dict of constraint_name -> (check_fn, threshold, direction) """ self.constraints = constraints self.validation_history = [] def add_latency_constraint( self, max_p99_ms: float, X_sample, batch_size: int = 1, ): """Add inference latency constraint.""" def check_latency(model): import time latencies = [] for _ in range(100): start = time.perf_counter() model.predict(X_sample[:batch_size]) latencies.append((time.perf_counter() - start) * 1000) return np.percentile(latencies, 99) self.constraints['latency_p99_ms'] = { 'check_fn': check_latency, 'threshold': max_p99_ms, 'direction': 'max', # Value must be <= threshold } def add_memory_constraint(self, max_mb: float): """Add model memory constraint.""" def check_memory(model): import pickle import sys serialized = pickle.dumps(model) return len(serialized) / (1024 * 1024) self.constraints['memory_mb'] = { 'check_fn': check_memory, 'threshold': max_mb, 'direction': 'max', } def add_fairness_constraint( self, metric: str, max_disparity: float, X_val, y_val, protected_attr, ): """Add fairness constraint.""" def check_fairness(model): y_pred = model.predict(X_val) if metric == 'demographic_parity': g0 = y_pred[protected_attr == 0].mean() g1 = y_pred[protected_attr == 1].mean() return abs(g0 - g1) # Add other metrics as needed self.constraints[f'fairness_{metric}'] = { 'check_fn': check_fairness, 'threshold': max_disparity, 'direction': 'max', } def validate(self, model, model_name: str = None) -> dict: """ Validate model against all constraints. Returns: Dict with validation results and pass/fail status """ results = { 'model_name': model_name, 'passes_all': True, 'constraint_results': {}, } for name, constraint in self.constraints.items(): value = constraint['check_fn'](model) threshold = constraint['threshold'] if constraint['direction'] == 'max': passes = value <= threshold else: # min passes = value >= threshold results['constraint_results'][name] = { 'value': value, 'threshold': threshold, 'passes': passes, } if not passes: results['passes_all'] = False self.validation_history.append(results) return results def generate_report(self) -> str: """Generate validation report.""" lines = ["Constraint Validation Report", "=" * 40] for result in self.validation_history: lines.append(f"\nModel: {result['model_name']}") lines.append(f"Overall: {'PASS' if result['passes_all'] else 'FAIL'}") for name, cr in result['constraint_results'].items(): status = '✓' if cr['passes'] else '✗' lines.append(f" {status} {name}: {cr['value']:.3f} (limit: {cr['threshold']})") return "\n".join(lines)Always validate constraints on the final model using the exact deployment conditions: actual hardware, representative input samples, and full preprocessing pipeline. Development and production environments often differ in ways that affect constraint satisfaction.
We've covered the comprehensive landscape of constraint handling in AutoML. Let's consolidate the essential principles:
What's Next:
With constraint handling mastered, we turn to a critical concern for many AutoML deployments: Explainability. The next page examines how to interpret AutoML-produced models, satisfy regulatory requirements for model transparency, and communicate model behavior to stakeholders.
You now have a comprehensive framework for incorporating real-world constraints into AutoML pipelines. This knowledge ensures that AutoML produces not just accurate models but deployable ones—models that meet latency, memory, fairness, and custom business requirements.