Machine LearningAutoML & Neural Architecture Search

AutoML Best Practices

LevelAdvanced

Duration90 mins

TopicAutoML & Neural Architecture Search

3 / 5

Constraint Handling

Beyond Accuracy: Real-World Constraints

Accuracy is necessary but rarely sufficient for production ML systems. Real-world deployments face a constellation of constraints that determine whether a model can actually be used: latency budgets that dictate response times, memory limits that constrain deployment environments, model size restrictions for edge or mobile deployment, fairness requirements mandated by regulation or ethics, and custom business constraints unique to each application.

AutoML systems that optimize purely for predictive performance often produce models that cannot be deployed. A 99.5% accurate model that requires 500ms inference latency is useless for a 50ms real-time requirement. A highly accurate ensemble that consumes 16GB of RAM cannot run on a 4GB edge device. Understanding constraint handling transforms AutoML from an academic exercise into a production-ready tool.

What You Will Learn

By the end of this page, you will understand how to incorporate latency, memory, size, fairness, and custom constraints into AutoML pipelines. You'll learn constraint formulation strategies, multi-objective optimization approaches, and practical techniques for balancing accuracy against operational requirements.

Taxonomy of Constraints

Production ML systems face diverse constraints that can be categorized along multiple dimensions. Understanding this taxonomy is essential for systematic constraint handling.

ML Deployment Constraint Taxonomy
Category	Constraint Type	Example	Typical Origin
Performance	Inference Latency	< 10ms p99 latency	Real-time systems, user experience
Performance	Throughput	10,000 predictions/second	High-volume batch processing
Resource	Memory (RAM)	< 4GB peak memory	Edge devices, containerized deployments
Resource	Model Size (Disk)	< 100MB serialized model	Mobile apps, bandwidth-limited deployment
Resource	Compute (CPU/GPU)	CPU-only inference, no GPU	Cost constraints, deployment environment
Fairness	Demographic Parity	Equal positive prediction rates across groups	Regulatory compliance, ethical AI
Fairness	Equalized Odds	Equal TPR and FPR across protected groups	Legal requirements, fairness audits
Regulatory	Model Complexity	Linear models only, decision trees < depth 5	Interpretability mandates (GDPR, credit scoring)
Business	Feature Restrictions	Cannot use certain features	Privacy, data availability, legal restrictions
Business	Update Frequency	Model must retrain within 1 hour	Data freshness requirements

Hard vs. Soft Constraints:

Constraints differ in their strictness:

Hard Constraints — Must be satisfied; violating models are rejected regardless of accuracy.

Example: Latency < 50ms for real-time bidding
Example: Memory < 512MB for embedded deployment
Example: Used protected attributes excluded from model

Soft Constraints — Preferred but negotiable; violations incur penalties rather than rejection.

Example: Prefer latency < 20ms, tolerate up to 100ms
Example: Minimize demographic disparity, don't require elimination
Example: Prefer simpler models when accuracy difference is marginal

AutoML systems must handle both types differently: hard constraints filter the search space, while soft constraints are incorporated into the optimization objective.

Constraint Discovery is Critical

Many AutoML failures occur because constraints were not identified before the search began. Invest time upfront to document all hard constraints (dealbreakers) and soft constraints (preferences) with stakeholders from engineering, product, legal, and ethics teams. Discovering constraints after a week-long AutoML run is extremely costly.

Latency Constraints

Inference latency is often the most critical constraint for user-facing ML systems. A model that takes too long to respond is functionally useless, regardless of accuracy. AutoML must select and configure models that meet latency requirements while maximizing performance.

Latency-Sensitive Applications

•Real-Time Bidding (RTB) — 10-50ms total response time; model inference must complete in 1-5ms to allow for network and processing overhead.
•Recommendation Systems — 50-200ms response expectation; complex ensemble models may violate latency budgets.
•Fraud Detection — 50-100ms to avoid blocking legitimate transactions while catching fraud in real-time.
•Autonomous Vehicles — Sub-10ms for safety-critical decisions; often requires specialized hardware acceleration.
•Interactive Voice Response — 200-500ms perceived latency tolerance; model must respond within 50-100ms to allow for speech synthesis.
•Search Ranking — 100-300ms total; model inference is one component of a multi-stage pipeline.

Latency Factors in ML Models:

Inference latency depends on multiple factors that AutoML must consider:

Factor	Impact	AutoML Control
Model Type	100-1000x variation (linear vs. deep ensemble)	Algorithm selection
Model Size	Proportional (more parameters = slower)	Hyperparameter constraints
Feature Count	Linear to quadratic relationship	Feature selection
Data Preprocessing	Can dominate for complex transformations	Pipeline optimization
Batch Size	Amortization benefits for batched inference	Deployment configuration
Hardware	10-100x variation (CPU vs. GPU)	Deployment targeting

latency_constrained_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
import time
import numpy as np
from sklearn.base import BaseEstimator
from typing import List, Callable, Tuple
 
class LatencyConstrainedAutoML:
    """
    AutoML with latency constraints.
    
    Filters candidate models based on inference latency requirements
    before evaluating accuracy, ensuring all finalists meet production needs.
    """
    
    def __init__(
        self,
        max_latency_ms: float,
        target_percentile: float = 99.0,
        warmup_iterations: int = 5,
        benchmark_iterations: int = 100,
    ):
        """
        Args:
            max_latency_ms: Maximum acceptable inference latency in milliseconds
            target_percentile: Percentile to measure (default p99)
            warmup_iterations: Iterations for JIT warmup before benchmarking
            benchmark_iterations: Number of iterations for latency measurement
        """
        self.max_latency_ms = max_latency_ms
        self.target_percentile = target_percentile
        self.warmup_iterations = warmup_iterations
        self.benchmark_iterations = benchmark_iterations
        
        self.latency_results = {}
        
    def measure_latency(
        self,
        model: BaseEstimator,
        X_sample: np.ndarray,
    ) -> Tuple[float, float, float]:
        """
        Measure inference latency for a trained model.
        
        Returns:
            Tuple of (median_ms, p95_ms, p99_ms)
        """
        # Warmup (allows JIT compilation, cache warming)
        for _ in range(self.warmup_iterations):
            _ = model.predict(X_sample)
        
        # Benchmark
        latencies = []
        for _ in range(self.benchmark_iterations):
            start = time.perf_counter()
            _ = model.predict(X_sample)
            end = time.perf_counter()
            latencies.append((end - start) * 1000)  # Convert to ms
        
        latencies = np.array(latencies)
        
        return (
            np.median(latencies),
            np.percentile(latencies, 95),
            np.percentile(latencies, 99),
        )
    
    def check_latency_constraint(
        self,
        model: BaseEstimator,
        X_sample: np.ndarray,
        model_name: str = "",
    ) -> bool:
        """
        Check if model meets latency constraint.
        
        Returns True if model passess, False otherwise.
        """
        median, p95, p99 = self.measure_latency(model, X_sample)
        
        # Select target percentile
        if self.target_percentile == 99:
            target_latency = p99
        elif self.target_percentile == 95:
            target_latency = p95
        else:
            target_latency = median
        
        passes = target_latency <= self.max_latency_ms
        
        # Store results
        self.latency_results[model_name] = {
            'median_ms': round(median, 3),
            'p95_ms': round(p95, 3),
            'p99_ms': round(p99, 3),
            'target_latency_ms': round(target_latency, 3),
            'constraint_ms': self.max_latency_ms,
            'passes': passes,
        }
        
        return passes
    
    def filter_by_latency(
        self,
        models: List[Tuple[str, BaseEstimator]],
        X_sample: np.ndarray,
    ) -> List[Tuple[str, BaseEstimator]]:
        """
        Filter models that violate latency constraint.
        
        Args:
            models: List of (name, trained_model) tuples
            X_sample: Sample input for latency measurement
            
        Returns:
            List of models that pass latency constraint
        """
        passing_models = []
        
        print(f"Latency constraint: {self.max_latency_ms}ms (p{self.target_percentile:.0f})")
        
        for name, model in models:
            passes = self.check_latency_constraint(model, X_sample, name)
            result = self.latency_results[name]
            
            status = "✓ PASS" if passes else "✗ FAIL"
            print(f"  {name}: {result['target_latency_ms']:.2f}ms {status}")
            
            if passes:
                passing_models.append((name, model))
        
        print(f"\n{len(passing_models)}/{len(models)} models pass latency constraint")
        
        return passing_models
 
 
# Example: Algorithm-specific latency estimates
ALGORITHM_LATENCY_PROFILES = {
    # Per-sample inference time estimates (microseconds) for 100 features
    'LogisticRegression': {'base': 5, 'per_feature': 0.02},
    'DecisionTree': {'base': 2, 'per_depth': 0.5},
    'RandomForest': {'base': 10, 'per_tree': 2},
    'XGBoost': {'base': 5, 'per_tree': 0.5},
    'LightGBM': {'base': 3, 'per_tree': 0.3},
    'MLP_small': {'base': 50, 'per_layer': 10},
    'MLP_large': {'base': 200, 'per_layer': 50},
    'Ensemble_stacked': {'base': 100, 'per_model': 30},
}
 
def estimate_latency(
    algorithm: str,
    num_features: int = 100,
    num_trees: int = 100,
    num_layers: int = 3,
    depth: int = 6,
) -> float:
    """
    Estimate inference latency in milliseconds.
    
    These are rough estimates for CPU inference; actual latency
    depends on hardware, implementation, and data characteristics.
    """
    profile = ALGORITHM_LATENCY_PROFILES.get(algorithm, {'base': 100})
    
    latency_us = profile['base']
    
    if 'per_feature' in profile:
        latency_us += profile['per_feature'] * num_features
    if 'per_tree' in profile:
        latency_us += profile['per_tree'] * num_trees
    if 'per_depth' in profile:
        latency_us += profile['per_depth'] * depth
    if 'per_layer' in profile:
        latency_us += profile['per_layer'] * num_layers
    if 'per_model' in profile:
        latency_us += profile['per_model'] * 5  # Assume 5 models in ensemble
    
    return latency_us / 1000  # Convert to milliseconds
 
# Usage
for algo in ALGORITHM_LATENCY_PROFILES:
    latency = estimate_latency(algo)
    print(f"{algo}: ~{latency:.2f}ms per prediction")

Search Space Pruning

The most efficient approach to latency constraints is search space pruning: exclude algorithm families that cannot possibly meet requirements. If you need < 5ms latency, exclude neural networks and large ensembles from the search space entirely rather than training and then rejecting them.

Memory and Size Constraints

Memory constraints limit the RAM available during inference, while size constraints limit the serialized model size on disk or during transmission. These constraints are critical for edge deployment, mobile applications, containerized microservices, and bandwidth-limited environments.

Memory Constraint Sources

•Edge/IoT Devices — Often limited to 256MB-1GB RAM with other processes
•Mobile Applications — Memory budget shared with app, typically 50-200MB for ML
•Containerized Services — Resource limits (Kubernetes, Docker memory limits)
•Serverless Functions — Lambda/Functions have fixed memory tiers
•Concurrent Serving — Multiple model instances must share available memory

Size Constraint Sources

•Mobile App Stores — App bundle size limits, download considerations
•OTA Updates — Bandwidth constraints for model updates
•Embedded Systems — Flash storage limits
•Model Registry — Storage costs for many model versions
•Network Transfer — Initial model download time

Memory Contributors in ML Models:

Component	Typical Size	Optimization Approaches
Model Parameters	4 bytes per float32 weight	Quantization, pruning
Feature Preprocessing	Vocabulary dicts, scalers	Sparse representations
Tree Structures	~100 bytes per tree node	Depth limits, tree count
Ensemble Members	Sum of individual models	Ensemble pruning, distillation
Runtime Overhead	Framework, buffers	Framework selection
Input/Output Buffers	Batch size × features	Batch size limits

memory_constrained_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
import sys
import pickle
import tempfile
from typing import Any, Dict, List, Tuple
import numpy as np
 
def get_model_memory_mb(model: Any) -> float:
    """
    Estimate model memory usage in MB.
    
    Uses multiple methods: serialized size, sys.getsizeof (limited),
    and model-specific estimators.
    """
    # Method 1: Pickle size (good proxy for many models)
    with tempfile.NamedTemporaryFile() as f:
        pickle.dump(model, f)
        f.flush()
        pickle_size_mb = f.tell() / (1024 * 1024)
    
    # Method 2: Try to get more accurate estimates for known model types
    model_type = type(model).__name__
    
    if hasattr(model, 'get_params'):
        params = model.get_params()
    else:
        params = {}
    
    # Model-specific size estimation
    extra_runtime_mb = 0
    
    if 'RandomForest' in model_type:
        # Trees expand in memory due to runtime structures
        n_estimators = getattr(model, 'n_estimators', 100)
        extra_runtime_mb = n_estimators * 0.1  # ~100KB per tree runtime overhead
        
    elif 'XGB' in model_type or 'LGBM' in model_type or 'LightGBM' in model_type:
        # Gradient boosting models are relatively efficient
        extra_runtime_mb = pickle_size_mb * 0.2
        
    elif 'Keras' in model_type or 'Sequential' in model_type or 'torch' in str(type(model)):
        # Neural networks: weights + optimizer state + buffers
        extra_runtime_mb = pickle_size_mb * 1.5  # Conservative estimate
    
    estimated_runtime_mb = pickle_size_mb + extra_runtime_mb
    
    return {
        'pickle_size_mb': round(pickle_size_mb, 2),
        'estimated_runtime_mb': round(estimated_runtime_mb, 2),
        'model_type': model_type,
    }
 
 
class MemoryConstrainedModelSelector:
    """
    Select models that fit within memory constraints.
    
    Balances accuracy against memory requirements, supporting
    both hard limits and soft preferences.
    """
    
    def __init__(
        self,
        max_memory_mb: float,
        prefer_smaller: bool = True,
        memory_penalty_weight: float = 0.1,
    ):
        """
        Args:
            max_memory_mb: Hard memory limit in MB
            prefer_smaller: If True, add penalty for memory usage
            memory_penalty_weight: Weight for memory penalty (0-1)
        """
        self.max_memory_mb = max_memory_mb
        self.prefer_smaller = prefer_smaller
        self.memory_penalty_weight = memory_penalty_weight
        
    def score_with_memory_penalty(
        self,
        accuracy_score: float,
        memory_mb: float,
    ) -> float:
        """
        Compute penalized score accounting for memory usage.
        
        penalty = memory_penalty_weight * (memory_mb / max_memory_mb)
        adjusted_score = accuracy_score * (1 - penalty)
        """
        if memory_mb > self.max_memory_mb:
            return -1.0  # Invalid: exceeds hard constraint
        
        if not self.prefer_smaller:
            return accuracy_score
        
        # Soft penalty proportional to memory usage
        memory_fraction = memory_mb / self.max_memory_mb
        penalty = self.memory_penalty_weight * memory_fraction
        
        return accuracy_score * (1 - penalty)
    
    def select_best(
        self,
        candidates: List[Dict],
    ) -> Dict:
        """
        Select best model from candidates.
        
        Args:
            candidates: List of dicts with 'model', 'accuracy', 'memory_mb' keys
            
        Returns:
            Best candidate that meets constraints
        """
        valid_candidates = []
        
        for candidate in candidates:
            memory_mb = candidate['memory_mb']
            accuracy = candidate['accuracy']
            
            if memory_mb > self.max_memory_mb:
                print(f"  Rejecting {candidate.get('name', 'model')}: "
                      f"{memory_mb:.1f}MB > {self.max_memory_mb}MB limit")
                continue
            
            penalized_score = self.score_with_memory_penalty(accuracy, memory_mb)
            valid_candidates.append({
                **candidate,
                'penalized_score': penalized_score,
            })
        
        if not valid_candidates:
            raise ValueError(f"No models fit within {self.max_memory_mb}MB constraint")
        
        # Sort by penalized score
        valid_candidates.sort(key=lambda x: x['penalized_score'], reverse=True)
        
        best = valid_candidates[0]
        print(f"\nSelected: {best.get('name', 'model')}")
        print(f"  Accuracy: {best['accuracy']:.4f}")
        print(f"  Memory: {best['memory_mb']:.1f}MB")
        print(f"  Penalized Score: {best['penalized_score']:.4f}")
        
        return best
 
 
# Model compression techniques for memory reduction
def quantize_model_weights(model, precision='int8'):
    """
    Quantize model weights to reduce memory footprint.
    
    Note: This is a simplified example. Real quantization requires
    framework-specific implementations (TensorFlow Lite, ONNX, etc.)
    """
    quantization_savings = {
        'float32': 1.0,   # Baseline
        'float16': 0.5,   # 50% reduction
        'int8': 0.25,     # 75% reduction
        'int4': 0.125,    # 87.5% reduction (accuracy tradeoff)
    }
    
    return {
        'original_precision': 'float32',
        'target_precision': precision,
        'expected_size_reduction': 1 - quantization_savings[precision],
    }
 
 
def prune_ensemble(ensemble_models, target_count=5, X_val=None, y_val=None):
    """
    Prune ensemble to top-N models by validation performance.
    
    Reduces memory by removing redundant or weak ensemble members.
    """
    if len(ensemble_models) <= target_count:
        return ensemble_models
    
    # Score each model
    scores = []
    for i, model in enumerate(ensemble_models):
        if X_val is not None and y_val is not None:
            from sklearn.metrics import accuracy_score
            pred = model.predict(X_val)
            score = accuracy_score(y_val, pred)
        else:
            score = 0  # No validation data; keep original order
        scores.append((score, i, model))
    
    # Keep top performers
    scores.sort(reverse=True)
    pruned = [model for _, _, model in scores[:target_count]]
    
    print(f"Pruned ensemble: {len(ensemble_models)} → {len(pruned)} models")
    
    return pruned

Knowledge Distillation

When the best AutoML model exceeds memory constraints, knowledge distillation offers a solution: train a smaller 'student' model to mimic the larger 'teacher' model's predictions. This often achieves 90-95% of the teacher's accuracy at a fraction of the memory cost.

Fairness Constraints

Fairness constraints ensure ML models do not discriminate against protected groups. These constraints are increasingly mandated by regulations (GDPR, Equal Credit Opportunity Act) and organizational ethics policies. AutoML must incorporate fairness as a first-class objective.

Common Fairness Metrics

•Demographic Parity — Positive prediction rates should be equal across protected groups. P(Ŷ=1|A=0) = P(Ŷ=1|A=1) where A is the protected attribute.
•Equalized Odds — True positive rates AND false positive rates should be equal across groups. More nuanced than demographic parity.
•Equal Opportunity — True positive rates should be equal across groups. Focuses on equal benefit from correct positive predictions.
•Calibration — Predicted probabilities should be accurate within each group. P(Y=1|S=s,A=a) = s for all scores s and groups a.
•Individual Fairness — Similar individuals should receive similar predictions. Requires a meaningful similarity metric.
•Counterfactual Fairness — Prediction should be the same in counterfactual world where individual belonged to different group.

Fairness Metrics Can Conflict

It's mathematically proven that certain fairness metrics cannot be simultaneously satisfied except in special cases (Chouldechova, 2017; Kleinberg et al., 2016). For example, you cannot achieve both calibration and equalized odds when base rates differ between groups. Choose fairness metrics carefully based on the application context and legal requirements.

fairness_constrained_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
import numpy as np
from typing import Dict, List, Callable
from sklearn.base import BaseEstimator
 
class FairnessMetrics:
    """
    Compute fairness metrics for binary classification.
    """
    
    @staticmethod
    def demographic_parity_difference(
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """
        Compute demographic parity difference.
        
        Returns: |P(Ŷ=1|A=0) - P(Ŷ=1|A=1)|
        Lower is fairer (0 = perfect parity).
        """
        group_0_mask = protected_attribute == 0
        group_1_mask = protected_attribute == 1
        
        rate_0 = y_pred[group_0_mask].mean()
        rate_1 = y_pred[group_1_mask].mean()
        
        return abs(rate_0 - rate_1)
    
    @staticmethod
    def equalized_odds_difference(
        y_true: np.ndarray,
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """
        Compute equalized odds difference.
        
        Returns: max(|TPR_0 - TPR_1|, |FPR_0 - FPR_1|)
        Lower is fairer (0 = perfect equalized odds).
        """
        group_0_mask = protected_attribute == 0
        group_1_mask = protected_attribute == 1
        
        # True Positive Rates
        tpr_0 = y_pred[(group_0_mask) & (y_true == 1)].mean()
        tpr_1 = y_pred[(group_1_mask) & (y_true == 1)].mean()
        
        # False Positive Rates
        fpr_0 = y_pred[(group_0_mask) & (y_true == 0)].mean()
        fpr_1 = y_pred[(group_1_mask) & (y_true == 0)].mean()
        
        return max(abs(tpr_0 - tpr_1), abs(fpr_0 - fpr_1))
    
    @staticmethod
    def equal_opportunity_difference(
        y_true: np.ndarray,
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """
        Compute equal opportunity difference (TPR difference).
        
        Returns: |TPR_0 - TPR_1|
        """
        group_0_pos = (protected_attribute == 0) & (y_true == 1)
        group_1_pos = (protected_attribute == 1) & (y_true == 1)
        
        tpr_0 = y_pred[group_0_pos].mean() if group_0_pos.sum() > 0 else 0
        tpr_1 = y_pred[group_1_pos].mean() if group_1_pos.sum() > 0 else 0
        
        return abs(tpr_0 - tpr_1)
 
 
class FairnessConstrainedAutoML:
    """
    AutoML with fairness constraints.
    
    Supports both hard constraints (reject models exceeding threshold)
    and soft constraints (penalize unfair models in ranking).
    """
    
    def __init__(
        self,
        fairness_metric: str = 'demographic_parity',
        max_disparity: float = 0.1,
        is_hard_constraint: bool = True,
        fairness_weight: float = 0.3,
    ):
        """
        Args:
            fairness_metric: One of 'demographic_parity', 'equalized_odds', 'equal_opportunity'
            max_disparity: Maximum allowed disparity (for hard constraints)
            is_hard_constraint: If True, reject models exceeding max_disparity
            fairness_weight: Weight for fairness in combined objective (for soft constraints)
        """
        self.fairness_metric = fairness_metric
        self.max_disparity = max_disparity
        self.is_hard_constraint = is_hard_constraint
        self.fairness_weight = fairness_weight
        
        self.metrics = FairnessMetrics()
        
    def compute_fairness(
        self,
        y_true: np.ndarray,
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """Compute the configured fairness metric."""
        if self.fairness_metric == 'demographic_parity':
            return self.metrics.demographic_parity_difference(y_pred, protected_attribute)
        elif self.fairness_metric == 'equalized_odds':
            return self.metrics.equalized_odds_difference(y_true, y_pred, protected_attribute)
        elif self.fairness_metric == 'equal_opportunity':
            return self.metrics.equal_opportunity_difference(y_true, y_pred, protected_attribute)
        else:
            raise ValueError(f"Unknown fairness metric: {self.fairness_metric}")
    
    def evaluate_model(
        self,
        model: BaseEstimator,
        X: np.ndarray,
        y_true: np.ndarray,
        protected_attribute: np.ndarray,
        accuracy_metric: Callable = None,
    ) -> Dict:
        """
        Evaluate model for both accuracy and fairness.
        """
        y_pred = model.predict(X)
        
        # Accuracy
        if accuracy_metric:
            accuracy = accuracy_metric(y_true, y_pred)
        else:
            accuracy = (y_true == y_pred).mean()
        
        # Fairness
        disparity = self.compute_fairness(y_true, y_pred, protected_attribute)
        
        # Combined score (higher is better)
        if self.is_hard_constraint:
            if disparity > self.max_disparity:
                combined_score = -1  # Rejected
            else:
                combined_score = accuracy
        else:
            # Soft constraint: penalize based on disparity
            fairness_score = max(0, 1 - disparity / self.max_disparity)
            combined_score = (
                (1 - self.fairness_weight) * accuracy +
                self.fairness_weight * fairness_score
            )
        
        return {
            'accuracy': accuracy,
            'disparity': disparity,
            'fairness_metric': self.fairness_metric,
            'passes_constraint': disparity <= self.max_disparity,
            'combined_score': combined_score,
        }
    
    def select_fair_model(
        self,
        models: List[BaseEstimator],
        X: np.ndarray,
        y_true: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> BaseEstimator:
        """
        Select best model that satisfies fairness constraints.
        """
        results = []
        
        for i, model in enumerate(models):
            result = self.evaluate_model(model, X, y_true, protected_attribute)
            result['model_idx'] = i
            result['model'] = model
            results.append(result)
            
            status = "✓" if result['passes_constraint'] else "✗"
            print(f"Model {i}: Acc={result['accuracy']:.3f}, "
                  f"{self.fairness_metric}={result['disparity']:.3f} {status}")
        
        # Filter and sort
        valid = [r for r in results if r['combined_score'] >= 0]
        
        if not valid:
            raise ValueError(f"No models satisfy fairness constraint "
                           f"({self.fairness_metric} <= {self.max_disparity})")
        
        valid.sort(key=lambda x: x['combined_score'], reverse=True)
        
        best = valid[0]
        print(f"\nSelected Model {best['model_idx']}: "
              f"Accuracy={best['accuracy']:.3f}, "
              f"Disparity={best['disparity']:.3f}")
        
        return best['model']

Fairness-Accuracy Tradeoffs:

Enforcing fairness constraints typically reduces accuracy. This tradeoff is fundamental and must be communicated to stakeholders:

Fairness Threshold	Typical Accuracy Impact	Recommendation
Very Strict (< 0.01 disparity)	5-15% accuracy reduction	Only for critical applications
Strict (< 0.05 disparity)	2-8% accuracy reduction	Standard for regulated industries
Moderate (< 0.10 disparity)	1-3% accuracy reduction	Good balance for most applications
Loose (< 0.20 disparity)	< 1% accuracy reduction	Minimal impact, basic compliance

The actual tradeoff depends on the dataset's inherent bias patterns.

Multi-Objective Optimization

Real-world AutoML often involves multiple competing objectives: maximize accuracy, minimize latency, satisfy fairness constraints, limit memory usage. Multi-objective optimization (MOO) provides principled frameworks for navigating these tradeoffs.

Multi-Objective Approaches

•Scalarization — Combine objectives into single score via weighted sum: Score = w₁×accuracy - w₂×latency - w₃×disparity. Simple but requires specifying weights upfront.
•Constraint-Based — Optimize primary objective (accuracy) subject to constraints on others (latency < 10ms, disparity < 0.05). Most intuitive for practitioners.
•Pareto Optimization — Find the Pareto frontier of non-dominated solutions. Returns multiple models; stakeholder selects preferred tradeoff.
•Hierarchical — Optimize objectives in priority order: first filter by hard constraints, then optimize primary objective, then break ties with secondary objectives.
•Evolutionary MOO (NSGA-II, SMS-EMOA) — Genetic algorithms that maintain population of diverse Pareto-optimal solutions. Powerful for complex tradeoff landscapes.

Converting Mermaid diagram...

multi_objective_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
import numpy as np
from typing import List, Dict, Tuple, Callable
from dataclasses import dataclass
 
@dataclass
class Objective:
    """Definition of an optimization objective."""
    name: str
    direction: str  # 'maximize' or 'minimize'
    weight: float = 1.0
    threshold: float = None  # Hard constraint threshold
    is_constraint: bool = False
 
class MultiObjectiveAutoML:
    """
    Multi-objective AutoML framework supporting various MOO strategies.
    """
    
    def __init__(self, objectives: List[Objective]):
        """
        Args:
            objectives: List of Objective definitions
        """
        self.objectives = objectives
        self.constraint_objectives = [o for o in objectives if o.is_constraint]
        self.optimization_objectives = [o for o in objectives if not o.is_constraint]
        
    def evaluate_model(
        self,
        model_scores: Dict[str, float],
    ) -> Dict:
        """
        Evaluate a model against all objectives.
        
        Args:
            model_scores: Dict mapping objective name to score
            
        Returns:
            Evaluation result with constraint checks and combined score
        """
        # Check constraints first
        constraint_violations = []
        for obj in self.constraint_objectives:
            score = model_scores[obj.name]
            if obj.direction == 'maximize':
                passes = score >= obj.threshold
            else:
                passes = score <= obj.threshold
            
            if not passes:
                constraint_violations.append(obj.name)
        
        if constraint_violations:
            return {
                'feasible': False,
                'violations': constraint_violations,
                'combined_score': float('-inf'),
            }
        
        # Compute weighted combined score for optimization objectives
        combined_score = 0
        for obj in self.optimization_objectives:
            score = model_scores[obj.name]
            
            # Normalize direction (always maximize internally)
            if obj.direction == 'minimize':
                score = -score
            
            combined_score += obj.weight * score
        
        return {
            'feasible': True,
            'violations': [],
            'combined_score': combined_score,
            'individual_scores': model_scores,
        }
    
    def pareto_frontier(
        self,
        candidates: List[Dict[str, float]],
    ) -> List[int]:
        """
        Find Pareto-optimal solutions from candidate set.
        
        A solution is Pareto-optimal if no other solution is better
        in all objectives.
        
        Returns:
            Indices of Pareto-optimal candidates
        """
        n = len(candidates)
        is_dominated = [False] * n
        
        obj_names = [o.name for o in self.optimization_objectives]
        
        for i in range(n):
            if is_dominated[i]:
                continue
                
            for j in range(n):
                if i == j or is_dominated[j]:
                    continue
                
                # Check if j dominates i
                j_better_in_all = True
                j_strictly_better_in_any = False
                
                for obj in self.optimization_objectives:
                    score_i = candidates[i][obj.name]
                    score_j = candidates[j][obj.name]
                    
                    if obj.direction == 'maximize':
                        if score_j < score_i:
                            j_better_in_all = False
                        if score_j > score_i:
                            j_strictly_better_in_any = True
                    else:  # minimize
                        if score_j > score_i:
                            j_better_in_all = False
                        if score_j < score_i:
                            j_strictly_better_in_any = True
                
                if j_better_in_all and j_strictly_better_in_any:
                    is_dominated[i] = True
                    break
        
        return [i for i in range(n) if not is_dominated[i]]
    
    def select_best(
        self,
        candidates: List[Dict[str, float]],
        strategy: str = 'weighted_sum',
    ) -> Tuple[int, Dict]:
        """
        Select best candidate according to strategy.
        
        Args:
            candidates: List of score dictionaries for each candidate
            strategy: 'weighted_sum', 'pareto_knee', 'constraint_first'
            
        Returns:
            Tuple of (best_index, evaluation_result)
        """
        if strategy == 'weighted_sum':
            # Simple: use combined weighted score
            evaluations = [self.evaluate_model(c) for c in candidates]
            feasible = [(i, e) for i, e in enumerate(evaluations) if e['feasible']]
            
            if not feasible:
                raise ValueError("No feasible solutions found")
            
            best_idx, best_eval = max(feasible, key=lambda x: x[1]['combined_score'])
            return best_idx, best_eval
            
        elif strategy == 'pareto_knee':
            # Find Pareto frontier, then select "knee" point
            pareto_indices = self.pareto_frontier(candidates)
            
            if not pareto_indices:
                raise ValueError("No Pareto-optimal solutions found")
            
            # Knee = point with best average normalized objective value
            pareto_candidates = [candidates[i] for i in pareto_indices]
            
            # Normalize each objective to [0, 1]
            normalized_scores = []
            for obj in self.optimization_objectives:
                values = [c[obj.name] for c in pareto_candidates]
                min_val, max_val = min(values), max(values)
                range_val = max_val - min_val if max_val != min_val else 1
                
                for i, c in enumerate(pareto_candidates):
                    if len(normalized_scores) <= i:
                        normalized_scores.append({})
                    normalized_scores[i][obj.name] = (c[obj.name] - min_val) / range_val
            
            # Knee = best average normalized score
            avg_scores = [np.mean(list(ns.values())) for ns in normalized_scores]
            local_best = np.argmax(avg_scores)
            best_idx = pareto_indices[local_best]
            
            return best_idx, self.evaluate_model(candidates[best_idx])
        
        else:
            raise ValueError(f"Unknown strategy: {strategy}")
 
 
# Example usage
objectives = [
    Objective('accuracy', 'maximize', weight=1.0),
    Objective('latency_ms', 'minimize', weight=0.3),
    Objective('disparity', 'minimize', threshold=0.1, is_constraint=True),
    Objective('memory_mb', 'minimize', threshold=500, is_constraint=True),
]
 
moo = MultiObjectiveAutoML(objectives)
 
# Sample candidates
candidates = [
    {'accuracy': 0.95, 'latency_ms': 50, 'disparity': 0.08, 'memory_mb': 300},
    {'accuracy': 0.92, 'latency_ms': 10, 'disparity': 0.05, 'memory_mb': 100},
    {'accuracy': 0.97, 'latency_ms': 200, 'disparity': 0.12, 'memory_mb': 800},  # Violates constraints
    {'accuracy': 0.90, 'latency_ms': 5, 'disparity': 0.03, 'memory_mb': 50},
]
 
best_idx, result = moo.select_best(candidates, strategy='weighted_sum')
print(f"Best model: {best_idx}")
print(f"Result: {result}")

Presenting Pareto Frontiers

When stakeholders must choose from multiple valid tradeoffs, present the Pareto frontier visually. A scatter plot of accuracy vs. latency (or any two key objectives) showing only non-dominated points helps stakeholders understand the tradeoff landscape and make informed choices.

Constraint Specification in Practice

Specifying constraints correctly requires understanding both the constraint semantics and how AutoML systems interpret them. This section covers practical constraint specification across popular AutoML frameworks.

constraint_specification_examples.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
"""
Constraint Specification Examples Across AutoML Frameworks
"""
 
# ============================================
# AutoGluon: Constraint-Aware Training
# ============================================
from autogluon.tabular import TabularPredictor
 
 
# Time and resource constraints
predictor = TabularPredictor(label='target').fit(
    train_data,
    time_limit=3600,                    # Total time constraint
    presets='best_quality',
    
    # Exclude slow models (implicit latency constraint)
    excluded_model_types=['NN_TORCH', 'FASTAI'],
    
    # Memory constraint via hyperparameter limits
    hyperparameters={
        'GBM': {
            'num_boost_round': 500,      # Limit tree count → memory
            'max_depth': 6,              # Limit depth → size
        },
        'RF': {
            'n_estimators': 100,         # Limit forest size
        },
    },
    
    # Inference speed constraint
    ag_args_fit={
        'max_memory_usage_ratio': 0.8,   # Memory ceiling
    },
)
 
# Inference optimization after training
predictor.persist_models()               # Optimize for inference latency
 
 
# ============================================
# Auto-sklearn: Algorithm Constraints
# ============================================
from autosklearn.classification import AutoSklearnClassifier
 
# Constrain search space to interpretable models
clf = AutoSklearnClassifier(
    time_left_for_this_task=1800,
    per_run_time_limit=180,
    
    # Include only interpretable classifiers
    include={
        'classifier': [
            'decision_tree',
            'extra_trees',          # Still interpretable via feature importance
            'gradient_boosting',    # Reasonably interpretable
        ],
    },
    
    # Exclude black-box models
    exclude={
        'classifier': [
            'mlp',                  # Neural network
            'adaboost',             # Less interpretable ensemble
        ],
    },
    
    # Limit ensemble complexity
    ensemble_size=5,
    ensemble_nbest=5,
    
    # Memory limit
    memory_limit=4096,              # 4GB
)
 
 
# ============================================
# H2O AutoML: Constraints and Monotonicity
# ============================================
import h2o
from h2o.automl import H2OAutoML
 
h2o.init()
 
# With constraints on model types and monotonicity
aml = H2OAutoML(
    max_runtime_secs=3600,
    max_models=20,
    
    # Exclude complex models
    exclude_algos=['DeepLearning'],
    
    # Monotonicity constraints (domain knowledge)
    monotone_constraints={
        'credit_score': 1,          # Higher score → lower risk (positive)
        'debt_to_income': -1,       # Higher DTI → higher risk (negative)
    },
    
    # Stopping criteria
    stopping_metric='AUC',
    stopping_rounds=5,
    stopping_tolerance=0.001,
)
 
 
# ============================================
# Custom Constraint Implementation
# ============================================
class ConstraintValidator:
    """
    Validate models against production constraints before deployment.
    """
    
    def __init__(self, constraints: dict):
        """
        Args:
            constraints: Dict of constraint_name -> (check_fn, threshold, direction)
        """
        self.constraints = constraints
        self.validation_history = []
        
    def add_latency_constraint(
        self,
        max_p99_ms: float,
        X_sample,
        batch_size: int = 1,
    ):
        """Add inference latency constraint."""
        def check_latency(model):
            import time
            latencies = []
            for _ in range(100):
                start = time.perf_counter()
                model.predict(X_sample[:batch_size])
                latencies.append((time.perf_counter() - start) * 1000)
            return np.percentile(latencies, 99)
        
        self.constraints['latency_p99_ms'] = {
            'check_fn': check_latency,
            'threshold': max_p99_ms,
            'direction': 'max',  # Value must be <= threshold
        }
    
    def add_memory_constraint(self, max_mb: float):
        """Add model memory constraint."""
        def check_memory(model):
            import pickle
            import sys
            serialized = pickle.dumps(model)
            return len(serialized) / (1024 * 1024)
        
        self.constraints['memory_mb'] = {
            'check_fn': check_memory,
            'threshold': max_mb,
            'direction': 'max',
        }
    
    def add_fairness_constraint(
        self,
        metric: str,
        max_disparity: float,
        X_val,
        y_val,
        protected_attr,
    ):
        """Add fairness constraint."""
        def check_fairness(model):
            y_pred = model.predict(X_val)
            
            if metric == 'demographic_parity':
                g0 = y_pred[protected_attr == 0].mean()
                g1 = y_pred[protected_attr == 1].mean()
                return abs(g0 - g1)
            # Add other metrics as needed
            
        self.constraints[f'fairness_{metric}'] = {
            'check_fn': check_fairness,
            'threshold': max_disparity,
            'direction': 'max',
        }
    
    def validate(self, model, model_name: str = None) -> dict:
        """
        Validate model against all constraints.
        
        Returns:
            Dict with validation results and pass/fail status
        """
        results = {
            'model_name': model_name,
            'passes_all': True,
            'constraint_results': {},
        }
        
        for name, constraint in self.constraints.items():
            value = constraint['check_fn'](model)
            threshold = constraint['threshold']
            
            if constraint['direction'] == 'max':
                passes = value <= threshold
            else:  # min
                passes = value >= threshold
            
            results['constraint_results'][name] = {
                'value': value,
                'threshold': threshold,
                'passes': passes,
            }
            
            if not passes:
                results['passes_all'] = False
        
        self.validation_history.append(results)
        return results
    
    def generate_report(self) -> str:
        """Generate validation report."""
        lines = ["Constraint Validation Report", "=" * 40]
        
        for result in self.validation_history:
            lines.append(f"\nModel: {result['model_name']}")
            lines.append(f"Overall: {'PASS' if result['passes_all'] else 'FAIL'}")
            
            for name, cr in result['constraint_results'].items():
                status = '✓' if cr['passes'] else '✗'
                lines.append(f"  {status} {name}: {cr['value']:.3f} (limit: {cr['threshold']})")
        
        return "\n".join(lines)

Constraint Validation Before Deployment

Always validate constraints on the final model using the exact deployment conditions: actual hardware, representative input samples, and full preprocessing pipeline. Development and production environments often differ in ways that affect constraint satisfaction.

Summary: Mastering Constraint Handling

We've covered the comprehensive landscape of constraint handling in AutoML. Let's consolidate the essential principles:

Key Takeaways

•Discover constraints before searching — Identify all hard and soft constraints with stakeholders upfront. Late constraint discovery wastes resources.
•Distinguish hard from soft constraints — Hard constraints filter the search space; soft constraints influence ranking. Handle them differently.
•Search space pruning is most efficient — Exclude algorithm families that cannot possibly meet constraints rather than training and rejecting.
•Multi-objective optimization handles tradeoffs — Use scalarization, constraint-based, or Pareto methods depending on the degree of stakeholder flexibility.
•Fairness requires explicit attention — Accuracy-optimal models are often unfair. Incorporate fairness metrics early in the optimization process.
•Validate constraints in production conditions — Development and deployment environments differ. Always verify constraint satisfaction before deployment.

What's Next:

With constraint handling mastered, we turn to a critical concern for many AutoML deployments: Explainability. The next page examines how to interpret AutoML-produced models, satisfy regulatory requirements for model transparency, and communicate model behavior to stakeholders.

Page Complete

You now have a comprehensive framework for incorporating real-world constraints into AutoML pipelines. This knowledge ensures that AutoML produces not just accurate models but deployable ones—models that meet latency, memory, fairness, and custom business requirements.

3 / 5

Loading learning content...

Machine LearningAutoML & Neural Architecture Search

AutoML Best Practices

LevelAdvanced

Duration90 mins

TopicAutoML & Neural Architecture Search

3 / 5

Constraint Handling

Beyond Accuracy: Real-World Constraints

What You Will Learn

Taxonomy of Constraints

Production ML systems face diverse constraints that can be categorized along multiple dimensions. Understanding this taxonomy is essential for systematic constraint handling.

ML Deployment Constraint Taxonomy
Category	Constraint Type	Example	Typical Origin
Performance	Inference Latency	< 10ms p99 latency	Real-time systems, user experience
Performance	Throughput	10,000 predictions/second	High-volume batch processing
Resource	Memory (RAM)	< 4GB peak memory	Edge devices, containerized deployments
Resource	Model Size (Disk)	< 100MB serialized model	Mobile apps, bandwidth-limited deployment
Resource	Compute (CPU/GPU)	CPU-only inference, no GPU	Cost constraints, deployment environment
Fairness	Demographic Parity	Equal positive prediction rates across groups	Regulatory compliance, ethical AI
Fairness	Equalized Odds	Equal TPR and FPR across protected groups	Legal requirements, fairness audits
Regulatory	Model Complexity	Linear models only, decision trees < depth 5	Interpretability mandates (GDPR, credit scoring)
Business	Feature Restrictions	Cannot use certain features	Privacy, data availability, legal restrictions
Business	Update Frequency	Model must retrain within 1 hour	Data freshness requirements

Hard vs. Soft Constraints:

Constraints differ in their strictness:

Hard Constraints — Must be satisfied; violating models are rejected regardless of accuracy.

Example: Latency < 50ms for real-time bidding
Example: Memory < 512MB for embedded deployment
Example: Used protected attributes excluded from model

Soft Constraints — Preferred but negotiable; violations incur penalties rather than rejection.

Example: Prefer latency < 20ms, tolerate up to 100ms
Example: Minimize demographic disparity, don't require elimination
Example: Prefer simpler models when accuracy difference is marginal

AutoML systems must handle both types differently: hard constraints filter the search space, while soft constraints are incorporated into the optimization objective.

Constraint Discovery is Critical

Latency Constraints

Latency-Sensitive Applications

•Real-Time Bidding (RTB) — 10-50ms total response time; model inference must complete in 1-5ms to allow for network and processing overhead.
•Recommendation Systems — 50-200ms response expectation; complex ensemble models may violate latency budgets.
•Fraud Detection — 50-100ms to avoid blocking legitimate transactions while catching fraud in real-time.
•Autonomous Vehicles — Sub-10ms for safety-critical decisions; often requires specialized hardware acceleration.
•Interactive Voice Response — 200-500ms perceived latency tolerance; model must respond within 50-100ms to allow for speech synthesis.
•Search Ranking — 100-300ms total; model inference is one component of a multi-stage pipeline.

Latency Factors in ML Models:

Inference latency depends on multiple factors that AutoML must consider:

Factor	Impact	AutoML Control
Model Type	100-1000x variation (linear vs. deep ensemble)	Algorithm selection
Model Size	Proportional (more parameters = slower)	Hyperparameter constraints
Feature Count	Linear to quadratic relationship	Feature selection
Data Preprocessing	Can dominate for complex transformations	Pipeline optimization
Batch Size	Amortization benefits for batched inference	Deployment configuration
Hardware	10-100x variation (CPU vs. GPU)	Deployment targeting

latency_constrained_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
import time
import numpy as np
from sklearn.base import BaseEstimator
from typing import List, Callable, Tuple
 
class LatencyConstrainedAutoML:
    """
    AutoML with latency constraints.
    
    Filters candidate models based on inference latency requirements
    before evaluating accuracy, ensuring all finalists meet production needs.
    """
    
    def __init__(
        self,
        max_latency_ms: float,
        target_percentile: float = 99.0,
        warmup_iterations: int = 5,
        benchmark_iterations: int = 100,
    ):
        """
        Args:
            max_latency_ms: Maximum acceptable inference latency in milliseconds
            target_percentile: Percentile to measure (default p99)
            warmup_iterations: Iterations for JIT warmup before benchmarking
            benchmark_iterations: Number of iterations for latency measurement
        """
        self.max_latency_ms = max_latency_ms
        self.target_percentile = target_percentile
        self.warmup_iterations = warmup_iterations
        self.benchmark_iterations = benchmark_iterations
        
        self.latency_results = {}
        
    def measure_latency(
        self,
        model: BaseEstimator,
        X_sample: np.ndarray,
    ) -> Tuple[float, float, float]:
        """
        Measure inference latency for a trained model.
        
        Returns:
            Tuple of (median_ms, p95_ms, p99_ms)
        """
        # Warmup (allows JIT compilation, cache warming)
        for _ in range(self.warmup_iterations):
            _ = model.predict(X_sample)
        
        # Benchmark
        latencies = []
        for _ in range(self.benchmark_iterations):
            start = time.perf_counter()
            _ = model.predict(X_sample)
            end = time.perf_counter()
            latencies.append((end - start) * 1000)  # Convert to ms
        
        latencies = np.array(latencies)
        
        return (
            np.median(latencies),
            np.percentile(latencies, 95),
            np.percentile(latencies, 99),
        )
    
    def check_latency_constraint(
        self,
        model: BaseEstimator,
        X_sample: np.ndarray,
        model_name: str = "",
    ) -> bool:
        """
        Check if model meets latency constraint.
        
        Returns True if model passess, False otherwise.
        """
        median, p95, p99 = self.measure_latency(model, X_sample)
        
        # Select target percentile
        if self.target_percentile == 99:
            target_latency = p99
        elif self.target_percentile == 95:
            target_latency = p95
        else:
            target_latency = median
        
        passes = target_latency <= self.max_latency_ms
        
        # Store results
        self.latency_results[model_name] = {
            'median_ms': round(median, 3),
            'p95_ms': round(p95, 3),
            'p99_ms': round(p99, 3),
            'target_latency_ms': round(target_latency, 3),
            'constraint_ms': self.max_latency_ms,
            'passes': passes,
        }
        
        return passes
    
    def filter_by_latency(
        self,
        models: List[Tuple[str, BaseEstimator]],
        X_sample: np.ndarray,
    ) -> List[Tuple[str, BaseEstimator]]:
        """
        Filter models that violate latency constraint.
        
        Args:
            models: List of (name, trained_model) tuples
            X_sample: Sample input for latency measurement
            
        Returns:
            List of models that pass latency constraint
        """
        passing_models = []
        
        print(f"Latency constraint: {self.max_latency_ms}ms (p{self.target_percentile:.0f})")
        
        for name, model in models:
            passes = self.check_latency_constraint(model, X_sample, name)
            result = self.latency_results[name]
            
            status = "✓ PASS" if passes else "✗ FAIL"
            print(f"  {name}: {result['target_latency_ms']:.2f}ms {status}")
            
            if passes:
                passing_models.append((name, model))
        
        print(f"\n{len(passing_models)}/{len(models)} models pass latency constraint")
        
        return passing_models
 
 
# Example: Algorithm-specific latency estimates
ALGORITHM_LATENCY_PROFILES = {
    # Per-sample inference time estimates (microseconds) for 100 features
    'LogisticRegression': {'base': 5, 'per_feature': 0.02},
    'DecisionTree': {'base': 2, 'per_depth': 0.5},
    'RandomForest': {'base': 10, 'per_tree': 2},
    'XGBoost': {'base': 5, 'per_tree': 0.5},
    'LightGBM': {'base': 3, 'per_tree': 0.3},
    'MLP_small': {'base': 50, 'per_layer': 10},
    'MLP_large': {'base': 200, 'per_layer': 50},
    'Ensemble_stacked': {'base': 100, 'per_model': 30},
}
 
def estimate_latency(
    algorithm: str,
    num_features: int = 100,
    num_trees: int = 100,
    num_layers: int = 3,
    depth: int = 6,
) -> float:
    """
    Estimate inference latency in milliseconds.
    
    These are rough estimates for CPU inference; actual latency
    depends on hardware, implementation, and data characteristics.
    """
    profile = ALGORITHM_LATENCY_PROFILES.get(algorithm, {'base': 100})
    
    latency_us = profile['base']
    
    if 'per_feature' in profile:
        latency_us += profile['per_feature'] * num_features
    if 'per_tree' in profile:
        latency_us += profile['per_tree'] * num_trees
    if 'per_depth' in profile:
        latency_us += profile['per_depth'] * depth
    if 'per_layer' in profile:
        latency_us += profile['per_layer'] * num_layers
    if 'per_model' in profile:
        latency_us += profile['per_model'] * 5  # Assume 5 models in ensemble
    
    return latency_us / 1000  # Convert to milliseconds
 
# Usage
for algo in ALGORITHM_LATENCY_PROFILES:
    latency = estimate_latency(algo)
    print(f"{algo}: ~{latency:.2f}ms per prediction")

Search Space Pruning

Memory and Size Constraints

Memory Constraint Sources

•Edge/IoT Devices — Often limited to 256MB-1GB RAM with other processes
•Mobile Applications — Memory budget shared with app, typically 50-200MB for ML
•Containerized Services — Resource limits (Kubernetes, Docker memory limits)
•Serverless Functions — Lambda/Functions have fixed memory tiers
•Concurrent Serving — Multiple model instances must share available memory

Size Constraint Sources

•Mobile App Stores — App bundle size limits, download considerations
•OTA Updates — Bandwidth constraints for model updates
•Embedded Systems — Flash storage limits
•Model Registry — Storage costs for many model versions
•Network Transfer — Initial model download time

Memory Contributors in ML Models:

Component	Typical Size	Optimization Approaches
Model Parameters	4 bytes per float32 weight	Quantization, pruning
Feature Preprocessing	Vocabulary dicts, scalers	Sparse representations
Tree Structures	~100 bytes per tree node	Depth limits, tree count
Ensemble Members	Sum of individual models	Ensemble pruning, distillation
Runtime Overhead	Framework, buffers	Framework selection
Input/Output Buffers	Batch size × features	Batch size limits

memory_constrained_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
import sys
import pickle
import tempfile
from typing import Any, Dict, List, Tuple
import numpy as np
 
def get_model_memory_mb(model: Any) -> float:
    """
    Estimate model memory usage in MB.
    
    Uses multiple methods: serialized size, sys.getsizeof (limited),
    and model-specific estimators.
    """
    # Method 1: Pickle size (good proxy for many models)
    with tempfile.NamedTemporaryFile() as f:
        pickle.dump(model, f)
        f.flush()
        pickle_size_mb = f.tell() / (1024 * 1024)
    
    # Method 2: Try to get more accurate estimates for known model types
    model_type = type(model).__name__
    
    if hasattr(model, 'get_params'):
        params = model.get_params()
    else:
        params = {}
    
    # Model-specific size estimation
    extra_runtime_mb = 0
    
    if 'RandomForest' in model_type:
        # Trees expand in memory due to runtime structures
        n_estimators = getattr(model, 'n_estimators', 100)
        extra_runtime_mb = n_estimators * 0.1  # ~100KB per tree runtime overhead
        
    elif 'XGB' in model_type or 'LGBM' in model_type or 'LightGBM' in model_type:
        # Gradient boosting models are relatively efficient
        extra_runtime_mb = pickle_size_mb * 0.2
        
    elif 'Keras' in model_type or 'Sequential' in model_type or 'torch' in str(type(model)):
        # Neural networks: weights + optimizer state + buffers
        extra_runtime_mb = pickle_size_mb * 1.5  # Conservative estimate
    
    estimated_runtime_mb = pickle_size_mb + extra_runtime_mb
    
    return {
        'pickle_size_mb': round(pickle_size_mb, 2),
        'estimated_runtime_mb': round(estimated_runtime_mb, 2),
        'model_type': model_type,
    }
 
 
class MemoryConstrainedModelSelector:
    """
    Select models that fit within memory constraints.
    
    Balances accuracy against memory requirements, supporting
    both hard limits and soft preferences.
    """
    
    def __init__(
        self,
        max_memory_mb: float,
        prefer_smaller: bool = True,
        memory_penalty_weight: float = 0.1,
    ):
        """
        Args:
            max_memory_mb: Hard memory limit in MB
            prefer_smaller: If True, add penalty for memory usage
            memory_penalty_weight: Weight for memory penalty (0-1)
        """
        self.max_memory_mb = max_memory_mb
        self.prefer_smaller = prefer_smaller
        self.memory_penalty_weight = memory_penalty_weight
        
    def score_with_memory_penalty(
        self,
        accuracy_score: float,
        memory_mb: float,
    ) -> float:
        """
        Compute penalized score accounting for memory usage.
        
        penalty = memory_penalty_weight * (memory_mb / max_memory_mb)
        adjusted_score = accuracy_score * (1 - penalty)
        """
        if memory_mb > self.max_memory_mb:
            return -1.0  # Invalid: exceeds hard constraint
        
        if not self.prefer_smaller:
            return accuracy_score
        
        # Soft penalty proportional to memory usage
        memory_fraction = memory_mb / self.max_memory_mb
        penalty = self.memory_penalty_weight * memory_fraction
        
        return accuracy_score * (1 - penalty)
    
    def select_best(
        self,
        candidates: List[Dict],
    ) -> Dict:
        """
        Select best model from candidates.
        
        Args:
            candidates: List of dicts with 'model', 'accuracy', 'memory_mb' keys
            
        Returns:
            Best candidate that meets constraints
        """
        valid_candidates = []
        
        for candidate in candidates:
            memory_mb = candidate['memory_mb']
            accuracy = candidate['accuracy']
            
            if memory_mb > self.max_memory_mb:
                print(f"  Rejecting {candidate.get('name', 'model')}: "
                      f"{memory_mb:.1f}MB > {self.max_memory_mb}MB limit")
                continue
            
            penalized_score = self.score_with_memory_penalty(accuracy, memory_mb)
            valid_candidates.append({
                **candidate,
                'penalized_score': penalized_score,
            })
        
        if not valid_candidates:
            raise ValueError(f"No models fit within {self.max_memory_mb}MB constraint")
        
        # Sort by penalized score
        valid_candidates.sort(key=lambda x: x['penalized_score'], reverse=True)
        
        best = valid_candidates[0]
        print(f"\nSelected: {best.get('name', 'model')}")
        print(f"  Accuracy: {best['accuracy']:.4f}")
        print(f"  Memory: {best['memory_mb']:.1f}MB")
        print(f"  Penalized Score: {best['penalized_score']:.4f}")
        
        return best
 
 
# Model compression techniques for memory reduction
def quantize_model_weights(model, precision='int8'):
    """
    Quantize model weights to reduce memory footprint.
    
    Note: This is a simplified example. Real quantization requires
    framework-specific implementations (TensorFlow Lite, ONNX, etc.)
    """
    quantization_savings = {
        'float32': 1.0,   # Baseline
        'float16': 0.5,   # 50% reduction
        'int8': 0.25,     # 75% reduction
        'int4': 0.125,    # 87.5% reduction (accuracy tradeoff)
    }
    
    return {
        'original_precision': 'float32',
        'target_precision': precision,
        'expected_size_reduction': 1 - quantization_savings[precision],
    }
 
 
def prune_ensemble(ensemble_models, target_count=5, X_val=None, y_val=None):
    """
    Prune ensemble to top-N models by validation performance.
    
    Reduces memory by removing redundant or weak ensemble members.
    """
    if len(ensemble_models) <= target_count:
        return ensemble_models
    
    # Score each model
    scores = []
    for i, model in enumerate(ensemble_models):
        if X_val is not None and y_val is not None:
            from sklearn.metrics import accuracy_score
            pred = model.predict(X_val)
            score = accuracy_score(y_val, pred)
        else:
            score = 0  # No validation data; keep original order
        scores.append((score, i, model))
    
    # Keep top performers
    scores.sort(reverse=True)
    pruned = [model for _, _, model in scores[:target_count]]
    
    print(f"Pruned ensemble: {len(ensemble_models)} → {len(pruned)} models")
    
    return pruned

Knowledge Distillation

Fairness Constraints

Common Fairness Metrics

•Demographic Parity — Positive prediction rates should be equal across protected groups. P(Ŷ=1|A=0) = P(Ŷ=1|A=1) where A is the protected attribute.
•Equalized Odds — True positive rates AND false positive rates should be equal across groups. More nuanced than demographic parity.
•Equal Opportunity — True positive rates should be equal across groups. Focuses on equal benefit from correct positive predictions.
•Calibration — Predicted probabilities should be accurate within each group. P(Y=1|S=s,A=a) = s for all scores s and groups a.
•Individual Fairness — Similar individuals should receive similar predictions. Requires a meaningful similarity metric.
•Counterfactual Fairness — Prediction should be the same in counterfactual world where individual belonged to different group.

Fairness Metrics Can Conflict

fairness_constrained_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
import numpy as np
from typing import Dict, List, Callable
from sklearn.base import BaseEstimator
 
class FairnessMetrics:
    """
    Compute fairness metrics for binary classification.
    """
    
    @staticmethod
    def demographic_parity_difference(
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """
        Compute demographic parity difference.
        
        Returns: |P(Ŷ=1|A=0) - P(Ŷ=1|A=1)|
        Lower is fairer (0 = perfect parity).
        """
        group_0_mask = protected_attribute == 0
        group_1_mask = protected_attribute == 1
        
        rate_0 = y_pred[group_0_mask].mean()
        rate_1 = y_pred[group_1_mask].mean()
        
        return abs(rate_0 - rate_1)
    
    @staticmethod
    def equalized_odds_difference(
        y_true: np.ndarray,
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """
        Compute equalized odds difference.
        
        Returns: max(|TPR_0 - TPR_1|, |FPR_0 - FPR_1|)
        Lower is fairer (0 = perfect equalized odds).
        """
        group_0_mask = protected_attribute == 0
        group_1_mask = protected_attribute == 1
        
        # True Positive Rates
        tpr_0 = y_pred[(group_0_mask) & (y_true == 1)].mean()
        tpr_1 = y_pred[(group_1_mask) & (y_true == 1)].mean()
        
        # False Positive Rates
        fpr_0 = y_pred[(group_0_mask) & (y_true == 0)].mean()
        fpr_1 = y_pred[(group_1_mask) & (y_true == 0)].mean()
        
        return max(abs(tpr_0 - tpr_1), abs(fpr_0 - fpr_1))
    
    @staticmethod
    def equal_opportunity_difference(
        y_true: np.ndarray,
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """
        Compute equal opportunity difference (TPR difference).
        
        Returns: |TPR_0 - TPR_1|
        """
        group_0_pos = (protected_attribute == 0) & (y_true == 1)
        group_1_pos = (protected_attribute == 1) & (y_true == 1)
        
        tpr_0 = y_pred[group_0_pos].mean() if group_0_pos.sum() > 0 else 0
        tpr_1 = y_pred[group_1_pos].mean() if group_1_pos.sum() > 0 else 0
        
        return abs(tpr_0 - tpr_1)
 
 
class FairnessConstrainedAutoML:
    """
    AutoML with fairness constraints.
    
    Supports both hard constraints (reject models exceeding threshold)
    and soft constraints (penalize unfair models in ranking).
    """
    
    def __init__(
        self,
        fairness_metric: str = 'demographic_parity',
        max_disparity: float = 0.1,
        is_hard_constraint: bool = True,
        fairness_weight: float = 0.3,
    ):
        """
        Args:
            fairness_metric: One of 'demographic_parity', 'equalized_odds', 'equal_opportunity'
            max_disparity: Maximum allowed disparity (for hard constraints)
            is_hard_constraint: If True, reject models exceeding max_disparity
            fairness_weight: Weight for fairness in combined objective (for soft constraints)
        """
        self.fairness_metric = fairness_metric
        self.max_disparity = max_disparity
        self.is_hard_constraint = is_hard_constraint
        self.fairness_weight = fairness_weight
        
        self.metrics = FairnessMetrics()
        
    def compute_fairness(
        self,
        y_true: np.ndarray,
        y_pred: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> float:
        """Compute the configured fairness metric."""
        if self.fairness_metric == 'demographic_parity':
            return self.metrics.demographic_parity_difference(y_pred, protected_attribute)
        elif self.fairness_metric == 'equalized_odds':
            return self.metrics.equalized_odds_difference(y_true, y_pred, protected_attribute)
        elif self.fairness_metric == 'equal_opportunity':
            return self.metrics.equal_opportunity_difference(y_true, y_pred, protected_attribute)
        else:
            raise ValueError(f"Unknown fairness metric: {self.fairness_metric}")
    
    def evaluate_model(
        self,
        model: BaseEstimator,
        X: np.ndarray,
        y_true: np.ndarray,
        protected_attribute: np.ndarray,
        accuracy_metric: Callable = None,
    ) -> Dict:
        """
        Evaluate model for both accuracy and fairness.
        """
        y_pred = model.predict(X)
        
        # Accuracy
        if accuracy_metric:
            accuracy = accuracy_metric(y_true, y_pred)
        else:
            accuracy = (y_true == y_pred).mean()
        
        # Fairness
        disparity = self.compute_fairness(y_true, y_pred, protected_attribute)
        
        # Combined score (higher is better)
        if self.is_hard_constraint:
            if disparity > self.max_disparity:
                combined_score = -1  # Rejected
            else:
                combined_score = accuracy
        else:
            # Soft constraint: penalize based on disparity
            fairness_score = max(0, 1 - disparity / self.max_disparity)
            combined_score = (
                (1 - self.fairness_weight) * accuracy +
                self.fairness_weight * fairness_score
            )
        
        return {
            'accuracy': accuracy,
            'disparity': disparity,
            'fairness_metric': self.fairness_metric,
            'passes_constraint': disparity <= self.max_disparity,
            'combined_score': combined_score,
        }
    
    def select_fair_model(
        self,
        models: List[BaseEstimator],
        X: np.ndarray,
        y_true: np.ndarray,
        protected_attribute: np.ndarray,
    ) -> BaseEstimator:
        """
        Select best model that satisfies fairness constraints.
        """
        results = []
        
        for i, model in enumerate(models):
            result = self.evaluate_model(model, X, y_true, protected_attribute)
            result['model_idx'] = i
            result['model'] = model
            results.append(result)
            
            status = "✓" if result['passes_constraint'] else "✗"
            print(f"Model {i}: Acc={result['accuracy']:.3f}, "
                  f"{self.fairness_metric}={result['disparity']:.3f} {status}")
        
        # Filter and sort
        valid = [r for r in results if r['combined_score'] >= 0]
        
        if not valid:
            raise ValueError(f"No models satisfy fairness constraint "
                           f"({self.fairness_metric} <= {self.max_disparity})")
        
        valid.sort(key=lambda x: x['combined_score'], reverse=True)
        
        best = valid[0]
        print(f"\nSelected Model {best['model_idx']}: "
              f"Accuracy={best['accuracy']:.3f}, "
              f"Disparity={best['disparity']:.3f}")
        
        return best['model']

Fairness-Accuracy Tradeoffs:

Enforcing fairness constraints typically reduces accuracy. This tradeoff is fundamental and must be communicated to stakeholders:

Fairness Threshold	Typical Accuracy Impact	Recommendation
Very Strict (< 0.01 disparity)	5-15% accuracy reduction	Only for critical applications
Strict (< 0.05 disparity)	2-8% accuracy reduction	Standard for regulated industries
Moderate (< 0.10 disparity)	1-3% accuracy reduction	Good balance for most applications
Loose (< 0.20 disparity)	< 1% accuracy reduction	Minimal impact, basic compliance

The actual tradeoff depends on the dataset's inherent bias patterns.

Multi-Objective Optimization

Multi-Objective Approaches

•Scalarization — Combine objectives into single score via weighted sum: Score = w₁×accuracy - w₂×latency - w₃×disparity. Simple but requires specifying weights upfront.
•Constraint-Based — Optimize primary objective (accuracy) subject to constraints on others (latency < 10ms, disparity < 0.05). Most intuitive for practitioners.
•Pareto Optimization — Find the Pareto frontier of non-dominated solutions. Returns multiple models; stakeholder selects preferred tradeoff.
•Hierarchical — Optimize objectives in priority order: first filter by hard constraints, then optimize primary objective, then break ties with secondary objectives.
•Evolutionary MOO (NSGA-II, SMS-EMOA) — Genetic algorithms that maintain population of diverse Pareto-optimal solutions. Powerful for complex tradeoff landscapes.

Converting Mermaid diagram...

multi_objective_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
import numpy as np
from typing import List, Dict, Tuple, Callable
from dataclasses import dataclass
 
@dataclass
class Objective:
    """Definition of an optimization objective."""
    name: str
    direction: str  # 'maximize' or 'minimize'
    weight: float = 1.0
    threshold: float = None  # Hard constraint threshold
    is_constraint: bool = False
 
class MultiObjectiveAutoML:
    """
    Multi-objective AutoML framework supporting various MOO strategies.
    """
    
    def __init__(self, objectives: List[Objective]):
        """
        Args:
            objectives: List of Objective definitions
        """
        self.objectives = objectives
        self.constraint_objectives = [o for o in objectives if o.is_constraint]
        self.optimization_objectives = [o for o in objectives if not o.is_constraint]
        
    def evaluate_model(
        self,
        model_scores: Dict[str, float],
    ) -> Dict:
        """
        Evaluate a model against all objectives.
        
        Args:
            model_scores: Dict mapping objective name to score
            
        Returns:
            Evaluation result with constraint checks and combined score
        """
        # Check constraints first
        constraint_violations = []
        for obj in self.constraint_objectives:
            score = model_scores[obj.name]
            if obj.direction == 'maximize':
                passes = score >= obj.threshold
            else:
                passes = score <= obj.threshold
            
            if not passes:
                constraint_violations.append(obj.name)
        
        if constraint_violations:
            return {
                'feasible': False,
                'violations': constraint_violations,
                'combined_score': float('-inf'),
            }
        
        # Compute weighted combined score for optimization objectives
        combined_score = 0
        for obj in self.optimization_objectives:
            score = model_scores[obj.name]
            
            # Normalize direction (always maximize internally)
            if obj.direction == 'minimize':
                score = -score
            
            combined_score += obj.weight * score
        
        return {
            'feasible': True,
            'violations': [],
            'combined_score': combined_score,
            'individual_scores': model_scores,
        }
    
    def pareto_frontier(
        self,
        candidates: List[Dict[str, float]],
    ) -> List[int]:
        """
        Find Pareto-optimal solutions from candidate set.
        
        A solution is Pareto-optimal if no other solution is better
        in all objectives.
        
        Returns:
            Indices of Pareto-optimal candidates
        """
        n = len(candidates)
        is_dominated = [False] * n
        
        obj_names = [o.name for o in self.optimization_objectives]
        
        for i in range(n):
            if is_dominated[i]:
                continue
                
            for j in range(n):
                if i == j or is_dominated[j]:
                    continue
                
                # Check if j dominates i
                j_better_in_all = True
                j_strictly_better_in_any = False
                
                for obj in self.optimization_objectives:
                    score_i = candidates[i][obj.name]
                    score_j = candidates[j][obj.name]
                    
                    if obj.direction == 'maximize':
                        if score_j < score_i:
                            j_better_in_all = False
                        if score_j > score_i:
                            j_strictly_better_in_any = True
                    else:  # minimize
                        if score_j > score_i:
                            j_better_in_all = False
                        if score_j < score_i:
                            j_strictly_better_in_any = True
                
                if j_better_in_all and j_strictly_better_in_any:
                    is_dominated[i] = True
                    break
        
        return [i for i in range(n) if not is_dominated[i]]
    
    def select_best(
        self,
        candidates: List[Dict[str, float]],
        strategy: str = 'weighted_sum',
    ) -> Tuple[int, Dict]:
        """
        Select best candidate according to strategy.
        
        Args:
            candidates: List of score dictionaries for each candidate
            strategy: 'weighted_sum', 'pareto_knee', 'constraint_first'
            
        Returns:
            Tuple of (best_index, evaluation_result)
        """
        if strategy == 'weighted_sum':
            # Simple: use combined weighted score
            evaluations = [self.evaluate_model(c) for c in candidates]
            feasible = [(i, e) for i, e in enumerate(evaluations) if e['feasible']]
            
            if not feasible:
                raise ValueError("No feasible solutions found")
            
            best_idx, best_eval = max(feasible, key=lambda x: x[1]['combined_score'])
            return best_idx, best_eval
            
        elif strategy == 'pareto_knee':
            # Find Pareto frontier, then select "knee" point
            pareto_indices = self.pareto_frontier(candidates)
            
            if not pareto_indices:
                raise ValueError("No Pareto-optimal solutions found")
            
            # Knee = point with best average normalized objective value
            pareto_candidates = [candidates[i] for i in pareto_indices]
            
            # Normalize each objective to [0, 1]
            normalized_scores = []
            for obj in self.optimization_objectives:
                values = [c[obj.name] for c in pareto_candidates]
                min_val, max_val = min(values), max(values)
                range_val = max_val - min_val if max_val != min_val else 1
                
                for i, c in enumerate(pareto_candidates):
                    if len(normalized_scores) <= i:
                        normalized_scores.append({})
                    normalized_scores[i][obj.name] = (c[obj.name] - min_val) / range_val
            
            # Knee = best average normalized score
            avg_scores = [np.mean(list(ns.values())) for ns in normalized_scores]
            local_best = np.argmax(avg_scores)
            best_idx = pareto_indices[local_best]
            
            return best_idx, self.evaluate_model(candidates[best_idx])
        
        else:
            raise ValueError(f"Unknown strategy: {strategy}")
 
 
# Example usage
objectives = [
    Objective('accuracy', 'maximize', weight=1.0),
    Objective('latency_ms', 'minimize', weight=0.3),
    Objective('disparity', 'minimize', threshold=0.1, is_constraint=True),
    Objective('memory_mb', 'minimize', threshold=500, is_constraint=True),
]
 
moo = MultiObjectiveAutoML(objectives)
 
# Sample candidates
candidates = [
    {'accuracy': 0.95, 'latency_ms': 50, 'disparity': 0.08, 'memory_mb': 300},
    {'accuracy': 0.92, 'latency_ms': 10, 'disparity': 0.05, 'memory_mb': 100},
    {'accuracy': 0.97, 'latency_ms': 200, 'disparity': 0.12, 'memory_mb': 800},  # Violates constraints
    {'accuracy': 0.90, 'latency_ms': 5, 'disparity': 0.03, 'memory_mb': 50},
]
 
best_idx, result = moo.select_best(candidates, strategy='weighted_sum')
print(f"Best model: {best_idx}")
print(f"Result: {result}")

Presenting Pareto Frontiers

Constraint Specification in Practice

constraint_specification_examples.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
"""
Constraint Specification Examples Across AutoML Frameworks
"""
 
# ============================================
# AutoGluon: Constraint-Aware Training
# ============================================
from autogluon.tabular import TabularPredictor
 
 
# Time and resource constraints
predictor = TabularPredictor(label='target').fit(
    train_data,
    time_limit=3600,                    # Total time constraint
    presets='best_quality',
    
    # Exclude slow models (implicit latency constraint)
    excluded_model_types=['NN_TORCH', 'FASTAI'],
    
    # Memory constraint via hyperparameter limits
    hyperparameters={
        'GBM': {
            'num_boost_round': 500,      # Limit tree count → memory
            'max_depth': 6,              # Limit depth → size
        },
        'RF': {
            'n_estimators': 100,         # Limit forest size
        },
    },
    
    # Inference speed constraint
    ag_args_fit={
        'max_memory_usage_ratio': 0.8,   # Memory ceiling
    },
)
 
# Inference optimization after training
predictor.persist_models()               # Optimize for inference latency
 
 
# ============================================
# Auto-sklearn: Algorithm Constraints
# ============================================
from autosklearn.classification import AutoSklearnClassifier
 
# Constrain search space to interpretable models
clf = AutoSklearnClassifier(
    time_left_for_this_task=1800,
    per_run_time_limit=180,
    
    # Include only interpretable classifiers
    include={
        'classifier': [
            'decision_tree',
            'extra_trees',          # Still interpretable via feature importance
            'gradient_boosting',    # Reasonably interpretable
        ],
    },
    
    # Exclude black-box models
    exclude={
        'classifier': [
            'mlp',                  # Neural network
            'adaboost',             # Less interpretable ensemble
        ],
    },
    
    # Limit ensemble complexity
    ensemble_size=5,
    ensemble_nbest=5,
    
    # Memory limit
    memory_limit=4096,              # 4GB
)
 
 
# ============================================
# H2O AutoML: Constraints and Monotonicity
# ============================================
import h2o
from h2o.automl import H2OAutoML
 
h2o.init()
 
# With constraints on model types and monotonicity
aml = H2OAutoML(
    max_runtime_secs=3600,
    max_models=20,
    
    # Exclude complex models
    exclude_algos=['DeepLearning'],
    
    # Monotonicity constraints (domain knowledge)
    monotone_constraints={
        'credit_score': 1,          # Higher score → lower risk (positive)
        'debt_to_income': -1,       # Higher DTI → higher risk (negative)
    },
    
    # Stopping criteria
    stopping_metric='AUC',
    stopping_rounds=5,
    stopping_tolerance=0.001,
)
 
 
# ============================================
# Custom Constraint Implementation
# ============================================
class ConstraintValidator:
    """
    Validate models against production constraints before deployment.
    """
    
    def __init__(self, constraints: dict):
        """
        Args:
            constraints: Dict of constraint_name -> (check_fn, threshold, direction)
        """
        self.constraints = constraints
        self.validation_history = []
        
    def add_latency_constraint(
        self,
        max_p99_ms: float,
        X_sample,
        batch_size: int = 1,
    ):
        """Add inference latency constraint."""
        def check_latency(model):
            import time
            latencies = []
            for _ in range(100):
                start = time.perf_counter()
                model.predict(X_sample[:batch_size])
                latencies.append((time.perf_counter() - start) * 1000)
            return np.percentile(latencies, 99)
        
        self.constraints['latency_p99_ms'] = {
            'check_fn': check_latency,
            'threshold': max_p99_ms,
            'direction': 'max',  # Value must be <= threshold
        }
    
    def add_memory_constraint(self, max_mb: float):
        """Add model memory constraint."""
        def check_memory(model):
            import pickle
            import sys
            serialized = pickle.dumps(model)
            return len(serialized) / (1024 * 1024)
        
        self.constraints['memory_mb'] = {
            'check_fn': check_memory,
            'threshold': max_mb,
            'direction': 'max',
        }
    
    def add_fairness_constraint(
        self,
        metric: str,
        max_disparity: float,
        X_val,
        y_val,
        protected_attr,
    ):
        """Add fairness constraint."""
        def check_fairness(model):
            y_pred = model.predict(X_val)
            
            if metric == 'demographic_parity':
                g0 = y_pred[protected_attr == 0].mean()
                g1 = y_pred[protected_attr == 1].mean()
                return abs(g0 - g1)
            # Add other metrics as needed
            
        self.constraints[f'fairness_{metric}'] = {
            'check_fn': check_fairness,
            'threshold': max_disparity,
            'direction': 'max',
        }
    
    def validate(self, model, model_name: str = None) -> dict:
        """
        Validate model against all constraints.
        
        Returns:
            Dict with validation results and pass/fail status
        """
        results = {
            'model_name': model_name,
            'passes_all': True,
            'constraint_results': {},
        }
        
        for name, constraint in self.constraints.items():
            value = constraint['check_fn'](model)
            threshold = constraint['threshold']
            
            if constraint['direction'] == 'max':
                passes = value <= threshold
            else:  # min
                passes = value >= threshold
            
            results['constraint_results'][name] = {
                'value': value,
                'threshold': threshold,
                'passes': passes,
            }
            
            if not passes:
                results['passes_all'] = False
        
        self.validation_history.append(results)
        return results
    
    def generate_report(self) -> str:
        """Generate validation report."""
        lines = ["Constraint Validation Report", "=" * 40]
        
        for result in self.validation_history:
            lines.append(f"\nModel: {result['model_name']}")
            lines.append(f"Overall: {'PASS' if result['passes_all'] else 'FAIL'}")
            
            for name, cr in result['constraint_results'].items():
                status = '✓' if cr['passes'] else '✗'
                lines.append(f"  {status} {name}: {cr['value']:.3f} (limit: {cr['threshold']})")
        
        return "\n".join(lines)

Constraint Validation Before Deployment

Summary: Mastering Constraint Handling

We've covered the comprehensive landscape of constraint handling in AutoML. Let's consolidate the essential principles:

Key Takeaways

•Discover constraints before searching — Identify all hard and soft constraints with stakeholders upfront. Late constraint discovery wastes resources.
•Distinguish hard from soft constraints — Hard constraints filter the search space; soft constraints influence ranking. Handle them differently.
•Search space pruning is most efficient — Exclude algorithm families that cannot possibly meet constraints rather than training and rejecting.
•Multi-objective optimization handles tradeoffs — Use scalarization, constraint-based, or Pareto methods depending on the degree of stakeholder flexibility.
•Fairness requires explicit attention — Accuracy-optimal models are often unfair. Incorporate fairness metrics early in the optimization process.
•Validate constraints in production conditions — Development and deployment environments differ. Always verify constraint satisfaction before deployment.

What's Next:

Page Complete

3 / 5