ML Systems & ProductionFeature Transformation Pipelines

Feature Transformation Pipelines

LevelIntermediate

Duration90 mins

TopicFeature Transformation Pipelines

5 / 5

Production Deployment

From Notebook to Production

The journey from a working Jupyter notebook to a production ML system is often described as crossing a chasm. Your pipeline works beautifully on your laptop with a sample dataset. But production demands something fundamentally different:

Latency requirements: Transform and predict in 50ms, not 5 seconds
Throughput demands: Handle 10,000 requests per second, not 10 per minute
Reliability expectations: 99.9% uptime means only 8 hours of downtime per year
Operational concerns: Monitoring, logging, alerting, debugging, updating

This page covers the patterns, practices, and pitfalls of deploying feature transformation pipelines to production. We'll progress from simple deployments suitable for proofs-of-concept to robust architectures capable of serving millions of predictions daily.

What You Will Learn

By the end of this page, you will understand production deployment patterns for sklearn pipelines, including REST APIs, batch processing, real-time vs offline feature computation, monitoring strategies, and operational best practices. You'll learn to avoid common production pitfalls and build systems that operate reliably at scale.

Deployment Patterns Overview

Before diving into implementation, let's understand the landscape of deployment patterns. The right choice depends on your latency requirements, traffic patterns, and operational constraints:

Deployment Pattern Comparison
Pattern	Latency	Throughput	Best For
REST API	10-100ms	100-10K RPS	Online predictions, user-facing apps
Batch Processing	Minutes-hours	Millions/job	Offline scoring, analytics, reports
Streaming	Sub-second	10K-100K RPS	Real-time features, fraud detection
Embedded	Microseconds	N/A	Mobile apps, edge devices
Serverless	100ms-seconds	Auto-scaling	Variable traffic, cost optimization

Real-Time vs Batch Feature Transformation:

A critical architectural decision is where feature transformation happens:

Online (Real-Time) Transformation:

Features computed at prediction time from raw inputs
Always fresh, no staleness issues
Latency constraint limits complexity
Every request pays transformation cost

Offline (Pre-Computed) Transformation:

Features computed in batch, stored in feature store
Lookup at prediction time, not computation
Can use arbitrarily complex transformations
Risk of stale features if not updated timely

Hybrid:

Stable features pre-computed (user history, aggregates)
Volatile features computed online (request context, time-based)
Best of both worlds, but more complexity

Start Simple, Scale Later

Begin with the simplest pattern that meets requirements. A Flask REST API handles thousands of requests per second and deploys in minutes. Kubernetes orchestration and streaming architectures can come later when you've validated the model works and traffic justifies complexity.

REST API Deployment

The most common deployment pattern is wrapping your pipeline in a REST API. This provides a language-agnostic interface that any client can call. Let's build a production-ready API:

api_deployment.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# app.py - FastAPI prediction service
 
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import joblib
import numpy as np
import pandas as pd
import logging
from datetime import datetime
import time
import os
 
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
 
# Initialize FastAPI
app = FastAPI(
    title="ML Prediction Service",
    description="Feature transformation and prediction API",
    version="1.0.0"
)
 
# ===== Load model at startup =====
 
MODEL_PATH = os.getenv("MODEL_PATH", "./model.pkl")
pipeline = None
 
@app.on_event("startup")
async def load_model():
    global pipeline
    logger.info(f"Loading model from {MODEL_PATH}")
    try:
        pipeline = joblib.load(MODEL_PATH)
        logger.info("Model loaded successfully")
    except Exception as e:
        logger.error(f"Failed to load model: {e}")
        raise RuntimeError(f"Cannot start without model: {e}")
 
 
# ===== Request/Response schemas =====
 
class PredictionRequest(BaseModel):
    """Input data for prediction."""
    age: float = Field(..., ge=0, le=150, description="Customer age")
    income: float = Field(..., ge=0, description="Annual income")
    tenure_months: int = Field(..., ge=0, description="Months as customer")
    gender: str = Field(..., description="Gender (M/F)")
    region: str = Field(..., description="Geographic region")
    
    @validator('gender')
    def validate_gender(cls, v):
        if v not in ['M', 'F']:
            raise ValueError("gender must be 'M' or 'F'")
        return v
 
class PredictionResponse(BaseModel):
    """Prediction result."""
    probability: float = Field(..., description="Churn probability")
    prediction: str = Field(..., description="Predicted class")
    latency_ms: float = Field(..., description="Processing time")
    model_version: str = Field(..., description="Model version used")
 
class BatchRequest(BaseModel):
    """Batch of prediction requests."""
    instances: List[PredictionRequest]
 
class BatchResponse(BaseModel):
    """Batch prediction results."""
    predictions: List[PredictionResponse]
    total_latency_ms: float
 
 
# ===== Prediction endpoints =====
 
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    """Single prediction endpoint."""
    start_time = time.time()
    
    try:
        # Convert to DataFrame (matching training format)
        df = pd.DataFrame([request.dict()])
        
        # Get prediction
        probability = pipeline.predict_proba(df)[0, 1]
        prediction = "churn" if probability > 0.5 else "no_churn"
        
        latency_ms = (time.time() - start_time) * 1000
        
        return PredictionResponse(
            probability=float(probability),
            prediction=prediction,
            latency_ms=latency_ms,
            model_version=os.getenv("MODEL_VERSION", "1.0.0")
        )
    
    except Exception as e:
        logger.error(f"Prediction failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))
 
 
@app.post("/predict/batch", response_model=BatchResponse)
async def predict_batch(request: BatchRequest):
    """Batch prediction endpoint for higher throughput."""
    start_time = time.time()
    
    try:
        # Convert all instances to DataFrame
        df = pd.DataFrame([inst.dict() for inst in request.instances])
        
        # Batch prediction (more efficient than individual)
        probabilities = pipeline.predict_proba(df)[:, 1]
        
        # Build responses
        predictions = []
        for prob in probabilities:
            predictions.append(PredictionResponse(
                probability=float(prob),
                prediction="churn" if prob > 0.5 else "no_churn",
                latency_ms=0,  # Individual latency not meaningful for batch
                model_version=os.getenv("MODEL_VERSION", "1.0.0")
            ))
        
        total_latency_ms = (time.time() - start_time) * 1000
        
        return BatchResponse(
            predictions=predictions,
            total_latency_ms=total_latency_ms
        )
    
    except Exception as e:
        logger.error(f"Batch prediction failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))
 
 
# ===== Health check endpoints =====
 
@app.get("/health")
async def health():
    """Kubernetes liveness probe."""
    return {"status": "healthy"}
 
@app.get("/ready")
async def ready():
    """Kubernetes readiness probe."""
    if pipeline is None:
        raise HTTPException(status_code=503, detail="Model not loaded")
    return {"status": "ready"}
 
 
# Run with: uvicorn app:app --host 0.0.0.0 --port 8080

Production Considerations:

Input Validation: Pydantic validates inputs before they reach the model. Invalid requests return 422 with details.
Error Handling: Catch exceptions with informative error messages. Never expose internal errors to clients.
Health Checks: /health for liveness (is the process running?), /ready for readiness (can it serve traffic?).
Batch Endpoint: Processing multiple instances in one call is more efficient than multiple single calls.
Logging: Log predictions, latencies, and errors for debugging and monitoring.

Framework Comparison

FastAPI offers automatic OpenAPI docs, async support, and Pydantic validation. Flask is simpler but requires manual validation. Django adds overhead but integrates with larger systems. For ML APIs, FastAPI is the modern default.

Containerization with Docker

Containers provide a reproducible, portable environment for your model. Docker encapsulates the Python runtime, dependencies, and model artifact into a single deployable image:

Dockerfile
dockerfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Production Dockerfile for sklearn prediction service
 
# ===== Stage 1: Build environment =====
FROM python:3.10-slim as builder
 
WORKDIR /app
 
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*
 
# Copy and install requirements
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip \
    && pip wheel --no-cache-dir --wheel-dir=/app/wheels -r requirements.txt
 
 
# ===== Stage 2: Runtime environment =====
FROM python:3.10-slim
 
# Security: Run as non-root user
RUN useradd --create-home --shell /bin/bash mluser
WORKDIR /home/mluser/app
 
# Install wheels from builder stage
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache-dir /wheels/* \
    && rm -rf /wheels
 
# Copy application code
COPY --chown=mluser:mluser app.py .
COPY --chown=mluser:mluser model.pkl .
 
# Environment configuration
ENV MODEL_PATH=/home/mluser/app/model.pkl
ENV MODEL_VERSION=1.0.0
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
 
# Switch to non-root user
USER mluser
 
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:${PORT}/health || exit 1
 
# Expose port
EXPOSE ${PORT}
 
# Run with gunicorn for production
CMD ["gunicorn", "app:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8080", "--timeout", "30", "--graceful-timeout", "10"]

requirements.txt

text

# requirements.txt - Pin versions for reproducibility
scikit-learn==1.3.0
joblib==1.3.1
numpy==1.24.3
pandas==2.0.3
fastapi==0.100.0
uvicorn[standard]==0.23.1
gunicorn==21.2.0
pydantic==2.0.3

docker_commands.sh
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Build image
docker build -t ml-prediction-service:1.0.0 .
 
# Run locally
docker run -p 8080:8080 \
    -e MODEL_VERSION=1.0.0 \
    ml-prediction-service:1.0.0
 
# Run with model mounted from host (for testing different models)
docker run -p 8080:8080 \
    -v $(pwd)/models/latest.pkl:/home/mluser/app/model.pkl:ro \
    ml-prediction-service:1.0.0
 
# Test the API
curl -X POST http://localhost:8080/predict \
    -H "Content-Type: application/json" \
    -d '{"age": 35, "income": 75000, "tenure_months": 24, "gender": "M", "region": "West"}'

Docker Best Practices for ML

•Multi-stage builds — Separate build and runtime reduces image size by ~50%
•Pin dependency versions — Prevents 'works on my machine' issues from version drift
•Non-root user — Security best practice; containers shouldn't run as root
•HEALTHCHECK directive — Enables container orchestrators to detect unhealthy instances
•Gunicorn for production — Uvicorn alone is single-process; Gunicorn manages workers

Image Size Matters

Smaller images deploy faster and reduce attack surface. Use python:3.10-slim over python:3.10 (500MB savings). Avoid installing unnecessary packages. Consider distroless images for even smaller, more secure deployments.

Batch Processing

Not all predictions need to be real-time. Batch processing is often more efficient for offline analytics, periodic scoring, and large-scale transformations:

batch_processing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# batch_predict.py - Scalable batch prediction pipeline
 
import joblib
import pandas as pd
import numpy as np
from pathlib import Path
import logging
from datetime import datetime
import argparse
from concurrent.futures import ProcessPoolExecutor
import pyarrow.parquet as pq
 
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
 
 
def load_pipeline(model_path: str):
    """Load the trained pipeline."""
    logger.info(f"Loading model from {model_path}")
    return joblib.load(model_path)
 
 
def process_chunk(args):
    """Process a single chunk of data. Worker function for multiprocessing."""
    chunk_path, model_path, output_path = args
    
    # Each worker loads its own copy of the model (fork-safe)
    pipeline = joblib.load(model_path)
    
    # Load chunk
    df = pd.read_parquet(chunk_path)
    
    # Predict
    probabilities = pipeline.predict_proba(df)[:, 1]
    predictions = (probabilities > 0.5).astype(int)
    
    # Add predictions to dataframe
    df['probability'] = probabilities
    df['prediction'] = predictions
    df['scored_at'] = datetime.utcnow().isoformat()
    
    # Save results
    df.to_parquet(output_path)
    
    return len(df)
 
 
class BatchPredictor:
    """Scalable batch prediction with chunking and parallelization."""
    
    def __init__(self, model_path: str, chunk_size: int = 100000):
        self.model_path = model_path
        self.chunk_size = chunk_size
        self.pipeline = load_pipeline(model_path)
    
    def predict_file(
        self, 
        input_path: str, 
        output_path: str,
        n_workers: int = 4
    ):
        """Process a single file with optional parallelization."""
        
        logger.info(f"Processing {input_path}")
        start_time = datetime.now()
        
        # For smaller files, process in memory
        df = pd.read_parquet(input_path)
        
        if len(df) <= self.chunk_size:
            # Single-threaded for small files
            probabilities = self.pipeline.predict_proba(df)[:, 1]
            df['probability'] = probabilities
            df['prediction'] = (probabilities > 0.5).astype(int)
            df['scored_at'] = datetime.utcnow().isoformat()
            df.to_parquet(output_path)
            
        else:
            # Parallel processing for large files
            self._parallel_predict(df, output_path, n_workers)
        
        elapsed = (datetime.now() - start_time).total_seconds()
        throughput = len(df) / elapsed
        
        logger.info(
            f"Completed: {len(df):,} rows in {elapsed:.1f}s "
            f"({throughput:,.0f} rows/sec)"
        )
    
    def _parallel_predict(self, df: pd.DataFrame, output_path: str, n_workers: int):
        """Split dataframe and process in parallel."""
        
        # Create temp directory for chunks
        temp_dir = Path(output_path).parent / '.temp_chunks'
        temp_dir.mkdir(exist_ok=True)
        
        # Split into chunks
        chunks = np.array_split(df, n_workers * 4)  # Oversplit for better load balancing
        
        chunk_args = []
        for i, chunk in enumerate(chunks):
            chunk_path = temp_dir / f'chunk_{i}.parquet'
            result_path = temp_dir / f'result_{i}.parquet'
            chunk.to_parquet(chunk_path)
            chunk_args.append((str(chunk_path), self.model_path, str(result_path)))
        
        # Process in parallel
        with ProcessPoolExecutor(max_workers=n_workers) as executor:
            results = list(executor.map(process_chunk, chunk_args))
        
        # Merge results
        result_files = sorted(temp_dir.glob('result_*.parquet'))
        result_dfs = [pd.read_parquet(f) for f in result_files]
        combined = pd.concat(result_dfs, ignore_index=True)
        combined.to_parquet(output_path)
        
        # Cleanup
        import shutil
        shutil.rmtree(temp_dir)
        
        logger.info(f"Processed {sum(results):,} total rows")
 
 
def main():
    parser = argparse.ArgumentParser(description='Batch prediction')
    parser.add_argument('--model', required=True, help='Path to model file')
    parser.add_argument('--input', required=True, help='Input parquet file/directory')
    parser.add_argument('--output', required=True, help='Output path')
    parser.add_argument('--workers', type=int, default=4, help='Number of workers')
    
    args = parser.parse_args()
    
    predictor = BatchPredictor(args.model)
    predictor.predict_file(args.input, args.output, args.workers)
 
 
if __name__ == '__main__':
    main()
 
 
# Run with:
# python batch_predict.py --model model.pkl --input data.parquet --output scored.parquet

Batch Processing at Scale:

For truly large datasets, consider distributed frameworks:

Framework	Best For	Notes
Spark + spark-sklearn	100GB+ datasets	Distributed across cluster
Dask	Medium-large datasets	Python-native, easier than Spark
Ray	Parallel Python workloads	Good for ML, supports distributed sklearn
AWS Batch / Airflow	Orchestrated jobs	For scheduled batch pipelines

Batch vs. Micro-batch

For near-real-time use cases, consider micro-batch processing: aggregate requests over a short window (100ms-1s), process as a batch, return results. This achieves sub-second latency while maintaining batch efficiency. Kafka + Faust or Spark Structured Streaming can implement this pattern.

Monitoring and Observability

Production ML systems need comprehensive monitoring. Unlike traditional software, ML systems can fail silently—producing outputs that look valid but are subtly wrong. Three pillars of observability:

monitoring.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# monitoring.py - ML-specific observability
 
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import FastAPI, Response
import numpy as np
from datetime import datetime
import logging
 
# ===== Prometheus Metrics =====
 
# Request metrics
PREDICTION_COUNT = Counter(
    'predictions_total',
    'Total number of predictions',
    ['model_version', 'prediction_class']
)
 
PREDICTION_LATENCY = Histogram(
    'prediction_latency_seconds',
    'Prediction latency in seconds',
    ['model_version'],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0]
)
 
PREDICTION_ERRORS = Counter(
    'prediction_errors_total',
    'Total prediction errors',
    ['model_version', 'error_type']
)
 
# Model-specific metrics
PROBABILITY_DISTRIBUTION = Histogram(
    'prediction_probability',
    'Distribution of predicted probabilities',
    ['model_version'],
    buckets=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
)
 
FEATURE_VALUES = Histogram(
    'feature_value',
    'Distribution of input feature values',
    ['feature_name'],
    buckets=[-3, -2, -1, 0, 1, 2, 3]  # For normalized features
)
 
# Data quality metrics
MISSING_VALUES = Counter(
    'missing_values_total',
    'Count of missing values in inputs',
    ['feature_name']
)
 
 
class PredictionMonitor:
    """Monitor predictions for drift and anomalies."""
    
    def __init__(self, model_version: str):
        self.model_version = model_version
        self.logger = logging.getLogger(__name__)
        
        # Rolling statistics for drift detection
        self.recent_probabilities = []
        self.window_size = 1000
    
    def record_prediction(self, features: dict, probability: float, prediction: str):
        """Record metrics for a single prediction."""
        
        # Count predictions by class
        PREDICTION_COUNT.labels(
            model_version=self.model_version,
            prediction_class=prediction
        ).inc()
        
        # Record probability distribution
        PROBABILITY_DISTRIBUTION.labels(
            model_version=self.model_version
        ).observe(probability)
        
        # Track feature distributions for drift detection
        for feature_name, value in features.items():
            if value is not None and not np.isnan(value):
                try:
                    FEATURE_VALUES.labels(feature_name=feature_name).observe(float(value))
                except:
                    pass
            else:
                MISSING_VALUES.labels(feature_name=feature_name).inc()
        
        # Update rolling statistics
        self.recent_probabilities.append(probability)
        if len(self.recent_probabilities) > self.window_size:
            self.recent_probabilities.pop(0)
        
        # Check for drift
        self._check_prediction_drift()
    
    def record_latency(self, latency_seconds: float):
        """Record prediction latency."""
        PREDICTION_LATENCY.labels(model_version=self.model_version).observe(latency_seconds)
    
    def record_error(self, error_type: str):
        """Record prediction errors."""
        PREDICTION_ERRORS.labels(
            model_version=self.model_version,
            error_type=error_type
        ).inc()
    
    def _check_prediction_drift(self):
        """Simple drift detection based on prediction distribution shift."""
        if len(self.recent_probabilities) < self.window_size:
            return
        
        # Compare recent mean to expected baseline
        recent_mean = np.mean(self.recent_probabilities)
        expected_mean = 0.3  # Baseline from training
        
        # Alert if mean shifts significantly
        if abs(recent_mean - expected_mean) > 0.1:
            self.logger.warning(
                f"Prediction drift detected! "
                f"Recent mean: {recent_mean:.3f}, Expected: {expected_mean:.3f}"
            )
 
 
# Metrics endpoint
app = FastAPI()
monitor = PredictionMonitor(model_version="1.0.0")
 
@app.get("/metrics")
async def metrics():
    """Prometheus metrics endpoint."""
    return Response(
        content=generate_latest(),
        media_type="text/plain"
    )

Essential ML Metrics to Monitor

•Request Latency — P50, P95, P99 latencies. Alert on degradation.
•Error Rate — Prediction failures, invalid inputs, timeouts.
•Prediction Distribution — Shift in predicted classes indicates drift.
•Feature Distributions — Input feature statistics changing suggest data drift.
•Missing Value Rate — Increase in nulls may indicate upstream pipeline issues.
•Model Load Time — Slow model loading affects startup and scaling.
•Memory Usage — Large models can cause OOM; monitor heap and RSS.

The Ground Truth Delay Problem

For many ML systems, true labels arrive hours, days, or never. You can't compute accuracy in real-time. Instead, monitor proxy metrics: prediction confidence, feature distributions, and prediction rates. When these shift unexpectedly, investigate even without ground truth.

Handling Feature Transformation in Production

Feature transformation in production introduces unique challenges. The Pipeline abstraction helps, but production environments stress the system in ways development never does:

production_transformation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# production_transformers.py - Production-hardened transformations
 
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from typing import Dict, List, Optional
import logging
 
logger = logging.getLogger(__name__)
 
 
class RobustColumnValidator(BaseEstimator, TransformerMixin):
    """
    Validates input data matches expected schema before transformation.
    
    Production systems receive malformed data. This transformer
    catches issues early with informative errors.
    """
    
    def __init__(self, expected_columns: List[str], expected_dtypes: Dict[str, str] = None):
        self.expected_columns = expected_columns
        self.expected_dtypes = expected_dtypes or {}
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        if not isinstance(X, pd.DataFrame):
            raise TypeError(f"Expected DataFrame, got {type(X).__name__}")
        
        # Check for missing columns
        missing = set(self.expected_columns) - set(X.columns)
        if missing:
            raise ValueError(f"Missing required columns: {missing}")
        
        # Check for unexpected columns
        extra = set(X.columns) - set(self.expected_columns)
        if extra:
            logger.warning(f"Unexpected columns will be ignored: {extra}")
            X = X[self.expected_columns].copy()
        
        # Validate dtypes
        for col, expected_dtype in self.expected_dtypes.items():
            actual_dtype = str(X[col].dtype)
            if expected_dtype not in actual_dtype:
                logger.warning(
                    f"Column {col} has dtype {actual_dtype}, expected {expected_dtype}"
                )
        
        return X
 
 
class SafeImputer(BaseEstimator, TransformerMixin):
    """
    Imputer with production safety features:
    - Logs high missing rates
    - Handles unexpected nulls in columns that didn't have nulls during training
    - Tracks imputation statistics
    """
    
    def __init__(self, strategy: str = 'median', threshold_warning: float = 0.1):
        self.strategy = strategy
        self.threshold_warning = threshold_warning
    
    def fit(self, X, y=None):
        X = np.asarray(X)
        if self.strategy == 'median':
            self.fill_values_ = np.nanmedian(X, axis=0)
        elif self.strategy == 'mean':
            self.fill_values_ = np.nanmean(X, axis=0)
        
        # Track training missing rates
        self.training_missing_rate_ = np.mean(np.isnan(X), axis=0)
        
        return self
    
    def transform(self, X):
        X = np.asarray(X).copy()
        
        # Check missing rates
        missing_rates = np.mean(np.isnan(X), axis=0)
        
        for i, (train_rate, current_rate) in enumerate(
            zip(self.training_missing_rate_, missing_rates)
        ):
            if current_rate > train_rate + self.threshold_warning:
                logger.warning(
                    f"Column {i}: missing rate {current_rate:.1%} "
                    f"exceeds training rate {train_rate:.1%} by >{self.threshold_warning:.0%}"
                )
        
        # Track imputation count for monitoring
        self.n_imputed_ = int(np.sum(np.isnan(X)))
        
        # Perform imputation
        for i in range(X.shape[1]):
            mask = np.isnan(X[:, i])
            X[mask, i] = self.fill_values_[i]
        
        return X
 
 
class GracefulCategoryEncoder(BaseEstimator, TransformerMixin):
    """
    Category encoder that handles unknown categories gracefully
    with configurable fallback behavior.
    """
    
    def __init__(self, unknown_handling: str = 'zero'):
        """
        unknown_handling: 'zero' (all zeros), 'most_common' (map to most frequent), 
                         'error' (raise exception)
        """
        self.unknown_handling = unknown_handling
    
    def fit(self, X, y=None):
        X = np.asarray(X).ravel()
        self.categories_ = sorted(set(X))
        self.category_to_idx_ = {cat: i for i, cat in enumerate(self.categories_)}
        
        # Track most common for fallback
        unique, counts = np.unique(X, return_counts=True)
        self.most_common_ = unique[np.argmax(counts)]
        
        return self
    
    def transform(self, X):
        X = np.asarray(X).ravel()
        n_categories = len(self.categories_)
        result = np.zeros((len(X), n_categories))
        
        unknown_count = 0
        
        for i, val in enumerate(X):
            if val in self.category_to_idx_:
                result[i, self.category_to_idx_[val]] = 1.0
            else:
                unknown_count += 1
                if self.unknown_handling == 'zero':
                    pass  # Already zeros
                elif self.unknown_handling == 'most_common':
                    result[i, self.category_to_idx_[self.most_common_]] = 1.0
                elif self.unknown_handling == 'error':
                    raise ValueError(f"Unknown category: {val}")
        
        if unknown_count > 0:
            logger.info(f"Encoded {unknown_count} unknown categories as '{self.unknown_handling}'")
        
        self.n_unknown_ = unknown_count
        return result
 
 
# Full production pipeline with validation
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
 
def create_production_pipeline(model, feature_schema: Dict):
    """
    Creates a production-ready pipeline with validation and monitoring.
    """
    
    numeric_features = feature_schema['numeric']
    categorical_features = feature_schema['categorical']
    
    # Input validation step
    validator = RobustColumnValidator(
        expected_columns=numeric_features + categorical_features,
        expected_dtypes={col: 'float' for col in numeric_features}
    )
    
    # Safe transformers
    numeric_transformer = Pipeline([
        ('impute', SafeImputer(strategy='median')),
        ('scale', StandardScaler())
    ])
    
    categorical_transformer = Pipeline([
        ('encode', GracefulCategoryEncoder(unknown_handling='zero'))
    ])
    
    preprocessor = ColumnTransformer([
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])
    
    # Full pipeline
    return Pipeline([
        ('validate', validator),
        ('preprocess', preprocessor),
        ('model', model)
    ])

Fail Fast, Fail Informatively

Production transformers should validate inputs early and fail with clear errors. A cryptic numpy error 10 layers deep is impossible to debug. Validation at the entry point with human-readable messages saves hours of debugging.

Common Production Failures

Understanding common failure modes helps you build more robust systems. These issues rarely appear in development but strike reliably in production:

Common Production Failure Modes

•Memory exhaustion — Large batch requests or accumulated memory leaks cause OOM kills. Set memory limits, monitor heap growth, restart workers periodically.
•Cold start latency — First request after deployment is slow while models load. Use warm-up endpoints and pre-loading strategies.
•Upstream data changes — A column is renamed, a categorical gains new values, or format changes. Input validation catches these early.
•Silent numerical issues — Float precision differences between numpy versions, or inf/NaN propagation. Log and validate outputs.
•Serialization desynchronization — Model was trained with feature order A,B,C but production sends B,A,C. Use named columns, not positional.
•Timezone and locale issues — Datetime parsing or string formatting differs between training and production environments.
•Dependency version conflicts — Production has different numpy/scipy/sklearn versions than training. Pin versions and verify.

failure_handling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# Defensive prediction wrapper
 
import numpy as np
from typing import Dict, Any, Optional
import traceback
import logging
 
logger = logging.getLogger(__name__)
 
class SafePredictor:
    """
    Wrapper that handles errors gracefully and provides fallback behavior.
    """
    
    def __init__(
        self, 
        pipeline,
        fallback_prediction: float = 0.5,
        timeout_seconds: float = 5.0
    ):
        self.pipeline = pipeline
        self.fallback_prediction = fallback_prediction
        self.timeout_seconds = timeout_seconds
    
    def predict(self, X) -> Dict[str, Any]:
        """
        Make prediction with error handling and fallback.
        """
        result = {
            'success': True,
            'prediction': None,
            'probability': None,
            'used_fallback': False,
            'error': None
        }
        
        try:
            # Validate input
            self._validate_input(X)
            
            # Make prediction
            probability = self.pipeline.predict_proba(X)[0, 1]
            
            # Validate output
            if not self._validate_output(probability):
                raise ValueError(f"Invalid probability: {probability}")
            
            result['probability'] = float(probability)
            result['prediction'] = 'positive' if probability > 0.5 else 'negative'
            
        except Exception as e:
            logger.error(f"Prediction failed: {e}\n{traceback.format_exc()}")
            
            result['success'] = False
            result['error'] = str(e)
            result['used_fallback'] = True
            result['probability'] = self.fallback_prediction
            result['prediction'] = 'unknown'
        
        return result
    
    def _validate_input(self, X):
        """Validate input data."""
        if X is None or len(X) == 0:
            raise ValueError("Empty input")
        
        # Check for excessive missing values
        if hasattr(X, 'isnull'):
            missing_rate = X.isnull().mean().mean()
            if missing_rate > 0.5:
                raise ValueError(f"Too many missing values: {missing_rate:.1%}")
    
    def _validate_output(self, probability):
        """Validate prediction is sensible."""
        if probability is None:
            return False
        if np.isnan(probability) or np.isinf(probability):
            return False
        if probability < 0 or probability > 1:
            return False
        return True
 
 
# Circuit breaker pattern for cascading failures
from datetime import datetime, timedelta
from threading import Lock
 
class CircuitBreaker:
    """
    Prevents cascading failures by short-circuiting when errors exceed threshold.
    """
    
    def __init__(
        self, 
        failure_threshold: int = 5,
        recovery_timeout: timedelta = timedelta(seconds=60)
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failures = 0
        self.last_failure_time = None
        self.state = 'closed'  # closed, open, half-open
        self.lock = Lock()
    
    def can_execute(self) -> bool:
        with self.lock:
            if self.state == 'closed':
                return True
            
            if self.state == 'open':
                # Check if recovery timeout has passed
                if datetime.now() - self.last_failure_time > self.recovery_timeout:
                    self.state = 'half-open'
                    return True
                return False
            
            if self.state == 'half-open':
                return True  # Allow one request through
        
        return False
    
    def record_success(self):
        with self.lock:
            self.failures = 0
            self.state = 'closed'
    
    def record_failure(self):
        with self.lock:
            self.failures += 1
            self.last_failure_time = datetime.now()
            
            if self.failures >= self.failure_threshold:
                self.state = 'open'
                logger.warning("Circuit breaker opened!")

Graceful Degradation

When prediction fails, returning a sensible fallback (like population average) is often better than returning an error. The user experience continues, and you can investigate the failure asynchronously. Document fallback behavior clearly so downstream systems know when they're receiving fallback values.

Summary: Production Deployment

Deploying feature transformation pipelines requires more than just wrapping a model in an API. Let's consolidate the key insights from this page and the entire module:

Production Deployment Key Takeaways

•Choose the right deployment pattern — REST API for real-time, batch for offline, streaming for near-real-time. Start simple.
•Containerize for reproducibility — Docker ensures the same environment runs everywhere. Pin dependency versions.
•Monitor beyond traditional metrics — Track prediction distributions, feature distributions, and latency for ML-specific issues.
•Validate inputs aggressively — Production data is messy. Catch schema violations and data quality issues early.
•Handle failures gracefully — Use fallbacks, circuit breakers, and defensive coding. Never crash on bad input.
•Plan for cold starts — Preload models, use health checks, and implement warm-up strategies.

Module Summary: Feature Transformation Pipelines

•Pipelines solve train-serving skew — Encapsulating preprocessing in a single object ensures consistency between training and inference.
•ColumnTransformer handles heterogeneous data — Route different feature types to appropriate transformers automatically.
•Custom transformers extend capabilities — Implement the estimator contract to integrate domain logic with sklearn's ecosystem.
•Serialization requires care — Use joblib, track versions, handle custom code, and consider security.
•Production is a different world — Containerize, monitor, validate, and handle failures. The model is the easy part.

Module Complete!

You now have comprehensive knowledge of building, serializing, and deploying feature transformation pipelines. From the fundamental Pipeline abstraction through ColumnTransformer composition, custom transformer implementation, serialization strategies, and production deployment patterns—you're equipped to build robust ML preprocessing workflows that operate reliably at scale.

The skills in this module are foundational for ML engineering. Every production ML system requires thoughtful preprocessing that's reproducible, maintainable, and operationally sound. What you've learned here applies whether you're building a simple Flask API or a distributed feature platform serving millions of predictions daily.

Module Complete: Feature Transformation Pipelines

Congratulations! You've mastered the art and engineering of feature transformation pipelines. From sklearn Pipelines and ColumnTransformers through custom transformers, serialization, and production deployment—you're now equipped to build robust, reproducible, and production-ready ML preprocessing workflows.

5 / 5

Loading learning content...

ML Systems & ProductionFeature Transformation Pipelines

Feature Transformation Pipelines

LevelIntermediate

Duration90 mins

TopicFeature Transformation Pipelines

5 / 5

Production Deployment

From Notebook to Production

Latency requirements: Transform and predict in 50ms, not 5 seconds
Throughput demands: Handle 10,000 requests per second, not 10 per minute
Reliability expectations: 99.9% uptime means only 8 hours of downtime per year
Operational concerns: Monitoring, logging, alerting, debugging, updating

What You Will Learn

Deployment Patterns Overview

Before diving into implementation, let's understand the landscape of deployment patterns. The right choice depends on your latency requirements, traffic patterns, and operational constraints:

Deployment Pattern Comparison
Pattern	Latency	Throughput	Best For
REST API	10-100ms	100-10K RPS	Online predictions, user-facing apps
Batch Processing	Minutes-hours	Millions/job	Offline scoring, analytics, reports
Streaming	Sub-second	10K-100K RPS	Real-time features, fraud detection
Embedded	Microseconds	N/A	Mobile apps, edge devices
Serverless	100ms-seconds	Auto-scaling	Variable traffic, cost optimization

Real-Time vs Batch Feature Transformation:

A critical architectural decision is where feature transformation happens:

Online (Real-Time) Transformation:

Features computed at prediction time from raw inputs
Always fresh, no staleness issues
Latency constraint limits complexity
Every request pays transformation cost

Offline (Pre-Computed) Transformation:

Features computed in batch, stored in feature store
Lookup at prediction time, not computation
Can use arbitrarily complex transformations
Risk of stale features if not updated timely

Hybrid:

Stable features pre-computed (user history, aggregates)
Volatile features computed online (request context, time-based)
Best of both worlds, but more complexity

Start Simple, Scale Later

REST API Deployment

The most common deployment pattern is wrapping your pipeline in a REST API. This provides a language-agnostic interface that any client can call. Let's build a production-ready API:

api_deployment.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# app.py - FastAPI prediction service
 
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import joblib
import numpy as np
import pandas as pd
import logging
from datetime import datetime
import time
import os
 
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
 
# Initialize FastAPI
app = FastAPI(
    title="ML Prediction Service",
    description="Feature transformation and prediction API",
    version="1.0.0"
)
 
# ===== Load model at startup =====
 
MODEL_PATH = os.getenv("MODEL_PATH", "./model.pkl")
pipeline = None
 
@app.on_event("startup")
async def load_model():
    global pipeline
    logger.info(f"Loading model from {MODEL_PATH}")
    try:
        pipeline = joblib.load(MODEL_PATH)
        logger.info("Model loaded successfully")
    except Exception as e:
        logger.error(f"Failed to load model: {e}")
        raise RuntimeError(f"Cannot start without model: {e}")
 
 
# ===== Request/Response schemas =====
 
class PredictionRequest(BaseModel):
    """Input data for prediction."""
    age: float = Field(..., ge=0, le=150, description="Customer age")
    income: float = Field(..., ge=0, description="Annual income")
    tenure_months: int = Field(..., ge=0, description="Months as customer")
    gender: str = Field(..., description="Gender (M/F)")
    region: str = Field(..., description="Geographic region")
    
    @validator('gender')
    def validate_gender(cls, v):
        if v not in ['M', 'F']:
            raise ValueError("gender must be 'M' or 'F'")
        return v
 
class PredictionResponse(BaseModel):
    """Prediction result."""
    probability: float = Field(..., description="Churn probability")
    prediction: str = Field(..., description="Predicted class")
    latency_ms: float = Field(..., description="Processing time")
    model_version: str = Field(..., description="Model version used")
 
class BatchRequest(BaseModel):
    """Batch of prediction requests."""
    instances: List[PredictionRequest]
 
class BatchResponse(BaseModel):
    """Batch prediction results."""
    predictions: List[PredictionResponse]
    total_latency_ms: float
 
 
# ===== Prediction endpoints =====
 
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    """Single prediction endpoint."""
    start_time = time.time()
    
    try:
        # Convert to DataFrame (matching training format)
        df = pd.DataFrame([request.dict()])
        
        # Get prediction
        probability = pipeline.predict_proba(df)[0, 1]
        prediction = "churn" if probability > 0.5 else "no_churn"
        
        latency_ms = (time.time() - start_time) * 1000
        
        return PredictionResponse(
            probability=float(probability),
            prediction=prediction,
            latency_ms=latency_ms,
            model_version=os.getenv("MODEL_VERSION", "1.0.0")
        )
    
    except Exception as e:
        logger.error(f"Prediction failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))
 
 
@app.post("/predict/batch", response_model=BatchResponse)
async def predict_batch(request: BatchRequest):
    """Batch prediction endpoint for higher throughput."""
    start_time = time.time()
    
    try:
        # Convert all instances to DataFrame
        df = pd.DataFrame([inst.dict() for inst in request.instances])
        
        # Batch prediction (more efficient than individual)
        probabilities = pipeline.predict_proba(df)[:, 1]
        
        # Build responses
        predictions = []
        for prob in probabilities:
            predictions.append(PredictionResponse(
                probability=float(prob),
                prediction="churn" if prob > 0.5 else "no_churn",
                latency_ms=0,  # Individual latency not meaningful for batch
                model_version=os.getenv("MODEL_VERSION", "1.0.0")
            ))
        
        total_latency_ms = (time.time() - start_time) * 1000
        
        return BatchResponse(
            predictions=predictions,
            total_latency_ms=total_latency_ms
        )
    
    except Exception as e:
        logger.error(f"Batch prediction failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))
 
 
# ===== Health check endpoints =====
 
@app.get("/health")
async def health():
    """Kubernetes liveness probe."""
    return {"status": "healthy"}
 
@app.get("/ready")
async def ready():
    """Kubernetes readiness probe."""
    if pipeline is None:
        raise HTTPException(status_code=503, detail="Model not loaded")
    return {"status": "ready"}
 
 
# Run with: uvicorn app:app --host 0.0.0.0 --port 8080

Production Considerations:

Input Validation: Pydantic validates inputs before they reach the model. Invalid requests return 422 with details.
Error Handling: Catch exceptions with informative error messages. Never expose internal errors to clients.
Health Checks: /health for liveness (is the process running?), /ready for readiness (can it serve traffic?).
Batch Endpoint: Processing multiple instances in one call is more efficient than multiple single calls.
Logging: Log predictions, latencies, and errors for debugging and monitoring.

Framework Comparison

Containerization with Docker

Containers provide a reproducible, portable environment for your model. Docker encapsulates the Python runtime, dependencies, and model artifact into a single deployable image:

Dockerfile
dockerfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Production Dockerfile for sklearn prediction service
 
# ===== Stage 1: Build environment =====
FROM python:3.10-slim as builder
 
WORKDIR /app
 
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*
 
# Copy and install requirements
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip \
    && pip wheel --no-cache-dir --wheel-dir=/app/wheels -r requirements.txt
 
 
# ===== Stage 2: Runtime environment =====
FROM python:3.10-slim
 
# Security: Run as non-root user
RUN useradd --create-home --shell /bin/bash mluser
WORKDIR /home/mluser/app
 
# Install wheels from builder stage
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache-dir /wheels/* \
    && rm -rf /wheels
 
# Copy application code
COPY --chown=mluser:mluser app.py .
COPY --chown=mluser:mluser model.pkl .
 
# Environment configuration
ENV MODEL_PATH=/home/mluser/app/model.pkl
ENV MODEL_VERSION=1.0.0
ENV PYTHONUNBUFFERED=1
ENV PORT=8080
 
# Switch to non-root user
USER mluser
 
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:${PORT}/health || exit 1
 
# Expose port
EXPOSE ${PORT}
 
# Run with gunicorn for production
CMD ["gunicorn", "app:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8080", "--timeout", "30", "--graceful-timeout", "10"]

requirements.txt

text

# requirements.txt - Pin versions for reproducibility
scikit-learn==1.3.0
joblib==1.3.1
numpy==1.24.3
pandas==2.0.3
fastapi==0.100.0
uvicorn[standard]==0.23.1
gunicorn==21.2.0
pydantic==2.0.3

docker_commands.sh
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Build image
docker build -t ml-prediction-service:1.0.0 .
 
# Run locally
docker run -p 8080:8080 \
    -e MODEL_VERSION=1.0.0 \
    ml-prediction-service:1.0.0
 
# Run with model mounted from host (for testing different models)
docker run -p 8080:8080 \
    -v $(pwd)/models/latest.pkl:/home/mluser/app/model.pkl:ro \
    ml-prediction-service:1.0.0
 
# Test the API
curl -X POST http://localhost:8080/predict \
    -H "Content-Type: application/json" \
    -d '{"age": 35, "income": 75000, "tenure_months": 24, "gender": "M", "region": "West"}'

Docker Best Practices for ML

•Multi-stage builds — Separate build and runtime reduces image size by ~50%
•Pin dependency versions — Prevents 'works on my machine' issues from version drift
•Non-root user — Security best practice; containers shouldn't run as root
•HEALTHCHECK directive — Enables container orchestrators to detect unhealthy instances
•Gunicorn for production — Uvicorn alone is single-process; Gunicorn manages workers

Image Size Matters

Batch Processing

Not all predictions need to be real-time. Batch processing is often more efficient for offline analytics, periodic scoring, and large-scale transformations:

batch_processing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# batch_predict.py - Scalable batch prediction pipeline
 
import joblib
import pandas as pd
import numpy as np
from pathlib import Path
import logging
from datetime import datetime
import argparse
from concurrent.futures import ProcessPoolExecutor
import pyarrow.parquet as pq
 
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
 
 
def load_pipeline(model_path: str):
    """Load the trained pipeline."""
    logger.info(f"Loading model from {model_path}")
    return joblib.load(model_path)
 
 
def process_chunk(args):
    """Process a single chunk of data. Worker function for multiprocessing."""
    chunk_path, model_path, output_path = args
    
    # Each worker loads its own copy of the model (fork-safe)
    pipeline = joblib.load(model_path)
    
    # Load chunk
    df = pd.read_parquet(chunk_path)
    
    # Predict
    probabilities = pipeline.predict_proba(df)[:, 1]
    predictions = (probabilities > 0.5).astype(int)
    
    # Add predictions to dataframe
    df['probability'] = probabilities
    df['prediction'] = predictions
    df['scored_at'] = datetime.utcnow().isoformat()
    
    # Save results
    df.to_parquet(output_path)
    
    return len(df)
 
 
class BatchPredictor:
    """Scalable batch prediction with chunking and parallelization."""
    
    def __init__(self, model_path: str, chunk_size: int = 100000):
        self.model_path = model_path
        self.chunk_size = chunk_size
        self.pipeline = load_pipeline(model_path)
    
    def predict_file(
        self, 
        input_path: str, 
        output_path: str,
        n_workers: int = 4
    ):
        """Process a single file with optional parallelization."""
        
        logger.info(f"Processing {input_path}")
        start_time = datetime.now()
        
        # For smaller files, process in memory
        df = pd.read_parquet(input_path)
        
        if len(df) <= self.chunk_size:
            # Single-threaded for small files
            probabilities = self.pipeline.predict_proba(df)[:, 1]
            df['probability'] = probabilities
            df['prediction'] = (probabilities > 0.5).astype(int)
            df['scored_at'] = datetime.utcnow().isoformat()
            df.to_parquet(output_path)
            
        else:
            # Parallel processing for large files
            self._parallel_predict(df, output_path, n_workers)
        
        elapsed = (datetime.now() - start_time).total_seconds()
        throughput = len(df) / elapsed
        
        logger.info(
            f"Completed: {len(df):,} rows in {elapsed:.1f}s "
            f"({throughput:,.0f} rows/sec)"
        )
    
    def _parallel_predict(self, df: pd.DataFrame, output_path: str, n_workers: int):
        """Split dataframe and process in parallel."""
        
        # Create temp directory for chunks
        temp_dir = Path(output_path).parent / '.temp_chunks'
        temp_dir.mkdir(exist_ok=True)
        
        # Split into chunks
        chunks = np.array_split(df, n_workers * 4)  # Oversplit for better load balancing
        
        chunk_args = []
        for i, chunk in enumerate(chunks):
            chunk_path = temp_dir / f'chunk_{i}.parquet'
            result_path = temp_dir / f'result_{i}.parquet'
            chunk.to_parquet(chunk_path)
            chunk_args.append((str(chunk_path), self.model_path, str(result_path)))
        
        # Process in parallel
        with ProcessPoolExecutor(max_workers=n_workers) as executor:
            results = list(executor.map(process_chunk, chunk_args))
        
        # Merge results
        result_files = sorted(temp_dir.glob('result_*.parquet'))
        result_dfs = [pd.read_parquet(f) for f in result_files]
        combined = pd.concat(result_dfs, ignore_index=True)
        combined.to_parquet(output_path)
        
        # Cleanup
        import shutil
        shutil.rmtree(temp_dir)
        
        logger.info(f"Processed {sum(results):,} total rows")
 
 
def main():
    parser = argparse.ArgumentParser(description='Batch prediction')
    parser.add_argument('--model', required=True, help='Path to model file')
    parser.add_argument('--input', required=True, help='Input parquet file/directory')
    parser.add_argument('--output', required=True, help='Output path')
    parser.add_argument('--workers', type=int, default=4, help='Number of workers')
    
    args = parser.parse_args()
    
    predictor = BatchPredictor(args.model)
    predictor.predict_file(args.input, args.output, args.workers)
 
 
if __name__ == '__main__':
    main()
 
 
# Run with:
# python batch_predict.py --model model.pkl --input data.parquet --output scored.parquet

Batch Processing at Scale:

For truly large datasets, consider distributed frameworks:

Framework	Best For	Notes
Spark + spark-sklearn	100GB+ datasets	Distributed across cluster
Dask	Medium-large datasets	Python-native, easier than Spark
Ray	Parallel Python workloads	Good for ML, supports distributed sklearn
AWS Batch / Airflow	Orchestrated jobs	For scheduled batch pipelines

Batch vs. Micro-batch

Monitoring and Observability

monitoring.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# monitoring.py - ML-specific observability
 
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import FastAPI, Response
import numpy as np
from datetime import datetime
import logging
 
# ===== Prometheus Metrics =====
 
# Request metrics
PREDICTION_COUNT = Counter(
    'predictions_total',
    'Total number of predictions',
    ['model_version', 'prediction_class']
)
 
PREDICTION_LATENCY = Histogram(
    'prediction_latency_seconds',
    'Prediction latency in seconds',
    ['model_version'],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0]
)
 
PREDICTION_ERRORS = Counter(
    'prediction_errors_total',
    'Total prediction errors',
    ['model_version', 'error_type']
)
 
# Model-specific metrics
PROBABILITY_DISTRIBUTION = Histogram(
    'prediction_probability',
    'Distribution of predicted probabilities',
    ['model_version'],
    buckets=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
)
 
FEATURE_VALUES = Histogram(
    'feature_value',
    'Distribution of input feature values',
    ['feature_name'],
    buckets=[-3, -2, -1, 0, 1, 2, 3]  # For normalized features
)
 
# Data quality metrics
MISSING_VALUES = Counter(
    'missing_values_total',
    'Count of missing values in inputs',
    ['feature_name']
)
 
 
class PredictionMonitor:
    """Monitor predictions for drift and anomalies."""
    
    def __init__(self, model_version: str):
        self.model_version = model_version
        self.logger = logging.getLogger(__name__)
        
        # Rolling statistics for drift detection
        self.recent_probabilities = []
        self.window_size = 1000
    
    def record_prediction(self, features: dict, probability: float, prediction: str):
        """Record metrics for a single prediction."""
        
        # Count predictions by class
        PREDICTION_COUNT.labels(
            model_version=self.model_version,
            prediction_class=prediction
        ).inc()
        
        # Record probability distribution
        PROBABILITY_DISTRIBUTION.labels(
            model_version=self.model_version
        ).observe(probability)
        
        # Track feature distributions for drift detection
        for feature_name, value in features.items():
            if value is not None and not np.isnan(value):
                try:
                    FEATURE_VALUES.labels(feature_name=feature_name).observe(float(value))
                except:
                    pass
            else:
                MISSING_VALUES.labels(feature_name=feature_name).inc()
        
        # Update rolling statistics
        self.recent_probabilities.append(probability)
        if len(self.recent_probabilities) > self.window_size:
            self.recent_probabilities.pop(0)
        
        # Check for drift
        self._check_prediction_drift()
    
    def record_latency(self, latency_seconds: float):
        """Record prediction latency."""
        PREDICTION_LATENCY.labels(model_version=self.model_version).observe(latency_seconds)
    
    def record_error(self, error_type: str):
        """Record prediction errors."""
        PREDICTION_ERRORS.labels(
            model_version=self.model_version,
            error_type=error_type
        ).inc()
    
    def _check_prediction_drift(self):
        """Simple drift detection based on prediction distribution shift."""
        if len(self.recent_probabilities) < self.window_size:
            return
        
        # Compare recent mean to expected baseline
        recent_mean = np.mean(self.recent_probabilities)
        expected_mean = 0.3  # Baseline from training
        
        # Alert if mean shifts significantly
        if abs(recent_mean - expected_mean) > 0.1:
            self.logger.warning(
                f"Prediction drift detected! "
                f"Recent mean: {recent_mean:.3f}, Expected: {expected_mean:.3f}"
            )
 
 
# Metrics endpoint
app = FastAPI()
monitor = PredictionMonitor(model_version="1.0.0")
 
@app.get("/metrics")
async def metrics():
    """Prometheus metrics endpoint."""
    return Response(
        content=generate_latest(),
        media_type="text/plain"
    )

Essential ML Metrics to Monitor

•Request Latency — P50, P95, P99 latencies. Alert on degradation.
•Error Rate — Prediction failures, invalid inputs, timeouts.
•Prediction Distribution — Shift in predicted classes indicates drift.
•Feature Distributions — Input feature statistics changing suggest data drift.
•Missing Value Rate — Increase in nulls may indicate upstream pipeline issues.
•Model Load Time — Slow model loading affects startup and scaling.
•Memory Usage — Large models can cause OOM; monitor heap and RSS.

The Ground Truth Delay Problem

Handling Feature Transformation in Production

Feature transformation in production introduces unique challenges. The Pipeline abstraction helps, but production environments stress the system in ways development never does:

production_transformation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# production_transformers.py - Production-hardened transformations
 
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from typing import Dict, List, Optional
import logging
 
logger = logging.getLogger(__name__)
 
 
class RobustColumnValidator(BaseEstimator, TransformerMixin):
    """
    Validates input data matches expected schema before transformation.
    
    Production systems receive malformed data. This transformer
    catches issues early with informative errors.
    """
    
    def __init__(self, expected_columns: List[str], expected_dtypes: Dict[str, str] = None):
        self.expected_columns = expected_columns
        self.expected_dtypes = expected_dtypes or {}
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        if not isinstance(X, pd.DataFrame):
            raise TypeError(f"Expected DataFrame, got {type(X).__name__}")
        
        # Check for missing columns
        missing = set(self.expected_columns) - set(X.columns)
        if missing:
            raise ValueError(f"Missing required columns: {missing}")
        
        # Check for unexpected columns
        extra = set(X.columns) - set(self.expected_columns)
        if extra:
            logger.warning(f"Unexpected columns will be ignored: {extra}")
            X = X[self.expected_columns].copy()
        
        # Validate dtypes
        for col, expected_dtype in self.expected_dtypes.items():
            actual_dtype = str(X[col].dtype)
            if expected_dtype not in actual_dtype:
                logger.warning(
                    f"Column {col} has dtype {actual_dtype}, expected {expected_dtype}"
                )
        
        return X
 
 
class SafeImputer(BaseEstimator, TransformerMixin):
    """
    Imputer with production safety features:
    - Logs high missing rates
    - Handles unexpected nulls in columns that didn't have nulls during training
    - Tracks imputation statistics
    """
    
    def __init__(self, strategy: str = 'median', threshold_warning: float = 0.1):
        self.strategy = strategy
        self.threshold_warning = threshold_warning
    
    def fit(self, X, y=None):
        X = np.asarray(X)
        if self.strategy == 'median':
            self.fill_values_ = np.nanmedian(X, axis=0)
        elif self.strategy == 'mean':
            self.fill_values_ = np.nanmean(X, axis=0)
        
        # Track training missing rates
        self.training_missing_rate_ = np.mean(np.isnan(X), axis=0)
        
        return self
    
    def transform(self, X):
        X = np.asarray(X).copy()
        
        # Check missing rates
        missing_rates = np.mean(np.isnan(X), axis=0)
        
        for i, (train_rate, current_rate) in enumerate(
            zip(self.training_missing_rate_, missing_rates)
        ):
            if current_rate > train_rate + self.threshold_warning:
                logger.warning(
                    f"Column {i}: missing rate {current_rate:.1%} "
                    f"exceeds training rate {train_rate:.1%} by >{self.threshold_warning:.0%}"
                )
        
        # Track imputation count for monitoring
        self.n_imputed_ = int(np.sum(np.isnan(X)))
        
        # Perform imputation
        for i in range(X.shape[1]):
            mask = np.isnan(X[:, i])
            X[mask, i] = self.fill_values_[i]
        
        return X
 
 
class GracefulCategoryEncoder(BaseEstimator, TransformerMixin):
    """
    Category encoder that handles unknown categories gracefully
    with configurable fallback behavior.
    """
    
    def __init__(self, unknown_handling: str = 'zero'):
        """
        unknown_handling: 'zero' (all zeros), 'most_common' (map to most frequent), 
                         'error' (raise exception)
        """
        self.unknown_handling = unknown_handling
    
    def fit(self, X, y=None):
        X = np.asarray(X).ravel()
        self.categories_ = sorted(set(X))
        self.category_to_idx_ = {cat: i for i, cat in enumerate(self.categories_)}
        
        # Track most common for fallback
        unique, counts = np.unique(X, return_counts=True)
        self.most_common_ = unique[np.argmax(counts)]
        
        return self
    
    def transform(self, X):
        X = np.asarray(X).ravel()
        n_categories = len(self.categories_)
        result = np.zeros((len(X), n_categories))
        
        unknown_count = 0
        
        for i, val in enumerate(X):
            if val in self.category_to_idx_:
                result[i, self.category_to_idx_[val]] = 1.0
            else:
                unknown_count += 1
                if self.unknown_handling == 'zero':
                    pass  # Already zeros
                elif self.unknown_handling == 'most_common':
                    result[i, self.category_to_idx_[self.most_common_]] = 1.0
                elif self.unknown_handling == 'error':
                    raise ValueError(f"Unknown category: {val}")
        
        if unknown_count > 0:
            logger.info(f"Encoded {unknown_count} unknown categories as '{self.unknown_handling}'")
        
        self.n_unknown_ = unknown_count
        return result
 
 
# Full production pipeline with validation
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
 
def create_production_pipeline(model, feature_schema: Dict):
    """
    Creates a production-ready pipeline with validation and monitoring.
    """
    
    numeric_features = feature_schema['numeric']
    categorical_features = feature_schema['categorical']
    
    # Input validation step
    validator = RobustColumnValidator(
        expected_columns=numeric_features + categorical_features,
        expected_dtypes={col: 'float' for col in numeric_features}
    )
    
    # Safe transformers
    numeric_transformer = Pipeline([
        ('impute', SafeImputer(strategy='median')),
        ('scale', StandardScaler())
    ])
    
    categorical_transformer = Pipeline([
        ('encode', GracefulCategoryEncoder(unknown_handling='zero'))
    ])
    
    preprocessor = ColumnTransformer([
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])
    
    # Full pipeline
    return Pipeline([
        ('validate', validator),
        ('preprocess', preprocessor),
        ('model', model)
    ])

Fail Fast, Fail Informatively

Common Production Failures

Understanding common failure modes helps you build more robust systems. These issues rarely appear in development but strike reliably in production:

Common Production Failure Modes

•Memory exhaustion — Large batch requests or accumulated memory leaks cause OOM kills. Set memory limits, monitor heap growth, restart workers periodically.
•Cold start latency — First request after deployment is slow while models load. Use warm-up endpoints and pre-loading strategies.
•Upstream data changes — A column is renamed, a categorical gains new values, or format changes. Input validation catches these early.
•Silent numerical issues — Float precision differences between numpy versions, or inf/NaN propagation. Log and validate outputs.
•Serialization desynchronization — Model was trained with feature order A,B,C but production sends B,A,C. Use named columns, not positional.
•Timezone and locale issues — Datetime parsing or string formatting differs between training and production environments.
•Dependency version conflicts — Production has different numpy/scipy/sklearn versions than training. Pin versions and verify.

failure_handling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# Defensive prediction wrapper
 
import numpy as np
from typing import Dict, Any, Optional
import traceback
import logging
 
logger = logging.getLogger(__name__)
 
class SafePredictor:
    """
    Wrapper that handles errors gracefully and provides fallback behavior.
    """
    
    def __init__(
        self, 
        pipeline,
        fallback_prediction: float = 0.5,
        timeout_seconds: float = 5.0
    ):
        self.pipeline = pipeline
        self.fallback_prediction = fallback_prediction
        self.timeout_seconds = timeout_seconds
    
    def predict(self, X) -> Dict[str, Any]:
        """
        Make prediction with error handling and fallback.
        """
        result = {
            'success': True,
            'prediction': None,
            'probability': None,
            'used_fallback': False,
            'error': None
        }
        
        try:
            # Validate input
            self._validate_input(X)
            
            # Make prediction
            probability = self.pipeline.predict_proba(X)[0, 1]
            
            # Validate output
            if not self._validate_output(probability):
                raise ValueError(f"Invalid probability: {probability}")
            
            result['probability'] = float(probability)
            result['prediction'] = 'positive' if probability > 0.5 else 'negative'
            
        except Exception as e:
            logger.error(f"Prediction failed: {e}\n{traceback.format_exc()}")
            
            result['success'] = False
            result['error'] = str(e)
            result['used_fallback'] = True
            result['probability'] = self.fallback_prediction
            result['prediction'] = 'unknown'
        
        return result
    
    def _validate_input(self, X):
        """Validate input data."""
        if X is None or len(X) == 0:
            raise ValueError("Empty input")
        
        # Check for excessive missing values
        if hasattr(X, 'isnull'):
            missing_rate = X.isnull().mean().mean()
            if missing_rate > 0.5:
                raise ValueError(f"Too many missing values: {missing_rate:.1%}")
    
    def _validate_output(self, probability):
        """Validate prediction is sensible."""
        if probability is None:
            return False
        if np.isnan(probability) or np.isinf(probability):
            return False
        if probability < 0 or probability > 1:
            return False
        return True
 
 
# Circuit breaker pattern for cascading failures
from datetime import datetime, timedelta
from threading import Lock
 
class CircuitBreaker:
    """
    Prevents cascading failures by short-circuiting when errors exceed threshold.
    """
    
    def __init__(
        self, 
        failure_threshold: int = 5,
        recovery_timeout: timedelta = timedelta(seconds=60)
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failures = 0
        self.last_failure_time = None
        self.state = 'closed'  # closed, open, half-open
        self.lock = Lock()
    
    def can_execute(self) -> bool:
        with self.lock:
            if self.state == 'closed':
                return True
            
            if self.state == 'open':
                # Check if recovery timeout has passed
                if datetime.now() - self.last_failure_time > self.recovery_timeout:
                    self.state = 'half-open'
                    return True
                return False
            
            if self.state == 'half-open':
                return True  # Allow one request through
        
        return False
    
    def record_success(self):
        with self.lock:
            self.failures = 0
            self.state = 'closed'
    
    def record_failure(self):
        with self.lock:
            self.failures += 1
            self.last_failure_time = datetime.now()
            
            if self.failures >= self.failure_threshold:
                self.state = 'open'
                logger.warning("Circuit breaker opened!")

Graceful Degradation

Summary: Production Deployment

Deploying feature transformation pipelines requires more than just wrapping a model in an API. Let's consolidate the key insights from this page and the entire module:

Production Deployment Key Takeaways

•Choose the right deployment pattern — REST API for real-time, batch for offline, streaming for near-real-time. Start simple.
•Containerize for reproducibility — Docker ensures the same environment runs everywhere. Pin dependency versions.
•Monitor beyond traditional metrics — Track prediction distributions, feature distributions, and latency for ML-specific issues.
•Validate inputs aggressively — Production data is messy. Catch schema violations and data quality issues early.
•Handle failures gracefully — Use fallbacks, circuit breakers, and defensive coding. Never crash on bad input.
•Plan for cold starts — Preload models, use health checks, and implement warm-up strategies.

Module Summary: Feature Transformation Pipelines

•Pipelines solve train-serving skew — Encapsulating preprocessing in a single object ensures consistency between training and inference.
•ColumnTransformer handles heterogeneous data — Route different feature types to appropriate transformers automatically.
•Custom transformers extend capabilities — Implement the estimator contract to integrate domain logic with sklearn's ecosystem.
•Serialization requires care — Use joblib, track versions, handle custom code, and consider security.
•Production is a different world — Containerize, monitor, validate, and handle failures. The model is the easy part.

Module Complete!

Module Complete: Feature Transformation Pipelines

5 / 5