Feature Stores - Learning Module

Loading content...

0/245

Data Consistency

The Silent Model Killer

In 2021, a major ride-sharing company experienced a perplexing incident: their pricing model suddenly started generating abnormally high surge prices in areas with low demand. The root cause? A timezone bug in an upstream data pipeline had shifted all location timestamps by 8 hours. Features based on 'current traffic conditions' were actually using data from hours in the past—or future.

The model itself was unchanged. The inference pipeline was working correctly. But inconsistent data flowing through the feature store silently corrupted every prediction. It took 3 days to identify the issue, costing millions in refunds and customer trust.

Data consistency isn't just a nice-to-have—it's the foundation upon which all ML predictions rest. Bad data is worse than no data, because it produces confident but wrong predictions.

What You Will Learn

This page provides a comprehensive exploration of data consistency in feature stores. You'll understand consistency models, learn to detect and prevent data quality issues, implement validation pipelines, and build monitoring systems that catch problems before they affect predictions.

What Is Data Consistency?

In the context of feature stores, data consistency encompasses multiple dimensions of correctness and reliability. Understanding these dimensions is essential for designing robust feature systems.

Dimensions of Feature Data Consistency

•Temporal Consistency — Features reflect the correct point in time. A feature computed for January 1st should use only data available on January 1st, not data from January 2nd.
•Cross-Store Consistency — Online and offline stores contain semantically equivalent values. Training features match serving features for the same entity and time.
•Schema Consistency — Feature types, ranges, and formats remain stable across updates. An integer feature doesn't suddenly become a string.
•Semantic Consistency — Feature meanings remain stable. 'purchase_count_30d' always means the same calculation, not subtly different versions.
•Referential Consistency — Entity keys correctly join across feature views. User ID 1001 in one feature view refers to the same user in another.
•Completeness Consistency — Features that should exist do exist. Missing values are intentional (no data) not accidental (pipeline failures).

Consistency Failures and Their Impact
Consistency Type	Example Failure	Impact on Models
Temporal	Using future data in training features (data leakage)	Model appears accurate in training, fails in production
Cross-Store	Online store lags offline by days instead of hours	Model trained on fresh data serves stale features
Schema	Feature changed from float to int, truncating decimals	Loss of precision silently degrades predictions
Semantic	Feature calculation quietly changed without versioning	Model behavior changes without retraining
Referential	Entity key collision between old and new ID systems	Features assigned to wrong entities
Completeness	Batch job failed, leaving features missing	Models serve with nulls or stale fallback values

The Silent Failure Pattern

Data consistency failures are particularly dangerous because they often don't cause errors—they cause wrong predictions. The model confidently returns an answer, but it's the wrong answer. Without explicit monitoring, these failures can persist for weeks or months.

Consistency Between Stores

The dual-store architecture (online/offline) introduces consistency challenges. Ensuring that both stores return equivalent values is critical for eliminating training-serving skew.

Cross-store validation compares samples from online and offline stores to detect drift. This catches cases where materialization or synchronization has failed.

cross_store_validation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from feast import FeatureStore
 
def validate_cross_store_consistency(store: FeatureStore, feature_view: str, 
                                      sample_size: int = 1000) -> dict:
    """
    Validate that online and offline stores contain consistent values.
    Samples entities and compares feature values from both stores.
    """
    fv = store.get_feature_view(feature_view)
    features = [f"{feature_view}:{f.name}" for f in fv.schema]
    
    # Get sample of entity keys (would come from entity registry in production)
    sample_entities = get_sample_entity_keys(fv.entities[0].name, sample_size)
    
    # Query online store (current values)
    online_result = store.get_online_features(
        features=features,
        entity_rows=[{fv.entities[0].join_keys[0]: eid} for eid in sample_entities]
    ).to_df()
    
    # Query offline store for same entities at current time
    entity_df = pd.DataFrame({
        fv.entities[0].join_keys[0]: sample_entities,
        "event_timestamp": [datetime.now()] * len(sample_entities)
    })
    
    offline_result = store.get_historical_features(
        entity_df=entity_df,
        features=features
    ).to_df()
    
    # Compare values
    discrepancies = []
    for feature in features:
        feature_name = feature.split(":")[1]
        online_values = online_result[feature_name].values
        offline_values = offline_result[feature_name].values
        
        # For numeric features, check approximate equality
        if np.issubdtype(online_values.dtype, np.number):
            matches = np.allclose(
                online_values, offline_values, 
                rtol=1e-5, equal_nan=True
            )
            mismatch_count = (~np.isclose(
                online_values, offline_values, 
                rtol=1e-5, equal_nan=True
            )).sum()
        else:
            matches = all(online_values == offline_values)
            mismatch_count = (online_values != offline_values).sum()
        
        if not matches:
            discrepancies.append({
                'feature': feature,
                'mismatch_count': int(mismatch_count),
                'mismatch_rate': mismatch_count / len(sample_entities),
            })
    
    return {
        'feature_view': feature_view,
        'sample_size': sample_size,
        'validation_time': datetime.now().isoformat(),
        'consistent': len(discrepancies) == 0,
        'discrepancies': discrepancies,
    }

Data Quality Validation

Beyond store consistency, feature data must meet quality expectations. Data quality validation catches issues in the data itself—nulls, outliers, incorrect values, and schema violations.

Data Quality Dimensions

•Completeness — What percentage of values are non-null? Are nulls expected or indicative of pipeline failures?
•Validity — Do values fall within expected ranges? Is a percentage between 0 and 100? Is an age between 0 and 150?
•Uniqueness — For entity keys, are they truly unique? Are there duplicate entries that could cause join errors?
•Consistency — Do related features make sense together? If 'total_amount' exists, does it equal 'quantity' × 'unit_price'?
•Timeliness — How old is the data? Are timestamps within expected recency windows?
•Accuracy — Does the data match ground truth? Can you validate samples against authoritative sources?

data_quality_checks.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
from dataclasses import dataclass
from typing import List, Optional, Callable
import pandas as pd
import numpy as np
 
@dataclass
class QualityCheck:
    name: str
    check_fn: Callable[[pd.Series], bool]
    severity: str  # 'error', 'warning', 'info'
    description: str
 
class FeatureQualityValidator:
    def __init__(self):
        self.checks: dict[str, List[QualityCheck]] = {}
    
    def add_check(self, feature: str, check: QualityCheck):
        if feature not in self.checks:
            self.checks[feature] = []
        self.checks[feature].append(check)
    
    def validate(self, df: pd.DataFrame, feature: str) -> List[dict]:
        """Run all checks for a feature and return results"""
        if feature not in self.checks:
            return []
        
        results = []
        series = df[feature]
        
        for check in self.checks[feature]:
            try:
                passed = check.check_fn(series)
                results.append({
                    'check': check.name,
                    'passed': passed,
                    'severity': check.severity,
                    'description': check.description,
                })
            except Exception as e:
                results.append({
                    'check': check.name,
                    'passed': False,
                    'severity': 'error',
                    'description': f"Check failed with error: {str(e)}",
                })
        
        return results
 
# Define quality checks for user features
validator = FeatureQualityValidator()
 
# Completeness checks
validator.add_check("total_purchases_30d", QualityCheck(
    name="null_rate_below_threshold",
    check_fn=lambda s: s.isna().mean() < 0.05,  # <5% nulls
    severity="error",
    description="Null rate should be below 5%"
))
 
# Range checks
validator.add_check("total_purchases_30d", QualityCheck(
    name="non_negative",
    check_fn=lambda s: (s.dropna() >= 0).all(),
    severity="error",
    description="Purchase count should never be negative"
))
 
validator.add_check("total_purchases_30d", QualityCheck(
    name="reasonable_maximum",
    check_fn=lambda s: (s.dropna() <= 10000).all(),
    severity="warning",
    description="Purchase count above 10,000 is suspicious"
))
 
# Distribution checks
validator.add_check("avg_purchase_amount_30d", QualityCheck(
    name="mean_in_expected_range",
    check_fn=lambda s: 10 <= s.dropna().mean() <= 500,
    severity="warning",
    description="Average purchase amount should be $10-$500"
))
 
# Run validation
def run_feature_validation(store, feature_view: str) -> dict:
    """Run all quality checks for a feature view"""
    
    # Get recent feature data
    fv = store.get_feature_view(feature_view)
    df = get_recent_feature_data(feature_view)  # Implementation depends on store
    
    all_results = {}
    for feature in [f.name for f in fv.schema]:
        results = validator.validate(df, feature)
        all_results[feature] = results
        
        # Alert on failures
        errors = [r for r in results if not r['passed'] and r['severity'] == 'error']
        if errors:
            alert_data_quality_error(feature_view, feature, errors)
    
    return all_results

Great Expectations Integration

For comprehensive data quality validation, consider integrating tools like Great Expectations. It provides a rich library of validators, automated documentation, and integration with data pipelines. Define expectations once and run them at each pipeline stage.

Schema Evolution

Feature schemas evolve over time—new features are added, types are refined, deprecated features are removed. Managing schema evolution without breaking downstream consumers is critical for production stability.

Schema Change Categories
Change Type	Examples	Breaking?	Mitigation
Additive	Adding new feature columns	No	Safe, just add
Type Widening	int32 → int64, float → double	Usually No	Verify downstream compatibility
Type Narrowing	float → int, string → enum	Yes	Create new version
Rename	user_id → customer_id	Yes	Add alias, deprecate old
Remove	Dropping columns	Yes	Deprecation period required
Semantic	Changing calculation logic	Yes	New version required

schema_evolution.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
from feast import FeatureView, Field
from feast.types import Float64, Int64, String
from datetime import timedelta
 
# Original schema (v1)
user_features_v1 = FeatureView(
    name="user_features_v1",
    entities=[user],
    schema=[
        Field(name="purchase_count", dtype=Int64),
        Field(name="total_amount", dtype=Float64),
    ],
    source=user_source,
    online=True,
)
 
# Safe evolution: Add new fields (non-breaking)
user_features_v1_extended = FeatureView(
    name="user_features_v1",  # Same name - this is additive
    entities=[user],
    schema=[
        Field(name="purchase_count", dtype=Int64),
        Field(name="total_amount", dtype=Float64),
        # NEW FIELDS - safe to add
        Field(name="avg_amount", dtype=Float64),
        Field(name="last_purchase_date", dtype=String),
    ],
    source=user_source,
    online=True,
)
 
# Breaking change: Create new version
user_features_v2 = FeatureView(
    name="user_features_v2",  # NEW NAME for breaking changes
    description="""
    BREAKING CHANGES FROM V1:
    - purchase_count renamed to order_count (semantic clarity)
    - total_amount now includes tax (was excluding)
    - added new fields from v1 extension
    
    MIGRATION: See migration/v1_to_v2.md
    DEPRECATION: v1 will be removed on 2024-06-01
    """,
    entities=[user],
    schema=[
        Field(name="order_count", dtype=Int64),  # Renamed
        Field(name="total_amount_with_tax", dtype=Float64),  # Semantic change
        Field(name="avg_amount", dtype=Float64),
        Field(name="last_purchase_date", dtype=String),
    ],
    source=user_source_v2,
    online=True,
    tags={"version": "2.0", "deprecates": "user_features_v1"},
)
 
# Schema validation in CI/CD
def validate_schema_compatibility(old_fv, new_fv) -> dict:
    """Check if schema changes are backward compatible"""
    old_fields = {f.name: f.dtype for f in old_fv.schema}
    new_fields = {f.name: f.dtype for f in new_fv.schema}
    
    issues = []
    
    # Check for removed fields (breaking)
    removed = set(old_fields.keys()) - set(new_fields.keys())
    if removed:
        issues.append(f"BREAKING: Removed fields: {removed}")
    
    # Check for type changes (potentially breaking)
    for name in old_fields:
        if name in new_fields and old_fields[name] != new_fields[name]:
            issues.append(f"BREAKING: Type changed for {name}: {old_fields[name]} → {new_fields[name]}")
    
    # Check for additions (safe)
    added = set(new_fields.keys()) - set(old_fields.keys())
    
    return {
        'compatible': len(issues) == 0,
        'issues': issues,
        'added_fields': list(added),
        'removed_fields': list(removed),
    }

The Deprecation Protocol

Never remove features without a deprecation period. Announce deprecation, provide migration guidance, monitor for continued usage, and only remove after confirmed zero usage. A minimum 30-day deprecation window is recommended.

Feature Drift Detection

Feature drift occurs when the statistical properties of features change over time. This can degrade model performance even when the model itself is unchanged. Detecting drift early allows for proactive intervention.

Types of Feature Drift

•Covariate Shift — The distribution of input features changes while the relationship between features and target remains the same. Example: Customer demographics shift due to new marketing campaigns.
•Concept Drift — The relationship between features and target changes. Example: 'High purchase frequency' now indicates loyal customers, not fraud, due to market changes.
•Data Drift — Upstream data changes in format, meaning, or availability. Example: A partner API changes its response format.
•Label Drift — The distribution of target variables changes. Example: Fraud rates increase due to new attack vectors.

drift_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
import numpy as np
from scipy import stats
from typing import Tuple
import pandas as pd
 
class FeatureDriftDetector:
    """
    Detect statistical drift in feature distributions.
    Compares current distribution to a reference (training) distribution.
    """
    
    def __init__(self, reference_data: pd.DataFrame):
        self.reference = reference_data
        self.reference_stats = self._compute_stats(reference_data)
    
    def _compute_stats(self, df: pd.DataFrame) -> dict:
        """Compute summary statistics for each column"""
        stats = {}
        for col in df.columns:
            if df[col].dtype in ['int64', 'float64']:
                stats[col] = {
                    'mean': df[col].mean(),
                    'std': df[col].std(),
                    'min': df[col].min(),
                    'max': df[col].max(),
                    'percentiles': df[col].quantile([0.25, 0.5, 0.75]).tolist(),
                    'histogram': np.histogram(df[col].dropna(), bins=20),
                }
        return stats
    
    def detect_drift_ks_test(self, current_data: pd.DataFrame, 
                              feature: str, threshold: float = 0.05) -> dict:
        """
        Use Kolmogorov-Smirnov test to detect distribution shift.
        Returns test statistic and whether drift is detected.
        """
        ref = self.reference[feature].dropna()
        curr = current_data[feature].dropna()
        
        statistic, p_value = stats.ks_2samp(ref, curr)
        
        return {
            'feature': feature,
            'test': 'kolmogorov_smirnov',
            'statistic': statistic,
            'p_value': p_value,
            'drift_detected': p_value < threshold,
            'threshold': threshold,
        }
    
    def detect_drift_psi(self, current_data: pd.DataFrame, 
                         feature: str, threshold: float = 0.2) -> dict:
        """
        Population Stability Index (PSI) for drift detection.
        PSI < 0.1: No significant drift
        PSI 0.1-0.2: Moderate drift
        PSI > 0.2: Significant drift
        """
        ref = self.reference[feature].dropna()
        curr = current_data[feature].dropna()
        
        # Create bins from reference distribution
        bins = np.histogram_bin_edges(ref, bins=10)
        
        ref_hist, _ = np.histogram(ref, bins=bins)
        curr_hist, _ = np.histogram(curr, bins=bins)
        
        # Normalize to proportions
        ref_pct = ref_hist / len(ref) + 0.0001  # Avoid division by zero
        curr_pct = curr_hist / len(curr) + 0.0001
        
        # Calculate PSI
        psi = np.sum((curr_pct - ref_pct) * np.log(curr_pct / ref_pct))
        
        return {
            'feature': feature,
            'test': 'psi',
            'psi_value': psi,
            'drift_detected': psi > threshold,
            'threshold': threshold,
            'severity': 'high' if psi > 0.2 else 'moderate' if psi > 0.1 else 'low',
        }
    
    def detect_drift_mean_std(self, current_data: pd.DataFrame, 
                               feature: str, std_threshold: float = 3.0) -> dict:
        """
        Simple drift detection based on mean shift in standard deviations.
        Alerts if current mean is more than N standard deviations from reference.
        """
        ref_mean = self.reference_stats[feature]['mean']
        ref_std = self.reference_stats[feature]['std']
        
        curr_mean = current_data[feature].mean()
        
        z_score = abs(curr_mean - ref_mean) / (ref_std + 0.0001)
        
        return {
            'feature': feature,
            'test': 'mean_shift',
            'reference_mean': ref_mean,
            'current_mean': curr_mean,
            'z_score': z_score,
            'drift_detected': z_score > std_threshold,
            'threshold': std_threshold,
        }
 
# Usage in production monitoring
def monitor_feature_drift(store, feature_view: str, reference_df: pd.DataFrame):
    """Run drift detection as part of regular monitoring"""
    
    detector = FeatureDriftDetector(reference_df)
    current_df = get_recent_feature_data(store, feature_view)
    
    drift_results = []
    for feature in current_df.columns:
        if feature in reference_df.columns:
            result = detector.detect_drift_psi(current_df, feature)
            drift_results.append(result)
            
            if result['drift_detected']:
                alert_feature_drift(feature_view, result)
    
    return drift_results

Drift Response Strategy

When drift is detected, the response depends on severity: (1) Monitor and log for low drift, (2) Alert and investigate for moderate drift, (3) Trigger model retraining for high drift, (4) Fall back to simpler models if drift is extreme.

Pipeline Validation

Data quality problems often originate in feature computation pipelines. Pipeline validation catches issues before bad data reaches the feature store, preventing downstream contamination.

Converting Mermaid diagram...

pipeline_validation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
from dataclasses import dataclass
from typing import List, Callable
import pandas as pd
from enum import Enum
 
class ValidationResult(Enum):
    PASS = "pass"
    WARN = "warn"
    FAIL = "fail"
 
@dataclass
class PipelineValidation:
    name: str
    validate_fn: Callable[[pd.DataFrame], Tuple[ValidationResult, str]]
    on_fail: str  # 'block', 'warn', 'quarantine'
 
class FeaturePipelineValidator:
    """
    Validate feature data before writing to feature store.
    Blocks bad data from contaminating the store.
    """
    
    def __init__(self):
        self.validations: List[PipelineValidation] = []
    
    def add_validation(self, validation: PipelineValidation):
        self.validations.append(validation)
    
    def validate(self, df: pd.DataFrame) -> Tuple[bool, List[dict]]:
        """
        Run all validations. Returns (should_proceed, results).
        """
        results = []
        should_proceed = True
        
        for validation in self.validations:
            try:
                result, message = validation.validate_fn(df)
                results.append({
                    'validation': validation.name,
                    'result': result.value,
                    'message': message,
                    'on_fail': validation.on_fail,
                })
                
                if result == ValidationResult.FAIL:
                    if validation.on_fail == 'block':
                        should_proceed = False
                    elif validation.on_fail == 'quarantine':
                        # Write to quarantine for investigation
                        quarantine_data(df, validation.name)
                        should_proceed = False
                        
            except Exception as e:
                results.append({
                    'validation': validation.name,
                    'result': 'error',
                    'message': str(e),
                    'on_fail': validation.on_fail,
                })
                should_proceed = False
        
        return should_proceed, results
 
# Define pipeline validations
validator = FeaturePipelineValidator()
 
# Row count validation - detect empty or truncated batches
validator.add_validation(PipelineValidation(
    name="minimum_row_count",
    validate_fn=lambda df: (
        (ValidationResult.PASS, f"Row count: {len(df)}") 
        if len(df) >= 1000 
        else (ValidationResult.FAIL, f"Only {len(df)} rows - expected >= 1000")
    ),
    on_fail="block",
))
 
# Schema validation - ensure expected columns exist
expected_columns = ['user_id', 'event_timestamp', 'purchase_count', 'total_amount']
validator.add_validation(PipelineValidation(
    name="schema_check",
    validate_fn=lambda df: (
        (ValidationResult.PASS, "Schema valid")
        if all(col in df.columns for col in expected_columns)
        else (ValidationResult.FAIL, f"Missing columns: {set(expected_columns) - set(df.columns)}")
    ),
    on_fail="block",
))
 
# Null rate validation
validator.add_validation(PipelineValidation(
    name="null_rate_check",
    validate_fn=lambda df: (
        (ValidationResult.PASS, f"Max null rate: {df.isnull().mean().max():.2%}")
        if df.isnull().mean().max() < 0.10
        else (ValidationResult.WARN, f"High null rate: {df.isnull().mean().max():.2%}")
    ),
    on_fail="warn",
))
 
# Value range validation
validator.add_validation(PipelineValidation(
    name="value_sanity_check",
    validate_fn=lambda df: (
        (ValidationResult.PASS, "Values in expected range")
        if (df['total_amount'] >= 0).all() and (df['purchase_count'] >= 0).all()
        else (ValidationResult.FAIL, "Negative values detected")
    ),
    on_fail="quarantine",
))
 
# Integration into feature pipeline
def run_feature_pipeline(source_df: pd.DataFrame):
    """Run feature transformation with validation gate"""
    
    # Transform features
    features_df = transform_features(source_df)
    
    # Validate before writing
    should_proceed, validation_results = validator.validate(features_df)
    
    if not should_proceed:
        print("Pipeline blocked due to validation failures:")
        for r in validation_results:
            if r['result'] in ['fail', 'error']:
                print(f"  - {r['validation']}: {r['message']}")
        raise Exception("Pipeline validation failed")
    
    # Write to feature store
    store.write_to_online_store(feature_view_name, features_df)
    
    return validation_results

Monitoring and Alerting

Comprehensive monitoring is the last line of defense against data consistency issues. Well-designed monitoring surfaces problems quickly while avoiding alert fatigue.

Feature Store Monitoring Dimensions
Dimension	Metrics	Alert Threshold	Response
Freshness	Time since last update	2x expected interval	Check pipeline, run reconciliation
Completeness	Null rate per feature	10% for non-nullable	Investigate source data
Volume	Row count per materialization	< 50% of expected	Check source data, filters
Latency	Online query p99	50ms	Scale online store, optimize
Drift	PSI score	0.2	Notify model owners, investigate
Consistency	Online-offline mismatch rate	1%	Run reconciliation
Errors	Pipeline failure rate	0	Investigate and fix

comprehensive_monitoring.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
from prometheus_client import Counter, Gauge, Histogram, Summary
from datetime import datetime
import logging
 
# Define comprehensive metrics
class FeatureStoreMetrics:
    def __init__(self):
        # Freshness metrics
        self.last_materialization = Gauge(
            'feature_last_materialization_timestamp',
            'Timestamp of last successful materialization',
            ['feature_view']
        )
        
        self.materialization_lag = Gauge(
            'feature_materialization_lag_seconds',
            'Seconds since last materialization',
            ['feature_view']
        )
        
        # Quality metrics
        self.null_rate = Gauge(
            'feature_null_rate',
            'Percentage of null values',
            ['feature_view', 'feature_name']
        )
        
        self.drift_score = Gauge(
            'feature_drift_psi',
            'Population Stability Index',
            ['feature_view', 'feature_name']
        )
        
        # Performance metrics
        self.query_latency = Histogram(
            'feature_online_query_latency_seconds',
            'Online store query latency',
            ['feature_view'],
            buckets=[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0]
        )
        
        # Error metrics
        self.pipeline_errors = Counter(
            'feature_pipeline_errors_total',
            'Total pipeline errors',
            ['feature_view', 'error_type']
        )
        
        self.validation_failures = Counter(
            'feature_validation_failures_total',
            'Total validation failures',
            ['feature_view', 'validation_name']
        )
 
metrics = FeatureStoreMetrics()
 
# Alert configuration
class AlertConfig:
    FRESHNESS_THRESHOLD_HOURS = 2
    NULL_RATE_THRESHOLD = 0.10
    DRIFT_PSI_THRESHOLD = 0.20
    LATENCY_P99_THRESHOLD_MS = 50
    MISMATCH_RATE_THRESHOLD = 0.01
 
def run_monitoring_job(store, feature_view: str, reference_df: pd.DataFrame):
    """Comprehensive monitoring job for a feature view"""
    
    fv = store.get_feature_view(feature_view)
    current_df = get_recent_feature_data(store, feature_view)
    alerts = []
    
    # Freshness check
    last_mat = get_last_materialization_time(store, feature_view)
    lag_hours = (datetime.now() - last_mat).total_seconds() / 3600
    metrics.materialization_lag.labels(feature_view=feature_view).set(lag_hours * 3600)
    
    if lag_hours > AlertConfig.FRESHNESS_THRESHOLD_HOURS:
        alerts.append({
            'type': 'freshness',
            'severity': 'high',
            'message': f'{feature_view} materialization lag: {lag_hours:.1f} hours',
        })
    
    # Quality checks per feature
    for feature in [f.name for f in fv.schema]:
        # Null rate
        null_rate = current_df[feature].isna().mean()
        metrics.null_rate.labels(feature_view=feature_view, feature_name=feature).set(null_rate)
        
        if null_rate > AlertConfig.NULL_RATE_THRESHOLD:
            alerts.append({
                'type': 'quality',
                'severity': 'medium',
                'message': f'{feature_view}.{feature} null rate: {null_rate:.1%}',
            })
        
        # Drift detection
        if feature in reference_df.columns:
            drift = calculate_psi(reference_df[feature], current_df[feature])
            metrics.drift_score.labels(feature_view=feature_view, feature_name=feature).set(drift)
            
            if drift > AlertConfig.DRIFT_PSI_THRESHOLD:
                alerts.append({
                    'type': 'drift',
                    'severity': 'high',
                    'message': f'{feature_view}.{feature} PSI: {drift:.3f}',
                })
    
    # Send alerts
    for alert in alerts:
        send_alert(alert)
    
    return alerts

Avoiding Alert Fatigue

Too many alerts lead to ignored alerts. Use tiered severity (info/warn/critical), alert only on actionable issues, implement alert grouping, and review alert frequency weekly. If an alert fires daily without action, remove or tune it.

Recovery and Remediation

When consistency issues occur—and they will—having clear recovery procedures minimizes impact. Recovery and remediation strategies should be planned and practiced before incidents occur.

Recovery Strategies by Scenario

•Pipeline Failure — Identify root cause, fix, and replay failed time range. Ensure idempotent pipelines that can safely rerun.
•Bad Data Ingested — Identify affected time range, quarantine or delete bad data, re-ingest corrected data. Notify downstream consumers.
•Schema Corruption — Roll back to last known good schema version. Re-materialize with correct schema.
•Online Store Stale — Force full re-materialization from offline store. May require temporarily serving stale features.
•Drift Detected — Investigate root cause (data issue vs. real distribution shift). Retrain models if needed.
•Source Outage — Serve stale features with freshness warnings. Fall back to simpler features or default values.

recovery_procedures.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
from datetime import datetime, timedelta
import logging
 
logger = logging.getLogger(__name__)
 
class FeatureStoreRecovery:
    def __init__(self, store):
        self.store = store
    
    def rollback_materialization(self, feature_view: str, 
                                   to_timestamp: datetime):
        """
        Roll back online store to a previous point in time.
        Useful when bad data was ingested.
        """
        logger.warning(f"Rolling back {feature_view} to {to_timestamp}")
        
        # Clear current online store data
        self._clear_online_store(feature_view)
        
        # Re-materialize from offline store up to the safe timestamp
        self.store.materialize(
            start_date=datetime(2020, 1, 1),  # Or earliest data date
            end_date=to_timestamp,
            feature_views=[feature_view],
        )
        
        logger.info(f"Rollback complete for {feature_view}")
    
    def backfill_time_range(self, feature_view: str,
                             start: datetime, end: datetime):
        """
        Re-compute features for a specific time range.
        Used after fixing pipeline bugs or source data issues.
        """
        logger.info(f"Backfilling {feature_view} from {start} to {end}")
        
        # First, remove any existing data in this range
        self._delete_range(feature_view, start, end)
        
        # Re-materialize
        self.store.materialize(
            start_date=start,
            end_date=end,
            feature_views=[feature_view],
        )
        
        # Validate the backfill
        validation = validate_cross_store_consistency(self.store, feature_view)
        if not validation['consistent']:
            logger.error(f"Backfill validation failed: {validation}")
            raise Exception("Backfill did not resolve consistency issues")
        
        logger.info(f"Backfill complete and validated for {feature_view}")
    
    def emergency_fallback(self, feature_view: str):
        """
        Enable emergency fallback mode when feature data is unreliable.
        Serves default values or stale data with warnings.
        """
        logger.critical(f"Enabling emergency fallback for {feature_view}")
        
        # Mark feature view as degraded in registry
        self._set_degraded_mode(feature_view, True)
        
        # Notify consumers
        notify_consumers(
            feature_view,
            message=f"DEGRADED: {feature_view} serving fallback values. "
                    f"Do not rely on these features for critical decisions."
        )
        
        # Return fallback configuration
        return {
            'mode': 'fallback',
            'feature_view': feature_view,
            'behavior': 'serve_stale_with_warning',
            'timestamp': datetime.now().isoformat(),
        }
    
    def restore_from_backup(self, feature_view: str, backup_id: str):
        """
        Restore feature data from a point-in-time backup.
        Last resort when re-computation isn't possible.
        """
        logger.warning(f"Restoring {feature_view} from backup {backup_id}")
        
        # Load backup
        backup_data = load_backup(backup_id)
        
        # Validate backup integrity
        if not validate_backup_integrity(backup_data):
            raise Exception(f"Backup {backup_id} failed integrity check")
        
        # Restore to online store
        self._restore_online_store(feature_view, backup_data)
        
        logger.info(f"Restore complete for {feature_view} from {backup_id}")

Practice Recovery

Recovery procedures that haven't been tested will fail in production. Run quarterly 'game day' exercises where you simulate failures and practice recovery. Time your procedures and identify gaps before real incidents occur.

Summary: Data Consistency

We've comprehensively explored the critical topic of data consistency in feature stores. Let's consolidate the key insights:

Key Takeaways

•Data consistency spans multiple dimensions — Temporal, cross-store, schema, semantic, referential, and completeness. Each requires specific validation strategies.
•Online-offline consistency is fundamental — Implement cross-store validation, lag monitoring, and automated reconciliation. The dual-store architecture requires active maintenance.
•Data quality validation catches issues at the source — Validate completeness, validity, uniqueness, and timeliness. Block bad data before it contaminates the store.
•Schema evolution requires careful management — Distinguish breaking from non-breaking changes. Version breaking changes; use deprecation periods for removals.
•Feature drift degrades models silently — Implement statistical tests (PSI, KS-test) to detect distribution shifts. Alert on significant drift.
•Pipeline validation is your first defense — Validate data before writing to the feature store. Quarantine suspicious data for investigation.
•Comprehensive monitoring enables proactive response — Track freshness, quality, volume, latency, drift, and errors. Alert on actionable thresholds.
•Recovery procedures must be prepared in advance — Document rollback, backfill, fallback, and restore procedures. Practice them regularly.

Module Complete:

Congratulations! You've completed the Feature Stores module. You now have comprehensive knowledge of:

Feature store concepts and architecture
Feast as a reference implementation
Online vs offline store patterns
Feature reuse and governance
Data consistency and quality

This knowledge equips you to design, build, and operate production-grade feature infrastructure that serves ML models reliably at scale.

Module Complete

You've mastered the fundamentals of feature stores—from concepts through production operations. This knowledge is essential for any organization operating ML at scale. Feature stores are the bridge between data engineering and ML engineering, and you now understand how to build that bridge effectively.