Loading content...
When Gojek, Southeast Asia's leading super-app, faced the challenge of managing features across hundreds of ML models, they built a feature store. That feature store became Feast (Feature Store), and after being open-sourced in 2019, it has become the de facto standard for open-source feature management.
Feast's adoption by Google Cloud, its integration into major MLOps platforms, and its active community have established it as the reference implementation for feature store concepts. Understanding Feast's architecture provides both practical skills and a template for understanding other feature stores.
This page provides a comprehensive deep-dive into Feast's architecture. You'll understand its core abstractions (entities, features, feature views), its dual-store pattern, deployment options, and how to build production-ready feature pipelines. By the end, you'll be able to design and implement Feast-based feature infrastructure.
Feast is an open-source feature store that helps organizations manage and serve ML features to production models. Unlike monolithic platforms, Feast follows a minimalist, composable philosophy—it focuses on the core feature store responsibilities while integrating with existing infrastructure.
Feast has evolved significantly. Feast 0.10+ introduced a simpler, file-based architecture replacing the earlier Kubernetes-native approach. Modern Feast is lightweight enough to run in a Jupyter notebook while scaling to production workloads.
What Feast Is and Isn't:
| Feast IS | Feast IS NOT |
|---|---|
| A feature registry and serving layer | A complete ML platform |
| A bridge between offline and online stores | A data warehouse or database |
| A Python SDK for feature retrieval | A feature transformation engine (primarily) |
| Infrastructure for consistent serving | A model training framework |
| An integration layer with existing tools | A replacement for data engineering |
Feast organizes features using a clear hierarchy of abstractions. Understanding these abstractions is essential for effective feature store design.
An Entity represents the real-world object for which features are computed. It defines the join key that links features to the business domain.
Key Concepts:
user_id, product_id, merchant_id, session_id12345678910111213141516171819202122232425262728293031
from feast import Entity # Simple entity - single keyuser = Entity( name="user", description="A registered user of the platform", join_keys=["user_id"], # Optional: specify value type for validation # value_type=ValueType.INT64) # Another common entityproduct = Entity( name="product", description="A product in the catalog", join_keys=["product_id"],) # Composite entity - multiple keysuser_product = Entity( name="user_product", description="User-product interaction entity", join_keys=["user_id", "product_id"],) # Session-based entity for real-time featuressession = Entity( name="session", description="A user session", join_keys=["session_id"],)Feast's architecture is designed for flexibility and simplicity. Understanding its components helps in optimizing deployments and troubleshooting issues.
The Registry is Feast's metadata backbone. It stores all information about features, making discovery, governance, and consistency possible. Understanding the registry is key to operating Feast effectively.
| Backend | Best For | Pros | Cons |
|---|---|---|---|
| Local File (SQLite) | Development, testing | Zero setup, portable | Single user, no sharing |
| S3/GCS File | Small teams, simple deployments | Easy sharing, versioned | No concurrent writes |
| SQL (PostgreSQL) | Production, multi-team | ACID, concurrent access | Requires database management |
| AWS Registry (DynamoDB) | AWS-native deployments | Serverless, scalable | AWS lock-in |
| Snowflake Registry | Snowflake-centric orgs | Unified with data warehouse | Snowflake lock-in |
123456789101112131415161718192021222324252627282930313233343536373839
# Development configuration - local file registryproject: my_ml_projectregistry: data/registry.dbprovider: localonline_store: type: sqlite path: data/online_store.dboffline_store: type: file ---# Production configuration - distributed registries and storesproject: my_ml_projectregistry: registry_type: sql path: postgresql://user:pass@host:5432/feast_registry cache_ttl_seconds: 60 # Cache registry for performance provider: gcp # or aws, azure online_store: type: redis connection_string: redis://redis-cluster:6379 # Alternative: DynamoDB # type: dynamodb # region: us-west-2 offline_store: type: bigquery project: my-gcp-project dataset: feast_features # Alternative: Snowflake # type: snowflake # account: myaccount # database: ANALYTICS entity_key_serialization_version: 2flags: alpha_features: true # Enable experimental featuresRegistry Operations:
The registry supports several key operations:
feast apply — Registers or updates feature definitions from your feature repository.feast registry-dump — Exports the current registry state for backup or analysis.feast teardown — Removes all registered objects (use with caution!).The registry also caches locally for performance. In production, configure appropriate cache TTLs to balance freshness and performance.
Use SQL-based registries for production deployments with multiple users. Implement GitOps workflows where CI/CD pipelines run 'feast apply' on merge, ensuring the registry always reflects the repository state. Version your feature definitions alongside your model code.
The Offline Store provides large-scale historical feature retrieval for model training. Feast supports multiple backends, each with distinct performance characteristics and cost profiles.
File-based offline stores use Parquet files on local filesystems or cloud storage. Ideal for development and small-scale production.
12345678910111213141516171819202122232425262728293031323334353637
# feast.yaml for file-based offline store"""project: my_projectregistry: s3://my-bucket/registry.dbprovider: localoffline_store: type: file""" # Usage - feature retrieval from file sourcesfrom feast import FeatureStoreimport pandas as pd store = FeatureStore(repo_path="./feature_repo") # Define entities with timestamps for point-in-time joinsentity_df = pd.DataFrame({ "user_id": [1001, 1002, 1003], "event_timestamp": pd.to_datetime([ "2024-01-01", "2024-01-02", "2024-01-03" ]),}) # Retrieve historical features (triggers Parquet file reads)training_df = store.get_historical_features( entity_df=entity_df, features=[ "user_statistics:total_purchases_30d", "user_statistics:avg_purchase_amount_30d", ],).to_df() # For larger datasets, use to_arrow() for memory efficiencytraining_arrow = store.get_historical_features( entity_df=entity_df, features=["user_statistics:total_purchases_30d"],).to_arrow()Choose your offline store based on where your data already lives. Data movement is expensive. If your feature source data is in BigQuery, use BigQuery as your offline store. The same applies to Snowflake, Redshift, and other warehouses.
The Online Store provides low-latency feature serving for real-time inference. The choice of online store backend significantly impacts serving performance. Understanding the tradeoffs is critical for production deployments.
| Backend | Latency (p99) | Scalability | Cost Model | Best For |
|---|---|---|---|---|
| SQLite | ~10-50ms | Single machine | Free | Development, testing |
| Redis | ~1-5ms | Cluster scaling | Memory-based | Low-latency production |
| DynamoDB | ~5-15ms | Auto-scaling | Request-based | AWS, serverless |
| Bigtable | ~5-10ms | Massive scale | Row/storage | GCP, very high throughput |
| PostgreSQL | ~10-30ms | Moderate | Compute-based | Simple production, SQL familiarity |
| Cassandra | ~5-15ms | Linear scaling | Node-based | Multi-region, high availability |
12345678910111213141516171819202122232425262728293031323334353637383940
# SQLite - Development/Testingonline_store: type: sqlite path: data/online_store.db ---# Redis - Low-latency productiononline_store: type: redis connection_string: redis://redis-cluster:6379 # Redis cluster configuration # connection_string: redis://host1:6379,host2:6379,host3:6379 # With authentication # connection_string: redis://:password@host:6379 ---# Redis with TLS and connection poolingonline_store: type: redis connection_string: rediss://redis-cluster:6379 # 'rediss' for TLS key_ttl_seconds: 86400 # Optional: Key expiration redis_type: redis_cluster # or 'redis' for single node ---# DynamoDB - AWS serverlessonline_store: type: dynamodb region: us-west-2 table_name_template: "{project}_{table_name}" # Optional: Use on-demand capacity for variable workloads # billing_mode: PAY_PER_REQUEST ---# Bigtable - GCP high-scaleonline_store: type: bigtable project: my-gcp-project instance: feast-instance # Table naming table_name_template: "{project}_{table_name}"Online Store Performance Tuning:
Achieving sub-millisecond latencies requires attention to several factors:
For latency-critical applications (< 5ms p99), Redis remains the top choice. Redis Cluster provides horizontal scaling, and Redis Sentinel provides high availability. Consider AWS ElastiCache or GCP Memorystore for managed Redis deployments.
Materialization is the process of computing feature values from sources and populating the online store. Understanding materialization is essential for maintaining fresh features and optimizing costs.
1234567891011121314151617181920212223242526272829303132333435363738
from feast import FeatureStorefrom datetime import datetime, timedelta store = FeatureStore(repo_path="./feature_repo") # Basic materialization - specify time rangestore.materialize( start_date=datetime(2024, 1, 1), end_date=datetime(2024, 1, 31), feature_views=["user_statistics", "product_features"],) # Incremental materialization - from last materialized point to nowstore.materialize_incremental( end_date=datetime.now(), feature_views=["user_statistics"],) # CLI-based materialization (often used in pipelines)# feast materialize 2024-01-01T00:00:00 2024-01-31T00:00:00# feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S") # Production pattern: Scheduled incremental materialization# Using Airflow, Prefect, or similar orchestratorsfrom airflow.decorators import dag, taskfrom datetime import datetime @dag(schedule_interval="@hourly", start_date=datetime(2024, 1, 1))def materialize_features(): @task def run_materialization(): from feast import FeatureStore store = FeatureStore(repo_path="./feature_repo") store.materialize_incremental( end_date=datetime.now(), feature_views=["user_statistics", "realtime_features"], ) run_materialization()materialize_incremental() for regular updates. Full materialization is expensive and should be rare.Materialization incurs compute and storage costs. Over-frequent materialization wastes resources; under-frequent materialization serves stale features. Analyze your use case to determine the right balance between freshness and cost.
The Feature Server is an optional component that provides HTTP/gRPC APIs for feature retrieval. It's essential for non-Python services and high-performance serving scenarios.
12345678910111213141516171819
# Start feature server locally (development)feast serve --port 6566 --host 0.0.0.0 # Start with specific feature repositoryfeast serve --repo-path /path/to/feature_repo # Docker deploymentdocker run -d \ -p 6566:6566 \ -v $(pwd)/feature_repo:/feature_repo \ -e FEAST_REPO_PATH=/feature_repo \ feastdev/feature-server:latest # Kubernetes deployment (Helm)helm repo add feast-helm-charts https://feast-helm-charts.storage.googleapis.comhelm install feast-server feast-helm-charts/feast-feature-server \ --set feast_repo_path=/feature_repo \ --set replicaCount=3 \ --set resources.requests.memory=1Gi123456789101112131415161718192021222324252627282930313233343536373839
import requests # HTTP REST API - Feature retrievalresponse = requests.post( "http://feature-server:6566/get-online-features", json={ "features": [ "user_statistics:total_purchases_30d", "user_statistics:avg_purchase_amount_30d", ], "entities": { "user_id": [1001, 1002, 1003] } })features = response.json() # Feature Service retrievalresponse = requests.post( "http://feature-server:6566/get-online-features", json={ "feature_service": "fraud_detection_v2", "entities": {"user_id": [1001]} }) # gRPC client (higher performance)from feast.protos.feast.serving.ServingService_pb2_grpc import ServingServiceStubfrom feast.protos.feast.serving.ServingService_pb2 import GetOnlineFeaturesRequestimport grpc channel = grpc.insecure_channel("feature-server:6567")stub = ServingServiceStub(channel) request = GetOnlineFeaturesRequest( feature_service="fraud_detection_v2", # ... entity configuration)response = stub.GetOnlineFeatures(request)For production deployments, run multiple feature server replicas behind a load balancer. Use health checks and readiness probes. Consider using the Go-based feature server for lower latency and memory footprint.
We've comprehensively explored Feast's architecture. Let's consolidate the key concepts:
What's Next:
Now that we understand Feast's architecture, we'll explore the critical distinction between online and offline feature stores. You'll learn when to use each, how to optimize for their distinct requirements, and patterns for keeping them synchronized.
You now have a comprehensive understanding of Feast's architecture—from core abstractions through deployment patterns. This knowledge provides the foundation for building production-ready feature infrastructure with Feast.