Feature Stores
A feature store is a central repository for ML features that eliminates training-serving skew by guaranteeing the same feature computation is used during both model training and online inference.
Key Facts
- Training-serving skew is the core problem: if training uses one SQL query and serving uses another, model performance silently degrades in production
- Two storage tiers: offline store (historical, batch, used for training dataset construction) and online store (current values only, low-latency, used at inference time)
- Point-in-time correctness prevents data leakage — when building a training dataset, features are looked up as of the label timestamp, not "now"
- Online stores target sub-10ms feature lookup; offline stores trade latency for completeness and historical depth
- Feast is the leading open-source option (Linux Foundation); Tecton is the leading managed enterprise option; Hopsworks leads on governance and regulated industries
- For LLM applications, pre-computed embeddings and user preference vectors stored in feature stores let recommendation and retrieval systems meet <200ms end-to-end latency budgets
- Feature stores are overkill for small teams or single-model systems; the complexity pays off when multiple models share the same features
Detail
The problem feature stores solve
Without a feature store, every team that needs "user's average purchase value over the last 30 days" writes its own version — one for training, one for the API, one for the dashboard. They diverge. The training pipeline uses a LEFT JOIN; the serving code uses an INNER JOIN. The model is trained on one distribution and evaluated on another. This is training-serving skew, and it causes model degradation that is extremely hard to debug because nothing crashes — predictions just get worse.
A feature store solves this by making feature logic the single source of truth. Features are defined once (as a transformation + storage spec), computed on a schedule, and read from the same store for both training data construction and online inference.
Point-in-time correctness
When constructing a training dataset from historical labels, you cannot use feature values that were computed after the label timestamp — that is data leakage. A feature store's offline retrieval performs a point-in-time join: for each (entity, timestamp) in your label set, it looks up the feature value that was current at that timestamp, not the latest value.
Example: a fraud label for a transaction at 14:03:22 should be joined with the user's "number of transactions in last hour" as of 14:03:22, not as of whenever you ran the training job.
This is the most operationally difficult part of building a feature store from scratch — it requires storing feature history with timestamps and doing an efficient as-of join across potentially billions of rows.
Online vs offline store
| Property | Offline store | Online store |
|---|---|---|
| Storage backend | S3, GCS, Delta Lake, BigQuery | Redis, DynamoDB, Cassandra, Bigtable |
| Read latency | Seconds to minutes (batch) | 1–10ms |
| Data volume | Full history | Latest value per entity only |
| Primary use | Training dataset construction | Real-time inference |
| Write pattern | Batch materialization jobs | Continuous or scheduled materialization from offline |
The materialization job moves data from offline to online on a schedule (hourly, daily). Freshness of online features is bounded by this schedule — a relevant operational tradeoff.
Tool comparison: Feast vs Tecton vs Hopsworks
Feast (open source, Linux Foundation)
- Most flexible: bring your own storage backends (S3 + Redis, BigQuery + Firestore, etc.)
- Feature definitions in Python, registered to a central registry
- Requires engineering capacity to operate and integrate
- Best for: teams that want no vendor lock-in and already have infrastructure opinions
Tecton (managed SaaS, originally from Uber Michelangelo team)
- Opinionated, end-to-end managed platform
- Strong real-time streaming support (features computed from Kafka, not just batch)
- Enterprise pricing; targets business-critical real-time ML
- Best for: enterprises that need production SLAs and can pay for a managed service
Hopsworks (open source + managed, on-prem or cloud)
- Tightest integration between feature store, model registry, and training pipelines
- Strong data lineage, governance, and metadata management
- Default choice for regulated industries (healthcare, finance, manufacturing)
- Best for: organizations requiring on-premises deployment or regulatory compliance
All three support point-in-time correct training dataset generation and an online/offline split. The differences are operational model, streaming support depth, and governance tooling.
LLM-specific use cases
Feature stores are seeing renewed relevance in LLM applications in two ways:
1. Pre-computed embeddings as features For retrieval-augmented systems or recommendation layers sitting in front of an LLM, embedding a document or user profile at query time is too slow. Pre-computing embeddings on a schedule and storing them in the online store (or a vector index backed by the feature store) brings retrieval latency into the <10ms range. The feature store manages freshness — when a document is updated, the embedding is recomputed and re-materialized.
2. User preference and behavior vectors User-LLM personalization patterns encode user history as a dense vector (a "user embedding") computed from behavioral signals. These vectors are expensive to compute at query time. Storing them as features — recomputed on a rolling schedule, served from the online store — lets LLM inference pipelines fetch rich user context in milliseconds via a single key lookup.
Note: for pure embedding retrieval (ANN search over a corpus), a vector store (Qdrant, pgvector, Pinecone) is more natural than a feature store. The feature store pattern applies when the embedding or vector is an input feature to a model, not the retrieval index itself.
When a feature store is worth adding
Add a feature store when:
- Multiple models share the same derived features (avoids redundant computation and inconsistency)
- Online inference has strict latency requirements (<50ms) and features require non-trivial computation
- You need auditable, reproducible training datasets from historical feature values
- The team is large enough that feature definitions will otherwise drift between data scientists and engineers
Skip it when:
- You have a single model and a small data team
- Features are simple column selects from a database — a view or dbt model is sufficient
- The overhead of operating a dual-store system exceeds the engineering time lost to the problem it solves
- A lightweight SQL implementation (see sql/sql-for-ai feature store pattern) handles the use case
Connections
- data/data-engineering-hub — first mention of feature stores in the vault; positioned in the AI data stack between storage and serving layers
- sql/sql-for-ai — the feature store pattern section shows a minimal PostgreSQL implementation using
feature_values+ TTL checks, useful when a full feature store is overkill - rag/embeddings — pre-computed embeddings stored in a feature store are the primary LLM-era use case; freshness and materialization schedules apply directly
- infra/vector-stores — alternative and complement: vector stores handle ANN search over embedding corpora; feature stores handle entity-keyed feature lookup; the two are often used together
Open Questions
- What is the practical freshness floor for online store materialization in real-time recommendation use cases — is hourly good enough, or do streaming feature stores (Tecton's approach) become necessary?
- How do feature stores interact with LLM prompt caching — is there a pattern where the feature store keys into a prompt cache by user/context ID?
- Does the Hopsworks vector store offering (added 2024) meaningfully compete with dedicated vector stores for RAG, or is it primarily for feature-adjacent embedding use cases?
- What is the migration path from a lightweight SQL feature store (as in sql-for-ai) to a full Feast deployment — is there a natural breaking point (data volume, team size)?
Sources
- Aerospike — Feature Store 101: Build, Serve, and Scale ML Features: https://aerospike.com/blog/feature-store/
- Databricks — What Is a Feature Store: https://www.databricks.com/blog/what-is-a-feature-store
- Kanerika — Feast vs Tecton vs Hopsworks: Which Feature Store Fits: https://kanerika.com/blogs/feast-vs-tecton-vs-hopsworks/
- GoCodeo — Top 5 Feature Stores in 2025: https://www.gocodeo.com/post/top-5-feature-stores-in-2025-tecton-feast-and-beyond
- Taylor Amarel — Comprehensive Comparison: Feast vs. Tecton vs. Hopsworks (2024/2025): https://taylor-amarel.com/2025/04/comprehensive-comparison-feast-vs-tecton-vs-hopsworks-for-cloud-based-feature-stores-2024/
- Uplatz — A Comparative Analysis of Modern Feature Stores: https://uplatz.com/blog/a-comparative-analysis-of-modern-feature-stores-feast-vs-tecton-vs-hopsworks/
- ACM Web Conference 2025 — User-LLM: Efficient LLM Contextualization with User Embeddings: https://dl.acm.org/doi/10.1145/3701716.3715463
- Embedding Stack — Embedding Stack for AI-Powered Applications: https://embeddingstack.com/embedding-stack-for-ai-applications
- Made With ML (Anyscale) — Feature Store (MLOps course): https://madewithml.com/courses/mlops/feature-store/
Related reading