ML feature store and low-latency inference serving
A point-in-time-correct feature store with an online/offline split feeding sub-40 ms inference, built end to end on AWS managed services.
01
Raw event sources
flowchart LR
Apps([App and service events])
Txns([Transactions])
Logs([Behaviour logs])
Apps --> Raw[(Raw event lake S3)]
Txns --> Raw
Logs --> Raw
Application events, transactions and behavioural logs land as the raw material from which every feature is computed. Nothing is a feature yet - these are the immutable source of truth.
02
Batch feature pipeline
flowchart LR
Apps([App and service events])
Txns([Transactions])
Apps --> Raw[(Raw event lake S3)]
Txns --> Raw
Raw --> Glue[AWS Glue Spark]
Glue --> Offline[(S3 offline store\nParquet)]
AWS Glue Spark jobs read raw events and compute heavy aggregates and embeddings, writing them as partitioned Parquet to the S3 offline store - the columnar layer that makes point-in-time as-of joins possible.
03
Streaming feature pipeline
flowchart LR
Apps([App and service events])
Apps --> Raw[(Raw event lake S3)]
Raw --> Glue[AWS Glue Spark]
Glue --> Offline[(S3 offline store)]
Apps --> Kinesis[Kinesis Data Streams]
Kinesis --> Flink[Managed Flink]
Flink --> FS[(SageMaker Feature Store online)]
Fresh features - counts over the last minutes - flow through Kinesis Data Streams (on-demand) into Managed Service for Apache Flink for stateful windowed aggregation, then to the online store within seconds. Same feature definition, faster path.
SageMaker Feature Store is the durable online system of record written first; ElastiCache for Redis is populated asynchronously as a read cache so a ranking request can MGET hundreds of features in one sub-millisecond round trip, falling back to Feature Store on a miss.
A SageMaker real-time Endpoint hosts the model. Per request it batches feature reads from Redis - falling back to Feature Store on a miss - then runs the forward pass, all inside a 40 ms p99 budget.
The complete platform: batch and streaming feature paths, dual online and offline stores, inference fleet, plus the control plane - Feature Store metadata and lineage with Glue Data Catalog for one definition, and Model Monitor for drift and training-serving skew alarms.