Architecture

ML feature store and low-latency inference serving

A point-in-time-correct feature store with an online/offline split feeding sub-40 ms inference, built end to end on AWS managed services.

Raw event sources

flowchart LR Apps([App and service events]) Txns([Transactions]) Logs([Behaviour logs]) Apps --> Raw[(Raw event lake S3)] Txns --> Raw Logs --> Raw

Application events, transactions and behavioural logs land as the raw material from which every feature is computed. Nothing is a feature yet - these are the immutable source of truth.

Batch feature pipeline

flowchart LR Apps([App and service events]) Txns([Transactions]) Apps --> Raw[(Raw event lake S3)] Txns --> Raw Raw --> Glue[AWS Glue Spark] Glue --> Offline[(S3 offline store\nParquet)]

AWS Glue Spark jobs read raw events and compute heavy aggregates and embeddings, writing them as partitioned Parquet to the S3 offline store - the columnar layer that makes point-in-time as-of joins possible.

Streaming feature pipeline

flowchart LR Apps([App and service events]) Apps --> Raw[(Raw event lake S3)] Raw --> Glue[AWS Glue Spark] Glue --> Offline[(S3 offline store)] Apps --> Kinesis[Kinesis Data Streams] Kinesis --> Flink[Managed Flink] Flink --> FS[(SageMaker Feature Store online)]

Fresh features - counts over the last minutes - flow through Kinesis Data Streams (on-demand) into Managed Service for Apache Flink for stateful windowed aggregation, then to the online store within seconds. Same feature definition, faster path.

Online store and write-through cache

flowchart TD subgraph ingest[Ingestion] Glue[AWS Glue Spark] Flink[Managed Flink] end subgraph online[Online serving tier] Redis[(ElastiCache Redis\nread cache)] FS[(SageMaker Feature Store)] end Offline[(S3 offline store)] Glue --> Offline Glue --> FS Flink --> FS FS -.async populate.-> Redis Redis -->|miss fallback| FS

SageMaker Feature Store is the durable online system of record written first; ElastiCache for Redis is populated asynchronously as a read cache so a ranking request can MGET hundreds of features in one sub-millisecond round trip, falling back to Feature Store on a miss.

Inference serving layer

flowchart TD Client([Inference request\nentity IDs]) subgraph online[Online serving tier] Redis[(ElastiCache Redis)] FS[(SageMaker Feature Store)] end Endpoint[SageMaker Endpoint in VPC] Default[Default snapshot in container] Client --> Endpoint Endpoint -->|MGET 5ms timeout| Redis Redis -->|miss fallback| FS Endpoint -->|timeout degraded| Default Endpoint --> Pred([Prediction])

A SageMaker real-time Endpoint hosts the model. Per request it batches feature reads from Redis - falling back to Feature Store on a miss - then runs the forward pass, all inside a 40 ms p99 budget.

Full system with control plane

flowchart TD Apps([Raw events]) Client([Inference request]) subgraph ingest[Feature pipelines] Kinesis[Kinesis Data Streams\n7 day retention] Flink[Managed Flink] Glue[AWS Glue Spark] end subgraph data[Stores] Redis[(ElastiCache Redis\nper-tenant ACL)] FS[(SageMaker Feature Store)] Offline[(S3 offline store)] end subgraph serve[Serving] Endpoint[SageMaker Endpoint in VPC] end subgraph control[Control plane] Registry[Feature lineage\nand Glue Data Catalog] Monitor[Model Monitor\ndrift and skew] end Apps --> Kinesis --> Flink Apps --> Glue Glue --> Offline Glue --> FS Flink --> FS FS -.async.-> Redis Client --> Endpoint Endpoint -->|MGET| Redis Redis -->|fallback| FS Endpoint --> Pred([Prediction]) Registry -.governs.-> Glue Registry -.governs.-> Flink Endpoint -.metrics.-> Monitor Offline -.baseline.-> Monitor

The complete platform: batch and streaming feature paths, dual online and offline stores, inference fleet, plus the control plane - Feature Store metadata and lineage with Glue Data Catalog for one definition, and Model Monitor for drift and training-serving skew alarms.