ml-serving
Point-in-time-correct feature retrieval
- When
- Building training sets from time-varying features, where using a feature value recorded after the label event leaks the future and inflates offline metrics.
- AWS
- SageMaker Feature Store offline store: partitioned Parquet in S3 with event-time and ingestion timestamps, queried with an as-of (ASOF) join that takes the latest feature value strictly before each label timestamp.
- Trade-off
- The as-of join is an O(N log N) sort-merge scan, not a cheap lookup, so training-set assembly is a batch job measured in minutes, not an interactive query.
caching
Write-through online feature cache
- When
- A single inference request must read hundreds-to-thousands of features within a few-millisecond budget that the durable online store cannot meet per-record.
- AWS
- ElastiCache for Redis in front of SageMaker Feature Store: writes go to Feature Store first (durable), then async-populate Redis; reads use a scatter-gather MGET across shards (no hash-tag colocation) and a miss falls back to GetRecord. ElastiCache Serverless removes manual shard sizing.
- Trade-off
- Async population means a consistency window between the system of record and the cache; you accept bounded staleness on the hot path rather than risk silent divergence from a non-atomic double-write.
analytics-olap
Lambda architecture for batch plus streaming features
- When
- Some features need heavy historical aggregation (correct but slow) while others need second-level freshness (recent but light), under one feature definition.
- AWS
- AWS Glue Spark for the batch path to the S3 offline store; Kinesis Data Streams plus Managed Service for Apache Flink for the stateful windowed streaming path. Flink (not Lambda) owns aggregation because window state must be durable and writes must be micro-batched under the Feature Store PutRecord limit.
- Trade-off
- Two code paths for the same logical feature mean two places skew can creep in; the registry must enforce a shared transformation to keep batch and stream identical.
storage
Feature group versioning and lineage
- When
- Multiple teams and models share features and you must reproduce, audit, or GDPR-erase exactly which feature value a given prediction consumed.
- AWS
- SageMaker Feature Store feature groups as the versioned unit with built-in feature metadata and lineage, Glue Data Catalog for offline schema, per-group IAM and KMS keys, ElastiCache Redis ACLs for hot-path tenant isolation, and CloudTrail (management plane plus explicitly-enabled Feature Store data events, with log-file validation and S3 Object Lock) for a tamper-evident audit trail.
- Trade-off
- GDPR erasure must reach every copy - DeleteRecord, S3 tombstone, Redis DEL, and a Kinesis replay boundary at the deletion timestamp - or deleted data resurfaces on a cache rewarm.
caching
Probabilistic early expiration to prevent thundering herd
- When
- Millions of cached feature keys share a TTL or refresh boundary, so synchronized expiry causes every request to miss at once and stampede the backing store.
- AWS
- ElastiCache Redis with probabilistic early recomputation (XFetch-style jitter on TTL) plus a single-flight lock per key, so one request refreshes while others serve the slightly-stale value.
- Trade-off
- A small fraction of reads intentionally serve a near-expiry value to dampen the stampede; you trade marginal freshness for a flat p99 instead of a 100x spike.
ml-serving
Fail-soft degraded inference
- When
- The online feature tier is unavailable or slow and the inference path must still return rather than error or silently serve zeros.
- AWS
- A hard 5 ms per-call timeout on the Redis read: a steady-state single-key miss falls back to Feature Store GetRecord, but a timeout during a mass failover trips straight to a default-feature snapshot baked into the serving container (no network hop), with the response tagged degraded and an alarm on degraded-response rate.
- Trade-off
- Waiting on GetRecord for 6,000 features during a 15 s-1 min Redis failover would brownout every request; the timeout sacrifices feature richness for a bounded-latency degraded answer, observable and discountable downstream.