Patterns — System Design Studio

ingestion

Redundant active-active ingest with seamless input failover

When: A single live contribution feed over bare TCP RTMP stalls the entire downstream audience on a single packet-loss event, and a single ingest endpoint is a hard single point of failure for the whole channel - the part of the path you control least (the broadcaster's last mile) is exactly where loss happens.
AWS: Carry premium contribution over AWS Elemental MediaConnect (SRT/Zixi-grade transport with ARQ packet recovery) with two source flows, feeding two MediaLive input endpoints. MediaLive automatic input failover holds the secondary hot and cuts over on loss-of-signal or black-frame/silence detection while preserving a continuous output timeline, so the manifest never gaps and viewers never notice. Commodity streams can push dual RTMP directly to MediaLive inputs.
Trade-off: You pay to ingest and stand by a second pipeline that is idle nearly all the time, and the premium path gives up RTMP's universality - the encoder must speak SRT or push twice. For a live event the standby cost is rounding error against egress, but for low-stakes streams it may not be worth it.

from: Adaptive bitrate live streaming pipeline

Batch at an aggregation tier, buffer through a stream, fan out from a consumer

When: A huge, spiky fleet of writers (millions of mobile devices) would overwhelm a stateful store that can't hold millions of connections or absorb sudden bursts, and you need durable replay if a downstream consumer falls over. Watch the per-request edge cost: a managed API at millions of req/s and one-record-per-call stream PUTs can each be a seven- to eight-figure monthly bill before any compute runs.
AWS: Front the stream with an NLB + stateless connection-server fleet (ECS Fargate) that holds the persistent device connections and batches 50-100 records per Kinesis PutRecords call — cutting PUT units and per-request charges ~100x versus a direct API Gateway POST. Land writes in Kinesis Data Streams (1 MB/s per shard) and run a Lambda consumer fleet (enhanced fan-out for the latency-critical consumer, standard polling for lag-tolerant ones; explicit BatchSize and reserved concurrency) that drains micro-batches and fans out to ElastiCache and an S3 firehose. Report per-sink batch-item failures independently and buffer failed cache writes to a short-TTL SQS retry queue so one sink's failover never stalls the shard iterator.
Trade-off: You add up to a couple hundred ms of buffering latency to the write path and inherit shard-count capacity planning (and resharding), in exchange for protecting the store from connection storms, collapsing edge cost by two orders of magnitude, and getting durable replay for free. Stream replay is at-least-once, so every sink must be idempotent or dedup by per-shard sequence number.

from: Geospatial proximity search at scale

storage

Write-sharded hot account (virtual sub-accounts summed at read)

When: A single popular account — a marketplace platform account, a viral seller — receives thousands of writes per second, far past the ~1,000 writes/s a single partition key sustains. Every journal entry for that account contends on one row or one partition and the hot key throttles.
AWS: Split the hot account into N=256 virtual shards (account_id#000 .. account_id#255). At a ~1,000 writes/s per-key ceiling, 200,000 writes/s needs 200 shards — 256 is the next power of two with ~20% headroom, so no overflow tier is needed. Writes round-robin or hash across shards, so 256 partition keys absorb the load instead of one. The true balance is SUM over all shards, materialised in ElastiCache so the read is O(1) and unchanged by N. A control table flags which accounts are hot and stores N per account; known-hot accounts are pre-sharded at creation or early detection, not reactively mid-storm.
Trade-off: Re-sharding later is a migration, not a config flip — which is why hot accounts are pre-sharded. Reads fan-in across the shards only on a cache rebuild; the hot path trusts the materialised sum. You give up the simplicity of one row per account balance for the ability to absorb a write storm.

from: Financial ledger and double-entry accounting at scale

Read the idempotency store from the primary, never a replica

When: An idempotency check that reads a stale replica can return not-found for a key already committed to the primary, and the system then re-executes a payment it already ran.
AWS: Issue idempotency reads with DynamoDB ConsistentRead true (or against the Aurora writer endpoint), never an eventually-consistent read or a read replica; treat replica lag on this path as a correctness defect, not a performance one.
Trade-off: Strongly consistent reads cost twice the read-capacity of eventually-consistent ones and forgo the latency win of a nearby replica, in exchange for eliminating the replica-lag duplicate-charge that Airbnb traced to reading idempotency keys from a MySQL read replica.

from: Distributed payment ledger with idempotent settlement

Feature group versioning and lineage

When: Multiple teams and models share features and you must reproduce, audit, or GDPR-erase exactly which feature value a given prediction consumed.
AWS: SageMaker Feature Store feature groups as the versioned unit with built-in feature metadata and lineage, Glue Data Catalog for offline schema, per-group IAM and KMS keys, ElastiCache Redis ACLs for hot-path tenant isolation, and CloudTrail (management plane plus explicitly-enabled Feature Store data events, with log-file validation and S3 Object Lock) for a tamper-evident audit trail.
Trade-off: GDPR erasure must reach every copy - DeleteRecord, S3 tombstone, Redis DEL, and a Kinesis replay boundary at the deletion timestamp - or deleted data resurfaces on a cache rewarm.

from: ML feature store and low-latency inference serving

Shard by spatial cell, not by region name

When: Geographically clustered load (downtown SF, downtown NYC at rush hour) concentrates every write and read for a city onto one shard when you shard by city/region — Lyft hit exactly this ceiling at ~100k ops/s per region shard.
AWS: Use a fixed-area spatial cell (S2 level 5 ≈ 1 km², or an H3 res) as the shard key and hash-tag the Redis key by that cell, so the ElastiCache Cluster's CRC16 slotting spreads a dense city across dozens of shards while keeping nearby data co-located for radius queries.
Trade-off: Cell population is not uniform — a stadium at game time still hot-spots its own cell — and changing the cell-to-shard mapping is a reshard, so you must pick a resolution and online-reshard during low-traffic windows rather than re-key on the fly.

from: Geospatial proximity search at scale

Let ephemeral presence expire instead of deleting it

When: You track 'who is here right now' (online drivers, live cursors, active sessions) and a client that vanishes without a goodbye — a dead phone, a dropped socket — would otherwise linger in the index forever and poison results.
AWS: Write each presence record to ElastiCache with a short TTL (30 s) refreshed on every heartbeat; absence of a refresh expires the record automatically, so you never need a separate reaper job or an explicit 'I'm leaving' message you might never get.
Trade-off: A network partition that drops heartbeats expires healthy clients into false absence, and the TTL is a direct freshness/load knob — shorter means fewer ghosts but more write pressure to keep everyone alive.

from: Geospatial proximity search at scale

caching

Cache-key allowlist and normalization

When: Any CDN where clients shape responses via query params (size, format, locale) and junk params (utm, session) would otherwise mint infinite cache objects.
AWS: CloudFront cache policy allowlisting only the params that change the bytes, plus a CloudFront Function on viewer-request that sorts, lowercases, and clamps them to a fixed breakpoint set in ~1 ms.
Trade-off: You quantize the request space - arbitrary widths snap to the nearest breakpoint - giving up pixel-exact requests for a bounded, high-hit-ratio cache.

from: Global image delivery with CloudFront and edge-side transformation

Conditional same-key request coalescing

When: Same-key cold spikes (a hot hero image) cause thundering-herd misses AND the per-miss origin work is expensive (transform, re-encode). Not worth it for cheap static origins, and it does nothing for launches of many distinct keys.
AWS: CloudFront Origin Shield, one per CRR region, as the regional collapse point so concurrent edge misses for one key become one origin fetch; each POP routes to its nearest Shield.
Trade-off: Adds a cache hop and per-request fee; splits the coalescing cache across regions; and is not HA - when a Shield degrades CloudFront bypasses it and floods origin, so pre-compute, not Shield, is the launch-shock and SPOF protection.

from: Global image delivery with CloudFront and edge-side transformation

Versioned URLs over invalidation

When: Content changes you control (re-uploads, catalogue refreshes) need fresh bytes without racing the cache or burning CloudFront's 3000-path / 15-wildcard-per-sec quota.
AWS: Embed a version in the path (img/v3/id.jpg) backed by a DynamoDB version map; bump the version to mint a guaranteed-fresh key. Reserve wildcard invalidation plus short TTL for legal takedowns only.
Trade-off: URL generators must know the current version (a lookup), coupling the app to a version table instead of treating URLs as static.

from: Global image delivery with CloudFront and edge-side transformation

Write-through dedup cache fronting a transactional store

When: A correctness-critical lookup (have I seen this idempotency key, and what was the answer) must run on every request at a rate the transactional source of truth cannot serve without becoming the bottleneck, but the lock that decides genuine first-execution must stay in the transactional store.
AWS: DynamoDB (no DAX) sits in front of sharded Aurora as a read-through, write-through cache and doubles as the in-process saga's checkpoint store. A cache hit on a terminal status replays in single-digit milliseconds without touching Aurora or running the saga; a miss falls through to the Aurora lock, runs the saga on first execution, then writes the result back through DynamoDB. Reads filter on expires_at in application code rather than trusting best-effort TTL deletion, and Global Tables replicate the store cross-region. DAX is omitted because it does not join Global Tables, does not help the first-execution path, and is redundant once Fargate holds a warm connection pool.
Trade-off: Key state now lives in two stores with a narrow lag window, so you must enforce that only a request which took the Aurora lock and completed may write a terminal cache entry - a cache miss is always safe (fall through to the authority), only a hit short-circuits. The write-back is non-transactional, so a rising cache-miss rate must be alarmed: correctness survives a failed write-back but the cost and latency value silently degrades. You give up single-store simplicity for a hot-path read budget Aurora alone cannot meet.

from: Idempotent payment gateway

Write-through online feature cache

When: A single inference request must read hundreds-to-thousands of features within a few-millisecond budget that the durable online store cannot meet per-record.
AWS: ElastiCache for Redis in front of SageMaker Feature Store: writes go to Feature Store first (durable), then async-populate Redis; reads use a scatter-gather MGET across shards (no hash-tag colocation) and a miss falls back to GetRecord. ElastiCache Serverless removes manual shard sizing.
Trade-off: Async population means a consistency window between the system of record and the cache; you accept bounded staleness on the hot path rather than risk silent divergence from a non-atomic double-write.

from: ML feature store and low-latency inference serving

Probabilistic early expiration to prevent thundering herd

When: Millions of cached feature keys share a TTL or refresh boundary, so synchronized expiry causes every request to miss at once and stampede the backing store.
AWS: ElastiCache Redis with probabilistic early recomputation (XFetch-style jitter on TTL) plus a single-flight lock per key, so one request refreshes while others serve the slightly-stale value.
Trade-off: A small fraction of reads intentionally serve a near-expiry value to dampen the stampede; you trade marginal freshness for a flat p99 instead of a 100x spike.

from: ML feature store and low-latency inference serving

Event-driven preference cache invalidation with TTL backstop

When: Hot per-user preferences (quiet hours, opt-outs, consent) are read on every delivery and cannot be fetched from the database each time, but stale preferences cause consent violations and timezone bugs.
AWS: Write-through ElastiCache in front of a DynamoDB preferences table with per-user IANA timezone and push_consent. Change detection is source-of-truth-driven: DynamoDB Streams on the preferences table feeds an EventBridge Pipe to the cache-invalidation Lambda — no dedicated Kinesis stream, since pref changes are low-volume and the table already emits change records. A 5-minute TTL is only a backstop for missed records. Every push send checks push_consent (default-deny when absent, GDPR Art. 7); quiet-hours and consent both fail closed when the preference cannot be resolved.
Trade-off: Worst-case staleness equals the TTL if a stream record is dropped before the Pipe delivers. Synchronous source-of-truth re-checks for consent withdrawal add one read per consent-bearing delivery, and default-deny means a missing consent record suppresses sends until it is written.

from: Push notification fan-out at scale

Split cache policy by mutability - pin immutable segments, revalidate the live manifest

When: A live stream mixes two object classes with opposite cache requirements: segments that are immutable once written (a stale copy is still correct) and a manifest that must always reflect the live edge (a stale copy desyncs everyone). One blanket TTL is wrong for at least one of them.
AWS: Set Cache-Control on segments to duration plus a ~10s buffer so a million viewers fetching the same segment hit warm CloudFront cache, and set the manifest to a ~1s TTL so it effectively revalidates per poll while still benefiting from request collapsing. Origin-group failover on CloudFront covers a stalled primary MediaPackage endpoint. Bound staleness with a manifest-age canary rather than trusting HTTP 200.
Trade-off: Two cache policies means two places to misconfigure, and the segment TTL must be sized against the DVR/time-shift window - too short and time-shifted viewers miss the origin warm path, too long and storage and stale-edge risk grow. The 1s manifest TTL caps how fresh the live edge can be.

from: Adaptive bitrate live streaming pipeline

messaging

Fence the external gateway call with a Redis NX lock

When: Step Functions can retry a task, or two saga executions can target the same payment, and an at-least-once external call to a card network is a real double-charge — the gateway leg is the one step you cannot blindly retry.
AWS: Before calling the gateway, SET NX payment_id with a 5 s TTL on ElastiCache Serverless (sub-ms failover, no shard topology to manage); only the lock holder calls Stripe/Adyen, passing the internal payment ID as the gateway's own idempotency-key. Release on success; let the TTL expire on crash. Split the failure modes: lock HELD by another saga returns 409 (back off); a lock INFRASTRUCTURE error bypasses Redis and calls the gateway anyway, emitting a redis_bypass metric.
Trade-off: Redlock under partition is not a perfect distributed mutex, so it is a first fence, not the last line of defence — you lean on the gateway's server-side idempotency key as the authoritative de-dup. Critically, fail OPEN on Redis infrastructure failure, not closed: failing closed turns a single ElastiCache failover into a 100% payment outage, whereas bypassing the lock converts a platform-wide outage into a metered pass-through that the gateway fence contains.

from: Real-time payment processing with distributed sagas

In-process saga with DynamoDB-checkpointed compensation

When: A business operation spans multiple steps where a middle step has an irreversible external side effect (money moves), and a crash after that side effect but before the downstream write would otherwise be misread as a failure and retried - charging twice. The classic 86K-dollar duplicate-payout bug. Throughput is high enough (tens of thousands/s) that a managed orchestrator's per-start quota and per-execution price turn against you.
AWS: The saga runs in-process inside an ECS Fargate task and persists each step's state to DynamoDB via a conditional write before proceeding - explicit per-step checkpointing, so a re-drive resumes from the last persisted step rather than replaying the side effect. A conditional PutItem gates saga launch so concurrent duplicates cannot both start. The idempotency key is threaded into the external call (the PSP's own idempotency key) so even an in-step retry deduplicates at the side-effect boundary; full-jitter backoff and a circuit breaker protect a recovering dependency, and a charge-status query precedes any re-drive that risks the PSP's dedup window. A failed post-side-effect step routes to a compensating transaction (issue a void, mark the ledger VOIDED).
Trade-off: You own the saga loop instead of leaning on a managed state machine, so step orchestration, retries, and the reconciliation sweeper for orphaned leases are your code. The win is no per-start quota ceiling, near-zero orchestration cost, and checkpointing stronger than an at-least-once managed Express workflow. At low throughput or for multi-day workflows needing year-long execution history, a managed Step Functions Standard workflow is the better trade.

from: Idempotent payment gateway

Publish events by writing them inside the same transaction

When: Downstream services must learn about a committed ledger write, but a publish-after-commit can lose the event on a crash and a publish-before-commit can emit a phantom event on rollback.
AWS: Insert the event row into an outbox table in the same Aurora transaction as the ledger entries; a single Lambda (reserved concurrency 1) polls WHERE published_at IS NULL ... FOR UPDATE SKIP LOCKED every 100 ms with a tunable batch size, publishes to SQS with exponential backoff, and stamps published_at; a publish_attempts counter routes poison rows to a DLQ and advances the head so one bad row cannot block the stream; consumers dedupe on the idempotency key in their own inbox.
Trade-off: Delivery is at-least-once (consumers must be idempotent) and events lag the commit by up to a poll interval plus cold start; the Lambda poller is defensible to ~1,000 events/s, above which you graduate to DMS to Kinesis CDC (no MSK required) rather than relax the never-lost, never-phantom guarantee.

from: Distributed payment ledger with idempotent settlement

Per-channel SQS Standard buffer with dead-letter redrive

When: Delivery targets (APNs, FCM, email, SMS) have independent rate limits and failure modes, and a burst (10M jobs from one celebrity event) can exceed any single provider's — or FIFO's ~3,000 TPS per-group — sustainable throughput by orders of magnitude.
AWS: One SQS Standard queue per channel (mobile-push, email, in-app, SMS), each carrying a mandatory tenant_id message attribute and drained by a channel-specific consumer Lambda at a controlled rate. Standard is chosen over FIFO because the dedup store (not the queue) enforces idempotency and the ULID (not arrival order) carries display order, so FIFO's throughput ceiling buys nothing here. Cap Lambda reserved concurrency as the back-pressure valve. After 3 failed receives, transient failures redrive to source; malformed payloads route to a separate poison DLQ. A CloudWatch alarm fires on DLQ depth over 100.
Trade-off: SQS Standard is at-least-once (and unlike FIFO offers no ordering or built-in dedup), so a separate 30-day idempotency layer is mandatory. Per-channel queues also multiply operational surface (more queues, more DLQs, more alarms).

from: Push notification fan-out at scale

realtime

Optimize the path that says no, not the path that says yes

When: A request handler accepts a small fraction of traffic (a DSP bids on 2-5 percent, no-bids 95-97 percent) but a naive design does equal work on both paths, sizing the fleet for the rejected majority.
AWS: Drop non-qualifying requests at the network layer with the AWS RTB Fabric inline OpenRTB filter and per-partner rate limiter before they reach the app; short-circuit cheap eligibility checks in the bidder before invoking the in-process model so the expensive scoring runs only on the ~3 percent that can win.
Trade-off: More logic at the edge and a coarse filter that may drop a marginally-winnable request, in exchange for sizing compute for filtered traffic rather than 33x the raw bidstream.

from: Real-time bidding engine at scale

Treat the manifest as a heartbeat with a seconds-scale TTL, not as uncacheable

When: Live HLS/DASH players re-poll the manifest every 1-2s and freeze - all at once - if it stops advancing for ~10s. The manifest is simultaneously the highest-QPS object and the one that must be freshest, so getting its cache and freshness model wrong desyncs the whole audience together.
AWS: Serve the manifest from MediaPackage through CloudFront with a short ~1s Cache-Control TTL (not no-store) so request collapsing lets the origin see roughly one fetch per second per POP instead of one per viewer, while the TTL still bounds live-edge staleness. Monitor manifest age (now minus newest-segment PTS) as a CloudWatch metric and alarm on it - a stale 200 is more dangerous than a 500. Drop segment size to ~200ms LL-HLS parts with blocking playlist reload when sub-5s latency is required.
Trade-off: A 1s TTL means viewers can be up to a second behind the absolute edge, and marking the manifest truly non-cacheable to chase that second collapses request collapsing and dumps the full poll storm (millions of req/s) onto the origin. LL-HLS parts cut latency but multiply request volume ~30x and hold connections open via blocking reload - at large scale that can hit a single MediaPackage endpoint's hard quota (~1,000 manifest req/s, ~500 segment req/s), forcing multiple endpoints behind path-based CloudFront routing or a fan-out tier that absorbs the blocking long-polls before they reach the origin.

from: Adaptive bitrate live streaming pipeline

Close the loop - wire the staleness signal to an actuator, not a dashboard

When: A manifest-age canary that only alarms is a dashboard: it detects the frozen heartbeat but nothing acts on it. For channel-level recovery (a stalled MediaPackage origin or a wedged MediaLive channel) you need detection to mechanically trigger promotion of a standby channel and a CloudFront origin-group repoint, fast and without a human at 3am.
AWS: Compute manifest age at the edge via a CloudFront Functions response handler reading EXT-X-PROGRAM-DATE-TIME and push it as a per-channel CloudWatch embedded metric (no external Synthetics poller, so no O(channels) polling load and no CDN cache masking). A CloudWatch alarm fires an EventBridge rule that invokes a Lambda which promotes the standby MediaLive channel in a second region via API and updates the CloudFront origin group to the secondary MediaPackage endpoint. Viewer session state on DynamoDB on-demand with Global Tables gives cross-region continuity; the slate-fallback circuit breaker runs in CloudFront Functions, not Lambda at Edge, because it is a pure header/redirect rewrite with no outbound call.
Trade-off: A cold standby promoted in another region takes 10-30s and forces a player timeline re-sync, so viewers rebuffer once - it is the recovery of last resort below player backoff and dual-pipeline, not the first reflex. Global Tables replication is sub-second but not zero, so an in-flight session write can be lost at the instant of region failure; players must degrade to a live-edge restart rather than block on a session lookup.

from: Adaptive bitrate live streaming pipeline

geo

Index points in a geo set and answer radius queries directly

When: You need 'all entities within R of a point' under tight latency and you don't want to roll your own spatial index, scan-and-filter the whole table, or stand up a search cluster for what is fundamentally a sorted-set lookup. Reach here for the live 'now' index, not for cold analytics.
AWS: Store points with GEOADD on ElastiCache for Redis and answer with GEOSEARCH ... BYRADIUS (the successor to GEORADIUS). Redis encodes lon/lat as a 52-bit geohash inside a sorted set, so a single-cell radius query is range scans over the set — sub-millisecond and built in. Rejected alternatives: Amazon Location Service Trackers (no radius-query API, no per-asset TTL, no p99 SLA at high write rates) and Amazon OpenSearch (segment-refresh lag fights sub-10 s freshness, no sub-second per-document TTL, segment merges wreck p99 under heavy writes).
Trade-off: GEOSEARCH returns everything in a circle/box but can't express arbitrary polygons or 'inside this delivery zone' — anything beyond a radius needs a point-in-polygon post-filter in your service, and the set holds only point geometry, not the rich attributes you must join back from another store. GEOSEARCH is also a single-key command: a query whose radius spans many sharded cells needs the scatter-gather pattern, not one call.

from: Geospatial proximity search at scale

Scatter-gather across sharded geo cells, merge in-process

When: You shard a geo index by fixed-area cell (to spread dense-city write/read load) but a query radius covers many cells across many shards. A single-key radius command can't cross shard slots, so the 'one round trip' latency story is false the moment the radius exceeds one cell.
AWS: In the query service, compute the cell covering of the query circle (S2/H3), fire one GEOSEARCH per covering cell in parallel — pipelined per shard node — then merge and re-sort the candidate sets by true distance and apply attribute filters in-process. Co-locate any per-cell attribute lookup (e.g. an availability HMGET) on the same shard via a matching hash tag so each shard answers in one pipelined round trip.
Trade-off: A single query becomes dozens of parallel commands (read amplification), and p99 is bounded by the slowest shard in the fan-out, not the average — so the cell size must be chosen against the radius to keep the covering in the tens, not hundreds. Coarsen the shard cell if the covering blows up, trading write-load distribution for a tighter fan-out.

from: Geospatial proximity search at scale

Always query the cell and its 8 neighbors

When: You shard or index by geohash/grid cell and answer proximity with a prefix or single-cell lookup — two entities 10 m apart but across a cell boundary get different prefixes, so a single-cell query silently drops half of them right where density matters most.
AWS: Expand the target cell to its 3x3 (N+8) neighbor set before querying, or let Redis GEOSEARCH do the expansion for you — it computes the covering neighbor cells internally so the boundary bug never reaches your code.
Trade-off: Correctness costs you up to a 9x read amplification per query (nine cell ranges instead of one), and at fine resolutions the radius may still spill past the first neighbor ring, so the cell size must be chosen against the query radius.

from: Geospatial proximity search at scale

rate-limiting

Read soft caps from replicas, write on the primary

When: A per-user frequency cap must be checked hundreds of thousands of times per second, more than a single counter primary can serve, but the cap protects user experience rather than dollars.
AWS: Store freq:{user}:{campaign}:{day} integer counters on ElastiCache for Valkey 8 in cluster mode (shard count derived from target ops/s divided by ~100k ops/s per shard); read from per-shard replicas at bid time (1-5 ms lag) and INCR + EXPIRE 86400 on the primary at billing time; use Count-Min Sketch for extreme-scale soft caps and a Bloom filter for has-seen-at-all checks.
Trade-off: Accept 5-10 percent over-cap from replication lag and probabilistic over-count, in exchange for linear read scalability - a looseness acceptable for impressions but never reused for money.

from: Real-time bidding engine at scale

Push rate limiting and request filtering below the app tier

When: A sub-10 ms latency budget and a 10x thundering-herd surge (header-bidding inflation sends one impression as 1,500 requests) make it impossible to filter and rate-limit junk inside the app without melting the fleet.
AWS: Use AWS RTB Fabric for ingress - single-digit-ms private networking with an inline Rate Limiter (per-partner QPS caps and bid-stuffing defense), an inline OpenRTB Filter that drops wrong-geo and wrong-format requests before the app, and Error Masking; keep supply-chain checks (sellers.json, ads.txt) in the bidder app, not the Fabric.
Trade-off: Coupling to a young, six-region managed service billed per message hop (two hops per auction - request in and response/204 out). The ~80 percent networking saving holds only when exchange partners are RTB Fabric participants (internal rate vs ~7x external); assume 70 percent-plus internal volume and verify per partner. Accepted because the combined sub-10 ms latency and internal-traffic saving have no equivalent in composed AWS primitives.

from: Real-time bidding engine at scale

Make the limiter decision atomic with one Lua script

When: A rate-limit decision reads a counter, branches on it, increments, and conditionally sets a TTL — and concurrent requests or a mid-sequence crash can leave a key with no expiry, silently disabling the limit.
AWS: Run the whole read-branch-increment-expire as a single EVAL on ElastiCache for Redis, using redis.call('TIME') as the one authoritative clock.
Trade-off: All decision logic lives in a Lua script you must test and version, in exchange for true atomicity that MULTI/EXEC and WATCH can't give you without retry contention.

from: Distributed rate limiting at API gateway scale

Approximate the window with two counters, not a log

When: You need a smooth rolling limit without the 2x boundary burst of a fixed window, but a per-request timestamp log would OOM the store under attack (tens of GB at the limit).
AWS: Store two integer keys per tenant (current + previous window) on ElastiCache and compute a weighted estimate; hash-tag the keys so the pair lands on one cluster slot; guard EXPIRE to fire only on the first write of the window (cur == 1), not every request.
Trade-off: A bounded approximation error (~half the previous window's rate at the transition) in exchange for ~11,700x less memory than a sliding-window log — not suitable for billing-grade exactness; treat slot-migration TRYAGAIN/ASK/MOVED as fail-open rather than retrying.

from: Distributed rate limiting at API gateway scale

Reward well-behaved bursts with a token bucket

When: Tenants have legitimately bursty traffic (batch jobs, expensive queries) that a flat per-second window would unfairly punish.
AWS: A Redis hash per tenant (tokens, last_refill) refilled lazily in Lua at rate r up to cap b; bill expensive operations (e.g. GraphQL query cost) as more tokens.
Trade-off: You allow bursts up to b above the sustained rate, and must use integer/server-clock arithmetic to avoid float drift and clock-skew bugs, versus the perfectly smooth output of a leaky bucket.

from: Distributed rate limiting at API gateway scale

Counter-shard the whale tenants within a cell

When: Hash-tagging pins a tenant's keys to one slot for atomicity, but a single very-high-rate tenant then drives all its EVALs to one shard and saturates it while others sit idle.
AWS: For tenants above an ops threshold, split the key into M sub-keys ({rl:t:s0}..{rl:t:sM-1}) on ElastiCache, route each request to a random sub-shard, and sum the approximate sub-counts.
Trade-off: An extra over-count on whale tenants (the same approximation accepted at the cell level, one level down) in exchange for spreading their load off a single hot slot; the common tenant stays exact.

from: Distributed rate limiting at API gateway scale

id-generation

k-sortable IDs with a worker slot claimed at cold-start

When: You need a high-throughput, collision-resistant payment ID that also range-scans well on the ledger — UUIDv4 is random, so it fragments B-tree inserts and makes time-range queries a full scan — but Lambda has no stable worker identity and two cold-starts in the same millisecond would collide a naive Snowflake.
AWS: Generate a 63-bit Snowflake variant: 41-bit ms timestamp (custom epoch) + 14-bit worker ID + 8-bit sequence (16,384 workers, 256 IDs/ms/worker). Each Lambda instance claims a unique worker ID at cold-start via a DynamoDB conditional put on the slot, so no two live workers share a number. Use ULIDs for audit/event IDs (monotonic-in-ms, base32, URL-safe). Expose only an HMAC-derived ext_ref to clients, never the predictable numeric ID, to prevent BOLA enumeration.
Trade-off: You cap concurrent workers at 16,384 (14 bits — ample over the ~6,000 concurrent Lambdas at peak, alarmed at 80% utilisation) and depend on roughly-synced clocks (NTP) plus a clock-rollback guard, in exchange for coordinator-free, time-ordered IDs after boot. UUID v7 is the coordination-free alternative that deletes the worker-slot table entirely, but at 128 bits it doubles every B-tree index width versus the 63-bit integer — half the index size is load-bearing at billions of ledger rows, so we keep Snowflake.

from: Real-time payment processing with distributed sagas

Generate sortable IDs without coordination at extreme QPS

When: You must mint hundreds of millions of IDs per second for ledger and log entries, where OS-entropy UUID generation becomes a contention hotspot and random UUIDs throw away the ordering reconciliation needs.
AWS: Use Snowflake-style 64-bit IDs (timestamp + worker + sequence, worker id from ECS task metadata) for spend-ledger and bid-log entries - 4M+ monotonic sortable IDs per worker per second with no coordination; use per-thread PRNG-seeded generators for within-response Bid.id where ordering is not needed.
Trade-off: Snowflake leaks approximate creation time and demands worker-id assignment and backward-clock-skew handling, in exchange for never touching a central sequence or shared entropy pool in the hot path.

from: Real-time bidding engine at scale

ULID as time-ordered inbox sort key

When: You need a globally unique ID for high-volume time-series rows that are read newest-first, without a central coordinator, and you want pagination for free.
AWS: Generate a ULID (48-bit ms timestamp + 80-bit random) in the worker and store it as the DynamoDB sort key under a user partition key. Lexical order equals time order, so newest-first is a descending range query and the sort key doubles as the pagination cursor.
Trade-off: Ordering is only monotonic to the millisecond — two IDs minted in the same ms on different workers have arbitrary relative order. You give up strict global monotonicity (which Snowflake provides via coordinated machine IDs) for zero coordination.

from: Push notification fan-out at scale

coordination

Row-level lease for concurrent duplicate serialisation

When: Two identical requests carrying the same operation identity arrive milliseconds apart on different stateless instances, both miss any cache, and a plain check-then-act would let both proceed (a TOCTOU race) and execute the side effect twice - the double-click on Pay.
AWS: The first transaction wins the key row via a conditional insert and sets status PROCESSING; the second transaction's insert conflicts, so it takes a SELECT ... FOR UPDATE row-level lock on that key in Aurora PostgreSQL, blocks until the first commits, then reads the now-terminal status and replays the cached response. The database row lock is the coordinator - one request gets the lease to proceed, the other waits and replays - with no application-level locking service.
Trade-off: The lock makes the key row a serialisation point, so a pathologically hot key serialises its duplicates and a long-running first execution makes its duplicates wait. You depend on the relational primitive (row lock plus conditional insert in one transaction), which is why this lives in Aurora and not in a pure key-value cache.

from: Idempotent payment gateway

Limit per cell, accept bounded over-counting

When: A globally exact counter would need a cross-region synchronization point whose round-trip exceeds your entire latency budget.
AWS: Run an independent ElastiCache cluster per cell; a single control-plane authority computes per_cell_limit = global_limit / active_cell_count from a DynamoDB cell-registry and republishes on cell join/leave; pin clients to scaleReads:'master' and assert ROLE=primary on startup (never read replicas — stale = silent bypass).
Trade-off: A tenant balanced across C cells can consume up to C-times their global limit only within the recompute window (alarm at 1.5x, page at 3x), in exchange for local sub-millisecond hops and a cell-scoped blast radius.

from: Distributed rate limiting at API gateway scale

Fail open with a local backstop and a circuit breaker

When: The shared rate-limit store can fail or slow down, and a limiter bug must never reject legitimate paying traffic — but going fully open invites abuse during the outage.
AWS: On ElastiCache error/timeout, allow and emit a metric; keep WAF/usage-plan throttles as the volumetric floor that survives a Redis outage; size the in-process backstop bucket as tenant_limit / current_healthy_instances (from EDS) with a hard ceiling; trip a circuit breaker on sustained errors and recover half-open with U(0, 0.2*cooldown) jitter and a single probe.
Trade-off: Brief over-allowance during failover and coarser enforcement in degraded mode, bounded by the edge floor so induced-Redis-failure can't become unbounded pass-through, in exchange for never blocking real customers when the limiter is sick.

from: Distributed rate limiting at API gateway scale

Distribute quota config without a single-coordinator freeze

When: Live config (per-tenant limits) is distributed from a store to many stateless deciders; a single Streams->Lambda->push path can stall (iterator lag, poison record, fan-out storm) and freeze all tenants on stale limits.
AWS: DynamoDB Streams triggers a reserved-concurrency Lambda (DLQ + BisectBatchOnFunctionError, IteratorAge alarm) that publishes one event to SNS; each cell consumes via its own SQS queue; deciders snapshot config from DynamoDB on startup and serve last-known-good on staleness; AppConfig gates staged rollout/rollback.
Trade-off: More moving parts than a direct push, in exchange for bounded fan-out (O(M) publishes + O(N) batched consumers), no forward-progress single point, and a deterministic cold-start config.

from: Distributed rate limiting at API gateway scale

Jitter the reset to defuse the retry stampede

When: Throttled clients all read the same reset timestamp and retry on the same millisecond, turning a transient limit into a self-inflicted synchronized DDoS.
AWS: Return 429 with a jittered Retry-After and prefer no-hard-reset algorithms (sliding window / token bucket); have clients use full-jitter exponential backoff: sleep = min(cap, base*2^n) * U(0,1).
Trade-off: Slightly longer worst-case client wait in exchange for spreading retries across a window instead of concentrating them at a clock boundary.

from: Distributed rate limiting at API gateway scale

media-cdn

Two-tier edge compute split

When: Edge logic mixes cheap per-request string work (key rewrite, auth check) with expensive per-miss byte work (image transform).
AWS: CloudFront Functions (1 ms, no cold start) on viewer-request for normalization and signed-URL validation; Lambda at edge on origin-request for the transform, running on misses only.
Trade-off: Two runtimes to test and deploy, with Lambda at edge replication lag on every change - in exchange for a roughly 7x cheaper viewer layer.

from: Global image delivery with CloudFront and edge-side transformation

Pre-compute as the primary path

When: Derivatives are small and predictable and re-requested over a lifetime, so V x P_render is less than F_avg x P_transform - making pre-compute cheaper than on-the-fly and keeping Lambda off the critical path (concurrency, failover, SWR cold-miss gaps).
AWS: S3 upload event triggers Step Functions to validate (magic bytes, dimension caps, Rekognition moderation), render all standard variants to S3, and warm-prefetch them through CloudFront before go-live; on-the-fly Lambda at edge transform is the long-tail fallback only.
Trade-off: You store variants that may never be requested and re-render on a schema or transform-version change - only wins when the variant set is small and the crossover inequality holds (fails for high-cardinality, rarely-requested UGC).

from: Global image delivery with CloudFront and edge-side transformation

Three-layer per-tenant isolation

When: Multi-tenant delivery where a valid signature must not be enough to read another tenant's content - isolation has to bind signing, path, and IAM, not just trust a key.
AWS: CloudFront Trusted Key Groups per tenant validated in the viewer Function, which also scopes the signed Resource to the tenant prefix, asserts the requested prefix matches the validating key group, and rejects path traversal; the transform role assumes a session tagged with the authenticated tenant ID so an s3:prefix IAM condition bounds its read per request; S3 keys derived server-side to eliminate SSRF.
Trade-off: Config and CMKs grow with tenant count against the behaviors-per-distribution cap, forcing sharding (dedicated behaviors/distributions for large tenants, shared prefix-routed behavior for the long tail) - and sharding fixes signing isolation only, so the session-tag prefix condition is the separate control for origin/IAM isolation.

from: Global image delivery with CloudFront and edge-side transformation

Keyframe-align every rendition of the ABR ladder

When: You transcode one source into multiple bitrate renditions and want players to switch between them mid-stream as bandwidth changes. Seamless switching is only possible if segment N of every rendition covers the identical wall-clock interval and starts on an aligned keyframe - otherwise the player glitches at the splice or stalls waiting for the next GOP boundary.
AWS: Run the ladder as a single AWS Elemental MediaLive channel with a fixed GOP length that is an exact multiple of the segment duration and keyframes forced at segment boundaries across all outputs, so every rendition's segments share identical presentation timestamps. MediaLive enforces this when GOP and segment length are set consistently; a hand-rolled FFmpeg ladder with a mismatched -g per rendition silently breaks switching.
Trade-off: Forcing keyframes at fixed boundaries spends bitrate - you can't let the encoder place IDR frames purely where the content wants them, so you pay slightly more bits for the same quality in exchange for switchability. The GOP-to-segment coupling also constrains how short you can make segments before keyframe overhead dominates.

from: Adaptive bitrate live streaming pipeline

Origin Shield to make live-edge origin load independent of audience size

When: The newest segment at the live edge has never been requested, so it is uncached by definition, and millions of players want it the instant it appears - a thundering herd straight at the origin. Edge request collapsing alone still leaves one fetch per POP (hundreds) per new segment, which scales with neither zero nor one.
AWS: Enable CloudFront Origin Shield in the region nearest MediaPackage so all ~600 edge POPs route through one regional caching tier before reaching origin. The funnel becomes viewers to POPs to one Origin Shield to origin, collapsing the herd into roughly one origin fetch per segment - making origin load flat and audience-independent (a 10M-viewer event costs the origin the same as a 10k-viewer one).
Trade-off: Adds one cache hop of latency on a true edge miss and a per-request Origin Shield charge, and concentrates origin-facing traffic through a single regional tier - so that region's health becomes a dependency you must monitor. For tiny audiences the extra hop is pure overhead.

from: Adaptive bitrate live streaming pipeline

analytics-olap

CQRS balance projection — immutable journal, materialised read model

When: An account's balance is the sum of all its journal entries. Computing it by scanning every entry is O(entries) and a high-velocity account accumulates millions of rows — a balance read that re-sums history is unusable on the hot path.
AWS: The write side is the immutable Aurora journal. A CQRS projector — Lambda triggered by a Kinesis On-Demand stream of journal entries (fed by DMS / logical replication, partitioned by logical account_id so deltas apply in LSN order) — incrementally maintains a materialised balance per account in ElastiCache via atomic INCRBY (read model) and analytics snapshots in DynamoDB. ElastiCache Valkey is chosen over DAX because the hot path is an atomic in-memory increment, which DAX (a DynamoDB read-cache) does not provide. The API reads the cache; on a miss it falls back to a bounded Aurora aggregate over the latest snapshot plus recent entries, never the full history.
Trade-off: The read model is eventually consistent — there is a projection lag (typically sub-second) between a posted journal entry and the visible balance. You accept showing a slightly stale balance for the ability to read in O(1), and you must reconcile the projection against the journal of record to catch drift.

from: Financial ledger and double-entry accounting at scale

WORM audit trail with nightly reconciliation sweep

When: Compliance requires an immutable, queryable record of every journal entry retained for years, and the materialised balances must be provably equal to the sum of the journal of record. The reconciliation job itself is the only independent drift detector, so its silent failure must also be caught.
AWS: DMS streams journal entries to S3 as Parquet under S3 Object Lock (WORM, compliance mode) — write-once, immutable, queried by compliance via Athena. The same export doubles as the journal archive so Aurora keeps only a 90-day hot window. A nightly Step Functions + Athena job re-sums every account (GROUP BY account_id over Parquet, ~$17/run) and compares to the materialised balance cache, writing discrepancies to S3 and alerting when drift exceeds $0.01; Glue is reserved for the bulk genesis scan. A CloudWatch Alarm fires if the reconciliation has not SUCCEEDED within 26 hours — a dead-man's switch, because absence of a success is itself a page-worthy event. Audit is layered: CloudTrail covers the control plane; the Parquet export is the data-plane record.
Trade-off: Object Lock means data genuinely cannot be deleted before its retention expires — a misclassified or PII-bearing record is stuck for the retention term, so the schema must guarantee no PII enters the journal. Reconciliation is a batch backstop, not a real-time guarantee: drift is detected within a day, not within a second.

from: Financial ledger and double-entry accounting at scale

Lambda architecture for batch plus streaming features

When: Some features need heavy historical aggregation (correct but slow) while others need second-level freshness (recent but light), under one feature definition.
AWS: AWS Glue Spark for the batch path to the S3 offline store; Kinesis Data Streams plus Managed Service for Apache Flink for the stateful windowed streaming path. Flink (not Lambda) owns aggregation because window state must be durable and writes must be micro-batched under the Feature Store PutRecord limit.
Trade-off: Two code paths for the same logical feature mean two places skew can creep in; the registry must enforce a shared transformation to keep batch and stream identical.

from: ML feature store and low-latency inference serving

ml-serving

Point-in-time-correct feature retrieval

When: Building training sets from time-varying features, where using a feature value recorded after the label event leaks the future and inflates offline metrics.
AWS: SageMaker Feature Store offline store: partitioned Parquet in S3 with event-time and ingestion timestamps, queried with an as-of (ASOF) join that takes the latest feature value strictly before each label timestamp.
Trade-off: The as-of join is an O(N log N) sort-merge scan, not a cheap lookup, so training-set assembly is a batch job measured in minutes, not an interactive query.

from: ML feature store and low-latency inference serving

Fail-soft degraded inference

When: The online feature tier is unavailable or slow and the inference path must still return rather than error or silently serve zeros.
AWS: A hard 5 ms per-call timeout on the Redis read: a steady-state single-key miss falls back to Feature Store GetRecord, but a timeout during a mass failover trips straight to a default-feature snapshot baked into the serving container (no network hop), with the response tagged degraded and an alarm on degraded-response rate.
Trade-off: Waiting on GetRecord for 6,000 features during a 15 s-1 min Redis failover would brownout every request; the timeout sacrifices feature richness for a bounded-latency degraded answer, observable and discountable downstream.

from: ML feature store and low-latency inference serving

payments-exactly-once

Orchestrate the saga; compensate, never two-phase commit

When: A money movement spans three internal services plus an external gateway, and any step can fail independently — but a classic 2PC coordinator that dies after phase-1 commit (Robinhood's 72-hour ghost transfer) leaves funds in limbo with no recovery path.
AWS: Model the flow as a Step Functions Express Workflow: each forward step is an idempotent Lambda task, and each has an explicit compensating task (ReleaseBalance, VoidAuth) wired through a Catch into a reverse path. Every compensation is a guarded state transition (UPDATE ... WHERE status = :expected_status AND version = :v), not a relative delta, so an at-least-once replay matches zero rows and is a safe no-op; VoidAuth reuses the forward call's gateway idempotency-key. Sync execution returns the result in under the 800 ms checkout budget; the 5-minute Express ceiling is ample for synchronous auth.
Trade-off: You give up the illusion of a single atomic transaction and accept windows where the system is in a known-intermediate state (reserved-but-not-captured) that compensation must unwind; you must make every step AND every compensation idempotent — guarded transitions, never relative deltas — and reason about compensations that themselves fail, which a DLQ for human review backstops. Express does not persist execution history to the service (only CloudWatch Logs), so the authoritative state is the DynamoDB idempotency slot, not the workflow console.

from: Real-time payment processing with distributed sagas

Gate every payment with a DynamoDB conditional put

When: A client retry races the original request to completion and both reach the gateway — the Stripe-2013 double-charge — so the very first thing on the path must collapse all copies of one logical request onto one outcome.
AWS: On entry, PutItem to a DynamoDB idempotency table with ConditionExpression attribute_not_exists(pk), pk = sha256(merchant_id + idempotency_key). First writer wins the slot (status PROCESSING, 24 h TTL); a ConditionalCheckFailed means the request is already in flight or done, so return the cached response, never re-execute.
Trade-off: You add ~5 ms and one strongly-consistent write to every payment, and you inherit a stuck-PROCESSING edge case (caller crashed mid-saga) that needs a lease timeout to reclaim — in exchange for a hard exactly-once boundary that volatile caches like Redis cannot guarantee.

from: Real-time payment processing with distributed sagas

Reconcile against the gateway before settlement, on a heartbeat

When: Even with idempotency, sagas, and locks, a dropped async event or a partition can leave internal ledger and the gateway's view divergent — a captured charge with no ledger row, or a Monzo-style card reserve the merchant never captures and the bank silently releases at T+7d.
AWS: Push, not pull: subscribe to the gateway's real-time settlement webhook feed and land it via a second Kinesis Firehose stream into S3 Parquet, alongside ledger_entries snapshots — pulling 36M records/hour through a paginated API would not fit a Lambda timeout. An Athena join flags divergence to SNS. An EventBridge Scheduler heartbeat re-checks only pending reservations at T+7d, T+14d, T+30d (small volume) so a dropped release event cannot drift the ledger past settlement.
Trade-off: Reconciliation is eventually-consistent (hourly, not per-transaction) and is a detective control, not a preventive one — it catches and surfaces divergence rather than stopping it, which is acceptable because settlement is T+1 and gives the window time to resolve before money actually moves.

from: Real-time payment processing with distributed sagas

Idempotency-key dedup gate with atomic conditional insert

When: A payment command can be retried — by a flaky client, an API Gateway retry, an SQS redrive — and two retries can arrive concurrently before the first has committed. A naive read-then-write check ('does this key exist yet?') lets both retries pass and double-charges the account.
AWS: A DynamoDB idempotency table keyed on tenant_id#sha256(client_id + request_id + amount + currency + timestamp_bucket) — the tenant_id prefix lets dynamodb:LeadingKeys clamp each tenant's credential to its own partition space. The first thing every command does is a conditional PutItem with attribute_not_exists — an atomic compare-and-set that claims the key with status=PENDING. Exactly one writer wins the partition; the loser catches ConditionalCheckFailedException and returns the stored prior result. The Aurora journal commits as a SEPARATE phase (DynamoDB and Aurora are distinct transactional domains — there is no 2PC between them), then the claim flips to status=COMPLETE. A sweeper Lambda (EventBridge every 5 minutes) resolves any claim stranded in PENDING against Aurora — completing it or releasing it for retry. A 30-day TTL reaps the keyspace.
Trade-off: This is safe eventual consistency, not a single atomic unit: there is an observable window between the claim and the journal commit, bounded to ~5 minutes by the sweeper rather than to the 30-day TTL. The conditional write adds one synchronous round-trip on the hot path, and the timestamp_bucket bounds the dedup window. The idempotency store is a correctness-critical dependency, not a cache: if it is unavailable you must fail closed.

from: Financial ledger and double-entry accounting at scale

Deferred double-entry invariant (SUM of entries = 0 per transaction)

When: Every money movement must conserve value: for each transaction the debits and credits must net to exactly zero, or money was created or destroyed. But the debit row and the credit row cannot both be inserted in the same instant, so a row-level check would reject the half-written transaction.
AWS: Aurora PostgreSQL holds immutable journal entries (journal_entry_id, transaction_id, account_id, amount_cents BIGINT, direction). A DEFERRABLE INITIALLY DEFERRED constraint (or a constraint trigger) evaluates SUM(signed_amount) GROUP BY transaction_id = 0 at COMMIT, not per row — so a transaction can write its debit and credit legs and is only validated when complete. Amounts are integer cents (BIGINT), never floats, so the sum is exact. Entries are append-only: no UPDATE, no DELETE; a reversal is a new compensating transaction.
Trade-off: Deferred constraints defer the failure to commit time, so the application must handle a late rollback of an otherwise-accepted transaction. Integer cents means currency precision is fixed at the minor unit — sub-cent intermediate math (FX, interest accrual) must round explicitly and book the rounding remainder somewhere.

from: Financial ledger and double-entry accounting at scale

Saga with compensating entries for cross-account transfer

When: A transfer debits account A and credits account B. If the process dies after the debit but before the credit, money is stranded — destroyed from A, never created in B. A naive two-step write has no atomicity across the legs.
AWS: Model the transfer as a single Aurora transaction when both legs share a database (the deferred SUM=0 constraint then enforces atomicity for free). When legs cross service boundaries, use a Step Functions Express saga: each step is idempotent on transaction_id#step_name, and any failure after the debit triggers a compensating reversal transaction (a new credit back to A) rather than a destructive rollback. Express (at-least-once, ~$8k/month at 2,000 sagas/s) is chosen over Standard (exactly-once, ~$648k/month) because exactly-once belongs on the per-step idempotency key, not on the orchestrator — which the design already provides.
Trade-off: A saga is eventually consistent and money is briefly in an intermediate state (debited from A, not yet credited to B) visible to reconciliation. Express gives at-least-once execution, so each step must be idempotent — a burden the design already carries. Compensation is itself a journal entry, so the audit trail shows the failed-and-reversed path rather than hiding it — the ledger records attempts, not just successes.

from: Financial ledger and double-entry accounting at scale

Idempotency key with cached response replay

When: A mutating operation has an external side effect (a charge) that must run exactly once even when the response is lost to a timeout and the client retries, and concurrent retries can race in. You need an explicit identity for the operation - not a guess from request fields - and a way to return the original answer on every retry.
AWS: Client generates a V4 UUID once per logical operation and sends it as an Idempotency-Key header, reusing it across retries. The server persists ((tenant_id, idempotency_uuid), request_hash over a canonical form, status, response, expires_at, lease_expires_at) in Aurora PostgreSQL via INSERT ... ON CONFLICT DO NOTHING. First caller executes and stores the outcome; later callers replay the cached response on terminal status, get 409 while PROCESSING, and 422 when the same key arrives with a different request hash. Reads filter on expires_at in application code so best-effort TTL deletion is never a correctness boundary; expiry is jittered to avoid a stampede.
Trade-off: Every mutating request now pays a key-store round trip before doing work, and clients must persist and correctly reuse the key across retries (a new key means a new charge). Binding the key to a canonicalised request hash means a legitimate retry with any payload drift is rejected with 422 rather than silently replayed. The PROCESSING lease must be tuned: too short and a slow saga gets reclaimed mid-flight, too long and a crashed saga stays stuck until the sweeper finds it.

from: Idempotent payment gateway

Append-only ledger with streamed tamper-evident audit

When: A financial system must record every state transition (created, charged, settled, voided) as an immutable system of record, prove to an auditor that no record was altered, and fan the terminal outcome out to downstream consumers without coupling them to the charge path.
AWS: Each saga transition is appended to a DynamoDB ledger that is the system of record for SETTLED and VOIDED states. DynamoDB Streams feeds those transitions to S3 with Object Lock (WORM) in a dedicated logging account, so records cannot be altered or deleted within the retention window even by an admin - satisfying PCI Req. 10 and SOC 2 CC7 / NIST AU-9. Terminal outcomes also emit payment.completed / payment.failed to EventBridge for decoupled downstream fan-out.
Trade-off: Append-only means the ledger only grows; you pay storage and need a retention/archival strategy, and a correction is a new compensating entry rather than an update, so reads must fold the event history to get current state. The WORM immutability that satisfies auditors also means a genuinely wrong record cannot be deleted within retention - only annotated.

from: Idempotent payment gateway

Gate every payment on a conditional write to the primary

When: A client may retry a payment request after a lost response, a timeout, or a load-balancer hiccup, and a second execution would move money twice.
AWS: Client mints a UUID idempotency key before the first attempt and resends it on every retry; the server does a conditional PutItem with attribute_not_exists(pk) on a DynamoDB table (key sharded tenant#id#shard#mod_N to avoid a hot partition) to claim the key at PENDING, drives it PENDING to PROCESSING to COMPLETE with a leaseExpiry on the PROCESSING row so a crashed worker self-heals in ~30 s instead of orphaning the key for the TTL, and on a ConditionalCheckFailed reads the winner's stored response with ConsistentRead true and returns it verbatim.
Trade-off: Every payment pays for one strongly consistent DynamoDB write plus a consistent read on the retry path, and you must fail closed (reject) when the gate is unavailable — single-region and strongly consistent on purpose, since DynamoDB Global Tables (last-writer-wins, eventually consistent) would reopen the replica-lag double-charge hole. The gate's availability becomes a hard dependency of accepting any payment.

from: Distributed payment ledger with idempotent settlement

Model money as append-only double-entry pairs that sum to zero

When: You need an auditable, regulator-grade record of fund flows where balances are derivable and never silently corrupted by a partial write.
AWS: Write exactly one DEBIT and one CREDIT row per transaction in a single Aurora PostgreSQL ACID transaction, with a UNIQUE constraint on idempotency_key as a second line of defense behind the DynamoDB gate; never UPDATE or DELETE a row, only append reversing entries.
Trade-off: The ledger grows monotonically forever (7-year retention, no compaction) and corrections cost two extra reversing rows instead of an edit, in exchange for an immutable audit trail and a database-enforced no-duplicate guarantee that survives application bugs.

from: Distributed payment ledger with idempotent settlement

Orchestrate multi-step settlement with compensating transactions

When: A settlement spans steps that cannot share one ACID transaction (hold, external KYC verify, release, collect fee) and a mid-flight failure must reverse only the steps that already committed.
AWS: Step Functions runs the saga (Express for short settlements, Standard for long-running limbo cases); each state persists, steps retry with full-jitter backoff and a TimeoutSeconds on the external KYC call guarded by a DynamoDB circuit breaker; a Catch block runs a compensating state machine that reverses committed steps using derived idempotency keys ({key}:compensate:{step}); a stuck compensation flips the account to LIMBO and pages via EventBridge to SNS to AWS Systems Manager Incident Manager.
Trade-off: You accept eventual consistency across the settlement and the operational burden of compensation logic plus a manual-resolution path for stuck reversals, in exchange for a durable, restartable workflow with no distributed-transaction coordinator across external APIs.

from: Distributed payment ledger with idempotent settlement

Read money state in-band and fail closed

When: Stateless nodes spend a shared budget and a lost connection to the spend counter tempts a fail-open default - the exact gap that cost Meta advertisers $100K-$500K overnight during a DB failover.
AWS: Distribute the daily budget as token-chunk LEASES via a control-plane Lambda against a PID pacing curve; nodes spend locally with no hot-path hop and emit a fast HTTP 204 within an 8 ms internal deadline if they cannot read their allowance or the lease expires (2 missed reconciliation intervals); reconcile from a Kinesis burl stream every 1-5 s; a circuit-breaker Lambda - fed by both Kinesis AND the CloudWatch EMF spend-velocity metric so a Kinesis stall cannot disable it - pauses any campaign over 3x target for 60 s.
Trade-off: Accept 1-2 percent overspend and a 1-5 s reconciliation blind window in exchange for an off-hot-path budget check and a hard ceiling on runaway spend; fail-closed means a fast 204 (never a timeout, which Google throttles on), and the lease bounds even a stalled reconciliation.

from: Real-time bidding engine at scale

Account spend effectively-once on the billing notice

When: An auction win does not equal a charge - the impression may never render - and the same impression can arrive via multiple supply paths, so naively billing on the win notice double-charges or over-charges.
AWS: Bill on the OpenRTB billing notice (burl), not the win notice (nurl); stream events through Kinesis Data Streams (BatchWindow 1s, bisectBatchOnFunctionError, DLQ) deduplicated by the SSP-generated burl transaction id (trid); fold into a Snowflake-keyed ledger and archive to S3 under Object Lock as a tamper-evident audit trail.
Trade-off: This is effectively-once, not exactly-once: the burl trid is SSP-generated per auction and stays stable, but Prebid Aug-2025 trid fragmentation breaks real-time dedup on the request path - so real-time request dedup is abandoned in favor of structural Supply Path Optimization, at the cost of bidding into fewer paths.

from: Real-time bidding engine at scale

notification-fanout

Hybrid two-tier fan-out (write-time + read-time merge)

When: An event must reach every follower, but follower counts span six orders of magnitude — most users have hundreds, a celebrity has tens of millions. A single fan-out strategy either detonates on write (celebrity) or wastes reads (everyone else).
AWS: Kinesis Data Streams (partition by source user) feeds fan-out Lambdas. Below the ~10k-follower threshold, write one notification per follower into a DynamoDB inbox (materialised, cheap reads). Above it, write one celebrity-event pointer and merge it into followers' streams at read time. The threshold is config, tuned from cost/latency telemetry.
Trade-off: Celebrity-follower inbox reads become more expensive and complex — every read must fetch and time-merge celebrity pointers. You also accept a small consistency window where a celebrity post appears in followers' streams milliseconds apart rather than atomically.

from: Push notification fan-out at scale

Effectively-once delivery via MemoryDB SET NX dedup window

When: An at-least-once delivery pipeline must not double-notify users, the dedup window must outlast every downstream provider's retry horizon, and the dedup state must survive a failover rather than resetting empty and re-delivering everything in flight.
AWS: Before each send, the consumer Lambda runs SET NX on Amazon MemoryDB for Redis with key sha256(tenant_id + notification_ulid + recipient_id) and a 30-day TTL. MemoryDB's durable multi-AZ transaction log survives failover without losing dedup state. The atomic NX guarantees one consumer wins the race; the tenant prefix prevents cross-tenant suppression. On MemoryDB unavailability the consumer fails closed (message stays in queue, retried with backoff — never a blind send); consent-bearing channels fall back to a durable DynamoDB conditional PutItem (attribute_not_exists, 30-day TTL).
Trade-off: The dedup keyspace is large and stateful — billions of 30-day keys run to hundreds of GB — and MemoryDB's durability premium over plain ElastiCache costs more. The honest guarantee is effectively-once, not exactly-once: residual duplicates remain possible on an un-flushed write lost in failover or a DLQ redrive after the 30-day TTL expires.

from: Push notification fan-out at scale

Managed push transport with token lifecycle (SNS + Pinpoint)

When: You must deliver to APNs/FCM at burst scale, honour per-provider rate limits and warm connection pools, reap stale/invalid tokens, and schedule around quiet hours — without owning a stateful, throttle-sensitive client fleet.
AWS: SNS mobile push owns platform endpoints, the APNs/FCM feedback loop, and auto-deregistration of invalid tokens. Amazon Pinpoint layers quiet-hours and per-user timezone scheduling on top; Amazon SES handles email with sending-rate control. The fan-out tier never calls a provider directly.
Trade-off: You lose fine-grained control over connection-pool tuning and provider-specific behaviour, and you inherit SNS/Pinpoint quotas and abstractions. FCM's 600k msg/min cap is handled by sharding across K SNS platform applications (one per FCM sender project), with the consumer routing on recipient_user_id mod K to keep a device sticky to one project.

from: Push notification fan-out at scale