Patterns from this design

Real-time bidding engine at scale

realtime

When: A request handler accepts a small fraction of traffic (a DSP bids on 2-5 percent, no-bids 95-97 percent) but a naive design does equal work on both paths, sizing the fleet for the rejected majority.
AWS: Drop non-qualifying requests at the network layer with the AWS RTB Fabric inline OpenRTB filter and per-partner rate limiter before they reach the app; short-circuit cheap eligibility checks in the bidder before invoking the in-process model so the expensive scoring runs only on the ~3 percent that can win.
Trade-off: More logic at the edge and a coarse filter that may drop a marginally-winnable request, in exchange for sizing compute for filtered traffic rather than 33x the raw bidstream.

payments-exactly-once

When: Stateless nodes spend a shared budget and a lost connection to the spend counter tempts a fail-open default - the exact gap that cost Meta advertisers $100K-$500K overnight during a DB failover.
AWS: Distribute the daily budget as token-chunk LEASES via a control-plane Lambda against a PID pacing curve; nodes spend locally with no hot-path hop and emit a fast HTTP 204 within an 8 ms internal deadline if they cannot read their allowance or the lease expires (2 missed reconciliation intervals); reconcile from a Kinesis burl stream every 1-5 s; a circuit-breaker Lambda - fed by both Kinesis AND the CloudWatch EMF spend-velocity metric so a Kinesis stall cannot disable it - pauses any campaign over 3x target for 60 s.
Trade-off: Accept 1-2 percent overspend and a 1-5 s reconciliation blind window in exchange for an off-hot-path budget check and a hard ceiling on runaway spend; fail-closed means a fast 204 (never a timeout, which Google throttles on), and the lease bounds even a stalled reconciliation.

payments-exactly-once

When: An auction win does not equal a charge - the impression may never render - and the same impression can arrive via multiple supply paths, so naively billing on the win notice double-charges or over-charges.
AWS: Bill on the OpenRTB billing notice (burl), not the win notice (nurl); stream events through Kinesis Data Streams (BatchWindow 1s, bisectBatchOnFunctionError, DLQ) deduplicated by the SSP-generated burl transaction id (trid); fold into a Snowflake-keyed ledger and archive to S3 under Object Lock as a tamper-evident audit trail.
Trade-off: This is effectively-once, not exactly-once: the burl trid is SSP-generated per auction and stays stable, but Prebid Aug-2025 trid fragmentation breaks real-time dedup on the request path - so real-time request dedup is abandoned in favor of structural Supply Path Optimization, at the cost of bidding into fewer paths.

rate-limiting

When: A per-user frequency cap must be checked hundreds of thousands of times per second, more than a single counter primary can serve, but the cap protects user experience rather than dollars.
AWS: Store freq:{user}:{campaign}:{day} integer counters on ElastiCache for Valkey 8 in cluster mode (shard count derived from target ops/s divided by ~100k ops/s per shard); read from per-shard replicas at bid time (1-5 ms lag) and INCR + EXPIRE 86400 on the primary at billing time; use Count-Min Sketch for extreme-scale soft caps and a Bloom filter for has-seen-at-all checks.
Trade-off: Accept 5-10 percent over-cap from replication lag and probabilistic over-count, in exchange for linear read scalability - a looseness acceptable for impressions but never reused for money.

id-generation

When: You must mint hundreds of millions of IDs per second for ledger and log entries, where OS-entropy UUID generation becomes a contention hotspot and random UUIDs throw away the ordering reconciliation needs.
AWS: Use Snowflake-style 64-bit IDs (timestamp + worker + sequence, worker id from ECS task metadata) for spend-ledger and bid-log entries - 4M+ monotonic sortable IDs per worker per second with no coordination; use per-thread PRNG-seeded generators for within-response Bid.id where ordering is not needed.
Trade-off: Snowflake leaks approximate creation time and demands worker-id assignment and backward-clock-skew handling, in exchange for never touching a central sequence or shared entropy pool in the hot path.

rate-limiting

When: A sub-10 ms latency budget and a 10x thundering-herd surge (header-bidding inflation sends one impression as 1,500 requests) make it impossible to filter and rate-limit junk inside the app without melting the fleet.
AWS: Use AWS RTB Fabric for ingress - single-digit-ms private networking with an inline Rate Limiter (per-partner QPS caps and bid-stuffing defense), an inline OpenRTB Filter that drops wrong-geo and wrong-format requests before the app, and Error Masking; keep supply-chain checks (sellers.json, ads.txt) in the bidder app, not the Fabric.
Trade-off: Coupling to a young, six-region managed service billed per message hop (two hops per auction - request in and response/204 out). The ~80 percent networking saving holds only when exchange partners are RTB Fabric participants (internal rate vs ~7x external); assume 70 percent-plus internal volume and verify per partner. Accepted because the combined sub-10 ms latency and internal-traffic saving have no equivalent in composed AWS primitives.