Patterns

Every reusable pattern these designs relied on, grouped by family. The library you actually revise from.

messaging

Let delivery prune the registry

When
A connection registry drifts as clients disconnect uncleanly, leaving dead entries that waste pushes.
AWS
On a 410 Gone from @connections POST, delete that connection from DynamoDB inline; TTL sweeps the rest.
Trade-off
A little extra write traffic on the delivery path, in exchange for a registry that needs no separate reaper.

from: Real-time notification fan-out

realtime

Let a managed edge hold the socket

When
You need millions of long-lived client connections but don't want to own draining, sticky routing, and fleet scaling.
AWS
API Gateway WebSocket API with $connect/$disconnect Lambdas; push later with @connections POST.
Trade-off
Per-message + per-connection-minute cost and a 128 KB frame limit, in exchange for deleting socket-fleet operations.

from: Real-time notification fan-out

rate-limiting

Make the limiter decision atomic with one Lua script

When
A rate-limit decision reads a counter, branches on it, increments, and conditionally sets a TTL — and concurrent requests or a mid-sequence crash can leave a key with no expiry, silently disabling the limit.
AWS
Run the whole read-branch-increment-expire as a single EVAL on ElastiCache for Redis, using redis.call('TIME') as the one authoritative clock.
Trade-off
All decision logic lives in a Lua script you must test and version, in exchange for true atomicity that MULTI/EXEC and WATCH can't give you without retry contention.

from: Distributed rate limiting at API gateway scale

Approximate the window with two counters, not a log

When
You need a smooth rolling limit without the 2x boundary burst of a fixed window, but a per-request timestamp log would OOM the store under attack (tens of GB at the limit).
AWS
Store two integer keys per tenant (current + previous window) on ElastiCache and compute a weighted estimate; hash-tag the keys so the pair lands on one cluster slot; guard EXPIRE to fire only on the first write of the window (cur == 1), not every request.
Trade-off
A bounded approximation error (~half the previous window's rate at the transition) in exchange for ~11,700x less memory than a sliding-window log — not suitable for billing-grade exactness; treat slot-migration TRYAGAIN/ASK/MOVED as fail-open rather than retrying.

from: Distributed rate limiting at API gateway scale

Reward well-behaved bursts with a token bucket

When
Tenants have legitimately bursty traffic (batch jobs, expensive queries) that a flat per-second window would unfairly punish.
AWS
A Redis hash per tenant (tokens, last_refill) refilled lazily in Lua at rate r up to cap b; bill expensive operations (e.g. GraphQL query cost) as more tokens.
Trade-off
You allow bursts up to b above the sustained rate, and must use integer/server-clock arithmetic to avoid float drift and clock-skew bugs, versus the perfectly smooth output of a leaky bucket.

from: Distributed rate limiting at API gateway scale

Counter-shard the whale tenants within a cell

When
Hash-tagging pins a tenant's keys to one slot for atomicity, but a single very-high-rate tenant then drives all its EVALs to one shard and saturates it while others sit idle.
AWS
For tenants above an ops threshold, split the key into M sub-keys ({rl:t:s0}..{rl:t:sM-1}) on ElastiCache, route each request to a random sub-shard, and sum the approximate sub-counts.
Trade-off
An extra over-count on whale tenants (the same approximation accepted at the cell level, one level down) in exchange for spreading their load off a single hot slot; the common tenant stays exact.

from: Distributed rate limiting at API gateway scale

coordination

Limit per cell, accept bounded over-counting

When
A globally exact counter would need a cross-region synchronization point whose round-trip exceeds your entire latency budget.
AWS
Run an independent ElastiCache cluster per cell; a single control-plane authority computes per_cell_limit = global_limit / active_cell_count from a DynamoDB cell-registry and republishes on cell join/leave; pin clients to scaleReads:'master' and assert ROLE=primary on startup (never read replicas — stale = silent bypass).
Trade-off
A tenant balanced across C cells can consume up to C-times their global limit only within the recompute window (alarm at 1.5x, page at 3x), in exchange for local sub-millisecond hops and a cell-scoped blast radius.

from: Distributed rate limiting at API gateway scale

Fail open with a local backstop and a circuit breaker

When
The shared rate-limit store can fail or slow down, and a limiter bug must never reject legitimate paying traffic — but going fully open invites abuse during the outage.
AWS
On ElastiCache error/timeout, allow and emit a metric; keep WAF/usage-plan throttles as the volumetric floor that survives a Redis outage; size the in-process backstop bucket as tenant_limit / current_healthy_instances (from EDS) with a hard ceiling; trip a circuit breaker on sustained errors and recover half-open with U(0, 0.2*cooldown) jitter and a single probe.
Trade-off
Brief over-allowance during failover and coarser enforcement in degraded mode, bounded by the edge floor so induced-Redis-failure can't become unbounded pass-through, in exchange for never blocking real customers when the limiter is sick.

from: Distributed rate limiting at API gateway scale

Distribute quota config without a single-coordinator freeze

When
Live config (per-tenant limits) is distributed from a store to many stateless deciders; a single Streams->Lambda->push path can stall (iterator lag, poison record, fan-out storm) and freeze all tenants on stale limits.
AWS
DynamoDB Streams triggers a reserved-concurrency Lambda (DLQ + BisectBatchOnFunctionError, IteratorAge alarm) that publishes one event to SNS; each cell consumes via its own SQS queue; deciders snapshot config from DynamoDB on startup and serve last-known-good on staleness; AppConfig gates staged rollout/rollback.
Trade-off
More moving parts than a direct push, in exchange for bounded fan-out (O(M) publishes + O(N) batched consumers), no forward-progress single point, and a deterministic cold-start config.

from: Distributed rate limiting at API gateway scale

Jitter the reset to defuse the retry stampede

When
Throttled clients all read the same reset timestamp and retry on the same millisecond, turning a transient limit into a self-inflicted synchronized DDoS.
AWS
Return 429 with a jittered Retry-After and prefer no-hard-reset algorithms (sliding window / token bucket); have clients use full-jitter exponential backoff: sleep = min(cap, base*2^n) * U(0,1).
Trade-off
Slightly longer worst-case client wait in exchange for spreading retries across a window instead of concentrating them at a clock boundary.

from: Distributed rate limiting at API gateway scale

notification-fanout

Decouple fan-out behind pub/sub + queues

When
One event must reach many consumers and the producer must not feel the fan-out or the slowest consumer.
AWS
SNS topic fanning out to per-shard SQS queues, drained by Lambda workers; DLQ for poison messages.
Trade-off
At-least-once delivery (consumers must dedupe) and per-message SQS cost, in exchange for buffering, retries, and zero servers.

from: Real-time notification fan-out

Shard the hot partition

When
A few topics have orders-of-magnitude more subscribers than the rest, creating a hot key on lookup and delivery.
AWS
Append a shard suffix to the DynamoDB partition key (TOPIC#id#shard) and run one delivery worker per shard.
Trade-off
Delivery code must scatter-gather across shards; more workers and queries for the few topics that need it.

from: Real-time notification fan-out