Patterns from this design

Distributed rate limiting at API gateway scale

rate-limiting

When: A rate-limit decision reads a counter, branches on it, increments, and conditionally sets a TTL — and concurrent requests or a mid-sequence crash can leave a key with no expiry, silently disabling the limit.
AWS: Run the whole read-branch-increment-expire as a single EVAL on ElastiCache for Redis, using redis.call('TIME') as the one authoritative clock.
Trade-off: All decision logic lives in a Lua script you must test and version, in exchange for true atomicity that MULTI/EXEC and WATCH can't give you without retry contention.

rate-limiting

When: You need a smooth rolling limit without the 2x boundary burst of a fixed window, but a per-request timestamp log would OOM the store under attack (tens of GB at the limit).
AWS: Store two integer keys per tenant (current + previous window) on ElastiCache and compute a weighted estimate; hash-tag the keys so the pair lands on one cluster slot; guard EXPIRE to fire only on the first write of the window (cur == 1), not every request.
Trade-off: A bounded approximation error (~half the previous window's rate at the transition) in exchange for ~11,700x less memory than a sliding-window log — not suitable for billing-grade exactness; treat slot-migration TRYAGAIN/ASK/MOVED as fail-open rather than retrying.

rate-limiting

When: Tenants have legitimately bursty traffic (batch jobs, expensive queries) that a flat per-second window would unfairly punish.
AWS: A Redis hash per tenant (tokens, last_refill) refilled lazily in Lua at rate r up to cap b; bill expensive operations (e.g. GraphQL query cost) as more tokens.
Trade-off: You allow bursts up to b above the sustained rate, and must use integer/server-clock arithmetic to avoid float drift and clock-skew bugs, versus the perfectly smooth output of a leaky bucket.

coordination

When: A globally exact counter would need a cross-region synchronization point whose round-trip exceeds your entire latency budget.
AWS: Run an independent ElastiCache cluster per cell; a single control-plane authority computes per_cell_limit = global_limit / active_cell_count from a DynamoDB cell-registry and republishes on cell join/leave; pin clients to scaleReads:'master' and assert ROLE=primary on startup (never read replicas — stale = silent bypass).
Trade-off: A tenant balanced across C cells can consume up to C-times their global limit only within the recompute window (alarm at 1.5x, page at 3x), in exchange for local sub-millisecond hops and a cell-scoped blast radius.

rate-limiting

When: Hash-tagging pins a tenant's keys to one slot for atomicity, but a single very-high-rate tenant then drives all its EVALs to one shard and saturates it while others sit idle.
AWS: For tenants above an ops threshold, split the key into M sub-keys ({rl:t:s0}..{rl:t:sM-1}) on ElastiCache, route each request to a random sub-shard, and sum the approximate sub-counts.
Trade-off: An extra over-count on whale tenants (the same approximation accepted at the cell level, one level down) in exchange for spreading their load off a single hot slot; the common tenant stays exact.

coordination

When: The shared rate-limit store can fail or slow down, and a limiter bug must never reject legitimate paying traffic — but going fully open invites abuse during the outage.
AWS: On ElastiCache error/timeout, allow and emit a metric; keep WAF/usage-plan throttles as the volumetric floor that survives a Redis outage; size the in-process backstop bucket as tenant_limit / current_healthy_instances (from EDS) with a hard ceiling; trip a circuit breaker on sustained errors and recover half-open with U(0, 0.2*cooldown) jitter and a single probe.
Trade-off: Brief over-allowance during failover and coarser enforcement in degraded mode, bounded by the edge floor so induced-Redis-failure can't become unbounded pass-through, in exchange for never blocking real customers when the limiter is sick.

coordination

When: Live config (per-tenant limits) is distributed from a store to many stateless deciders; a single Streams->Lambda->push path can stall (iterator lag, poison record, fan-out storm) and freeze all tenants on stale limits.
AWS: DynamoDB Streams triggers a reserved-concurrency Lambda (DLQ + BisectBatchOnFunctionError, IteratorAge alarm) that publishes one event to SNS; each cell consumes via its own SQS queue; deciders snapshot config from DynamoDB on startup and serve last-known-good on staleness; AppConfig gates staged rollout/rollback.
Trade-off: More moving parts than a direct push, in exchange for bounded fan-out (O(M) publishes + O(N) batched consumers), no forward-progress single point, and a deterministic cold-start config.

coordination

When: Throttled clients all read the same reset timestamp and retry on the same millisecond, turning a transient limit into a self-inflicted synchronized DDoS.
AWS: Return 429 with a jittered Retry-After and prefer no-hard-reset algorithms (sliding window / token bucket); have clients use full-jitter exponential backoff: sleep = min(cap, base*2^n) * U(0,1).
Trade-off: Slightly longer worst-case client wait in exchange for spreading retries across a window instead of concentrating them at a clock boundary.