Limit per cell, accept bounded over-counting
- When
- A globally exact counter would need a cross-region synchronization point whose round-trip exceeds your entire latency budget.
- AWS
- Run an independent ElastiCache cluster per cell; a single control-plane authority computes per_cell_limit = global_limit / active_cell_count from a DynamoDB cell-registry and republishes on cell join/leave; pin clients to scaleReads:'master' and assert ROLE=primary on startup (never read replicas — stale = silent bypass).
- Trade-off
- A tenant balanced across C cells can consume up to C-times their global limit only within the recompute window (alarm at 1.5x, page at 3x), in exchange for local sub-millisecond hops and a cell-scoped blast radius.
from: Distributed rate limiting at API gateway scale
Fail open with a local backstop and a circuit breaker
- When
- The shared rate-limit store can fail or slow down, and a limiter bug must never reject legitimate paying traffic — but going fully open invites abuse during the outage.
- AWS
- On ElastiCache error/timeout, allow and emit a metric; keep WAF/usage-plan throttles as the volumetric floor that survives a Redis outage; size the in-process backstop bucket as tenant_limit / current_healthy_instances (from EDS) with a hard ceiling; trip a circuit breaker on sustained errors and recover half-open with U(0, 0.2*cooldown) jitter and a single probe.
- Trade-off
- Brief over-allowance during failover and coarser enforcement in degraded mode, bounded by the edge floor so induced-Redis-failure can't become unbounded pass-through, in exchange for never blocking real customers when the limiter is sick.
from: Distributed rate limiting at API gateway scale
Distribute quota config without a single-coordinator freeze
- When
- Live config (per-tenant limits) is distributed from a store to many stateless deciders; a single Streams->Lambda->push path can stall (iterator lag, poison record, fan-out storm) and freeze all tenants on stale limits.
- AWS
- DynamoDB Streams triggers a reserved-concurrency Lambda (DLQ + BisectBatchOnFunctionError, IteratorAge alarm) that publishes one event to SNS; each cell consumes via its own SQS queue; deciders snapshot config from DynamoDB on startup and serve last-known-good on staleness; AppConfig gates staged rollout/rollback.
- Trade-off
- More moving parts than a direct push, in exchange for bounded fan-out (O(M) publishes + O(N) batched consumers), no forward-progress single point, and a deterministic cold-start config.
from: Distributed rate limiting at API gateway scale
Jitter the reset to defuse the retry stampede
- When
- Throttled clients all read the same reset timestamp and retry on the same millisecond, turning a transient limit into a self-inflicted synchronized DDoS.
- AWS
- Return 429 with a jittered Retry-After and prefer no-hard-reset algorithms (sliding window / token bucket); have clients use full-jitter exponential backoff: sleep = min(cap, base*2^n) * U(0,1).
- Trade-off
- Slightly longer worst-case client wait in exchange for spreading retries across a window instead of concentrating them at a clock boundary.
from: Distributed rate limiting at API gateway scale