rate-limiting
Make the limiter decision atomic with one Lua script
- When
- A rate-limit decision reads a counter, branches on it, increments, and conditionally sets a TTL — and concurrent requests or a mid-sequence crash can leave a key with no expiry, silently disabling the limit.
- AWS
- Run the whole read-branch-increment-expire as a single EVAL on ElastiCache for Redis, using redis.call('TIME') as the one authoritative clock.
- Trade-off
- All decision logic lives in a Lua script you must test and version, in exchange for true atomicity that MULTI/EXEC and WATCH can't give you without retry contention.
rate-limiting
Approximate the window with two counters, not a log
- When
- You need a smooth rolling limit without the 2x boundary burst of a fixed window, but a per-request timestamp log would OOM the store under attack (tens of GB at the limit).
- AWS
- Store two integer keys per tenant (current + previous window) on ElastiCache and compute a weighted estimate; hash-tag the keys so the pair lands on one cluster slot; guard EXPIRE to fire only on the first write of the window (cur == 1), not every request.
- Trade-off
- A bounded approximation error (~half the previous window's rate at the transition) in exchange for ~11,700x less memory than a sliding-window log — not suitable for billing-grade exactness; treat slot-migration TRYAGAIN/ASK/MOVED as fail-open rather than retrying.
rate-limiting
Reward well-behaved bursts with a token bucket
- When
- Tenants have legitimately bursty traffic (batch jobs, expensive queries) that a flat per-second window would unfairly punish.
- AWS
- A Redis hash per tenant (tokens, last_refill) refilled lazily in Lua at rate r up to cap b; bill expensive operations (e.g. GraphQL query cost) as more tokens.
- Trade-off
- You allow bursts up to b above the sustained rate, and must use integer/server-clock arithmetic to avoid float drift and clock-skew bugs, versus the perfectly smooth output of a leaky bucket.
coordination
Limit per cell, accept bounded over-counting
- When
- A globally exact counter would need a cross-region synchronization point whose round-trip exceeds your entire latency budget.
- AWS
- Run an independent ElastiCache cluster per cell; a single control-plane authority computes per_cell_limit = global_limit / active_cell_count from a DynamoDB cell-registry and republishes on cell join/leave; pin clients to scaleReads:'master' and assert ROLE=primary on startup (never read replicas — stale = silent bypass).
- Trade-off
- A tenant balanced across C cells can consume up to C-times their global limit only within the recompute window (alarm at 1.5x, page at 3x), in exchange for local sub-millisecond hops and a cell-scoped blast radius.
rate-limiting
Counter-shard the whale tenants within a cell
- When
- Hash-tagging pins a tenant's keys to one slot for atomicity, but a single very-high-rate tenant then drives all its EVALs to one shard and saturates it while others sit idle.
- AWS
- For tenants above an ops threshold, split the key into M sub-keys ({rl:t:s0}..{rl:t:sM-1}) on ElastiCache, route each request to a random sub-shard, and sum the approximate sub-counts.
- Trade-off
- An extra over-count on whale tenants (the same approximation accepted at the cell level, one level down) in exchange for spreading their load off a single hot slot; the common tenant stays exact.
coordination
Fail open with a local backstop and a circuit breaker
- When
- The shared rate-limit store can fail or slow down, and a limiter bug must never reject legitimate paying traffic — but going fully open invites abuse during the outage.
- AWS
- On ElastiCache error/timeout, allow and emit a metric; keep WAF/usage-plan throttles as the volumetric floor that survives a Redis outage; size the in-process backstop bucket as tenant_limit / current_healthy_instances (from EDS) with a hard ceiling; trip a circuit breaker on sustained errors and recover half-open with U(0, 0.2*cooldown) jitter and a single probe.
- Trade-off
- Brief over-allowance during failover and coarser enforcement in degraded mode, bounded by the edge floor so induced-Redis-failure can't become unbounded pass-through, in exchange for never blocking real customers when the limiter is sick.
coordination
Distribute quota config without a single-coordinator freeze
- When
- Live config (per-tenant limits) is distributed from a store to many stateless deciders; a single Streams->Lambda->push path can stall (iterator lag, poison record, fan-out storm) and freeze all tenants on stale limits.
- AWS
- DynamoDB Streams triggers a reserved-concurrency Lambda (DLQ + BisectBatchOnFunctionError, IteratorAge alarm) that publishes one event to SNS; each cell consumes via its own SQS queue; deciders snapshot config from DynamoDB on startup and serve last-known-good on staleness; AppConfig gates staged rollout/rollback.
- Trade-off
- More moving parts than a direct push, in exchange for bounded fan-out (O(M) publishes + O(N) batched consumers), no forward-progress single point, and a deterministic cold-start config.
coordination
Jitter the reset to defuse the retry stampede
- When
- Throttled clients all read the same reset timestamp and retry on the same millisecond, turning a transient limit into a self-inflicted synchronized DDoS.
- AWS
- Return 429 with a jittered Retry-After and prefer no-hard-reset algorithms (sliding window / token bucket); have clients use full-jitter exponential backoff: sleep = min(cap, base*2^n) * U(0,1).
- Trade-off
- Slightly longer worst-case client wait in exchange for spreading retries across a window instead of concentrating them at a clock boundary.