Architecture
Distributed rate limiting at API gateway scale
The same limiter, assembled one layer at a time. Scroll to build it up — or use the arrow keys.
Shed the cheap traffic at the edge
Before anything stateful, CloudFront + WAF rate-based rules absorb volumetric IP floods and API Gateway REST usage plans apply a coarse per-API-key throttle. This edge layer is also the enforced volumetric floor that survives a Redis outage — and, at ~$370k/month, the dominant cost driver, not the Redis fleet.
Envoy fronts the cell
An Envoy proxy in each cell carries the rate-limit filter. It turns an authenticated request into ordered descriptors — (tenant, op), (tenant), (ip) — from a verified JWT claim, never a client-supplied header.
Atomic decision in-cell
Envoy calls a stateless ratelimit gRPC service, which runs one atomic Lua EVAL against the cell's ElastiCache Redis Cluster — a sliding-window counter or token bucket, two hash-tagged keys per tenant, sub-millisecond hop.
Allow, or 429 with jitter
Under the limit, the request flows to the upstream service. Over it, the client gets a 429 with a jittered Retry-After so 10k throttled clients don't retry on the same millisecond.
Quota config as data, not deploys
Tenant plans live in DynamoDB; a Streams-driven, reserved-concurrency control-plane Lambda (DLQ-backed) publishes resolved limits to SNS, and each cell drains its own SQS queue — no fan-out storm. Services snapshot config from DynamoDB on startup and serve last-known-good on staleness, so a stalled push never freezes enforcement.
Observe, then enforce
The ratelimit service aggregates allow/deny per descriptor in 1-second buckets and flushes them as EMF (not per-decision PutMetricData, which is impossible at 1M req/s); CloudWatch extracts metrics and powers dark-launch — a new limit runs in count mode before it's flipped to enforce.