Architecture

Distributed rate limiting at API gateway scale

The same limiter, assembled one layer at a time. Scroll to build it up — or use the arrow keys.

01

Shed the cheap traffic at the edge

Before anything stateful, CloudFront + WAF rate-based rules absorb volumetric IP floods and API Gateway REST usage plans apply a coarse per-API-key throttle. This edge layer is also the enforced volumetric floor that survives a Redis outage — and, at ~$370k/month, the dominant cost driver, not the Redis fleet.

02

Envoy fronts the cell

An Envoy proxy in each cell carries the rate-limit filter. It turns an authenticated request into ordered descriptors — (tenant, op), (tenant), (ip) — from a verified JWT claim, never a client-supplied header.

03

Atomic decision in-cell

Envoy calls a stateless ratelimit gRPC service, which runs one atomic Lua EVAL against the cell's ElastiCache Redis Cluster — a sliding-window counter or token bucket, two hash-tagged keys per tenant, sub-millisecond hop.

04

Allow, or 429 with jitter

Under the limit, the request flows to the upstream service. Over it, the client gets a 429 with a jittered Retry-After so 10k throttled clients don't retry on the same millisecond.

05

Quota config as data, not deploys

Tenant plans live in DynamoDB; a Streams-driven, reserved-concurrency control-plane Lambda (DLQ-backed) publishes resolved limits to SNS, and each cell drains its own SQS queue — no fan-out storm. Services snapshot config from DynamoDB on startup and serve last-known-good on staleness, so a stalled push never freezes enforcement.

06

Observe, then enforce

The ratelimit service aggregates allow/deny per descriptor in 1-second buckets and flushes them as EMF (not per-decision PutMetricData, which is impossible at 1M req/s); CloudWatch extracts metrics and powers dark-launch — a new limit runs in count mode before it's flipped to enforce.

https WAF Clients 10k tenants CloudFront + WAF rate rules API Gateway usage-plan throttle auth'd req Envoy sidecar ratelimit gRPC filter descriptors EVAL (Lua) ratelimit svc gRPC, stateless ElastiCache Redis Cluster sliding-window / token allow deny Upstream service 429 + Retry-After jittered backoff Streams quota sync DynamoDB tenant → plan limits control plane Lambda → SNS/SQS metrics CloudWatch alarms + dark-launch