Architecture

Geospatial proximity search at scale

A driver-location index assembled one layer at a time. Arrow keys or click to advance.

Absorb the write firehose at the edge

flowchart LR Drivers([10M Drivers - 2M writes/s]) --> NLB[NLB] NLB --> FF[Fargate Connection Fleet] FF -->|PutRecords batch| KDS[Kinesis Data Streams]

10M drivers pinging every 5s is 2M writes/s. Driver apps hold persistent TLS connections to an NLB-fronted Fargate connection fleet that batches 50-100 fixes per PutRecords call into Kinesis — cutting PUT units ~100x versus a direct API Gateway POST.

Fan out to the live index

flowchart LR Drivers([10M Drivers]) --> NLB[NLB] --> FF[Fargate Fleet] FF --> KDS[Kinesis] KDS --> LC[Lambda Consumer Fleet] LC -->|GEOADD TTL 30s| RD[(ElastiCache Redis Geo Index)]

A Lambda consumer fleet drains Kinesis in micro-batches and writes GEOADD into ElastiCache for Redis, each key carrying a 30-second TTL. The TTL is the whole anti-ghost mechanism: a driver whose phone dies simply ages out of the index.

Shard by spatial cell, not by city

flowchart LR Drivers([10M Drivers]) --> NLB --> FF[Fargate] FF --> KDS[Kinesis] KDS --> LC[Lambda Fleet] LC -->|GEOADD key=S2-cell-driverID| RD[(ElastiCache Redis Cluster - sharded by S2 cell)]

The Redis Cluster hash-tags location keys by S2 level-5 cell (~1 km2), so a dense downtown spreads across dozens of shards instead of hammering one NYC node. This is the fix for the hot-shard failure that geographic sharding always produces.

The rider query path — deliberately separate

flowchart LR subgraph WP[Write Path - 2M writes/s] Drivers([10M Drivers]) --> NLB --> FF[Fargate] --> KDS[Kinesis] KDS --> LC[Lambda] -->|GEOADD| RD[(ElastiCache Redis)] end subgraph RP[Read Path] Rider([Rider]) --> ALB[ALB] --> QS[Query Service ECS] end

A rider's find-nearby-drivers request comes in through an ALB to an ECS-hosted query service. The read path is deliberately separate from the write path so a query storm and a write firehose never contend for the same compute.

Scatter-gather GEOSEARCH across S2 cells

flowchart LR subgraph WP[Write Path] Drivers([10M Drivers]) --> NLB --> FF[Fargate] --> KDS[Kinesis] KDS --> LC[Lambda] -->|GEOADD| RD[(ElastiCache Redis)] end subgraph RP[Read Path] Rider([Rider]) --> ALB --> QS[Query Service] QS -->|parallel GEOSEARCH per S2 cell| RD QS --> RS([Top-N under 50ms p99]) end

A 5 km circle covers 20-80 S2 cells across many shards, and GEOSEARCH cannot cross slots. The query service computes the cell covering, fires one GEOSEARCH per cell in parallel, merges and re-sorts by true distance, and returns the top-N under 50ms p99.

Persist for history and cross-region

Raw fixes stream to S3 via Kinesis Data Firehose (Parquet/Snappy, ~$400/month, queried with Athena) — not DynamoDB, which at 2M writes/s would cost ~$3.2M/month. DynamoDB keeps only by-id driver state, replicated cross-region by Global Tables.

Tap the stream for surge pricing

An analytics Lambda reads the same Kinesis stream as a second consumer, aggregates driver density into DynamoDB counters per S2 cell, and the pricing service reads those. A supply-cliff breaker freezes the multiplier at last-known-good when a cell's driver count craters.