Patterns from this design

Geospatial proximity search at scale

geo

Index points in a geo set and answer radius queries directly

When: You need 'all entities within R of a point' under tight latency and you don't want to roll your own spatial index, scan-and-filter the whole table, or stand up a search cluster for what is fundamentally a sorted-set lookup. Reach here for the live 'now' index, not for cold analytics.
AWS: Store points with GEOADD on ElastiCache for Redis and answer with GEOSEARCH ... BYRADIUS (the successor to GEORADIUS). Redis encodes lon/lat as a 52-bit geohash inside a sorted set, so a single-cell radius query is range scans over the set — sub-millisecond and built in. Rejected alternatives: Amazon Location Service Trackers (no radius-query API, no per-asset TTL, no p99 SLA at high write rates) and Amazon OpenSearch (segment-refresh lag fights sub-10 s freshness, no sub-second per-document TTL, segment merges wreck p99 under heavy writes).
Trade-off: GEOSEARCH returns everything in a circle/box but can't express arbitrary polygons or 'inside this delivery zone' — anything beyond a radius needs a point-in-polygon post-filter in your service, and the set holds only point geometry, not the rich attributes you must join back from another store. GEOSEARCH is also a single-key command: a query whose radius spans many sharded cells needs the scatter-gather pattern, not one call.

geo

Scatter-gather across sharded geo cells, merge in-process

When: You shard a geo index by fixed-area cell (to spread dense-city write/read load) but a query radius covers many cells across many shards. A single-key radius command can't cross shard slots, so the 'one round trip' latency story is false the moment the radius exceeds one cell.
AWS: In the query service, compute the cell covering of the query circle (S2/H3), fire one GEOSEARCH per covering cell in parallel — pipelined per shard node — then merge and re-sort the candidate sets by true distance and apply attribute filters in-process. Co-locate any per-cell attribute lookup (e.g. an availability HMGET) on the same shard via a matching hash tag so each shard answers in one pipelined round trip.
Trade-off: A single query becomes dozens of parallel commands (read amplification), and p99 is bounded by the slowest shard in the fan-out, not the average — so the cell size must be chosen against the radius to keep the covering in the tens, not hundreds. Coarsen the shard cell if the covering blows up, trading write-load distribution for a tighter fan-out.

geo

Always query the cell and its 8 neighbors

When: You shard or index by geohash/grid cell and answer proximity with a prefix or single-cell lookup — two entities 10 m apart but across a cell boundary get different prefixes, so a single-cell query silently drops half of them right where density matters most.
AWS: Expand the target cell to its 3x3 (N+8) neighbor set before querying, or let Redis GEOSEARCH do the expansion for you — it computes the covering neighbor cells internally so the boundary bug never reaches your code.
Trade-off: Correctness costs you up to a 9x read amplification per query (nine cell ranges instead of one), and at fine resolutions the radius may still spill past the first neighbor ring, so the cell size must be chosen against the query radius.

storage

Shard by spatial cell, not by region name

When: Geographically clustered load (downtown SF, downtown NYC at rush hour) concentrates every write and read for a city onto one shard when you shard by city/region — Lyft hit exactly this ceiling at ~100k ops/s per region shard.
AWS: Use a fixed-area spatial cell (S2 level 5 ≈ 1 km², or an H3 res) as the shard key and hash-tag the Redis key by that cell, so the ElastiCache Cluster's CRC16 slotting spreads a dense city across dozens of shards while keeping nearby data co-located for radius queries.
Trade-off: Cell population is not uniform — a stadium at game time still hot-spots its own cell — and changing the cell-to-shard mapping is a reshard, so you must pick a resolution and online-reshard during low-traffic windows rather than re-key on the fly.

storage

Let ephemeral presence expire instead of deleting it

When: You track 'who is here right now' (online drivers, live cursors, active sessions) and a client that vanishes without a goodbye — a dead phone, a dropped socket — would otherwise linger in the index forever and poison results.
AWS: Write each presence record to ElastiCache with a short TTL (30 s) refreshed on every heartbeat; absence of a refresh expires the record automatically, so you never need a separate reaper job or an explicit 'I'm leaving' message you might never get.
Trade-off: A network partition that drops heartbeats expires healthy clients into false absence, and the TTL is a direct freshness/load knob — shorter means fewer ghosts but more write pressure to keep everyone alive.

ingestion

Batch at an aggregation tier, buffer through a stream, fan out from a consumer

When: A huge, spiky fleet of writers (millions of mobile devices) would overwhelm a stateful store that can't hold millions of connections or absorb sudden bursts, and you need durable replay if a downstream consumer falls over. Watch the per-request edge cost: a managed API at millions of req/s and one-record-per-call stream PUTs can each be a seven- to eight-figure monthly bill before any compute runs.
AWS: Front the stream with an NLB + stateless connection-server fleet (ECS Fargate) that holds the persistent device connections and batches 50-100 records per Kinesis PutRecords call — cutting PUT units and per-request charges ~100x versus a direct API Gateway POST. Land writes in Kinesis Data Streams (1 MB/s per shard) and run a Lambda consumer fleet (enhanced fan-out for the latency-critical consumer, standard polling for lag-tolerant ones; explicit BatchSize and reserved concurrency) that drains micro-batches and fans out to ElastiCache and an S3 firehose. Report per-sink batch-item failures independently and buffer failed cache writes to a short-TTL SQS retry queue so one sink's failover never stalls the shard iterator.
Trade-off: You add up to a couple hundred ms of buffering latency to the write path and inherit shard-count capacity planning (and resharding), in exchange for protecting the store from connection storms, collapsing edge cost by two orders of magnitude, and getting durable replay for free. Stream replay is at-least-once, so every sink must be idempotent or dedup by per-shard sequence number.