Architecture

Push notification fan-out at scale

Two-tier fan-out from a viral event to a ULID-keyed inbox to effectively-once APNs/FCM delivery, built entirely on AWS-managed services.

01

Event ingestion

flowchart LR User([App user action]) --> API[Event API] API --> Kinesis[(Kinesis Data Streams\npartition by source user)]

Author actions (post, like, comment) land on a durable ordered log. Kinesis is partitioned by source user ID so each originator's events stay in order and load spreads by author.

02

Two-tier fan-out tier

flowchart LR User([App user action]) --> API[Event API] API --> Kinesis[(Kinesis Data Streams\npartition by source user)] Kinesis --> Worker[Fan-out Lambda] Worker --> Redis[(ElastiCache\nhot follower lists)] Worker --> Followers[(DynamoDB\nfull follower lists)] Worker -->|over 10k followers| Pointer[Celebrity event pointer] Worker -->|under 10k followers| FanWrite[Write to follower inboxes]

Lambda workers triggered by Kinesis read the follower list. Regular accounts fan out on write to follower inboxes; accounts over 10k followers store one celebrity-event pointer for read-time merge.

03

ULID-keyed notification inbox

flowchart LR Kinesis[(Kinesis Data Streams)] --> Worker[Fan-out Lambda] Worker --> Redis[(ElastiCache\nfollower lists)] Worker --> Inbox[(DynamoDB inbox\ntenant-user PK plus ULID SK)] Inbox --> GSI[(GSI tenant-user and read_status\nunread count)] Inbox --> Global[(Global Tables\ncross-region)]

Fan-out writes notification records into a DynamoDB inbox keyed by tenant_id#user_id plus a ULID sort key. Time-ordered keys give newest-first reads, cursor pagination, a 90-day TTL, and an unread-count GSI. The GSI doubles write cost and Global Tables replicate cross-region.

04

Delivery queues and channel workers

flowchart TD Worker[Fan-out Lambda] --> Inbox[(DynamoDB inbox\nULID SK)] Worker --> Push[SQS Standard mobile-push] Worker --> Email[SQS Standard email] Worker --> InApp[SQS Standard in-app] Worker --> SMS[SQS Standard sms] Push --> PushW[Push consumer Lambda] Email --> EmailW[Email consumer Lambda] Push --> DLQ[(Dead-letter queue\n3 retries)]

After writing the inbox row the worker enqueues a delivery job (with a tenant_id attribute) onto a per-channel SQS Standard queue. Channel-specific consumer Lambdas drain each queue at a controlled rate, with a dead-letter queue after three retries.

05

Full system - delivery, dedup, preferences, audit

flowchart TD User([App user action]) --> API[Event API] subgraph Ingest[Ingest] API --> Kinesis[(Kinesis Data Streams)] end subgraph Fanout[Fan-out tier] Kinesis --> Worker[Fan-out Lambda] Worker --> Redis[(ElastiCache\nfollower lists)] end subgraph Data[Data tier] Worker --> Inbox[(DynamoDB inbox\nULID SK)] Prefs[(DynamoDB preferences\nplus timezone)] end subgraph Delivery[Delivery tier] Worker --> Push[SQS Standard mobile-push] Worker --> EmailQ[SQS Standard email] Push --> PushW[Push consumer Lambda] EmailQ --> EmailW[Email consumer Lambda] PushW --> Dedup[(MemoryDB dedup\nSET NX 30-day TTL)] PushW --> Pinpoint[Consent plus Pinpoint quiet hours] Pinpoint --> SNS[SNS mobile push] SNS --> APNs([APNs and FCM]) EmailW --> SES[Amazon SES] Push --> DLQ[(Dead-letter queue)] end Prefs --> DDBStream[(DynamoDB Streams)] DDBStream --> Pipe[EventBridge Pipe] Pipe --> Invalidate[Cache invalidation Lambda] Invalidate --> PrefCache[(ElastiCache pref cache)] Prefs --> PrefCache PrefCache --> PushW PushW --> Audit[(Firehose to S3\nObject Lock audit)]

The complete picture: effectively-once dedup via MemoryDB SET NX (fail-closed), consent and quiet-hours checks before Pinpoint, transport via SNS mobile push to APNs/FCM and SES for email, preference cache invalidated by DynamoDB Streams plus an EventBridge Pipe, and a Firehose to S3 Object Lock audit trail.