Patterns from this design

Push notification fan-out at scale

notification-fanout

When: An event must reach every follower, but follower counts span six orders of magnitude — most users have hundreds, a celebrity has tens of millions. A single fan-out strategy either detonates on write (celebrity) or wastes reads (everyone else).
AWS: Kinesis Data Streams (partition by source user) feeds fan-out Lambdas. Below the ~10k-follower threshold, write one notification per follower into a DynamoDB inbox (materialised, cheap reads). Above it, write one celebrity-event pointer and merge it into followers' streams at read time. The threshold is config, tuned from cost/latency telemetry.
Trade-off: Celebrity-follower inbox reads become more expensive and complex — every read must fetch and time-merge celebrity pointers. You also accept a small consistency window where a celebrity post appears in followers' streams milliseconds apart rather than atomically.

id-generation

When: You need a globally unique ID for high-volume time-series rows that are read newest-first, without a central coordinator, and you want pagination for free.
AWS: Generate a ULID (48-bit ms timestamp + 80-bit random) in the worker and store it as the DynamoDB sort key under a user partition key. Lexical order equals time order, so newest-first is a descending range query and the sort key doubles as the pagination cursor.
Trade-off: Ordering is only monotonic to the millisecond — two IDs minted in the same ms on different workers have arbitrary relative order. You give up strict global monotonicity (which Snowflake provides via coordinated machine IDs) for zero coordination.

messaging

When: Delivery targets (APNs, FCM, email, SMS) have independent rate limits and failure modes, and a burst (10M jobs from one celebrity event) can exceed any single provider's — or FIFO's ~3,000 TPS per-group — sustainable throughput by orders of magnitude.
AWS: One SQS Standard queue per channel (mobile-push, email, in-app, SMS), each carrying a mandatory tenant_id message attribute and drained by a channel-specific consumer Lambda at a controlled rate. Standard is chosen over FIFO because the dedup store (not the queue) enforces idempotency and the ULID (not arrival order) carries display order, so FIFO's throughput ceiling buys nothing here. Cap Lambda reserved concurrency as the back-pressure valve. After 3 failed receives, transient failures redrive to source; malformed payloads route to a separate poison DLQ. A CloudWatch alarm fires on DLQ depth over 100.
Trade-off: SQS Standard is at-least-once (and unlike FIFO offers no ordering or built-in dedup), so a separate 30-day idempotency layer is mandatory. Per-channel queues also multiply operational surface (more queues, more DLQs, more alarms).

notification-fanout

When: An at-least-once delivery pipeline must not double-notify users, the dedup window must outlast every downstream provider's retry horizon, and the dedup state must survive a failover rather than resetting empty and re-delivering everything in flight.
AWS: Before each send, the consumer Lambda runs SET NX on Amazon MemoryDB for Redis with key sha256(tenant_id + notification_ulid + recipient_id) and a 30-day TTL. MemoryDB's durable multi-AZ transaction log survives failover without losing dedup state. The atomic NX guarantees one consumer wins the race; the tenant prefix prevents cross-tenant suppression. On MemoryDB unavailability the consumer fails closed (message stays in queue, retried with backoff — never a blind send); consent-bearing channels fall back to a durable DynamoDB conditional PutItem (attribute_not_exists, 30-day TTL).
Trade-off: The dedup keyspace is large and stateful — billions of 30-day keys run to hundreds of GB — and MemoryDB's durability premium over plain ElastiCache costs more. The honest guarantee is effectively-once, not exactly-once: residual duplicates remain possible on an un-flushed write lost in failover or a DLQ redrive after the 30-day TTL expires.

notification-fanout

When: You must deliver to APNs/FCM at burst scale, honour per-provider rate limits and warm connection pools, reap stale/invalid tokens, and schedule around quiet hours — without owning a stateful, throttle-sensitive client fleet.
AWS: SNS mobile push owns platform endpoints, the APNs/FCM feedback loop, and auto-deregistration of invalid tokens. Amazon Pinpoint layers quiet-hours and per-user timezone scheduling on top; Amazon SES handles email with sending-rate control. The fan-out tier never calls a provider directly.
Trade-off: You lose fine-grained control over connection-pool tuning and provider-specific behaviour, and you inherit SNS/Pinpoint quotas and abstractions. FCM's 600k msg/min cap is handled by sharding across K SNS platform applications (one per FCM sender project), with the consumer routing on recipient_user_id mod K to keep a device sticky to one project.

caching

When: Hot per-user preferences (quiet hours, opt-outs, consent) are read on every delivery and cannot be fetched from the database each time, but stale preferences cause consent violations and timezone bugs.
AWS: Write-through ElastiCache in front of a DynamoDB preferences table with per-user IANA timezone and push_consent. Change detection is source-of-truth-driven: DynamoDB Streams on the preferences table feeds an EventBridge Pipe to the cache-invalidation Lambda — no dedicated Kinesis stream, since pref changes are low-volume and the table already emits change records. A 5-minute TTL is only a backstop for missed records. Every push send checks push_consent (default-deny when absent, GDPR Art. 7); quiet-hours and consent both fail closed when the preference cannot be resolved.
Trade-off: Worst-case staleness equals the TTL if a stream record is dropped before the Pipe delivers. Synchronous source-of-truth re-checks for consent withdrawal add one read per consent-bearing delivery, and default-deny means a missing consent record suppresses sends until it is written.