Patterns from this design

Distributed payment ledger with idempotent settlement

payments-exactly-once

When: A client may retry a payment request after a lost response, a timeout, or a load-balancer hiccup, and a second execution would move money twice.
AWS: Client mints a UUID idempotency key before the first attempt and resends it on every retry; the server does a conditional PutItem with attribute_not_exists(pk) on a DynamoDB table (key sharded tenant#id#shard#mod_N to avoid a hot partition) to claim the key at PENDING, drives it PENDING to PROCESSING to COMPLETE with a leaseExpiry on the PROCESSING row so a crashed worker self-heals in ~30 s instead of orphaning the key for the TTL, and on a ConditionalCheckFailed reads the winner's stored response with ConsistentRead true and returns it verbatim.
Trade-off: Every payment pays for one strongly consistent DynamoDB write plus a consistent read on the retry path, and you must fail closed (reject) when the gate is unavailable — single-region and strongly consistent on purpose, since DynamoDB Global Tables (last-writer-wins, eventually consistent) would reopen the replica-lag double-charge hole. The gate's availability becomes a hard dependency of accepting any payment.

payments-exactly-once

When: You need an auditable, regulator-grade record of fund flows where balances are derivable and never silently corrupted by a partial write.
AWS: Write exactly one DEBIT and one CREDIT row per transaction in a single Aurora PostgreSQL ACID transaction, with a UNIQUE constraint on idempotency_key as a second line of defense behind the DynamoDB gate; never UPDATE or DELETE a row, only append reversing entries.
Trade-off: The ledger grows monotonically forever (7-year retention, no compaction) and corrections cost two extra reversing rows instead of an edit, in exchange for an immutable audit trail and a database-enforced no-duplicate guarantee that survives application bugs.

messaging

When: Downstream services must learn about a committed ledger write, but a publish-after-commit can lose the event on a crash and a publish-before-commit can emit a phantom event on rollback.
AWS: Insert the event row into an outbox table in the same Aurora transaction as the ledger entries; a single Lambda (reserved concurrency 1) polls WHERE published_at IS NULL ... FOR UPDATE SKIP LOCKED every 100 ms with a tunable batch size, publishes to SQS with exponential backoff, and stamps published_at; a publish_attempts counter routes poison rows to a DLQ and advances the head so one bad row cannot block the stream; consumers dedupe on the idempotency key in their own inbox.
Trade-off: Delivery is at-least-once (consumers must be idempotent) and events lag the commit by up to a poll interval plus cold start; the Lambda poller is defensible to ~1,000 events/s, above which you graduate to DMS to Kinesis CDC (no MSK required) rather than relax the never-lost, never-phantom guarantee.

payments-exactly-once

When: A settlement spans steps that cannot share one ACID transaction (hold, external KYC verify, release, collect fee) and a mid-flight failure must reverse only the steps that already committed.
AWS: Step Functions runs the saga (Express for short settlements, Standard for long-running limbo cases); each state persists, steps retry with full-jitter backoff and a TimeoutSeconds on the external KYC call guarded by a DynamoDB circuit breaker; a Catch block runs a compensating state machine that reverses committed steps using derived idempotency keys ({key}:compensate:{step}); a stuck compensation flips the account to LIMBO and pages via EventBridge to SNS to AWS Systems Manager Incident Manager.
Trade-off: You accept eventual consistency across the settlement and the operational burden of compensation logic plus a manual-resolution path for stuck reversals, in exchange for a durable, restartable workflow with no distributed-transaction coordinator across external APIs.

storage

When: An idempotency check that reads a stale replica can return not-found for a key already committed to the primary, and the system then re-executes a payment it already ran.
AWS: Issue idempotency reads with DynamoDB ConsistentRead true (or against the Aurora writer endpoint), never an eventually-consistent read or a read replica; treat replica lag on this path as a correctness defect, not a performance one.
Trade-off: Strongly consistent reads cost twice the read-capacity of eventually-consistent ones and forgo the latency win of a nearby replica, in exchange for eliminating the replica-lag duplicate-charge that Airbnb traced to reading idempotency keys from a MySQL read replica.