payments-exactly-once
Gate every payment on a conditional write to the primary
- When
- A client may retry a payment request after a lost response, a timeout, or a load-balancer hiccup, and a second execution would move money twice.
- AWS
- Client mints a UUID idempotency key before the first attempt and resends it on every retry; the server does a conditional PutItem with attribute_not_exists(pk) on a DynamoDB table (key sharded tenant#id#shard#mod_N to avoid a hot partition) to claim the key at PENDING, drives it PENDING to PROCESSING to COMPLETE with a leaseExpiry on the PROCESSING row so a crashed worker self-heals in ~30 s instead of orphaning the key for the TTL, and on a ConditionalCheckFailed reads the winner's stored response with ConsistentRead true and returns it verbatim.
- Trade-off
- Every payment pays for one strongly consistent DynamoDB write plus a consistent read on the retry path, and you must fail closed (reject) when the gate is unavailable — single-region and strongly consistent on purpose, since DynamoDB Global Tables (last-writer-wins, eventually consistent) would reopen the replica-lag double-charge hole. The gate's availability becomes a hard dependency of accepting any payment.
payments-exactly-once
Model money as append-only double-entry pairs that sum to zero
- When
- You need an auditable, regulator-grade record of fund flows where balances are derivable and never silently corrupted by a partial write.
- AWS
- Write exactly one DEBIT and one CREDIT row per transaction in a single Aurora PostgreSQL ACID transaction, with a UNIQUE constraint on idempotency_key as a second line of defense behind the DynamoDB gate; never UPDATE or DELETE a row, only append reversing entries.
- Trade-off
- The ledger grows monotonically forever (7-year retention, no compaction) and corrections cost two extra reversing rows instead of an edit, in exchange for an immutable audit trail and a database-enforced no-duplicate guarantee that survives application bugs.
messaging
Publish events by writing them inside the same transaction
- When
- Downstream services must learn about a committed ledger write, but a publish-after-commit can lose the event on a crash and a publish-before-commit can emit a phantom event on rollback.
- AWS
- Insert the event row into an outbox table in the same Aurora transaction as the ledger entries; a single Lambda (reserved concurrency 1) polls WHERE published_at IS NULL ... FOR UPDATE SKIP LOCKED every 100 ms with a tunable batch size, publishes to SQS with exponential backoff, and stamps published_at; a publish_attempts counter routes poison rows to a DLQ and advances the head so one bad row cannot block the stream; consumers dedupe on the idempotency key in their own inbox.
- Trade-off
- Delivery is at-least-once (consumers must be idempotent) and events lag the commit by up to a poll interval plus cold start; the Lambda poller is defensible to ~1,000 events/s, above which you graduate to DMS to Kinesis CDC (no MSK required) rather than relax the never-lost, never-phantom guarantee.
payments-exactly-once
Orchestrate multi-step settlement with compensating transactions
- When
- A settlement spans steps that cannot share one ACID transaction (hold, external KYC verify, release, collect fee) and a mid-flight failure must reverse only the steps that already committed.
- AWS
- Step Functions runs the saga (Express for short settlements, Standard for long-running limbo cases); each state persists, steps retry with full-jitter backoff and a TimeoutSeconds on the external KYC call guarded by a DynamoDB circuit breaker; a Catch block runs a compensating state machine that reverses committed steps using derived idempotency keys ({key}:compensate:{step}); a stuck compensation flips the account to LIMBO and pages via EventBridge to SNS to AWS Systems Manager Incident Manager.
- Trade-off
- You accept eventual consistency across the settlement and the operational burden of compensation logic plus a manual-resolution path for stuck reversals, in exchange for a durable, restartable workflow with no distributed-transaction coordinator across external APIs.
storage
Read the idempotency store from the primary, never a replica
- When
- An idempotency check that reads a stale replica can return not-found for a key already committed to the primary, and the system then re-executes a payment it already ran.
- AWS
- Issue idempotency reads with DynamoDB ConsistentRead true (or against the Aurora writer endpoint), never an eventually-consistent read or a read replica; treat replica lag on this path as a correctness defect, not a performance one.
- Trade-off
- Strongly consistent reads cost twice the read-capacity of eventually-consistent ones and forgo the latency win of a nearby replica, in exchange for eliminating the replica-lag duplicate-charge that Airbnb traced to reading idempotency keys from a MySQL read replica.