payments-exactly-once
Idempotency-key dedup gate with atomic conditional insert
- When
- A payment command can be retried — by a flaky client, an API Gateway retry, an SQS redrive — and two retries can arrive concurrently before the first has committed. A naive read-then-write check ('does this key exist yet?') lets both retries pass and double-charges the account.
- AWS
- A DynamoDB idempotency table keyed on tenant_id#sha256(client_id + request_id + amount + currency + timestamp_bucket) — the tenant_id prefix lets dynamodb:LeadingKeys clamp each tenant's credential to its own partition space. The first thing every command does is a conditional PutItem with attribute_not_exists — an atomic compare-and-set that claims the key with status=PENDING. Exactly one writer wins the partition; the loser catches ConditionalCheckFailedException and returns the stored prior result. The Aurora journal commits as a SEPARATE phase (DynamoDB and Aurora are distinct transactional domains — there is no 2PC between them), then the claim flips to status=COMPLETE. A sweeper Lambda (EventBridge every 5 minutes) resolves any claim stranded in PENDING against Aurora — completing it or releasing it for retry. A 30-day TTL reaps the keyspace.
- Trade-off
- This is safe eventual consistency, not a single atomic unit: there is an observable window between the claim and the journal commit, bounded to ~5 minutes by the sweeper rather than to the 30-day TTL. The conditional write adds one synchronous round-trip on the hot path, and the timestamp_bucket bounds the dedup window. The idempotency store is a correctness-critical dependency, not a cache: if it is unavailable you must fail closed.
storage
Write-sharded hot account (virtual sub-accounts summed at read)
- When
- A single popular account — a marketplace platform account, a viral seller — receives thousands of writes per second, far past the ~1,000 writes/s a single partition key sustains. Every journal entry for that account contends on one row or one partition and the hot key throttles.
- AWS
- Split the hot account into N=256 virtual shards (account_id#000 .. account_id#255). At a ~1,000 writes/s per-key ceiling, 200,000 writes/s needs 200 shards — 256 is the next power of two with ~20% headroom, so no overflow tier is needed. Writes round-robin or hash across shards, so 256 partition keys absorb the load instead of one. The true balance is SUM over all shards, materialised in ElastiCache so the read is O(1) and unchanged by N. A control table flags which accounts are hot and stores N per account; known-hot accounts are pre-sharded at creation or early detection, not reactively mid-storm.
- Trade-off
- Re-sharding later is a migration, not a config flip — which is why hot accounts are pre-sharded. Reads fan-in across the shards only on a cache rebuild; the hot path trusts the materialised sum. You give up the simplicity of one row per account balance for the ability to absorb a write storm.
payments-exactly-once
Deferred double-entry invariant (SUM of entries = 0 per transaction)
- When
- Every money movement must conserve value: for each transaction the debits and credits must net to exactly zero, or money was created or destroyed. But the debit row and the credit row cannot both be inserted in the same instant, so a row-level check would reject the half-written transaction.
- AWS
- Aurora PostgreSQL holds immutable journal entries (journal_entry_id, transaction_id, account_id, amount_cents BIGINT, direction). A DEFERRABLE INITIALLY DEFERRED constraint (or a constraint trigger) evaluates SUM(signed_amount) GROUP BY transaction_id = 0 at COMMIT, not per row — so a transaction can write its debit and credit legs and is only validated when complete. Amounts are integer cents (BIGINT), never floats, so the sum is exact. Entries are append-only: no UPDATE, no DELETE; a reversal is a new compensating transaction.
- Trade-off
- Deferred constraints defer the failure to commit time, so the application must handle a late rollback of an otherwise-accepted transaction. Integer cents means currency precision is fixed at the minor unit — sub-cent intermediate math (FX, interest accrual) must round explicitly and book the rounding remainder somewhere.
analytics-olap
CQRS balance projection — immutable journal, materialised read model
- When
- An account's balance is the sum of all its journal entries. Computing it by scanning every entry is O(entries) and a high-velocity account accumulates millions of rows — a balance read that re-sums history is unusable on the hot path.
- AWS
- The write side is the immutable Aurora journal. A CQRS projector — Lambda triggered by a Kinesis On-Demand stream of journal entries (fed by DMS / logical replication, partitioned by logical account_id so deltas apply in LSN order) — incrementally maintains a materialised balance per account in ElastiCache via atomic INCRBY (read model) and analytics snapshots in DynamoDB. ElastiCache Valkey is chosen over DAX because the hot path is an atomic in-memory increment, which DAX (a DynamoDB read-cache) does not provide. The API reads the cache; on a miss it falls back to a bounded Aurora aggregate over the latest snapshot plus recent entries, never the full history.
- Trade-off
- The read model is eventually consistent — there is a projection lag (typically sub-second) between a posted journal entry and the visible balance. You accept showing a slightly stale balance for the ability to read in O(1), and you must reconcile the projection against the journal of record to catch drift.
payments-exactly-once
Saga with compensating entries for cross-account transfer
- When
- A transfer debits account A and credits account B. If the process dies after the debit but before the credit, money is stranded — destroyed from A, never created in B. A naive two-step write has no atomicity across the legs.
- AWS
- Model the transfer as a single Aurora transaction when both legs share a database (the deferred SUM=0 constraint then enforces atomicity for free). When legs cross service boundaries, use a Step Functions Express saga: each step is idempotent on transaction_id#step_name, and any failure after the debit triggers a compensating reversal transaction (a new credit back to A) rather than a destructive rollback. Express (at-least-once, ~$8k/month at 2,000 sagas/s) is chosen over Standard (exactly-once, ~$648k/month) because exactly-once belongs on the per-step idempotency key, not on the orchestrator — which the design already provides.
- Trade-off
- A saga is eventually consistent and money is briefly in an intermediate state (debited from A, not yet credited to B) visible to reconciliation. Express gives at-least-once execution, so each step must be idempotent — a burden the design already carries. Compensation is itself a journal entry, so the audit trail shows the failed-and-reversed path rather than hiding it — the ledger records attempts, not just successes.
analytics-olap
WORM audit trail with nightly reconciliation sweep
- When
- Compliance requires an immutable, queryable record of every journal entry retained for years, and the materialised balances must be provably equal to the sum of the journal of record. The reconciliation job itself is the only independent drift detector, so its silent failure must also be caught.
- AWS
- DMS streams journal entries to S3 as Parquet under S3 Object Lock (WORM, compliance mode) — write-once, immutable, queried by compliance via Athena. The same export doubles as the journal archive so Aurora keeps only a 90-day hot window. A nightly Step Functions + Athena job re-sums every account (GROUP BY account_id over Parquet, ~$17/run) and compares to the materialised balance cache, writing discrepancies to S3 and alerting when drift exceeds $0.01; Glue is reserved for the bulk genesis scan. A CloudWatch Alarm fires if the reconciliation has not SUCCEEDED within 26 hours — a dead-man's switch, because absence of a success is itself a page-worthy event. Audit is layered: CloudTrail covers the control plane; the Parquet export is the data-plane record.
- Trade-off
- Object Lock means data genuinely cannot be deleted before its retention expires — a misclassified or PII-bearing record is stuck for the retention term, so the schema must guarantee no PII enters the journal. Reconciliation is a batch backstop, not a real-time guarantee: drift is detected within a day, not within a second.