Real-time payment processing with distributed sagas
One checkout click, assembled one layer at a time — idempotency gate, ID, saga, gateway fence, ledger, reconciliation. Arrow keys or click to advance.
01
The idempotency gate is the first thing on the path
flowchart LR
Client([Merchant Checkout]) -->|POST with Idempotency-Key| AG[API Gateway plus WAF]
AG --> H[Payment Handler Lambda]
H -->|conditional put attribute_not_exists| IDM[(DynamoDB Idempotency)]
POST /v1/payments carries an Idempotency-Key header. The Lambda handler does a DynamoDB conditional put on sha256(merchant_id + key): first writer wins the slot and proceeds, every retry gets the cached result. No two copies of one logical payment ever reach the gateway.
02
A k-sortable payment ID, no coordinator
flowchart LR
Client([Merchant Checkout]) -->|POST with Idempotency-Key| AG[API Gateway plus WAF]
AG --> H[Payment Handler Lambda]
H -->|conditional put| IDM[(DynamoDB Idempotency)]
H -->|claim worker slot| WID[(DynamoDB Worker Slots)]
H --> SID[Snowflake ID Generator]
Owning the slot, the handler mints a 63-bit Snowflake-variant payment ID: 41-bit ms timestamp, 10-bit worker ID claimed at cold-start from DynamoDB, 12-bit sequence. Time-ordered for ledger range scans; ULIDs label audit events.
03
The saga, orchestrated — not two-phase commit
flowchart TD
Client([Merchant Checkout]) --> AG[API Gateway plus WAF]
AG --> H[Payment Handler Lambda]
H -->|conditional put| IDM[(DynamoDB Idempotency)]
H --> SID[Snowflake ID Generator]
H -->|StartSyncExecution| SF[Step Functions Express]
subgraph SAGA[Saga steps - each idempotent plus compensation]
S1[1 ValidatePayment] --> S2[2 ReserveBalance]
S2 --> S3[3 AuthorizeGateway]
S3 --> S4[4 CaptureOrVoid]
S4 --> S5[5 CommitLedger]
S5 --> S6[6 NotifyMerchant]
end
SF --> SAGA
The handler starts a Step Functions Express Workflow (sync, 5-min ceiling). Six idempotent steps; each forward step has a compensating step wired through a Catch. A coordinator crash mid-flight unwinds via compensation instead of stranding funds in limbo.
04
Fence the gateway call, then leave AWS for one hop
flowchart TD
Client([Merchant Checkout]) --> AG[API Gateway plus WAF]
AG --> H[Payment Handler Lambda]
H -->|conditional put| IDM[(DynamoDB Idempotency)]
H --> SID[Snowflake ID Generator]
H -->|StartSyncExecution| SF[Step Functions Express]
subgraph SAGA[Saga steps]
S1[1 ValidatePayment] --> S2[2 ReserveBalance]
S2 --> S3[3 AuthorizeGateway]
S3 --> S4[4 CaptureOrVoid]
S4 --> S5[5 CommitLedger]
S5 --> S6[6 NotifyMerchant]
end
SF --> SAGA
S1 -.-> FD[Fraud Detector]
S3 -->|SET NX lock 5s| RL[(ElastiCache Redis Lock)]
S3 -->|idempotency-key eq payment_id| GW([Stripe or Adyen Gateway])
Step 3 is the one call we cannot blindly retry. A Redis NX lock with a 5 s TTL ensures only one Lambda calls the external gateway for a given payment; the internal payment ID is passed as the gateway idempotency-key as a second net. Fraud Detector scores the transaction in step 1.
05
The ledger needs multi-row ACID — Aurora, not Dynamo
flowchart TD
Client([Merchant Checkout]) --> AG[API Gateway plus WAF]
AG --> H[Payment Handler Lambda]
H -->|conditional put| IDM[(DynamoDB Idempotency)]
H --> SID[Snowflake ID Generator]
H -->|StartSyncExecution| SF[Step Functions Express]
subgraph SAGA[Saga steps]
S2[2 ReserveBalance] --> S3[3 AuthorizeGateway]
S3 --> S4[4 CaptureOrVoid]
S4 --> S5[5 CommitLedger]
S5 --> S6[6 NotifyMerchant]
end
SF --> SAGA
S3 -->|SET NX lock| RL[(ElastiCache Redis Lock)]
S3 --> GW([Stripe or Adyen Gateway])
S2 --> RP[RDS Proxy]
S5 --> RP
RP -->|optimistic lock on version| AUR[(Aurora PostgreSQL Ledger)]
Steps 2, 4 and 5 write the ledger: payments status and ledger_entries must move in one transaction, with an optimistic-lock version guard. Aurora PostgreSQL Global Database holds it; RDS Proxy pools connections so 10k TPS of Lambdas do not open 10k raw connections.
NotifyMerchant fans out via SQS to SNS, webhook and EventBridge. Firehose lands ledger_entries as S3 Parquet; an EventBridge Scheduler Lambda joins them against the gateway feed via Athena hourly and at T+7/14/30d, flagging divergence to SNS. EMF metrics and X-Ray trace the whole path.