Architecture

Real-time payment processing with distributed sagas

One checkout click, assembled one layer at a time — idempotency gate, ID, saga, gateway fence, ledger, reconciliation. Arrow keys or click to advance.

01

The idempotency gate is the first thing on the path

flowchart LR Client([Merchant Checkout]) -->|POST with Idempotency-Key| AG[API Gateway plus WAF] AG --> H[Payment Handler Lambda] H -->|conditional put attribute_not_exists| IDM[(DynamoDB Idempotency)]

POST /v1/payments carries an Idempotency-Key header. The Lambda handler does a DynamoDB conditional put on sha256(merchant_id + key): first writer wins the slot and proceeds, every retry gets the cached result. No two copies of one logical payment ever reach the gateway.

02

A k-sortable payment ID, no coordinator

flowchart LR Client([Merchant Checkout]) -->|POST with Idempotency-Key| AG[API Gateway plus WAF] AG --> H[Payment Handler Lambda] H -->|conditional put| IDM[(DynamoDB Idempotency)] H -->|claim worker slot| WID[(DynamoDB Worker Slots)] H --> SID[Snowflake ID Generator]

Owning the slot, the handler mints a 63-bit Snowflake-variant payment ID: 41-bit ms timestamp, 10-bit worker ID claimed at cold-start from DynamoDB, 12-bit sequence. Time-ordered for ledger range scans; ULIDs label audit events.

03

The saga, orchestrated — not two-phase commit

flowchart TD Client([Merchant Checkout]) --> AG[API Gateway plus WAF] AG --> H[Payment Handler Lambda] H -->|conditional put| IDM[(DynamoDB Idempotency)] H --> SID[Snowflake ID Generator] H -->|StartSyncExecution| SF[Step Functions Express] subgraph SAGA[Saga steps - each idempotent plus compensation] S1[1 ValidatePayment] --> S2[2 ReserveBalance] S2 --> S3[3 AuthorizeGateway] S3 --> S4[4 CaptureOrVoid] S4 --> S5[5 CommitLedger] S5 --> S6[6 NotifyMerchant] end SF --> SAGA

The handler starts a Step Functions Express Workflow (sync, 5-min ceiling). Six idempotent steps; each forward step has a compensating step wired through a Catch. A coordinator crash mid-flight unwinds via compensation instead of stranding funds in limbo.

04

Fence the gateway call, then leave AWS for one hop

flowchart TD Client([Merchant Checkout]) --> AG[API Gateway plus WAF] AG --> H[Payment Handler Lambda] H -->|conditional put| IDM[(DynamoDB Idempotency)] H --> SID[Snowflake ID Generator] H -->|StartSyncExecution| SF[Step Functions Express] subgraph SAGA[Saga steps] S1[1 ValidatePayment] --> S2[2 ReserveBalance] S2 --> S3[3 AuthorizeGateway] S3 --> S4[4 CaptureOrVoid] S4 --> S5[5 CommitLedger] S5 --> S6[6 NotifyMerchant] end SF --> SAGA S1 -.-> FD[Fraud Detector] S3 -->|SET NX lock 5s| RL[(ElastiCache Redis Lock)] S3 -->|idempotency-key eq payment_id| GW([Stripe or Adyen Gateway])

Step 3 is the one call we cannot blindly retry. A Redis NX lock with a 5 s TTL ensures only one Lambda calls the external gateway for a given payment; the internal payment ID is passed as the gateway idempotency-key as a second net. Fraud Detector scores the transaction in step 1.

05

The ledger needs multi-row ACID — Aurora, not Dynamo

flowchart TD Client([Merchant Checkout]) --> AG[API Gateway plus WAF] AG --> H[Payment Handler Lambda] H -->|conditional put| IDM[(DynamoDB Idempotency)] H --> SID[Snowflake ID Generator] H -->|StartSyncExecution| SF[Step Functions Express] subgraph SAGA[Saga steps] S2[2 ReserveBalance] --> S3[3 AuthorizeGateway] S3 --> S4[4 CaptureOrVoid] S4 --> S5[5 CommitLedger] S5 --> S6[6 NotifyMerchant] end SF --> SAGA S3 -->|SET NX lock| RL[(ElastiCache Redis Lock)] S3 --> GW([Stripe or Adyen Gateway]) S2 --> RP[RDS Proxy] S5 --> RP RP -->|optimistic lock on version| AUR[(Aurora PostgreSQL Ledger)]

Steps 2, 4 and 5 write the ledger: payments status and ledger_entries must move in one transaction, with an optimistic-lock version guard. Aurora PostgreSQL Global Database holds it; RDS Proxy pools connections so 10k TPS of Lambdas do not open 10k raw connections.

06

Reconciliation and observability close the loop

flowchart TD Client([Merchant Checkout]) --> AG[API Gateway plus WAF] AG --> H[Payment Handler Lambda] H -->|conditional put| IDM[(DynamoDB Idempotency)] H -->|StartSyncExecution| SF[Step Functions Express] subgraph SAGA[Saga steps] S3[3 AuthorizeGateway] --> S4[4 CaptureOrVoid] S4 --> S5[5 CommitLedger] S5 --> S6[6 NotifyMerchant] end SF --> SAGA S3 -->|SET NX lock| RL[(ElastiCache Redis Lock)] S3 --> GW([Stripe or Adyen Gateway]) S5 --> RP[RDS Proxy] --> AUR[(Aurora PostgreSQL Ledger)] S6 --> SQS[SQS] --> SNS[SNS Fan-out] SNS --> WH([Merchant Webhook]) SNS --> EB[EventBridge Bus] AUR -->|Firehose Parquet| S3L[(S3 Ledger Snapshots)] SCH[EventBridge Scheduler] --> REC[Reconciliation Lambda] REC -->|Athena join| S3L REC -.->|pull feed| GW REC -->|mismatch| ALARM[SNS Ops Alert]

NotifyMerchant fans out via SQS to SNS, webhook and EventBridge. Firehose lands ledger_entries as S3 Parquet; an EventBridge Scheduler Lambda joins them against the gateway feed via Athena hourly and at T+7/14/30d, flagging divergence to SNS. EMF metrics and X-Ray trace the whole path.