Charge a card exactly once through timeouts, retries, concurrent duplicates, and partial sagas at 50,000 payments/second over two regions - idempotency keys, an in-process Fargate saga checkpointed on DynamoDB conditional writes, and a DynamoDB-fronted Aurora key store, built entirely on AWS.
01
Entry layer and idempotency check
flowchart LR
Client([Client with Idempotency-Key]) --> ALB[ALB]
ALB --> Gateway[ECS Fargate gateway\nidempotency check]
The client sends a charge with a V4 UUID Idempotency-Key header, reused across all retries of one logical payment. An ALB routes to a tier of ECS Fargate tasks (Lambda plus API Gateway cannot hold 10,000 concurrent executions at this load), which makes the idempotency decision before anything mutating runs.
02
Idempotency key store - cache plus source of truth
flowchart LR
Client([Client with Idempotency-Key]) --> ALB[ALB]
ALB --> Gateway[ECS Fargate gateway\nidempotency check]
Gateway --> Dynamo[(DynamoDB\ncache and checkpoint)]
Gateway -->|cache miss| Proxy[RDS Proxy]
Proxy --> Aurora[(Aurora PostgreSQL\nsharded key store and lock)]
The gateway reads the key from DynamoDB (partition key tenant_id, sort key uuid), filtering on expires_at at read time. A hit on a terminal status replays the cached response. A miss falls through to sharded Aurora PostgreSQL via RDS Proxy, where a row-level lock with a PROCESSING lease serialises concurrent duplicates.
03
In-process saga with the PSP
flowchart TD
Client([Client with Idempotency-Key]) --> ALB[ALB]
ALB --> Gateway[ECS Fargate gateway\nin-process saga]
Gateway --> Dynamo[(DynamoDB\ncache and checkpoint)]
Gateway -->|cache miss| Proxy[RDS Proxy]
Proxy --> Aurora[(Aurora PostgreSQL\nsharded key store and lock)]
Gateway -->|same idempotency key| PSP([PSP - Stripe or Braintree])
Gateway -->|on settle failure| Compensate[Compensating void]
On first execution the Fargate task runs an in-process saga - validate, reserve funds, call the PSP, settle ledger, emit events - persisting each step to DynamoDB via conditional writes before proceeding. A conditional PutItem gates launch so duplicates cannot both start. The idempotency key is passed to the PSP. Full-jitter backoff and a circuit breaker guard the PSP call; a failed settle triggers a compensating void.
04
Ledger and event fan-out
flowchart TD
Client([Client with Idempotency-Key]) --> ALB[ALB]
ALB --> Gateway[ECS Fargate gateway\nin-process saga]
Gateway --> Dynamo[(DynamoDB\ncache and checkpoint)]
Gateway -->|cache miss| Proxy[RDS Proxy]
Proxy --> Aurora[(Aurora PostgreSQL\nsharded key store and lock)]
Gateway -->|same idempotency key| PSP([PSP - Stripe or Braintree])
Gateway -->|on settle failure| Compensate[Compensating void]
Gateway --> Ledger[(DynamoDB append-only ledger\nSETTLED or VOIDED)]
Gateway --> Events[EventBridge]
Events --> SQS[SQS per consumer\nplus DLQ]
SQS --> Downstream[Order, receipts, analytics]
The saga writes each state transition to an append-only DynamoDB ledger - the system of record for SETTLED and VOIDED. On a terminal outcome it publishes payment.completed or payment.failed to EventBridge, which routes to one SQS queue per consumer (each with a DLQ) so consumers poll at their own rate, decoupled from the charge path.
05
Full system - two regions, security, observability
flowchart TD
Client([Client with Idempotency-Key]) --> ALB[ALB]
subgraph Edge[Edge and compute]
ALB --> Gateway[ECS Fargate gateway\nin-process saga]
Gateway --> Secrets[Secrets Manager PSP creds]
Gateway --> KMS[KMS envelope encryption]
end
subgraph KeyStore[Idempotency key store]
Gateway --> Dynamo[(DynamoDB cache\nGlobal Tables)]
Gateway -->|cache miss| Proxy[RDS Proxy]
Proxy --> Aurora[(Aurora PostgreSQL\nGlobal Database sharded)]
end
subgraph Saga[In-process saga]
Gateway -->|same idempotency key| PSP([PSP - Stripe or Braintree])
Gateway -->|on settle failure| Compensate[Compensating void]
Sweeper[EventBridge Scheduler sweeper] --> Aurora
end
subgraph DataAndEvents[Ledger and delivery]
Gateway --> Ledger[(DynamoDB append-only ledger)]
Gateway --> Events[EventBridge]
Events --> SQS[SQS per consumer plus DLQ]
Ledger --> Stream[(DynamoDB Streams)]
Stream --> Audit[(S3 Object Lock WORM audit)]
end
Gateway --> Trace[X-Ray and CloudWatch]
The complete picture: KMS envelope-encrypted tokens and Secrets Manager PSP credentials, TLS on every hop and HMAC-verified webhooks, Aurora Global Database (RTO under 30s, RPO ~1s) and DynamoDB Global Tables for cross-region failover, an EventBridge Scheduler sweeper reclaiming orphaned PROCESSING leases, ledger streamed to S3 Object Lock for tamper-evident audit, and X-Ray plus CloudWatch tracing the saga end to end.