Architecture

Idempotent payment gateway

Charge a card exactly once through timeouts, retries, concurrent duplicates, and partial sagas at 50,000 payments/second over two regions - idempotency keys, an in-process Fargate saga checkpointed on DynamoDB conditional writes, and a DynamoDB-fronted Aurora key store, built entirely on AWS.

01

Entry layer and idempotency check

flowchart LR Client([Client with Idempotency-Key]) --> ALB[ALB] ALB --> Gateway[ECS Fargate gateway\nidempotency check]

The client sends a charge with a V4 UUID Idempotency-Key header, reused across all retries of one logical payment. An ALB routes to a tier of ECS Fargate tasks (Lambda plus API Gateway cannot hold 10,000 concurrent executions at this load), which makes the idempotency decision before anything mutating runs.

02

Idempotency key store - cache plus source of truth

flowchart LR Client([Client with Idempotency-Key]) --> ALB[ALB] ALB --> Gateway[ECS Fargate gateway\nidempotency check] Gateway --> Dynamo[(DynamoDB\ncache and checkpoint)] Gateway -->|cache miss| Proxy[RDS Proxy] Proxy --> Aurora[(Aurora PostgreSQL\nsharded key store and lock)]

The gateway reads the key from DynamoDB (partition key tenant_id, sort key uuid), filtering on expires_at at read time. A hit on a terminal status replays the cached response. A miss falls through to sharded Aurora PostgreSQL via RDS Proxy, where a row-level lock with a PROCESSING lease serialises concurrent duplicates.

03

In-process saga with the PSP

flowchart TD Client([Client with Idempotency-Key]) --> ALB[ALB] ALB --> Gateway[ECS Fargate gateway\nin-process saga] Gateway --> Dynamo[(DynamoDB\ncache and checkpoint)] Gateway -->|cache miss| Proxy[RDS Proxy] Proxy --> Aurora[(Aurora PostgreSQL\nsharded key store and lock)] Gateway -->|same idempotency key| PSP([PSP - Stripe or Braintree]) Gateway -->|on settle failure| Compensate[Compensating void]

On first execution the Fargate task runs an in-process saga - validate, reserve funds, call the PSP, settle ledger, emit events - persisting each step to DynamoDB via conditional writes before proceeding. A conditional PutItem gates launch so duplicates cannot both start. The idempotency key is passed to the PSP. Full-jitter backoff and a circuit breaker guard the PSP call; a failed settle triggers a compensating void.

04

Ledger and event fan-out

flowchart TD Client([Client with Idempotency-Key]) --> ALB[ALB] ALB --> Gateway[ECS Fargate gateway\nin-process saga] Gateway --> Dynamo[(DynamoDB\ncache and checkpoint)] Gateway -->|cache miss| Proxy[RDS Proxy] Proxy --> Aurora[(Aurora PostgreSQL\nsharded key store and lock)] Gateway -->|same idempotency key| PSP([PSP - Stripe or Braintree]) Gateway -->|on settle failure| Compensate[Compensating void] Gateway --> Ledger[(DynamoDB append-only ledger\nSETTLED or VOIDED)] Gateway --> Events[EventBridge] Events --> SQS[SQS per consumer\nplus DLQ] SQS --> Downstream[Order, receipts, analytics]

The saga writes each state transition to an append-only DynamoDB ledger - the system of record for SETTLED and VOIDED. On a terminal outcome it publishes payment.completed or payment.failed to EventBridge, which routes to one SQS queue per consumer (each with a DLQ) so consumers poll at their own rate, decoupled from the charge path.

05

Full system - two regions, security, observability

flowchart TD Client([Client with Idempotency-Key]) --> ALB[ALB] subgraph Edge[Edge and compute] ALB --> Gateway[ECS Fargate gateway\nin-process saga] Gateway --> Secrets[Secrets Manager PSP creds] Gateway --> KMS[KMS envelope encryption] end subgraph KeyStore[Idempotency key store] Gateway --> Dynamo[(DynamoDB cache\nGlobal Tables)] Gateway -->|cache miss| Proxy[RDS Proxy] Proxy --> Aurora[(Aurora PostgreSQL\nGlobal Database sharded)] end subgraph Saga[In-process saga] Gateway -->|same idempotency key| PSP([PSP - Stripe or Braintree]) Gateway -->|on settle failure| Compensate[Compensating void] Sweeper[EventBridge Scheduler sweeper] --> Aurora end subgraph DataAndEvents[Ledger and delivery] Gateway --> Ledger[(DynamoDB append-only ledger)] Gateway --> Events[EventBridge] Events --> SQS[SQS per consumer plus DLQ] Ledger --> Stream[(DynamoDB Streams)] Stream --> Audit[(S3 Object Lock WORM audit)] end Gateway --> Trace[X-Ray and CloudWatch]

The complete picture: KMS envelope-encrypted tokens and Secrets Manager PSP credentials, TLS on every hop and HMAC-verified webhooks, Aurora Global Database (RTO under 30s, RPO ~1s) and DynamoDB Global Tables for cross-region failover, an EventBridge Scheduler sweeper reclaiming orphaned PROCESSING leases, ledger streamed to S3 Object Lock for tamper-evident audit, and X-Ray plus CloudWatch tracing the saga end to end.