Loading learning content...
Every second, payment systems across the globe process over 50,000 transactions, moving trillions of dollars annually with a target uptime of 99.999% and an acceptable error rate approaching zero. A single miscalculated cent across millions of transactions translates to millions in discrepancies. A single security breach can cost companies billions in fines, lawsuits, and lost trust.
Payment system design is not merely a software engineering challenge—it's a discipline where correctness is non-negotiable, where eventual consistency is often unacceptable, and where regulatory compliance isn't optional but legally mandated. This module will equip you with the knowledge to design payment infrastructure that handles the most demanding financial workloads while maintaining absolute reliability and security.
By the end of this page, you will understand the comprehensive requirements for a production-grade payment system—functional capabilities, non-functional constraints, regulatory obligations, and the unique challenges that distinguish payment systems from typical distributed systems. You'll learn to think like a payments engineer, where every decision carries financial and legal consequences.
Before diving into requirements, we must understand the fundamental concepts and stakeholders in the payment ecosystem. Payment processing is not a single operation—it's a complex orchestration involving multiple parties, each with distinct responsibilities, incentives, and failure modes.
The Payment Ecosystem:
A typical payment transaction involves a minimum of five distinct parties, each adding latency, potential failure points, and processing fees:
| Stakeholder | Role | Concerns | Example |
|---|---|---|---|
| Customer (Cardholder) | Initiates payment using payment instrument | Security, convenience, transaction speed, dispute resolution | End user with credit card |
| Merchant | Sells goods/services and accepts payment | Transaction fees, settlement speed, chargeback protection | Your e-commerce platform |
| Payment Gateway | Encrypts and routes transaction data | API reliability, security, multi-processor support | Stripe, Braintree, Adyen |
| Payment Processor | Handles transaction authorization and settlement | Transaction throughput, network connectivity, fraud prevention | First Data, Worldpay, Chase Paymentech |
| Card Network | Routes transactions between processors and issuers | Network availability, interchange fee collection, brand standards | Visa, Mastercard, American Express |
| Issuing Bank | Issues payment cards and approves/declines transactions | Credit risk, fraud detection, customer service | Chase, Bank of America, Capital One |
| Acquiring Bank | Merchant's bank that receives funds | Risk management, merchant underwriting, settlement | Bank that holds merchant account |
Payment Transaction Lifecycle:
Understanding the full lifecycle of a payment is crucial for requirement definition. A single "payment" actually consists of multiple discrete operations:
Each phase has distinct timing, failure modes, and recovery procedures. A robust payment system must handle all phases correctly.
When a customer sees "Payment Successful," money hasn't actually transferred yet. Authorization only reserves funds. Settlement (actual fund transfer) typically occurs 1-2 business days later. Your system must accurately represent this reality and handle the interim states correctly.
Functional requirements define what the payment system must do. We'll categorize these into core payment operations, payment method support, merchant capabilities, and customer-facing features.
In a system design interview, you cannot cover all functional requirements in 45 minutes. Clarify with your interviewer which subset to focus on. A typical scope might be: 'Let's design the core payment processing flow—authorization, capture, and refund—for card payments, with stored credentials for repeat customers.' This gives you a focused problem while demonstrating awareness of the broader scope.
Non-functional requirements define how well the system must perform. For payment systems, these requirements are exceptionally stringent—financial systems cannot afford the relaxed consistency or "good enough" availability acceptable in other domains.
| Requirement | Target | Justification | Consequence of Failure |
|---|---|---|---|
| Availability | 99.99% - 99.999% (4-5 nines) | Payment failures directly translate to lost revenue | Lost sales, customer churn, SLA penalties |
| Latency (P99) | < 500ms end-to-end | Customer abandonment increases sharply with latency | Cart abandonment, poor user experience |
| Throughput | 10,000 - 1,000,000+ TPS | Handle peak loads during sales events, Black Friday | System collapse during high-value periods |
| Consistency | Strong consistency for transactions | Double-charging or lost payments are unacceptable | Financial loss, legal liability, customer trust |
| Durability | Zero transaction loss | Every transaction must be recorded and recoverable | Financial reconciliation failures, audit failures |
| Security | PCI DSS Level 1 compliance | Legal requirement for card data handling | Fines up to $500K/month, losing ability to process cards |
| Auditability | Complete transaction history | Regulatory requirements, dispute resolution | Compliance violations, failed audits |
Let's analyze each requirement in depth:
Payment systems are one of the few domains where you cannot trade consistency for availability. Double-charging a customer or losing a payment is never acceptable. This means payment systems often choose CP (Consistency + Partition tolerance) over AP in CAP theorem terms, accepting potential unavailability during network partitions rather than risking inconsistent state.
Before designing a system, we must understand the scale we're designing for. Let's work through a realistic estimation for a mid-to-large-scale payment processor.
Assumptions:
## Transaction Volume Estimation ### Daily Transactions- Merchants: 100,000- Transactions per merchant per day: 1,000- Daily transactions: 100,000 × 1,000 = 100,000,000 (100M transactions/day) ### Transactions Per Second (TPS)- Seconds per day: 86,400- Average TPS: 100,000,000 / 86,400 ≈ 1,157 TPS- Peak TPS (10x): ~11,570 TPS- Design target (with headroom): 20,000 TPS ### Daily Payment Volume- Average transaction: $50- Daily volume: 100M × $50 = $5 billion/day- Annual volume: ~$1.8 trillion/year ## Storage Estimation ### Transaction Record Size- Transaction ID: 16 bytes (UUID)- Merchant ID: 16 bytes- Customer ID: 16 bytes- Amount/Currency: 12 bytes- Status: 4 bytes- Timestamps: 16 bytes- Payment method token: 64 bytes- Metadata (JSON): ~500 bytes- Total per transaction: ~650 bytes ### Daily Storage- Transactions: 100M × 650 bytes = 65 GB/day- With indexes (2x): 130 GB/day- Monthly: ~4 TB- Yearly: ~48 TB (before archival) ### Hot Storage (90 days)- 90 × 130 GB = 11.7 TB hot storage ## Bandwidth Estimation ### Request Size- Average API request: 2 KB- Average API response: 1 KB ### Daily Bandwidth- Inbound: 100M × 2 KB = 200 GB/day- Outbound: 100M × 1 KB = 100 GB/day- Total: 300 GB/day ≈ 28 Mbps average- Peak (10x): 280 MbpsIn interviews, always show your calculations explicitly. Even if your numbers aren't perfect, interviewers want to see structured thinking. Round aggressively (100M not 100,000,000) and always add headroom for growth (2-3x current estimates).
Payment systems face challenges that are either unique to financial systems or present at a severity level not seen in typical applications. Understanding these challenges is essential for designing robust payment infrastructure.
Consider this scenario: A customer clicks "Pay," the request reaches your server, you call the payment gateway, the gateway charges the card successfully, but the response times out before reaching your server. Is the payment successful? If you retry, you'll double-charge. If you don't, you might lose the payment. This exact scenario is why idempotency is the most critical design consideration in payment systems.
Before implementation, we must define the API contract. Payment APIs have particular requirements around clarity, safety, and compatibility that differ from typical REST APIs.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
// Core Payment API Types interface CreatePaymentRequest { // Idempotency key - REQUIRED for safe retries idempotencyKey: string; // UUID, unique per intended payment // Amount in smallest currency unit (cents for USD) amount: number; // 1000 = $10.00 currency: string; // ISO 4217 code: "USD", "EUR", "GBP" // Payment method (token, not raw card data) paymentMethodId: string; // Token from secure tokenization // Merchant context merchantId: string; orderId?: string; // Merchant's order reference // Customer context (for fraud detection) customerId?: string; customerEmail?: string; customerIp?: string; // Processing options captureMethod: 'automatic' | 'manual'; // Auth+capture or auth-only // Metadata for merchant's use metadata?: Record<string, string>;} interface PaymentResponse { id: string; // Our payment ID status: PaymentStatus; amount: number; currency: string; // Processing details authorizationCode?: string; networkTransactionId?: string; // Outcome details outcome: { networkStatus: 'approved' | 'declined' | 'pending'; type?: 'issuer_declined' | 'fraud' | 'invalid' | 'network_error'; reason?: string; riskScore?: number; }; // Timestamps createdAt: string; // ISO 8601 updatedAt: string; authorizedAt?: string; capturedAt?: string; // Idempotency tracking idempotencyKey: string; idempotentReplayed: boolean; // True if this was a replay} type PaymentStatus = | 'pending' // Created, not yet processed | 'requires_action' // Needs 3DS or other auth | 'processing' // Being processed | 'authorized' // Auth successful, not captured | 'succeeded' // Captured successfully | 'failed' // Permanently failed | 'canceled' // Voided before capture | 'refunded' // Fully refunded | 'partially_refunded'; // Partially refunded // Error response structureinterface PaymentError { code: string; // Machine-readable code message: string; // Human-readable message type: 'card_error' | 'validation_error' | 'api_error' | 'idempotency_error'; declineCode?: string; // Card network decline code param?: string; // Which parameter caused the error requestId: string; // For support/debugging}Key API Design Principles:
amount: 1000 = $10.00.Payment transactions follow a well-defined state machine. Understanding valid state transitions is crucial for implementing correct payment logic and handling edge cases.
Critical State Transition Rules:
failed, canceled, or refunded, it cannot transition to any other state. These are terminal states.succeeded), a payment cannot be voided—only refunded. Void is cheaper than refund, so prefer voiding when possible.amount_captured and amount_refunded separately from original amount.Never allow: failed → succeeded (retry creates new payment), refunded → authorized (refunds are final), canceled → processing (cancellation is terminal). Invalid state transitions indicate bugs or fraud attempts. Log and alert on any attempts.
We've established a comprehensive requirements foundation for designing a payment system. Let's consolidate the key takeaways:
What's Next:
With requirements established, we'll dive into payment gateway integration in the next page. We'll explore how to connect with external payment processors, handle their diverse APIs and protocols, implement fallback routing for resilience, and manage the complexity of multi-gateway architectures.
You now understand the comprehensive requirements for designing a production-grade payment system. These requirements will guide every architectural decision in subsequent pages—from gateway integration through fraud detection to regulatory compliance.