· 1 min read

Building Real-Time Payment Infrastructure: Architecture and Lessons

Myles Ndlovu
Myles Ndlovu
Fintech Entrepreneur & Developer
Building Real-Time Payment Infrastructure: Architecture and Lessons

Real-time payments are becoming the global standard. Myles Ndlovu has built payment processing systems that handle thousands of transactions per second, and the engineering challenges are both fascinating and unforgiving.

What Makes Payments Hard

Payment systems have zero tolerance for certain types of failure. If you lose a social media post, it’s annoying. If you lose a payment, it’s someone’s money. This constraint shapes every architectural decision.

The core requirements:

  • Exactly-once processing: A payment must happen exactly once. Not zero times (lost). Not twice (double charge).
  • Durability: Once a payment is accepted, it cannot be lost, even if servers crash.
  • Auditability: Every state change must be recorded and traceable.
  • Latency: Users expect confirmation in seconds.

The Ledger Is the Source of Truth

At the heart of every payment system is a ledger. The ledger records every credit and debit. Account balances are derived from ledger entries, not stored as a mutable field.

-- Every balance change is a ledger entry
INSERT INTO ledger (account_id, amount, type, reference, created_at)
VALUES ('acc_123', -5000, 'DEBIT', 'txn_abc', NOW());

INSERT INTO ledger (account_id, amount, type, reference, created_at)
VALUES ('acc_456', 5000, 'CREDIT', 'txn_abc', NOW());

-- Balance is a derived value
SELECT SUM(amount) AS balance FROM ledger WHERE account_id = 'acc_123';

This append-only approach means you never lose history. Every balance can be reconstructed from the ledger entries. Auditors love this.

Idempotency: The Most Important Pattern

Network failures happen. Timeouts happen. Retries happen. Without idempotency, a retry can cause a double payment.

Every payment request must include an idempotency key — a unique identifier chosen by the caller. If the same key is submitted twice, the system returns the result of the first attempt without processing again.

async function processPayment(idempotencyKey: string, payment: Payment) {
  // Check if we've seen this key before
  const existing = await db.getByIdempotencyKey(idempotencyKey);
  if (existing) return existing.result;

  // Process the payment
  const result = await executePayment(payment);

  // Store the result with the idempotency key
  await db.saveIdempotencyResult(idempotencyKey, result);
  return result;
}

The idempotency check and payment execution should be within the same database transaction to prevent race conditions.

Message Queues for Reliability

Real-time doesn’t mean synchronous. The user-facing response should be fast, but downstream processing can be asynchronous.

A common pattern:

  1. Accept the payment request, validate inputs, check balance
  2. Write the transaction to the database (committed, durable)
  3. Return success to the user
  4. Publish an event to a message queue
  5. Downstream services (notifications, analytics, partner APIs) consume the event

If a downstream service fails, the message stays in the queue and gets retried. The payment is already committed — the downstream processing is eventually consistent.

Handling Failures

Payments fail at every layer. Your system must handle each gracefully:

Validation failures: Bad input, insufficient balance, blocked account. Return a clear error immediately.

Processing failures: Database timeout, external API error. These require careful handling — did the payment go through or not?

Partner failures: The bank’s API is down, the card network is slow. Implement circuit breakers to stop hammering a failing service.

The most dangerous failure is the ambiguous one: you sent a payment instruction to a bank, but the connection dropped before you received a response. Did the bank process it? You don’t know.

Solutions:

  • Query the bank’s API for the transaction status
  • Use reconciliation processes to match your records against the bank’s
  • Implement a “pending” state that resolves through reconciliation

Reconciliation

Every payment system needs reconciliation — comparing your records against your partners’ records to ensure they match.

Daily reconciliation catches:

  • Payments you think succeeded but the bank rejected
  • Payments the bank processed that you don’t have records for
  • Amount mismatches due to currency conversion or fee differences

Automate reconciliation as much as possible, but always have human review for exceptions.

Scaling Considerations

Database design: Partition your ledger by account or time period. A single table with billions of rows will eventually become a bottleneck.

Read replicas: Balance queries can hit read replicas. Write operations (debits, credits) must hit the primary.

Horizontal scaling: Payment processing can be parallelised by account. Two payments to different accounts can process simultaneously. Two payments to the same account must be serialised to prevent balance inconsistencies.

Monitoring

Monitor everything:

  • Transaction success/failure rates
  • Processing latency (p50, p95, p99)
  • Queue depth and consumer lag
  • Partner API response times
  • Reconciliation match rates

Set alerts for anomalies. A sudden spike in failures or a drop in transaction volume often indicates a systemic issue that needs immediate attention.

Building payment infrastructure is demanding, but there’s something satisfying about building systems where reliability isn’t optional — it’s the entire point.

Share: