Machine Learning for Fraud Detection in Financial Systems

Financial fraud is an arms race. Fraudsters adapt constantly, and rule-based detection systems can’t keep up. Myles Ndlovu has implemented ML-based fraud detection in payment systems, and the results consistently outperform hand-coded rules.

Why Rules Aren’t Enough

Traditional fraud detection uses rules:

Flag transactions over $10,000
Block more than 5 transactions in 10 minutes
Reject transactions from blacklisted countries

Rules work for known patterns. But they have two fatal flaws:

They can’t detect novel fraud: If a new fraud pattern emerges that doesn’t match existing rules, it passes through undetected.
They generate too many false positives: Strict rules block legitimate customers. Loose rules miss fraud. The sweet spot is narrow and changes constantly.

Machine learning models learn from data to find patterns that humans can’t explicitly code.

Feature Engineering

The most important part of any fraud detection system isn’t the model — it’s the features. Features are the inputs that describe each transaction.

Transaction Features

Amount
Currency
Merchant category
Payment method (card, bank transfer, mobile money)
Time of day, day of week

Velocity Features

Number of transactions in the last hour, day, week
Total amount spent in the last hour, day, week
Number of unique merchants in the last day
Number of failed transactions recently

Behavioural Features

How different is this transaction from the user’s typical pattern?
Is the device/IP new for this user?
Is the merchant category new for this user?
Has the user’s transaction velocity suddenly changed?

Network Features

Is this merchant associated with other fraudulent transactions?
Is this device linked to multiple accounts?
Is this IP address associated with fraud across the platform?

def compute_features(transaction, user_history):
    return {
        'amount': transaction.amount,
        'amount_zscore': z_score(transaction.amount, user_history.amounts),
        'txn_count_1h': user_history.count(hours=1),
        'txn_count_24h': user_history.count(hours=24),
        'unique_merchants_7d': user_history.unique_merchants(days=7),
        'is_new_device': transaction.device_id not in user_history.devices,
        'is_new_merchant_category': transaction.mcc not in user_history.mccs,
        'time_since_last_txn': transaction.time - user_history.last_txn_time,
        'device_account_count': device_registry.account_count(transaction.device_id),
    }

Model Selection

For real-time fraud detection, the model needs to be:

Fast (inference in milliseconds)
Accurate (high precision and recall)
Interpretable (you need to explain why a transaction was blocked)

Gradient Boosted Trees (XGBoost, LightGBM) are the industry standard for tabular fraud detection. They’re fast, handle mixed feature types well, and provide feature importance scores.

Neural networks work for large-scale systems with complex patterns but sacrifice interpretability.

Ensemble models combine multiple models for better performance. A gradient boosted tree for initial scoring, with a neural network for edge cases.

The Imbalanced Data Problem

Fraud is rare — typically 0.1% to 1% of transactions. A model that predicts “not fraud” for everything would be 99% accurate and completely useless.

Solutions:

Oversampling: SMOTE or similar techniques to create synthetic fraud examples
Undersampling: Reduce the non-fraud majority class
Cost-sensitive learning: Penalise missed fraud more heavily than false positives
Anomaly detection: Train only on legitimate transactions and flag anomalies

Real-Time Scoring

The model must score transactions in real-time — ideally under 100ms. This requires:

Feature store: Pre-computed features stored in a low-latency database (Redis, DynamoDB). User history features are updated after each transaction.
Model serving: The trained model is deployed as a microservice. Incoming transactions are enriched with features and scored.
Decision engine: The model’s score is combined with business rules to make a final decision — approve, decline, or step-up (require additional authentication).

Monitoring and Feedback Loops

A fraud model degrades over time as fraud patterns change. You need:

Model monitoring: Track precision, recall, and false positive rates in production. Alert when performance degrades.

Feedback loops: When fraud is confirmed (chargebacks, user reports), feed that data back into the training pipeline. The model learns from its mistakes.

Champion-challenger testing: Run a new model alongside the existing one in shadow mode. Compare performance before swapping.

Explainability

Regulators and customers want to know why a transaction was blocked. “The model said so” isn’t acceptable.

Use SHAP values or feature importance scores to explain individual decisions:

“Transaction blocked because: unusual amount ($5,000 vs. typical $50), new device, velocity spike (5 transactions in 10 minutes vs. typical 2 per day)”

The Human Layer

ML doesn’t replace human fraud analysts — it augments them. The model handles the volume (scoring every transaction in real-time). Humans handle the edge cases, investigate confirmed fraud, and identify new fraud patterns that inform the next model iteration.

The best fraud detection systems combine ML speed with human judgment.