Machine Learning for Fraud Detection in Financial Systems

Financial fraud is an arms race. Fraudsters adapt constantly, and rule-based detection systems can’t keep up. Myles Ndlovu has implemented ML-based fraud detection in payment systems, and the results consistently outperform hand-coded rules.
Why Rules Aren’t Enough
Traditional fraud detection uses rules:
- Flag transactions over $10,000
- Block more than 5 transactions in 10 minutes
- Reject transactions from blacklisted countries
Rules work for known patterns. But they have two fatal flaws:
- They can’t detect novel fraud: If a new fraud pattern emerges that doesn’t match existing rules, it passes through undetected.
- They generate too many false positives: Strict rules block legitimate customers. Loose rules miss fraud. The sweet spot is narrow and changes constantly.
Machine learning models learn from data to find patterns that humans can’t explicitly code.
Feature Engineering
The most important part of any fraud detection system isn’t the model — it’s the features. Features are the inputs that describe each transaction.
Transaction Features
- Amount
- Currency
- Merchant category
- Payment method (card, bank transfer, mobile money)
- Time of day, day of week
Velocity Features
- Number of transactions in the last hour, day, week
- Total amount spent in the last hour, day, week
- Number of unique merchants in the last day
- Number of failed transactions recently
Behavioural Features
- How different is this transaction from the user’s typical pattern?
- Is the device/IP new for this user?
- Is the merchant category new for this user?
- Has the user’s transaction velocity suddenly changed?
Network Features
- Is this merchant associated with other fraudulent transactions?
- Is this device linked to multiple accounts?
- Is this IP address associated with fraud across the platform?
def compute_features(transaction, user_history):
return {
'amount': transaction.amount,
'amount_zscore': z_score(transaction.amount, user_history.amounts),
'txn_count_1h': user_history.count(hours=1),
'txn_count_24h': user_history.count(hours=24),
'unique_merchants_7d': user_history.unique_merchants(days=7),
'is_new_device': transaction.device_id not in user_history.devices,
'is_new_merchant_category': transaction.mcc not in user_history.mccs,
'time_since_last_txn': transaction.time - user_history.last_txn_time,
'device_account_count': device_registry.account_count(transaction.device_id),
} Model Selection
For real-time fraud detection, the model needs to be:
- Fast (inference in milliseconds)
- Accurate (high precision and recall)
- Interpretable (you need to explain why a transaction was blocked)
Gradient Boosted Trees (XGBoost, LightGBM) are the industry standard for tabular fraud detection. They’re fast, handle mixed feature types well, and provide feature importance scores.
Neural networks work for large-scale systems with complex patterns but sacrifice interpretability.
Ensemble models combine multiple models for better performance. A gradient boosted tree for initial scoring, with a neural network for edge cases.
The Imbalanced Data Problem
Fraud is rare — typically 0.1% to 1% of transactions. A model that predicts “not fraud” for everything would be 99% accurate and completely useless.
Solutions:
- Oversampling: SMOTE or similar techniques to create synthetic fraud examples
- Undersampling: Reduce the non-fraud majority class
- Cost-sensitive learning: Penalise missed fraud more heavily than false positives
- Anomaly detection: Train only on legitimate transactions and flag anomalies
Real-Time Scoring
The model must score transactions in real-time — ideally under 100ms. This requires:
- Feature store: Pre-computed features stored in a low-latency database (Redis, DynamoDB). User history features are updated after each transaction.
- Model serving: The trained model is deployed as a microservice. Incoming transactions are enriched with features and scored.
- Decision engine: The model’s score is combined with business rules to make a final decision — approve, decline, or step-up (require additional authentication).
Monitoring and Feedback Loops
A fraud model degrades over time as fraud patterns change. You need:
Model monitoring: Track precision, recall, and false positive rates in production. Alert when performance degrades.
Feedback loops: When fraud is confirmed (chargebacks, user reports), feed that data back into the training pipeline. The model learns from its mistakes.
Champion-challenger testing: Run a new model alongside the existing one in shadow mode. Compare performance before swapping.
Explainability
Regulators and customers want to know why a transaction was blocked. “The model said so” isn’t acceptable.
Use SHAP values or feature importance scores to explain individual decisions:
- “Transaction blocked because: unusual amount ($5,000 vs. typical $50), new device, velocity spike (5 transactions in 10 minutes vs. typical 2 per day)”
The Human Layer
ML doesn’t replace human fraud analysts — it augments them. The model handles the volume (scoring every transaction in real-time). Humans handle the edge cases, investigate confirmed fraud, and identify new fraud patterns that inform the next model iteration.
The best fraud detection systems combine ML speed with human judgment.
Myles Ndlovu builds algorithmic trading engines, crypto platforms, and payment infrastructure for emerging markets. Read more about Myles or get in touch.