RESEARCH · v1.0 · 2026-04-22

Could your choice of payment-fraud telemetry be harming your detection?

Most fraud-detection programmes report on the wrong things. Volume-of-alerts and mean-time-to-resolution measure the work that the platform produces, not the work that catches an attacker. This paper looks at four metrics we routinely see harming fraud detection in the institutions we work with, and proposes four replacements.

detection-engineering fraud metrics SOC

Published: 2026-04-22
Status: PUBLISHED
Version: 1.0
Author: IMF Research — Detection engineering practice
Reading time: 7 min

Contents

Fraud-detection programmes, like every other SOC, are measured. The question is whether they are measured on the things that change the outcome for the customer. In our engagements with mid-sized payment processors and corporate-banking platforms over the past eighteen months, we have repeatedly seen the same set of metrics being reported to executives — and the same set of attacks getting through. The two observations are connected.

This paper is not an argument that metrics are bad. It is an argument that the wrong metrics are worse than no metrics, because they manufacture a confidence which displaces the work that detection actually requires.

Why metrics matter more in fraud than in general SOC work

Fraud detection sits in a particular position. Unlike a generic security operations centre, the operating model is adversarial in real time: a competent fraud ring iterates against your detections during the working day, observes which patterns get challenged, and adapts. Unlike anti-fraud at a credit-card scheme, the loss event is on your books, not the network’s. And unlike most other SOC functions, the regulator inspects the analytics directly.

The programme is therefore measured by three audiences who want different things:

The board wants assurance that losses are bounded.
The regulator wants evidence that controls are reasonable and adequately calibrated.
The line-of-business wants the friction kept off the legitimate customer.

Most fraud-metric dashboards we see resolve these three pressures by showing volume statistics — alerts raised, alerts cleared, false positives suppressed, mean time to disposition. Each of those is a quantity the platform can produce. None of them is a quantity that moves the outcome.

Four metrics we see harming fraud detection

Alert volume per analyst

The default metric on every commercial platform we have evaluated. Every fraud team reports it because every fraud platform produces it.

The problem is the incentive: a high alert volume looks productive, because more cases are being closed per analyst per shift, but it strongly correlates with poorly tuned detections that auto-close on the obvious-legitimate transactions and quietly defer the cognitively expensive ones. A queue of 200 manageable alerts per shift masks five alerts which the analyst could have caught had they been the only thing on screen. We have walked into engagements where the team’s “case throughput” was being celebrated as a year-on-year improvement while the fraud loss line on the same dashboard was trending the wrong way.

Mean time to disposition

MTTD is reported on most fraud dashboards we see. It measures the clock from alert raised to alert closed. As a service-level indicator it is fine — long pending queues are bad. As a detection-quality indicator it is dangerous, because the fastest way to bring MTTD down is to teach the model to suppress more cases earlier, which suppresses the harder cases at the same rate as the easier ones.

A fraud team that has driven MTTD down to under six minutes has, in our experience, almost always driven recall on multi-leg attacks down with it. The clock rewards the same automation that the adversary is counting on you to deploy.

False-positive rate

Every reviewer asks for it; every dashboard shows it. The implicit model is that low false-positive rates are good and high ones are bad.

That model treats every false positive as equivalent. In practice they are not: a false positive on a customer who has the time and disposition to call your contact-centre is not the same as a false positive on a corporate treasurer whose payment file has a settlement deadline. A blunt FP-rate target almost always pushes detection toward the population of customers who can least afford the friction, because that is where the cheapest reductions live. The metric, on its own, is regressive.

Total alerts cleared, year on year

This is the one we see most often as a board-level KPI, often on a slide titled “fraud programme effectiveness.” It is the one that does the most damage. Year-on-year alert volume tracks the rate at which the platform classifies things, not the rate at which the customer loses money. It is a measure of supply, not of outcome.

A fraud platform whose detection model has become marginally more sensitive year-on-year will close more alerts. A fraud platform whose analysts have become marginally less rigorous year-on-year will also close more alerts. The metric does not distinguish.

THE FOUR METRICS, AS WE SEE THEM

4 of 4
engagements in 2026 reporting at least three of these as primary KPIs: 0
engagements where we believe these four together correlate with outcome: 6 weeks
typical lead time from raising the issue to a measurable shift in dashboard composition: ~12 mo.
observed lag from a worsened dashboard to a worsened loss line

Four replacements

We are not in the business of reorganising someone else’s KPI deck on a slide. The four substitutions below are the ones we propose at discovery, refine during scope, and ship as deliverables.

Time-to-detect for a known adversarial pattern

Pick a small panel of attacks that are known to occur in your sector — invoice-redirection, mule-account onboarding, business-email compromise, instant-payment lift — and instrument the time between the first analytically-detectable event and the alert reaching an analyst. The metric is reported per pattern, not in aggregate. It incentivises detection latency on the patterns that matter, and it exposes detections that are present-but-late, which volume metrics hide.

False-positive cost ratio

Replace flat false-positive rate with a weighted false-positive cost that recognises which customers can absorb friction. The weighting needs to be made explicit and signed off by the line-of-business because the choice is a business decision; calibrate it once a quarter, hold it constant in between. The metric pushes detection toward suppressing FPs where they hurt customers, and accepting them where the customer has the means and time to clear them.

Suppressed-alert lineage

Instrument and report on which detections are silently auto-closing the most. Almost every platform has a layer of model-driven suppression that compresses the queue; almost no team reviews it weekly. Treat it as a first-class metric — what fraction of would-be alerts are being suppressed by which sub-model, and on which populations.

Time-to-revert on a recently-shipped detection

Detection content ships and breaks. The number that matters for operational quality is how long it takes to roll back a detection that has produced a customer-facing issue. Sub-day is good; sub-shift is the goal. Sub-week is a sign that nobody owns the detection lifecycle.

How to make the change without breaking the dashboard

Replacing executive metrics is hard because the existing ones are load-bearing for assurance. We do not advise turning them off. We do advise running both sets in parallel for a quarter, with the new metrics reported alongside the old ones and the relationship between them annotated. After a quarter the executive audience will have learned which metrics correlate with the loss line they actually care about and which do not. The vestigial ones quietly drop off the next iteration of the deck.

Two operational notes:

If the platform vendor cannot expose the data the new metrics require, treat it as a procurement issue. Vendor lock-in around metric production is the most common reason institutions stay on metrics they know to be inadequate.
Calibrate the false-positive cost ratio with an actual line-of-business signoff. Without it the weighting will drift to the analyst’s intuition, which usually re-introduces the bias the metric was meant to remove.

What to do next week

If your dashboard reports any of the four metrics in the first list as a primary detection-quality indicator, the action this week is to nominate someone — usually the senior detection engineer, not the fraud manager — to draft the replacement. The draft should fit on a single page and define the metric, its data source, the cadence of reporting, and the line-of-business sign-off needed for any thresholds.

The substitution itself should land in the next quarterly review. The follow-on work — getting the suppressed-alert lineage piped, getting the cost-ratio weighting agreed in writing — is a six-month programme, not a sprint.

Was this paper useful?