According to the FTC, US consumers reported losing $10 billion to fraud in 2023, a 14% increase over 2022. With fraud attacks increasing yearly, companies must deploy real-time fraud detection systems to protect their customers and their assets.
But standard data architectures are not ideal for fraud detection. Traditional batch data architecture delays fraud determinations. And fraud detection needs to occur while fraud is happening. Anti-fraud measures that take hours, or even minutes, allow fraudsters to escape with their loot.
Most companies capture the data they need to detect fraud, including user behavior and account activity, in company databases. However, the challenge is transforming this data with split-second rapidity.
A traditional data warehouse, with SQL support and the ability to ingest diverse data sources, seems like a potential platform to power anti-fraud services.
But traditional data warehouses are designed around batch-loading and caching. They are optimized for analytics and historical reporting. Fraud detection requires the continuous transformation of real-time data, a task that is expensive and difficult for traditional data warehouses.
With the rise of operational data warehouses, companies can now build cost-effective, real-time fraud detection systems. Operational data warehouses combine streaming data, SQL support, and continuous data transformation to calculate fraud scores in real-time, stopping fraudsters in their tracks.
Materialize is an operational data warehouse that fuels real-time fraud detection systems for many of our customers in the financial services sector.
After working with several leading data teams to build these streaming anti-fraud systems from scratch, we want to share what we’ve learned about reference architectures for real-time fraud detection with you.
In this blog, we’ll explain the different roles of analytical and operational data warehouses in building real-time fraud detection systems.
Fraud Detection: Accuracy vs. Latency
Effective fraud detection depends on two critical factors: accuracy and latency. Fraud detection workflows must predict fraud accurately in order to stop bad actors, without disrupting real customers. And fraud detection must achieve low latency in order to detect fraudulent activity in time to stop it.
Accuracy is essential not just to stop fraud, but to avoid disrupting legitimate customers. Both cut into company profit margins. Companies will never detect fraud with absolute accuracy. However, they can assign a well-refined, probabilistic fraud score to each transaction. They can apply automated deterrence actions when the score passes certain thresholds.
Companies can refine fraud score criteria over time, as more fraud data is verified. SQL remains a popular choice for programming fraud scores, due to its refined business logic, and its strength with manipulating data. That’s why companies turn to data warehouses to power fraud detection.
Data warehouses can ingest, join, and transform large volumes of data. Teams use data warehouses to amalgamate fraud signals from different sources, including product sources and business systems. They restructure this data via SQL queries into business outputs for fraud workflows.
Typically, companies leverage traditional data warehouses — or ‘analytical’ data warehouses — to perform fraud detection. And when it comes to accuracy, analytical data warehouses are viable options.
Teams can analyze historical fraud data with analytical data warehouses. They can use this historical data to develop SQL logic that detects fraudulent activity. But because analytical data warehouses harness historical data, they can only detect fraud after it happens, rather than during the act.
In other words, you can use an analytical data warehouse to build SQL logic for detecting fraud. And this SQL logic can accurately identify fraudulent activity. But the data itself is hours or days old.
In terms of actually stopping fraud, analytical data warehouses have limited use. Fraud detection needs to occur within seconds in order to be effective. Otherwise, fraudsters can easily escape with their ill-gotten gains.
Thus, the problem with analytical data warehouses is not one of accuracy, but one of high latency.
The Cost of Latency for Analytical Data Warehouses
This problem of high latency is built into the way analytical data warehouses are designed.
Analytical data warehouses practice batch processing. Data is processed in batches, at set intervals, rather than in real-time. Queries are also run at intervals, perhaps a few times a day at most.
By the time the data is queried, it’s out-of-date. The window for acting on the data has closed. For operational use cases such as fraud detection, this delay is unacceptable. Querying batched data every few hours is not sufficient, when the window for stopping fraudsters is measured in seconds.
However, cloud-native data warehouses are still in many ways ideal for the fraud detection use case. The ability to combine large volumes of disparate data sources, and utilize SQL for logic, is an attractive option for data teams. In fact, some teams are willing to push traditional data warehouses to their limits to keep this convenient architecture.
Teams can develop their SQL-powered fraud scores on analytical data warehouses. And by natural extension, they do try to use their analytical data warehouses for real-time fraud detection.
While it’s not impossible to implement fraud detection on an analytical data warehouse, it’s far from optimal. Analytical warehouses are designed around batch transfer and caching for existing queries.
This option makes sense if your data doesn’t change very often. Results are stored in memory after a query and cached as long as possible so it can be re-accessed by a similar query. Since queries are infrequent, the database can maintain consistency with simple table-locking mechanisms.
However, this design is cumbersome for operational workloads such as fraud detection. Computational limits on large batches obstruct data freshness, and cached query results are not helpful when new data is constantly loaded.
This pushes the technical boundaries of analytical data warehouses. As more data is queried, computation times take longer. Anti-fraud workflows slow down, due to these technical limitations. And shaving a few seconds off response time can lead to thousands of dollars in losses.
This option is also much more expensive in terms of compute resources. Rapidly re-running queries demands excessive computation. With an analytical data warehouse, the pricing model is pay-per-query, and cost is linked to data freshness. Costs for operational use cases such as fraud detection, which require continuous query execution, skyrocket for analytical data warehouses.
With these limitations, teams soon realize that while traditional data warehouses can serve as testing grounds for SQL, they cannot operationalize real-time fraud detection. At least, not with the latency that the use case requires. And so, they’re left with accurate but out-of-date fraud scores.
But what if teams could combine the ease and power of a data warehouse, along with this elusive low latency?
That would allow teams to engage in effective real-time fraud detection directly from their data warehouse. And this is not a thought experiment: teams are accomplishing this right now with operational data warehouses.
Operational Data Warehouse: Streaming Data + SQL Support + Continuous Transformation
Operational data warehouses combine streaming, real-time data with continuous data transformation to power essential business operations, including fraud detection.
Operational data warehouses leverage streaming data to enable use cases that require low latency. ODWs process data in a continuous, incremental way, so results are updated as they change, as opposed to all at once in a batch job.
To power real-time use cases, operational data warehouses continuously transform streams of raw data into actionable outputs. ODWs allow you to execute SQL queries on fresh data continuously.
This combination of streaming data and continuous transformation make operational data warehouses ideal for fraud detection use cases.
Real-time data ensures that the data is always up-to-date. ODWs receive fraud signals as they occur, so you can act on fraudulent activity in real-time.
Operational data warehouses also empower you to continuously transform this fresh data. You can reformat the data into usable inputs for your anti-fraud workflows every few seconds, rather than minutes, or hours.
While traditional data warehouses can detect fraud hours after it occurs, operational data warehouses combine streaming data and continuous transformation to detect fraud almost instantly. This enables operational data warehouses to operationalize SQL logic for fraud detection in real-time. With these new capabilities, companies can stop fraud as it occurs, rather than identify historical fraud.
The cost of operationalizing fraud detection is high for analytical data warehouses. Constantly re-running anti-fraud queries is expensive in a pay-per-query pricing model. But with operational data warehouses, price is not tied to query execution.
Instead, Materialize avoids constant query recomputation. By maintaining views incrementally , Materialize decouples the cost of compute and query execution. Materialize uses materialized views and indexes to provide up-to-date query outputs at a fraction of the cost.
Instead of re-running the query, Materialize only updates the results that have changed. This ensures the query output is fresh, while keeping costs down considerably. Materialize harnesses Timely Dataflow, a low-latency computation model, to perform efficient and correct incremental computation.
By updating queries rapidly, Materialize allows teams to constantly transform data for fraud workflows, so they can detect fraud in real-time. This enables them to stop fraudulent activity as it happens without the price or technical limitations of traditional data warehouses.
White Paper: A Reference Architecture for Real-Time Fraud Detection
Now that you understand the role of data warehouses, download our free white paper to learn how to build a reference architecture for real-time fraud detection.
See a full walkthrough of how our customer, Ramp, built a real-time fraud detection system for its corporate card product.
Check out our white paper — “A Reference Architecture for Real-Time Fraud Detection” — to learn more! You can download our free white paper now.