Data intensive applications often have complex questions they need to ask about the current state of the world, be it tracking packages, buying tickets, or fetching account balances. These queries impose load on the primary database, and when that load is too high we often turn to caching results in a serving layer like Redis. 

These cached results become stale, and can very quickly become useless unless refreshed. When should they be refreshed, and how? Materialize’s SUBSCRIBE command provides a live changelog of query results, reporting every change that occurs, in sync with transactional commits to the upstream primary. It solves the cache invalidation problem by proactively correcting (not invalidating) your caches the moment it sees a change in what they should store.

The Cache Invalidation Dilemma

Transactional databases are designed to efficiently process concurrent reads and writes over indexed data. But for read queries that process a lot of data and perform a lot of computation, they are not the right fit. These data intensive workloads can hog resources and slow down all the concurrent queries on the system. To solve this problem, developers often turn to caching. A cache allows for these queries to be run fewer times, or even just once. The result is saved in the cache, which can comfortably handle highly concurrent reads of the result.

There is a catch: as the underlying data continues to change, the cached result may or may not become invalid due to those changes. There are two problems: first, detecting when the result is invalidated, and thus minimizing the amount of queries that are served a stale result. The second problem is efficiently recomputing the results.

This problem is ubiquitous in building data platforms, and often called “one of the two hard things in computer science”. Today, we are going to walk you through how Materialize, in conjunction with Redis, can help you design a data architecture that efficiently and correctly, solves the cache invalidation problem for the set of data intensive workloads that you can ergonomically express in SQL.

You might already be familiar with using Redis as a caching layer. We will focus on how Materialize simplifies Redis cache management in complex query environments. We’ll develop an integration that keeps your application fast, consistent, and reliable, even when dealing with the most demanding queries.

How Materialize and Redis Work Together

Materialize’s core strength lies in incremental view maintenance, moving work from the read path (when you issue queries) to the write path (when you update your data). Rather than doing heavy work every time you read, it spreads out the effort, doing small, incremental updates on every write. Even for demanding data intensive workloads. This allows complex queries involving multiple joins or aggregations stay up to date without reprocessing all of the data.

Views in Materialize are live, always up to date, and you can run any SQL query on them. However, for the highest concurrency reads, or to fit into your existing caching strategy, you might want to store those results in Redis. For that, Materialize’s “SUBSCRIBE” command allows you to monitor a query or view and receive a continuous stream of updates as soon as results change. No need to guess or reprocess entire datasets — just update systems with the relevant changes. 

For example, let’s say a financial institution wants to track the current value of a customer’s portfolio, derived from stock prices, bond values, and cash holdings. When stock prices fluctuate, the corresponding data in the cache should change immediately to mirror the new stock prices.

Use Case Example: SQL Code

sql
SUBSCRIBE (
   SELECT 
          p.customer_id, 
          SUM(s.quantity * m.price) AS portfolio_value
   FROM portfolios p
   JOIN securities s ON p.security_id = s.security_id
   JOIN market_data m ON s.security_id = m.security_id
   GROUP BY p.customer_id
)  WITH (PROGRESS) ENVELOPE UPSERT (KEY (customer_id));

By continuously streaming updates whenever the underlying data changes, the financial institution can ensure that each customer’s portfolio value in the cache remains accurate and up-to-date. This real-time synchronization enhances the user experience by providing instant access to the latest data without the need for manual refreshes or heavy computational loads.

But what makes SUBSCRIBE so effective for cache invalidation?

Let’s dive into how SUBSCRIBE makes it easy to update your cache:

  • Fresh: SUBSCRIBE delivers updates continuously, ensuring your cache is refreshed with minimal delay. This is crucial for applications where real-time data accuracy directly impacts user experience and performance. By pushing changes immediately, you avoid the overhead of periodic polling or manual refreshes.

  • Precise Invalidation: Unlike traditional approaches, Materialize not only tracks the specific rows that change but also sends updates only when those changes affect the query to which you’re subscribed. This eliminates the guesswork that often leads to redundant computations “just in case,” making it more efficient than basic CDC.

  • Transactional Atomicity: SUBSCRIBE communicates updates in bundles that align with transaction boundaries. If these bundles of updates are applied to the cache atomically, the cached results will always reflect a valid database state.

  • Native Postgres Client Compatibility: SUBSCRIBE behaves like an infinite cursor, continuously streaming updates via any standard Postgres client. This enables easy integration into existing architectures without the need for additional message brokers or complex middleware, reducing operational complexity.

  • Support for Complex Queries: Whether you’re performing multi-table joins, aggregations, or window functions, SUBSCRIBE ensures incremental updates to complex queries are accurately reflected in your cache. This is critical for use cases with complex data dependencies, maintaining consistency between your live system and cached results.

Introducing mz-redis-sync: A Real-Time Cache Synchronization integration

We think this pattern is so sensible, we bundled it up for you. We call it mz-redis-sync and it’s as simple as using SUBSCRIBE for your business logic, just as we’ve describe above. You can try it out here.

mz-redis-sync is a prebuilt integration that simplifies the connection between Materialize and Redis, delivering both transactional consistency and real-time updates to your Redis cache with minimal effort. It automatically runs a SUBSCRIBE on your materialized views and continuously streams updates to Redis as the underlying data changes.

This means that as soon as view results are updated, all changes from a single database transaction are applied atomically to Redis, preserving both the timeliness and integrity of your cache. By handling the complexities of cache consistency, mz-redis-sync allows developers to focus on building features, freeing them from the burden of managing data synchronization.

How It Stacks Up Against Other Caching Strategies

Cache-aside Pattern

In the cache-aside pattern, the application manually manages when to load data into the cache and when to invalidate it. This works well for simple queries, but when dealing with complex queries, it adds a lot of complexity to your application as you have to ensure that your application triggers an invalidation in every case where the data is changed. It also means that your data intensive computations are defined imperatively in your application codebase, causing a lot of lines of code that need to be maintained.

Materialize lets the user define their computation declaratively in standard SQL. Materialize also tracks changes to the underlying data and updates only what’s necessary using incremental view maintenance. SUBSCRIBE pushes those updates into Redis instantly. This means you get the benefits of the cache-aside pattern, but without the guesswork and manual effort—Materialize does it perfectly and automatically.

Write-through Caching and Dual Writes

Write-through caching works well for simple, raw data where each update maps directly to a cache entry. However, it falls short with complex queries involving multi-table joins, aggregations, or filters. These results don’t align neatly with individual writes, making it impractical to keep the cache updated without recalculating the entire query.

Additionally, write-through caching introduces the dual writes problem. Your application must write data to both the database and the cache separately, which can lead to race conditions and inconsistencies if one write succeeds while the other fails.

With SUBSCRIBE, you overcome both challenges. For complex queries, SUBSCRIBE provides precise incremental updates by telling you exactly what has changed in the underlying data. This eliminates the need to recompute entire results to keep the cache up to date. It also eliminates dual writes by streaming updates from the database to the cache, ensuring atomicity and consistency without extra write operations from your application.

Periodic Cache Refresh

Some systems rely on periodic cache refreshes, where the cache is updated at regular intervals. While simpler to implement, this method can result in stale data being served between refresh cycles and unnecessary load on the database if nothing changed. It also doesn’t efficiently handle real-time updates, especially in high-frequency data environments.

Each caching strategy has its strengths and limitations, but Materialize paired with Redis offers a unique advantage in handling complex queries.  Leveraging the power of incremental view maintenance, Materialize avoids developers having to manually craft invalidation rules or elaborate update strategies. 

Instead, Materialize handles the complexity by automatically maintaining views as the underlying data changes. With mz-redis-sync, these updates are instantly reflected in Redis, ensuring your cache is always in sync with the most current data without the overhead and risk associated with manual cache management.

No additional load to your transactional database

Materialize connects to each of your transactional databases like a regular read replica, and performs the data intensive computation – even across sources – in dedicated compute clusters. This means that your transactional database does not have to perform data intensive compute, while you still get the benefits of writing the computation logic in SQL.

A Scalable Solution for Complex Queries

Combining Materialize’s real-time data processing with Redis’s high-performance caching offers a straightforward solution to the cache invalidation dilemma. Whether you’re dealing with simple or complex queries, this integration ensures your application remains fast, consistent, and ready to scale. 

Sign up for a free trial with Materialize now and see the difference yourself.

More Articles

Technical Article

Materialize: More Cost-Effective than Aurora Read Replicas

Materialize costs 1/20th what Aurora PostgreSQL read replicas cost, when you have non-trivial business logic.
Seth Wiesman
Arjun Narayan
Frank McSherry

Sep 9, 2024

Technical Article

Materialize + Novu: Real-Time Alerting Powered by a Cloud Operational Data Store

In the following blog, we’ll show you how to create real-time alerts using Materialize’s integration with Novu.
Josh Arenberg

Sep 12, 2024

Conceptual Article

Zero-Staleness: Like using your primary, but faster

Materialize can respond faster than your primary database, with results that are at least as fresh as your primary would provide.
Frank McSherry

Sep 13, 2024

Try Materialize Free