This post will look at the cost of querying databases, use cases for Materialized Views, how they work in specific databases, and what it looks like to create Materialized Views.

The Cost of Querying Adds Up Fast

Each time you query a database you incur some cost. Your database will parse, validate, plan, optimize, and execute your query, using up wall clock time, CPU time, memory, opportunity cost, and, potentially, actual dollars. As you grow your application, you will likely wonder how you can reduce these costs.

To dig into this, let’s look at an example.

Example

Imagine you work at a company that sells widgets. Each time a widget is sold, you record some purchase information in a purchase table that looks like this:

Diagram of the `purchase` table showing sample data

With this data, you can answer questions that are critical to your business: what’s our best selling widget? Which widgets need to be shipped today? And, the evergreen question: how much money have we made?

Each of these questions can be answered by querying the purchase table directly. For example, you could calculate your total revenue with the following query:

SQL query calculating total revenue from the `purchase` table

When this query is run, your database will scan each row of the purchase table to aggregate the values in the amount column. Your database will have spent X resources to return your result. (As mentioned before, X is comprised of wall clock time, CPU time, memory, opportunity cost, and dollars.)

The good news is you’re happy to pay X because this information is important and valuable to your business. The bad news is that you will pay X each time your query is run. Because queries against tables and views do not reference previous results, they are forced to completely recalculate their results each time. You might be wondering: is this a big deal? If your query is exploratory, ad hoc, or infrequent, it’s probably not. But what if you suddenly needed to calculate total revenue once an hour instead of once a day? Or once a second?

Illustration highlighting the rising costs of frequent queries

Suddenly, even a small X starts to feel expensive. As your queries are run more frequently, process more data, or perform more complex computation, these costs may even feel prohibitive.

To provide an alternative, we created Materialized Views.

Overview: Materialized Views

Let’s first define regular views, and build up to materialized views.

What is a view?

A view is a derived relation defined in terms of base (stored) relations. A view defines a SQL transformation from a set of base tables to a derived table; this transformation is typically recomputed every time the view is referenced.

You can think of a view as a saved query on your database. Future SELECT queries can reference the view by name.

Example

Imagine a database with two tables: users and purchases, here is the SQL statement (in PostgreSQL syntax) we use to create a view that summarizes user purchases:

sql
CREATE VIEW user_purchase_summary AS SELECT
  u.id as user_id,
  COUNT(*) as total_purchases,
  SUM(purchases.amount) as lifetime_value
FROM users u
JOIN purchases p ON p.user_id = u.id;

As you can see, it’s a select query with CREATE VIEW [my_view_name] AS prepended to it. Upon creating the view, the database doesn’t compute any results or make any changes to how data is stored or indexed. But now queries can reference the view as if it were a table:

sql
SELECT
  user_id
FROM user_purchase_summary
WHERE
  lifetime_value > 500;

Every time the database gets a query referencing a view, it needs to first compute the results of the view, and then compute the rest of the query using those results. In almost all modern databases, you can also “stack” views: You can create a view that references another view.

What is a materialized view?

A materialized view takes the regular view described above and materializes it by proactively computing the results and storing them in a “virtual” table.

A view can be “materialized” by storing the tuples of the view in the database. Index structures can be built on the materialized view. Consequently, database accesses to the materialized view can be much faster than recomputing the view. A materialized view is like a cache --- a copy of the data that can be accessed quickly.

If a regular view is a saved query, a materialized view is a saved query plus its results stored as a table.

The Implications of Views Being “Materialized”

There are a few important implications of a view being “materialized:”

  • When referenced in a query, a materialized view doesn’t need to be recomputed. — The results are stored, so querying materialized views tends to be faster.
  • Because it’s stored as if it were a table, indexes can be built on the columns of a materialized view.
  • A new problem of “view maintenance” arises. — Once a view is materialized, it is only accurate until the underlying base relations are modified. The process of updating a materialized view in response to these changes is called view maintenance.

Example

Here is the user_purchase_summary view from before, turned into a materialized view:

sql
CREATE MATERIALIZED VIEW user_purchase_summary AS SELECT
  u.id as user_id,
  COUNT(*) as total_purchases,
  SUM(CASE when p.status = 'cancelled' THEN 1 ELSE 0 END) as cancelled_purchases
FROM users u
JOIN purchases p ON p.user_id = u.id;

In terms of SQL, all that has changed is the addition of the MATERIALIZED keyword. But when executed, this statement instructs the database to:

  • Execute the SELECT query within the materialized view definition.
  • Cache the results in a new “virtual” table named user_purchase_summary
  • Save the original query so it knows how to update the materialized view in the future.

Use Cases for Materialized Views

Materialized views are great for use cases where:

  • The SQL query is known ahead of time and needs to be repeatedly recalculated.
  • It’s valuable to have low end-to-end latency from when data originates to when it is reflected in a query.
  • It’s valuable to have low-latency query response times, high concurrency, or high volume of queries.

We see these requirements most often in areas of analytics and data-intensive applications.

Materialized views for analytics

The extract-load-transform (ELT) pattern where raw data is loaded in bulk into a warehouse and then transformed via SQL typically relies on alternatives to materialized views for the transform step. In dbt, these are referred to as materializations. A materialization can use a regular view (where nothing is cached) or cached tables built from the results of a SELECT query, or an incrementally updated table where the user is responsible for writing the update strategy.

Historically, support for materialized views in data warehouses has been so bad that SQL modeling services like dbt don’t even have the syntax to allow users to create them. However, the dbt-materialize adapter allows dbt users building on Materialize to use materialized views.

Here’s the standard advice given to dbt users on when to use the different types of materializations available to them:

  • If using a view isn’t too slow for your end-users, use a view.
  • If a view gets too slow for your end-users, use a table.
  • If building a table with dbt gets too slow, use incremental models in dbt.

Proper automatically updated materialized views are a fourth approach that takes the simplicity of a regular view (the analytics engineer need only write the SQL transformation) and the query latency of a cached table (because results are stored, no computation is done on SELECT queries).

Materialized views are a particularly more compute-efficient alternative to regular views when the dataset is large and the view is queried often. They are a more up-to-date, more automated and less error-prone alternative to cached tables because they remove the burden of deciding when and how to update from the end-user.

Stacks of Views

A common pattern in analytics tools like dbt is to define “stacks” of views that transform and aggregate the data in steps. Converting stacked views to either:

  1. Stacks of materialized views or
  2. Stacks of regular views with a materialized view at the end

simplifies the job of the user or the external tool by shifting the responsibility of orchestrating the order and frequency of updates into the database engine.

Materialized views for applications

Incrementally updated materialized views can replace the caching and denormalization traditionally done to “guard” OLTP databases from read-side latency and overload. They do this by moving the computation work from read to write-side of your database: Instead of waiting for a query and doing computation to get the answer, we are now asking for the query upfront and doing the computation to update the results as the writes (creates, updates, and deletes) come in. This inverts the constraints of traditional database architectures, allowing developers to build data-intensive applications without complex cache invalidation or denormalization.

How do materialized views work in specific databases?

Not every database supports materialized views, and those that do each handle them

a little differently, especially when it comes to the approach to view maintenance.

Database Materialized Views? View Maintenance Notes
PostgreSQL Yes, in v9.3+ Manual Materialized views are populated at time of creation and must be manually refreshed via REFRESH MATERIALIZED VIEW statements that recompute the entire view.
MySQL No N/A
Microsoft SQL Server Yes Automatic SQL Server calls them “Indexed Views” because the materialization step is a matter of creating an index on a regular view. SQL Server limits indexed views to basic SQL queries.
Oracle Yes Multiple Options Materialized views in Oracle databases can be set to manually refresh, refresh on a schedule, or, if the SQL query meets these requirements, automatically refreshed.
Snowflake Yes Automatic Materialized views in Snowflake are updated automatically. This automated approach results in SQL limitations, and a limited set of use cases where a materialized view is recommended over a view or a table.
Redshift Yes Multiple Options By default, materialized views in Redshift must be manually refreshed, but an AUTO REFRESH option will attempt to update the view when base data changes. When a REFRESH MATERIALIZED VIEW call is made, certain types of views can be updated incrementally.
MongoDB Yes, in 4.2+ Manual MongoDB has a NoSQL equivalent of materialized views in the form of aggregation functions.
Materialize Yes Automatic Materialize can incrementally maintain views that other databases cannot: Views with complex joins and aggregations, CTEs, views on views. The biggest SQL gap is WINDOW functions.

Creating Materialized Views

Unlike queries that calculate their results from tables and views, queries that read from materialized views do not recalculate their results each time. To understand why, let’s create a materialized view together.

SQL statement creating a materialized view

When this statement is executed, your database will run your query once to calculate the result. Then, it will physically store that result in a newly created database object — in our case an object named total_revenue. Now when you query total_revenue, your database will return the results stored in total_revenue without performing any additional computation.

Query against the materialized view `total_revenue`

This means that unlike querying a view directly, you only pay the cost of your query once when creating a materialized view. Future queries will pay some constant cost 1 to read from total_revenue, instead of X. This seems like a perfect solution for our problem, so what’s the catch? Why don’t we use materialized views for everything?

Two Qualifications for Working with Materialized Views

The first catch is that you pay for materialized views with memory. While materialized views physically storing prior results reduces your query cost from X to 1, it also requires memory that was not required to simply query a table or a logical view.

The second, trickier catch surfaces when a new purchase is added to the purchase table. To illustrate, let’s walk through this scenario together.

Example of querying an outdated materialized view

Noticing that we have a new sale in our purchase table, we query our materialized view for our updated total revenue. But, we get back our old value. What gives?

Illustration of a materialized view not reflecting recent changes

When we created our materialized view, our database ran our query once to store the results in total_revenue. However, since the query was run before the newest purchase was added to the purchase table, the result calculated and stored in total_revenue does not take this new purchase into account. To update the result to include new purchases, you will have to refresh the materialized view. Generally, there are two mechanisms for refreshing materialized views: complete refreshes or incremental refreshes.

Complete refreshes rerun the query underlying a materialized view to completely recalculate and replace the stored results. Just like queries against tables and views, every complete refresh will cost X. To complicate matters, you will have to decide when to perform a refresh: when someone queries the view? Once an hour? Once a week? You will want to choose a cadence that does not force the database to do unnecessary work. For example, you don’t want to pay X to recompute total_revenue if there are no new purchases. In real world systems, users will have to choose a refresh schedule that maintains a careful balance between data freshness requirements and acceptable query costs.

By contrast, incremental refreshes keep materialized views up to date by only performing work only on data that has changed. To return to our example, instead of rescanning each row of the purchase table to calculate the sum, an incrementally maintained materialized view would only do the following work:

Diagram explaining incremental refresh of a materialized view

This means that instead of paying X to refresh the view, your database will only pay a fraction of X proportional to the size of the changed data. And, because this refresh mechanism is triggered by changes in the underlying data, you can be certain that your database is refreshing your view no more than necessary. In our example, total_revenue will only be updated when a row in the purchase table is inserted, updated, or deleted.

Conclusion

Creating and maintaining a materialized view can reduce the query costs paid for expensive or frequently run queries. However, the potential cost reduction of a materialized view greatly depends on its underlying refresh mechanism. Only incremental refreshes can reduce the per-refresh cost of maintaining a materialized view while also ensuring that views are updated no more than necessary.

While incremental refreshes are available in a handful of databases, they often come with a long list of restrictions. At the moment, Materialize is the only system that supports incremental maintained materialized views without restriction. If you identified with the pain points in this blog post, check us out!

More Articles

Technical Article

Change Data Capture (part 1)

Here we set the context for and propose a change data capture protocol: a means of writing down and reading back changes to data.

Frank McSherry

Aug 13, 2020

Technical Article

Why not RocksDB for streaming storage?

An explanation of our rationale for why Materialize chose not to use RocksDB as its underlying storage engine.

Arjun Narayan

Aug 6, 2020

Technical Article

Robust Reductions in Materialize

Comprehensive guide to implementing robust reductions in Materialize, ensuring efficient & real-time data processing.

Frank McSherry

Aug 4, 2020

Try Materialize Free