When to Use Indexes and Materialized Views

tip

Materialize is an operational data store used by teams to serve continually-updated data transformations using the same SQL workflows used with batch processing. To see if it works for your use-case, register for access here.

Understanding how to use materialized views and indexes is important for designing data-intensive applications with any database, including Materialize. In this post, we will see how knowing a bit about Materialize’s architecture will help you leverage indexes and materialized views to optimize the performance of your queries.

Broad Definitions

Here are some definitions that roughly capture how the words are used in traditional databases.

View

A view is simply a convenient name for a SQL query that you can reference in other queries. No computation happens when you create a view.

Materialized View

A materialized view precomputes the results of a query and stores those results for fast read access. Computation happens when you create a materialized view.

Index

An index is a data structure that allows the database to find and retrieve specific rows much faster than it could do otherwise. Computation happens when you create an index.

What Happens in Materialize

Here are some key characteristics about Materialize that play into how you should think about indexes and materialized views:

Updates results eagerly and incrementally as new data streams into the system.
Shares indexes across queries, which means your use cases can scale according to the number of input collections, not the number of output views
Scales storage and compute independently — durable cloud object storage for the persistence layer and clusters for the compute layer.

With these in mind, here is how views, materialized views, and indexes work together:

	no index	with index
View	Creates name for query. Shares name to other clusters. No computation when created. Serves ad-hoc, one-off queries.	Reads input from storage and memory. Computes results from memory. Serves point look-ups.
Materialized View	Reads input from storage and memory. Computes results incrementally. Persists results to storage. Shares results to other clusters. Serves with full table scans.	Loads results into memory. Serves results from memory. Serves point look-ups. No additional computation.

note

Each ad-hoc SELECT ... FROM my_view query on a view will ingest all the input data, return the result, and then throw the results away. The magic of incremental computation doesn’t happen automatically when you create my_view, so don’t forget to create an index on it!

A Sample Deployment

Here is a Materialize deployment from a real customer that uses clusters, materialized views, and indexes in a clever way to serve their access patterns.

They have three clusters:

A ingest cluster handles streaming data sources.
A processing cluster processes incoming data using materialized views so the results are persisted to storage.
A serving cluster uses indexes to load results from storage into memory to serve point lookups.

Benefits:

If the processing cluster fails due to hardware malfunction or bad data, the serving cluster can continue to serve (stale) results.
If the serving cluster fails, it can quickly come back online without having to reprocess the inputs. All the results are already computed in the processing cluster. The indexes just have to load the results into memory from storage again.

Tradeoffs:

It is possible to process the data and serve the results in one cluster rather than two by creating a view along with an index. Doing so would lose the fault tolerant benefits, but result in:

Less storage cost because results aren’t persisted.
Less compute cost running only one cluster.
Slightly less latency, since it avoids a round trip to storage.

Conclusion

Materialize uses views, materialized views, and indexes much like other databases, but knowing how the system works with memory and storage can help you optimize your deployment. Here are some quick rules of thumb for indexes and materialized views.

Index:

Faster sequential access.
Fast random access for queries selecting individual keys.
Can be consumed by views and queries executed on the same cluster.
Reads are served from memory.
Use this if you are querying Materialize directly from a Postgres client application.
Use this on columns of collections that frequently appear in equality predicates in your queries.

Materialized View:

Results persisted to storage.
Results can be consumed by views and queries executed on any cluster.
Reads are served from object storage.

For more optimization tips (especially for joins), check out our optimization documentation!

When to Use Indexes and Materialized Views

Transform, Deliver, and Act

Related Posts You’ll Love

tip

Broad Definitions

View

Materialized View

Index

What Happens in Materialize

note

A Sample Deployment

Conclusion

More Articles

The Software Architecture of Materialize

Building Differential Dataflow from Scratch

Clusters, explained with Data Warehouses

Get Started with Materialize