Indexes

Overview

In Materialize, indexes represent query results stored in memory within a cluster. You can create indexes on sources, views, or materialized views.

Indexes on sources

NOTE: In practice, you may find that you rarely need to index a source without performing some transformation using a view, etc.

In Materialize, you can create indexes on a source to maintain in-memory up-to-date source data within the cluster you create the index. This can help improve query performance when serving results directly from the source or when using joins. However, in practice, you may find that you rarely need to index a source directly.

CREATE INDEX idx_on_my_source ON my_source (...);

Indexes on views

In Materialize, you can create indexes on a view to maintain up-to-date view results in memory within the cluster you create the index.

CREATE INDEX idx_on_my_view ON my_view_name(...) ;

During the index creation on a view, the view is executed and the view results are stored in memory within the cluster. As new data arrives, the index incrementally updates the view results in memory.

Within the cluster, querying an indexed view is:

  • fast because the results are served from memory, and

  • computationally free because no computation is performed on read.

For best practices on using indexes, and understanding when to use indexed views vs. materialized views, see Usage patterns.

Indexes on materialized views

In Materialize, materialized view results are stored in durable storage and incrementally updated as new data arrives. Indexing a materialized view makes the already up-to-date view results available in memory within the cluster you create the index. That is, indexes on materialized views require no additional computation to keep results up-to-date.

NOTE: A materialized view can be queried from any cluster whereas its indexed results are available only within the cluster you create the index. Querying a materialized view, whether indexed or not, from any cluster is computationally free. However, querying an indexed materialized view within the cluster where the index is created is faster since the results are served from memory rather than from storage.

For best practices on using indexes, and understanding when to use indexed views vs. materialized views, see Usage patterns.

CREATE INDEX idx_on_my_mat_view ON my_mat_view_name(...) ;

Indexes and clusters

Indexes are local to a cluster. Queries in a different cluster cannot use the indexes in another cluster.

For example, to create an index in the current cluster:

CREATE INDEX idx_on_my_view ON my_view_name(...) ;

You can also explicitly specify the cluster:

CREATE INDEX idx_on_my_view IN CLUSTER active_cluster ON my_view (...);

Usage patterns

Index usage

! Important: Indexes are local to a cluster. Queries in one cluster cannot use the indexes in another, different cluster.

Unlike some other databases, Materialize can use an index to serve query results even if the query does not specify a WHERE condition on the index key. Serving queries from an index is fast since the results are already up-to-date and in memory.

For example, consider the following index:

CREATE INDEX idx_orders_view_qty ON orders_view (quantity);

Materialize will maintain the orders_view in memory in idx_orders_view_qty, and it will be able to use the index to serve a various queries on the orders_view (and not just queries that specify conditions on orders_view.quantity).

Materialize can use the index for the following queries (issued from the same cluster as the index) on orders_view:

SELECT * FROM orders_view;  -- scans the index
SELECT * FROM orders_view WHERE status = 'shipped';  -- scans the index
SELECT * FROM orders_view WHERE quantity = 10;  -- point lookup on the index

For the queries that do not specify a condition on the indexed field, Materialize scans the index. For the query that specifies an equality condition on the indexed field, Materialize performs a point lookup on the index (i.e., reads just the matching records from the index). Point lookups are the most efficient use of an index.

Point lookups

Materialize performs point lookup (i.e., reads just the matching records from the index) on the index if the query’s WHERE clause:

  • Specifies equality (= or IN) condition and only equality conditions on all the indexed fields. The equality conditions must specify the exact index key expression (including type) for point lookups. For example:

    • If the index is on round(quantity), the query must specify equality condition on round(quantity) (and not just quanity) for Materialize to perform a point lookup.

    • If the index is on quantity * price, the query must specify equality condition on quantity * price (and not price * quantity) for Materialize to perform a point lookup.

    • If the index is on the quantity field which is an integer, the query must specify an equality condition on quantity with a value that is an integer.

  • Only uses AND (conjunction) to combine conditions for different fields.

Point lookups are the most efficient use of an index.

For queries whose WHERE clause meets the point lookup criteria and includes conditions on additional fields (also using AND conjunction), Materialize performs a point lookup on the index keys and then filters the results using the additional conditions on the non-indexed fields.

For queries that do not meet the point lookup criteria, Materialize performs a full index scan (including for range queries). That is, Materialize performs a full index scan if the WHERE clause:

  • Does not specify all the indexed fields.
  • Does not specify only equality conditions on the index fields or specifies an equality condition that specifies a different value type than the index key type.
  • Uses OR (disjunction) to combine conditions for different fields.

Full index scans are less efficient than point lookups. The performance of full index scans will degrade with data volume; i.e., as you get more data, full scans will get slower.

Examples

Consider again the following index on a view:

CREATE INDEX idx_orders_view_qty on orders_view (quantity);

The following table shows various queries and whether Materialize performs a point lookup or an index scan.

Query Index usage
SELECT * FROM orders_view;
Index scan.
SELECT * FROM orders_view WHERE quantity = 10;
Point lookup.
SELECT * FROM orders_view WHERE quantity IN (10, 20);
Point lookup.
SELECT * FROM orders_view WHERE quantity = 10 OR quantity = 20;
Point lookup. Query uses OR to combine conditions on the same field.
SELECT * FROM orders_view WHERE quantity = 10 AND price = 5.00;
Point lookup on quantity, then filter on price.
SELECT * FROM orders_view WHERE (quantity, price) = (10, 5.00);
Point lookup on quantity, then filter on price.
SELECT * FROM orders_view WHERE quantity = 10 OR price = 5.00;
Index scan. Query uses OR to combine conditions on different fields.
SELECT * FROM orders_view WHERE quantity <= 10;
Index scan.
SELECT * FROM orders_view WHERE round(quantity) = 20;
Index scan.
-- Assume quantity is an integer
SELECT * FROM orders_view WHERE quantity = 'hello';
SELECT * FROM orders_view WHERE quantity::TEXT = 'hello';
Index scan, assuming quantity field in orders_view is an integer. In the first query, the quantity is implicitly cast to text. In the second query, the quantity is explicitly cast to text.

Consider that the view has an index on the quantity and price fields instead of an index on the quantity field:

DROP INDEX idx_orders_view_qty;
CREATE INDEX idx_orders_view_qty_price on orders_view (quantity, price);
Query Index usage
SELECT * FROM orders_view;
Index scan.
SELECT * FROM orders_view WHERE quantity = 10;
Index scan. Query does not include equality conditions on all indexed fields.
SELECT * FROM orders_view WHERE quantity = 10 AND price = 2.50;
Point lookup.
SELECT * FROM orders_view WHERE quantity = 10 OR price = 2.50;
Index scan. Query uses OR to combine conditions on different fields.
SELECT * FROM orders_view
WHERE quantity = 10 AND (price = 2.50 OR price = 3.00);
Point lookup. Query uses OR to combine conditions on same field and AND to combine conditions on different fields.
SELECT * FROM orders_view
WHERE quantity = 10 AND price = 2.50 AND item = 'cupcake';
Point lookup on the index keys quantity and price, then filter on item.
SELECT * FROM orders_view
WHERE quantity = 10 AND price = 2.50 OR item = 'cupcake';
Index scan. Query uses OR to combine conditions on different fields.

Limitations

Indexes in Materialize do not order their keys using the data type’s natural ordering and instead orders by its internal representation of the key (the tuple of key length and value).

As such, indexes in Materialize currently do not provide optimizations for:

  • Range queries; that is queries using >, >=, <, <=, BETWEEN clauses (e.g., WHERE quantity > 10, price >= 10 AND price <= 50, and WHERE quantity BETWEEN 10 AND 20).

  • GROUP BY, ORDER BY and LIMIT clauses.

Indexes on views vs. materialized views

In Materialize, both indexes on views and materialized views incrementally update the view results when Materialize ingests new data. Whereas materialized views persist the view results in durable storage and can be accessed across clusters, indexes on views compute and store view results in memory within a single cluster.

Maintaining a materialized view in durable storage has resource and latency costs that should be carefully considered depending on the main usage of the view, while maintaining an index has memory costs.

Some general guidelines for usage patterns include:

Usage Pattern General Guideline
View results are accessed from a single cluster only View with an index
View results are accessed across clusters Materialized view
Final consumer of the view is a sink or a SUBSCRIBE operation Materialized view
View maintenance and query serving are scaled independently Materialized view

Indexes and query optimizations

By making up-to-date results available in memory, indexes can help optimize query performance, such as:

  • Provide faster sequential access than unindexed data.

  • Provide fast random access for lookup queries (i.e., selecting individual keys).

Specific instances where indexes can be useful to improve performance include:

  • When used in ad-hoc queries.

  • When used by multiple queries within the same cluster.

  • When used to enable delta joins.

For more information, see Optimization.

Best practices

Before creating an index, consider the following:

  • If you create stacked views (i.e., views that depend on other views) to reduce SQL complexity, we recommend that you create an index only on the view that will serve results, taking into account the expected data access patterns.

  • Materialize can reuse indexes across queries that concurrently access the same data in memory, which reduces redundancy and resource utilization per query. In particular, this means that joins do not need to store data in memory multiple times.

  • For queries that have no supporting indexes, Materialize uses the same mechanics used by indexes to optimize computations. However, since this underlying work is discarded after each query run, take into account the expected data access patterns to determine if you need to index or not.

Back to top ↑