Indexes
Overview
In Materialize, indexes represent query results stored in memory within a cluster. You can create indexes on sources, views, or materialized views.
Indexes on sources
In Materialize, you can create indexes on a source to maintain in-memory up-to-date source data within the cluster you create the index. This can help improve query performance when serving results directly from the source or when using joins. However, in practice, you may find that you rarely need to index a source directly.
CREATE INDEX idx_on_my_source ON my_source (...);
Indexes on views
In Materialize, you can create indexes on a view to maintain up-to-date view results in memory within the cluster you create the index.
CREATE INDEX idx_on_my_view ON my_view_name(...) ;
During the index creation on a view, the view is executed and the view results are stored in memory within the cluster. As new data arrives, the index incrementally updates the view results in memory.
Within the cluster, querying an indexed view is:
-
fast because the results are served from memory, and
-
computationally free because no computation is performed on read.
For best practices on using indexes, and understanding when to use indexed views vs. materialized views, see Usage patterns.
Indexes on materialized views
In Materialize, materialized view results are stored in durable storage and incrementally updated as new data arrives. Indexing a materialized view makes the already up-to-date view results available in memory within the cluster you create the index. That is, indexes on materialized views require no additional computation to keep results up-to-date.
For best practices on using indexes, and understanding when to use indexed views vs. materialized views, see Usage patterns.
CREATE INDEX idx_on_my_mat_view ON my_mat_view_name(...) ;
Indexes and clusters
Indexes are local to a cluster. Queries in a different cluster cannot use the indexes in another cluster.
For example, to create an index in the current cluster:
CREATE INDEX idx_on_my_view ON my_view_name(...) ;
You can also explicitly specify the cluster:
CREATE INDEX idx_on_my_view IN CLUSTER active_cluster ON my_view (...);
Usage patterns
Index usage
Unlike some other databases, Materialize can use an index to serve query results
even if the query does not specify a WHERE
condition on the index key. Serving
queries from an index is fast since the results are already up-to-date and in
memory.
For example, consider the following index:
CREATE INDEX idx_orders_view_qty ON orders_view (quantity);
Materialize will maintain the orders_view
in memory in idx_orders_view_qty
,
and it will be able to use the index to serve a various queries on the
orders_view
(and not just queries that specify conditions on
orders_view.quantity
).
Materialize can use the index for the following queries (issued from the same
cluster as the index) on orders_view
:
SELECT * FROM orders_view; -- scans the index
SELECT * FROM orders_view WHERE status = 'shipped'; -- scans the index
SELECT * FROM orders_view WHERE quantity = 10; -- point lookup on the index
For the queries that do not specify a condition on the indexed field, Materialize scans the index. For the query that specifies an equality condition on the indexed field, Materialize performs a point lookup on the index (i.e., reads just the matching records from the index). Point lookups are the most efficient use of an index.
Point lookups
Materialize performs point lookup (i.e., reads just the matching records
from the index) on the index if the query’s WHERE
clause:
-
Specifies equality (
=
orIN
) condition and only equality conditions on all the indexed fields. The equality conditions must specify the exact index key expression (including type) for point lookups. For example:-
If the index is on
round(quantity)
, the query must specify equality condition onround(quantity)
(and not justquanity
) for Materialize to perform a point lookup. -
If the index is on
quantity * price
, the query must specify equality condition onquantity * price
(and notprice * quantity
) for Materialize to perform a point lookup. -
If the index is on the
quantity
field which is an integer, the query must specify an equality condition onquantity
with a value that is an integer.
-
-
Only uses
AND
(conjunction) to combine conditions for different fields.
Point lookups are the most efficient use of an index.
For queries whose WHERE
clause meets the point lookup criteria and includes
conditions on additional fields (also using AND
conjunction), Materialize
performs a point lookup on the index keys and then filters the results using the
additional conditions on the non-indexed fields.
For queries that do not meet the point lookup criteria, Materialize performs a
full index scan (including for range queries). That is, Materialize performs a
full index scan if the WHERE
clause:
- Does not specify all the indexed fields.
- Does not specify only equality conditions on the index fields or specifies an equality condition that specifies a different value type than the index key type.
- Uses OR (disjunction) to combine conditions for different fields.
Full index scans are less efficient than point lookups. The performance of full index scans will degrade with data volume; i.e., as you get more data, full scans will get slower.
Examples
Consider again the following index on a view:
CREATE INDEX idx_orders_view_qty on orders_view (quantity);
The following table shows various queries and whether Materialize performs a point lookup or an index scan.
Query | Index usage |
---|---|
|
Index scan. |
|
Point lookup. |
|
Point lookup. |
|
Point lookup. Query uses OR to combine conditions on the same field.
|
|
Point lookup on quantity , then filter on price .
|
|
Point lookup on quantity , then filter on price .
|
|
Index scan. Query uses OR to combine conditions on different fields.
|
|
Index scan. |
|
Index scan. |
|
Index scan, assuming quantity field in orders_view is an integer.
In the first query, the quantity is implicitly cast to text.
In the second query, the quantity is explicitly cast to text.
|
Consider that the view has an index on the quantity
and price
fields
instead of an index on the quantity
field:
DROP INDEX idx_orders_view_qty;
CREATE INDEX idx_orders_view_qty_price on orders_view (quantity, price);
Query | Index usage |
---|---|
|
Index scan. |
|
Index scan. Query does not include equality conditions on all indexed fields. |
|
Point lookup. |
|
Index scan. Query uses OR to combine conditions on different fields.
|
|
Point lookup. Query uses OR to combine conditions on same field and AND to combine conditions on different fields.
|
|
Point lookup on the index keys quantity and price , then filter on
item .
|
|
Index scan. Query uses OR to combine conditions on different fields.
|
Limitations
Indexes in Materialize do not order their keys using the data type’s natural ordering and instead orders by its internal representation of the key (the tuple of key length and value).
As such, indexes in Materialize currently do not provide optimizations for:
-
Range queries; that is queries using
>
,>=
,<
,<=
,BETWEEN
clauses (e.g.,WHERE quantity > 10
,price >= 10 AND price <= 50
, andWHERE quantity BETWEEN 10 AND 20
). -
GROUP BY
,ORDER BY
andLIMIT
clauses.
Indexes on views vs. materialized views
In Materialize, both indexes on views and materialized views incrementally update the view results when Materialize ingests new data. Whereas materialized views persist the view results in durable storage and can be accessed across clusters, indexes on views compute and store view results in memory within a single cluster.
Maintaining a materialized view in durable storage has resource and latency costs that should be carefully considered depending on the main usage of the view, while maintaining an index has memory costs.
Some general guidelines for usage patterns include:
Usage Pattern | General Guideline |
---|---|
View results are accessed from a single cluster only | View with an index |
View results are accessed across clusters | Materialized view |
Final consumer of the view is a sink or a SUBSCRIBE operation |
Materialized view |
View maintenance and query serving are scaled independently | Materialized view |
Indexes and query optimizations
By making up-to-date results available in memory, indexes can help optimize query performance, such as:
-
Provide faster sequential access than unindexed data.
-
Provide fast random access for lookup queries (i.e., selecting individual keys).
Specific instances where indexes can be useful to improve performance include:
-
When used in ad-hoc queries.
-
When used by multiple queries within the same cluster.
-
When used to enable delta joins.
For more information, see Optimization.
Best practices
Before creating an index, consider the following:
-
If you create stacked views (i.e., views that depend on other views) to reduce SQL complexity, we recommend that you create an index only on the view that will serve results, taking into account the expected data access patterns.
-
Materialize can reuse indexes across queries that concurrently access the same data in memory, which reduces redundancy and resource utilization per query. In particular, this means that joins do not need to store data in memory multiple times.
-
For queries that have no supporting indexes, Materialize uses the same mechanics used by indexes to optimize computations. However, since this underlying work is discarded after each query run, take into account the expected data access patterns to determine if you need to index or not.