Materialize Documentation
s
Join the Community github/materialize

CREATE INDEX

CREATE INDEX creates an in-memory index on a source or view.

Conceptual framework

Indexes assemble and maintain in memory a query’s results, which can provide future queries the data they need pre-arranged in a format they can immediately use. In particular, this can be very helpful for the JOIN operator which needs to build and maintain the appropriate indexes if they do not otherwise exist. For more information, see Key Concepts: Indexes.

When to create indexes

You might want to create indexes when…

Syntax

CREATE INDEX index_name ON obj_name USING method ( col_expr , ) DEFAULT INDEX ON obj_name USING method WITH ( field = val , )
Field Use
DEFAULT Creates a default index with the same structure as the index automatically created with CREATE MATERIALIZED VIEW. This provides a simple method to convert a non-materialized object to a materialized one.
index_name A name for the index.
obj_name The name of the source or view on which you want to create an index.
method The name of the index method to use. The only supported method is arrangement.
col_expr The expressions to use as the key for the index.
field The name of the option you want to set.
val The value for the option.

Changed in v0.7.1: The WITH (field = val, ...) clause was added to allow setting index options when creating the index.

New in v0.23.0: The USING clause.

WITH options

The following option is valid within the WITH clause:

Name Permitted values Default value Description
logical_compaction_window SQL interval string ‘1ms’ Overrides the logical compaction window for the data stored in this index. The default value is controlled by the --logical-compaction-window command-line option.

Details

Restrictions

Structure

Indexes in Materialize have the following structure for each unique row:

((tuple of indexed expressions), (tuple of the row, i.e. stored columns))

Indexed expressions vs. stored columns

Automatically created indexes will use all columns as key expressions for the index, unless Materialize is provided or can infer a unique key for the source or view.

For instance, unique keys can be…

When creating your own indexes, you can choose the indexed expressions.

Memory footprint

The in-memory sizes of indexes are proportional to the current size of the source or view they represent. The actual amount of memory required depends on several details related to the rate of compaction and the representation of the types of data in the source or view. We are working on a feature to let you see the size each index consumes (#1532).

Creating an index may also force the first materialization of a view, which may cause Materialize to install a dataflow to determine and maintain the results of the view. This dataflow may have a memory footprint itself, in addition to that of the index.

Examples

Optimizing joins with indexes

You can optimize the performance of JOIN on two relations by ensuring their join keys are the key columns in an index.

CREATE MATERIALIZED VIEW active_customers AS
    SELECT guid, geo_id, last_active_on
    FROM customer_source
    WHERE last_active_on > now() - INTERVAL '30' DAYS;

CREATE INDEX active_customers_geo_idx ON active_customers (geo_id);

CREATE MATERIALIZED VIEW active_customer_per_geo AS
    SELECT geo.name, count(*)
    FROM geo_regions AS geo
    JOIN active_customers ON active_customers.geo_id = geo.id
    GROUP BY geo.name;

In the above example, the index active_customers_geo_idx

Materializing views

You can convert a non-materialized view into a materialized view by adding an index to it.

CREATE VIEW active_customers AS
    SELECT guid, geo_id, last_active_on
    FROM customer_source
    WHERE last_active_on > now() - INTERVAL '30' DAYS;

CREATE INDEX active_customers_primary_idx ON active_customers (guid);

Note that this index is different than the primary index that Materialize would automatically create if you had used CREATE MATERIALIZED VIEW. Indexes that are automatically created contain an index of all columns in the result set, unless they contain a unique key. (Remember that indexes store a copy of a row’s indexed columns and a copy of the entire row.)

Speed up filtering with indexes

You can set up an index over a column were filtering by literal values or expressions are common to improve the performance.

CREATE MATERIALIZED VIEW active_customers AS
    SELECT guid, geo_id, last_active_on
    FROM customer_source
    GROUP BY geo_id;

CREATE INDEX active_customers_idx ON active_customers (guid);

SELECT * FROM active_customers WHERE guid = 'd868a5bf-2430-461d-a665-40418b1125e7';

-- Using expressions
CREATE INDEX active_customers_exp_idx ON active_customers (upper(guid));

SELECT * FROM active_customers WHERE upper(guid) = 'D868A5BF-2430-461D-A665-40418B1125E7';

-- Filter using an expression in one field and a literal in another field
CREATE INDEX active_customers_exp_field_idx ON active_customers (upper(guid), geo_id);

SELECT * FROM active_customers WHERE upper(guid) = 'D868A5BF-2430-461D-A665-40418B1125E7' and geo_id = 'ID_8482';

Create an index with an expression to improve query performance over a frequent used expression and avoid building downstream views to apply the function like the one used in the example: upper(). Take into account that aggregations like count() are not possible to use as expressions.