Materialize Documentation
Join the Community github/materialize

SELECT

The SELECT statement is the root of a SQL query, and is used both to bind SQL queries to named views or materialized views, and to interactively query data maintained in Materialize. For interactive queries, you should consider creating indexes on the underlying relations based on common query patterns.

Syntax

WITH cte_ident ( col_ident , ) AS ( select_stmt ) , SELECT ALL DISTINCT ON ( col_ref , ) target_elem , FROM table_expr , join_expr WHERE expr GROUP BY col_ref , OPTION ( option = val , ) HAVING expr ORDER BY col_ref ASC DESC NULLS LAST NULLS FIRST , LIMIT integer OFFSET integer UNION INTERSECT EXCEPT ALL DISTINCT another_select_stmt
Field Use
WITHAS Common table expressions (CTEs) for this query.
( col_ident) Rename the CTE’s columns to the list of identifiers, both of which must be the same length.
ALL Return all rows from query (Default).
DISTINCT Return only distinct values.
DISTINCT ON ( col_ref) Return only the first row with a distinct value for col_ref.
target_elem Return identified columns or functions.
FROM table_ref The tables you want to read from; note that these can also be other SELECT statements or Common Table Expressions (CTEs).
join_expr A join expression; for more details, see the JOIN documentation.
WHERE expression Filter tuples by expression.
GROUP BY col_ref Group aggregations by col_ref.
OPTIONS ( hint_list ) Specify one or more query hints.
HAVING expression Filter aggregations by expression.
ORDER BY col_ref Sort results in either ASC or DESC order (default: ASC).

Use the NULLS FIRST and NULLS LAST options to determine whether nulls appear before or after non-null values in the sort ordering (default: NULLS LAST for ASC, NULLS FIRST for DESC).

LIMIT Limit the number of returned results to integer.
OFFSET Skip the first integer number of rows.
UNION Records present in select_stmt or another_select_stmt.

DISTINCT returns only unique rows from these results (implied default).

With ALL specified, each record occurs a number of times equal to the sum of the times it occurs in each input statement.
INTERSECT Records present in both select_stmt and another_select_stmt.

DISTINCT returns only unique rows from these results (implied default).

With ALL specified, each record occurs a number of times equal to the lesser of the times it occurs in each input statement.
EXCEPT Records present in select_stmt but not in another_select_stmt.

DISTINCT returns only unique rows from these results (implied default).

With ALL specified, each record occurs a number of times equal to the times it occurs in select_stmt less the times it occurs in another_select_stmt, or not at all if the former is greater than latter.

Details

Because Materialize works very differently from a traditional RDBMS, it’s important to understand the implications that certain features of SELECT will have on Materialize.

Creating materialized views

Creating a materialized view generates a persistent dataflow, which has a different performance profile from performing a SELECT in an RDBMS.

A materialized view has resource and latency costs that should be carefully considered depending on its main usage. Materialize must maintain the results of the query in durable storage, but often it must also maintain additional intermediate state.

Reading from indexed relations

Performing a SELECT on an indexed source, view or materialized view is Materialize’s ideal operation. When Materialize receives such a SELECT query, it quickly returns the maintained results from memory.

Materialize also quickly returns results for queries that only filter, project, and re-order results.

Ad hoc queries

Queries over non-materialized views will create an ephemeral dataflow to compute the results. These dataflows are bound to the active cluster, which you can change using:

SET cluster = <cluster name>;

Performing a SELECT query that does not directly read out of a dataflow requires Materialize to evaluate your query. Materialize will construct a temporary dataflow to materialize your query, and remove the dataflow as soon as it returns the query results to you.

Common table expressions (CTEs)

Common table expressions, also known as CTEs or WITH queries, create aliases for statements that subsequent expressions can refer to (including subsequent CTEs). This can enhance legibility of complex queries, but doesn’t alter the queries' semantics.

For an example, see Using CTEs.

Known limitations

CTEs have the following limitations, which we are working to improve:

Query hints

Users can specify any query hints to help Materialize optimize query planning more efficiently.

The following query hints are valid within the OPTION clause.

Hint Value type Description
EXPECTED GROUP SIZE int How many rows will have the same group key. Materialize can render min and max expressions more efficiently with this information.

For an example, see Using query hints.

Column references

Within a given SELECT statement, we refer to the columns from the tables in the FROM clause as the input columns, and columns in the SELECT list as the output columns.

Expressions in the SELECT list, WHERE clause, and HAVING clause may refer only to input columns.

Column references in the ORDER BY and DISTINCT ON clauses may be the name of an output column, the ordinal number of an output column, or an arbitrary expression of only input columns. If an unqualified name refers to both an input and output column, ORDER BY chooses the output column.

Column references in the GROUP BY clause may be the name of an output column, the ordinal number of an output column, or an arbitrary expression of only input columns. If an unqualified name refers to both an input and output column, GROUP BY chooses the input column.

Examples

Creating a view

This assumes you’ve already created a source.

The following query creates a materialized view representing the total of all purchases made by users per region.

CREATE MATERIALIZED VIEW mat_view AS
    SELECT region.id, sum(purchase.total)
    FROM mysql_simple_purchase AS purchase
    JOIN mysql_simple_user AS user ON purchase.user_id = user.id
    JOIN mysql_simple_region AS region ON user.region_id = region.id
    GROUP BY region.id;

In this case, Materialized will create a dataflow to maintain the results of this query, and that dataflow will live on until the view it’s maintaining is dropped.

Reading from a view

Assuming you create the view listed above, named mat_view:

SELECT * FROM mat_view

In this case, Materialized simply returns the results of the dataflow you created to maintain the view.

Querying views

SELECT region.id, sum(purchase.total)
FROM mysql_simple_purchase AS purchase
JOIN mysql_simple_user AS user ON purchase.user_id = user.id
JOIN mysql_simple_region AS region ON user.region_id = region.id
GROUP BY region.id;

In this case, Materialized will spin up the same dataflow as it did for creating a materialized view, but it will tear down the dataflow once it’s returned its results to the client. If you regularly want to view the results of this query, you may want to create a view for it.

Using CTEs

WITH
  regional_sales (region, total_sales) AS (
    SELECT region, sum(amount)
    FROM orders
    GROUP BY region
  ),
  top_regions AS (
    SELECT region
    FROM regional_sales
    ORDER BY total_sales DESC
    LIMIT 5
  )
SELECT region,
       product,
       SUM(quantity) AS product_units,
       SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

Both regional_sales and top_regions are CTEs. You could write a query that produces the same results by replacing references to the CTE with the query it names, but the CTEs make the entire query simpler to understand.

With regard to dataflows, this is similar to Querying views above: Materialize tears down the created dataflow after returning the results.

Using query hints

SELECT a,
       min(b) AS min
FROM example
GROUP BY a
OPTIONS (EXPECTED GROUP SIZE = 100)

Here the hint indicates that there may be up to a hundred distinct (a, b) pairs for each a value, and Materialize can optimize its dataflow rendering with that knowledge.