SELECT

View as Markdown

The SELECT statement is the root of a SQL query, and is used both to bind SQL queries to named views or materialized views, and to interactively query data maintained in Materialize. For interactive queries, you should consider creating indexes on the underlying relations based on common query patterns.

Syntax

[WITH <cte_binding> [, ...]]
SELECT [ALL | DISTINCT [ON ( <col_ref> [, ...] )]]
  <target_elem> [, ...]
[FROM <table_expr> [, ...] [<join_expr>]]
[WHERE <expression>]
[GROUP BY <col_ref> [, ...]]
[OPTIONS ( <option> = <val> [, ...] )]
[HAVING <expression>]
[ORDER BY <col_ref> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...]]
[LIMIT <expression>]
[OFFSET <integer>]
[{UNION | INTERSECT | EXCEPT} [ALL | DISTINCT] <another_select_stmt>]

Syntax element Description

WITH <cte_binding> [, …] Optional. Common table expressions (CTEs) for this query. See Regular CTEs for details.

ALL | DISTINCT [ON ( <col_ref> [, …] )]

Optional. Specifies which rows to return:

Option	Description
`ALL`	Return all rows from query (default).
`DISTINCT`	Return only distinct values.
`DISTINCT ON ( <col_ref> [, ...] )`	Return only the first row with a distinct value for `<col_ref>`. If an `ORDER BY` clause is also present, then `DISTINCT ON` will respect that ordering when choosing which row to return for each distinct value. You should start the `ORDER BY` clause with the same `<col_ref>` as the `DISTINCT ON` clause.

<target_elem> [, …] The columns or expressions to return. Can include column names, functions, or expressions.

FROM <table_expr> [, …] The tables you want to read from. These can be table names, other SELECT statements, Common Table Expressions (CTEs), or table function calls.

<join_expr> Optional. A join expression to combine table expressions. For more details, see the JOIN documentation.

WHERE <expression> Optional. Filter tuples by <expression>.

GROUP BY <col_ref> [, …] Optional. Group aggregations by <col_ref>. Column references may be the name of an output column, the ordinal number of an output column, or an arbitrary expression of only input columns.

OPTIONS ( <option> = <val> [, …] )

Optional. Specify one or more query hints. Valid hints:

Hint	Value type	Description
`AGGREGATE INPUT GROUP SIZE`	`uint8`	How many rows will have the same group key in an aggregation. Materialize can render `min` and `max` expressions more efficiently with this information.
`DISTINCT ON INPUT GROUP SIZE`	`uint8`	How many rows will have the same group key in a `DISTINCT ON` expression. Materialize can render Top K patterns based on `DISTINCT ON` more efficiently with this information.
`LIMIT INPUT GROUP SIZE`	`uint8`	How many rows will be given as a group to a `LIMIT` restriction. Materialize can render Top K patterns based on `LIMIT` more efficiently with this information.

HAVING <expression> Optional. Filter aggregations by <expression>.

ORDER BY <col_ref> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, …] Optional. Sort results in either ASC (default) or DESC order. Use the NULLS FIRST and NULLS LAST options to determine whether nulls appear before or after non-null values in the sort ordering (default: NULLS LAST for ASC, NULLS FIRST for DESC). Column references may be the name of an output column, the ordinal number of an output column, or an arbitrary expression of only input columns.

LIMIT <expression> Optional. Limit the number of returned results to <expression>.

OFFSET <integer> Optional. Skip the first <integer> number of rows.

UNION [ALL | DISTINCT] <another_select_stmt> Optional. Records present in select_stmt or another_select_stmt. DISTINCT returns only unique rows from these results (implied default). With ALL specified, each record occurs a number of times equal to the sum of the times it occurs in each input statement.

INTERSECT [ALL | DISTINCT] <another_select_stmt> Optional. Records present in both select_stmt and another_select_stmt. DISTINCT returns only unique rows from these results (implied default). With ALL specified, each record occurs a number of times equal to the lesser of the times it occurs in each input statement.

EXCEPT [ALL | DISTINCT] <another_select_stmt> Optional. Records present in select_stmt but not in another_select_stmt. DISTINCT returns only unique rows from these results (implied default). With ALL specified, each record occurs a number of times equal to the times it occurs in select_stmt less the times it occurs in another_select_stmt, or not at all if the former is greater than latter.

Common table expressions (CTEs)

Regular CTEs

WITH <cte_ident> [( <col_ident> [, ...] )] AS ( <select_stmt> )
  [, <cte_ident> [( <col_ident> [, ...] )] AS ( <select_stmt> ) [, ...]]
<select_stmt>

Syntax element	Description
`<cte_ident>`	The name of the common table expression (CTE).
( `<col_ident>` [, …] )	Optional. Rename the CTE’s columns to the list of identifiers. The number of identifiers must match the number of columns returned by the CTE’s `select_stmt`.
AS ( `<select_stmt>` )	The `SELECT` statement that defines the CTE. Any `cte_ident` alias can be referenced in subsequent `cte_binding` definitions and in the final `select_stmt`.

Recursive CTEs

WITH MUTUALLY RECURSIVE
  [((RETURN AT | ERROR AT) RECURSION LIMIT <limit>)]
  <cte_ident> ( <col_ident> <col_type> [, ...] ) AS ( <select_stmt> )
  [, <cte_ident> ( <col_ident> <col_type> [, ...] ) AS ( <select_stmt> ) [, ...]]
<select_stmt>

Syntax element Description

(RETURN AT | ERROR AT) RECURSION LIMIT <limit>

Optional. Control the recursion behavior:

Option	Description
`RETURN AT RECURSION LIMIT <limit>`	Stop the fixpoint computation after `<limit>` iterations and use the current values computed for each recursive CTE binding in the `select_stmt`. Useful when debugging and validating the correctness of recursive queries.
`ERROR AT RECURSION LIMIT <limit>`	Stop the fixpoint computation after `<limit>` iterations and fail the query with an error. A good safeguard against accidentally running a non-terminating dataflow in production clusters.

<cte_ident> ( <col_ident> <col_type> [, …] ) A binding that gives the SQL fragment defined under select_stmt a cte_ident alias. Unlike regular CTEs, a recursive CTE binding must explicitly state its type as a comma-separated list of (col_ident col_type) pairs. This alias can be used in the same binding or in all other (preceding and subsequent) bindings in the enclosing recursive CTE block.

AS ( <select_stmt> ) The SELECT statement that defines the recursive CTE. Any cte_ident alias can be referenced in all recursive_cte_binding definitions that live under the same block, as well as in the final select_stmt for that block.

For details and examples, see the Recursive CTEs page.

Details

Because Materialize works very differently from a traditional RDBMS, it’s important to understand the implications that certain features of SELECT will have on Materialize.

Creating materialized views

Creating a materialized view generates a persistent dataflow, which has a different performance profile from performing a SELECT in an RDBMS.

A materialized view has resource and latency costs that should be carefully considered depending on its main usage. Materialize must maintain the results of the query in durable storage, but often it must also maintain additional intermediate state.

Creating indexes

Creating an index also generates a persistent dataflow. The difference from a materialized view is that the results are maintained in memory rather than on persistent storage. This allows ad hoc queries to perform efficient point-lookups in indexes.

Ad hoc queries

An ad hoc query (a.k.a. one-off SELECT) simply performs the query once and returns the results. Ad hoc queries can either read from an existing index, or they can start an ephemeral dataflow to compute the results.

Performing a SELECT on an indexed source, view or materialized view is Materialize’s ideal operation. When Materialize receives such a SELECT query, it quickly returns the maintained results from memory. Materialize also quickly returns results for queries that only filter, project, transform with scalar functions, and re-order data that is maintained by an index.

Queries that can’t simply read out from an index will create an ephemeral dataflow to compute the results. These dataflows are bound to the active cluster, which you can change using:

SET cluster = <cluster name>;

Materialize will remove the dataflow as soon as it has returned the query results to you.

Known limitations

CTEs have the following limitations, which we are working to improve:

INSERT/UPDATE/DELETE (with RETURNING) is not supported inside a CTE.
SQL99-compliant WITH RECURSIVE CTEs are not supported (use the non-standard flavor instead).

Query hints

Users can specify query hints to help Materialize optimize queries.

The following query hints are valid within the OPTIONS clause.

Hint	Value type	Description
`AGGREGATE INPUT GROUP SIZE`	`uint8`	How many rows will have the same group key in an aggregation. Materialize can render `min` and `max` expressions more efficiently with this information.
`DISTINCT ON INPUT GROUP SIZE`	`uint8`	How many rows will have the same group key in a `DISTINCT ON` expression. Materialize can render Top K patterns based on `DISTINCT ON` more efficiently with this information. To determine the query hint size, see `EXPLAIN ANALYZE HINTS`.
`LIMIT INPUT GROUP SIZE`	`uint8`	How many rows will be given as a group to a `LIMIT` restriction. Materialize can render Top K patterns based on `LIMIT` more efficiently with this information.

For examples, see the Optimization page.

Column references

Within a given SELECT statement, we refer to the columns from the tables in the FROM clause as the input columns, and columns in the SELECT list as the output columns.

Expressions in the SELECT list, WHERE clause, and HAVING clause may refer only to input columns.

Column references in the ORDER BY and DISTINCT ON clauses may be the name of an output column, the ordinal number of an output column, or an arbitrary expression of only input columns. If an unqualified name refers to both an input and output column, ORDER BY chooses the output column.

Column references in the GROUP BY clause may be the name of an output column, the ordinal number of an output column, or an arbitrary expression of only input columns. If an unqualified name refers to both an input and output column, GROUP BY chooses the input column.

Connection pooling

Because Materialize is wire-compatible with PostgreSQL, you can use any PostgreSQL connection pooler with Materialize. For example in using PgBouncer, see Connection Pooling.

Examples

Creating an indexed view

This assumes you’ve already created a source.

The following query creates a view representing the total of all purchases made by users per region, and then creates an index on this view.

CREATE VIEW purchases_by_region AS
    SELECT region.id, sum(purchase.total)
    FROM mysql_simple_purchase AS purchase
    JOIN mysql_simple_user AS user ON purchase.user_id = user.id
    JOIN mysql_simple_region AS region ON user.region_id = region.id
    GROUP BY region.id;

CREATE INDEX purchases_by_region_idx ON purchases_by_region(id);

In this case, Materialize will create a dataflow to maintain the results of this query, and that dataflow will live on until the index it’s maintaining is dropped.

Reading from a view

Assuming you’ve created the indexed view listed above, named purchases_by_region, you can simply read from the index with an ad hoc SELECT query:

SELECT * FROM purchases_by_region;

In this case, Materialize simply returns the results that the index is maintaining, by reading from memory.

Ad hoc querying

SELECT region.id, sum(purchase.total)
FROM mysql_simple_purchase AS purchase
JOIN mysql_simple_user AS user ON purchase.user_id = user.id
JOIN mysql_simple_region AS region ON user.region_id = region.id
GROUP BY region.id;

In this case, Materialize will spin up a similar dataflow as it did for creating the above indexed view, but it will tear down the dataflow once it’s returned its results to the client. If you regularly want to view the results of this query, you may want to create an index (in memory) and/or a materialized view (on persistent storage) for it.

Using regular CTEs

WITH
  regional_sales (region, total_sales) AS (
    SELECT region, sum(amount)
    FROM orders
    GROUP BY region
  ),
  top_regions AS (
    SELECT region
    FROM regional_sales
    ORDER BY total_sales DESC
    LIMIT 5
  )
SELECT region,
       product,
       SUM(quantity) AS product_units,
       SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

Both regional_sales and top_regions are CTEs. You could write a query that produces the same results by replacing references to the CTE with the query it names, but the CTEs make the entire query simpler to understand.

With regard to dataflows, this is similar to ad hoc querying above: Materialize tears down the created dataflow after returning the results.

Privileges

The privileges required to execute this statement are:

SELECT privileges on all directly referenced relations in the query. If the directly referenced relation is a view or materialized view:
SELECT privileges are required only on the directly referenced view/materialized view. SELECT privileges are not required for the underlying relations referenced in the view/materialized view definition unless those relations themselves are directly referenced in the query.
However, the owner of the view/materialized view (including those with superuser privileges) must have all required SELECT and USAGE privileges to run the view definition regardless of who is selecting from the view/materialized view.
USAGE privileges on the schemas that contain the relations in the query.
USAGE privileges on the active cluster.

JOIN

Recursive CTEs

SELECT

Syntax

Common table expressions (CTEs)

Regular CTEs

Recursive CTEs

Details

Creating materialized views

Creating indexes

Ad hoc queries

Known limitations

Query hints

Column references

Connection pooling

Examples

Creating an indexed view

Reading from a view

Ad hoc querying

Using regular CTEs

Privileges

Related pages