EXPLAIN

EXPLAIN displays the plan used for a SELECT statement, a view, or a materialized view.

WARNING! EXPLAIN is not part of Materialize’s stable interface and is not subject to our backwards compatibility guarantee. The syntax and output of EXPLAIN may change arbitrarily in future versions of Materialize.

Conceptual framework

To execute SELECT statements, Materialize generates a plan consisting of operators that interface with our underlying Differential dataflow engine. EXPLAIN lets you see the plan for a given query, which can provide insight into Materialize’s behavior for specific queries, e.g. performance.

Syntax

EXPLAIN RAW DECORRELATED OPTIMIZED PHYSICAL PLAN WITH ( output_modifier , ) AS TEXT JSON FOR select_stmt VIEW MATERIALIZED VIEW view_name

Explained object

The following three objects can be explained.

Explained object Description
select_stmt Display the plan for an ad hoc SELECT statement.
VIEW view_name Display the plan for an existing view.
MATERIALIZED VIEW view_name Display the plan for an existing materialized view.

Output format

You can select between JSON and TEXT for the output format of EXPLAIN. Non-text output is more machine-readable and can be parsed by common graph visualization libraries, while formatted text is more human-readable.

Output type Description
TEXT Format the explanation output as UTF-8 text.
JSON Format the explanation output as a JSON object.

Explained stage

This stage determines the query optimization stage at which the plan snapshot will be taken.

Plan Stage Description
RAW Display the raw plan.
DECORRELATED Display the decorrelated plan.
OPTIMIZED (Default) Display the optimized plan.
PHYSICAL Display the physical plan.

Output modifiers

Output modifiers act as boolean toggles and can be combined in order to slightly tweak the information and rendering style of the generated explanation output.

Modifier Description
arity Annotate each subplan with its number of produced columns. This is useful due to the use of offset-based column names.
join_impls Render details about the implementation strategy of optimized MIR Join nodes.
keys Annotate each subplan with its unique keys.
types Annotate each subplan with its inferred type.

Query compilation pipeline

The job of the Materialize planner is to turn SQL code into a differential dataflow program. We get there via a series of progressively lower-level plans:

SQL ⇒ raw plan ⇒ decorrelated plan ⇒ optimized plan ⇒ physical plan ⇒ dataflow

From SQL to raw plan

In this stage, the planner:

  • Replaces SQL variable names with column numbers.
  • Infers the type of each expression.
  • Choose the correct overload for each function.

From raw plan to decorrelated plan

In this stage, the planner:

  • Replaces subqueries and lateral joins with non-nested operations.
  • Replaces OUTER joins with lower-level operations.
  • Replaces aggregate default values with lower-level operations.

From decorrelated plan to optimized plan

In this stage, the planner performs various optimizing rewrites:

  • Coalesces joins.
  • Chooses join order and implementation.
  • Fuses adjacent operations.
  • Removes redundant operations.
  • Evaluates any operations on constants.

From optimized plan to physical plan

In this stage, the planner:

  • Maps plan operators to differential dataflow operators.
  • Locates existing arrangements which can be reused.

From physical plan to dataflow

In the final stage, the planner:

  • Renders an actual dataflow from the physical plan, and
  • Installs the new dataflow into the running system.

No smart logic runs as part of the rendering step, as the physical plan is meant to be a definitive and complete description of the rendered dataflow.

Reading decorrelated/optimized plans

Materialize plans are directed acyclic graphs of operators. Each operator in the graph receives inputs from zero or more other operators and produces a single output. Sub-graphs where each output is consumed only once are rendered as tree-shaped fragments. Sub-graphs consumed more than once are represented as common table expressions (CTEs). In the example below, the CTE l0 represents a linear sub-plan (a chain of Get, Filter, and Project operators) which is used in both inputs of a self-join.

Return
  Join on=(#1 = #2)
    Get l0
    Get l0
With
  cte l0 =
    Project (#0, #1)
      Filter (#0 > #2)
        Get materialize.public.t

Many operators need to refer to columns in their input. These are displayed like #3 for column number 3. (Columns are numbered starting from column 0). To get a better sense of columns assigned to Map operators, it might be useful to request the arity output modifier.

Each operator can also be annotated with additional metadata. Details are shown by default in the EXPLAIN PHYSICAL PLAN output, but are hidden elsewhere. In EXPLAIN OPTIMIZED PLAN, details about the implementation in the Join operator can be requested with the join_impls output modifier:

Join on=(#1 = #2 AND #3 = #4) type=delta
  implementation
    %0:t » %1:u[#0]KA » %2:v[#0]KA
    %1:u » %0:t[#1]KA » %2:v[#0]KA
    %2:v » %1:u[#1]KA » %0:t[#1]KA
  ArrangeBy keys=[[#1]]
    Get materialize.public.t
  ArrangeBy keys=[[#0], [#1]]
    Get materialize.public.u
  ArrangeBy keys=[[#0]]
    Get materialize.public.v

The %0, %1, etc. refer to each of the join inputs. A differential join shows one join path, which is simply a sequence of binary joins (each of whose results need to be maintained as state). A delta join shows a join path for each of the inputs. The expressions in a bracket show the key for joining with that input. The letters after the brackets indicate the input characteristics used for join ordering. U means unique, the number of Ks means the key length, A means already arranged (e.g., an index exists). The small letters refer to filter characteristics: equality to a literal, like, is null, inequality to a literal, any filter.

A plan can optionally end with a finishing action which can sort, limit and project the result data. This operator is special, as it can only occur at the top of the plan. Finishing actions are executed outside the parallel dataflow that implements the rest of the plan.

Finish order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5 output=[#0, #1]
  CrossJoin
    Get materialize.public.r
    Get materialize.public.s

Finally, simple queries are sometimes implemented using a so-called fast path. In this mode, the program that implements the query will just hit an existing index, transform the results, and optionally apply a finishing action. For fast path queries, all of these actions happen outside of the regular dataflow engine. The fast path is indicated by an “Explained Query (fast path):” heading before the explained query in the EXPLAIN OPTIMIZED PLAN and EXPLAIN PHYSICAL PLAN result.

Explained Query (fast path):
  Finish order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5 output=[#0, #1]
    ReadExistingIndex materialize.public.t_a_idx

Operators in decorrelated and optimized plans

Operator Meaning Example
Constant Always produces the same collection of rows. Constant
- ((1, 2) x 2)
- (3, 4)
Get Produces rows from either an existing source/view or from a previous operator in the same plan. Get materialize.public.ordered
Project Produces a subset of the columns in the input rows. Project (#2, #3)
Map Appends the results of some scalar expressions to each row in the input. Map (((#1 * 10000000dec) / #2) * 1000dec)
FlatMap Appends the result of some table function to each row in the input. FlatMap jsonb_foreach(#3)
Filter Removes rows of the input for which some scalar predicates return false. Filter (#20 < #21)
Join Returns combinations of rows from each input whenever some equality predicates are true. Join on=(#1 = #2)
CrossJoin An alias for a Join with an empty predicate (emits all combinations). CrossJoin
Reduce Groups the input rows by some scalar expressions, reduces each groups using some aggregate functions, and produce rows containing the group key and aggregate outputs. Reduce group_by=[#0] aggregates=[max((#0 * #1))]
Distinct Alias for a Reduce with an empty aggregate list. Distinct
TopK Groups the inputs rows by some scalar expressions, sorts each group using the group key, removes the top offset rows in each group, and returns the next limit rows. TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5
Negate Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. Negate
Threshold Removes any rows with negative counts. Threshold
Union Sums the counts of each row of all inputs. Union
ArrangeBy Indicates a point that will become an arrangement in the dataflow engine (each keys element will be a different arrangement). ArrangeBy keys=[[#0]]
Return … With … Binds sub-plans consumed multiple times by downstream operators. See above

Operators in raw plans

Operator Meaning Example
Constant Always produces the same collection of rows. Constant
- ((1, 2) x 2)
- (3, 4)
Get Produces rows from either an existing source/view or from a previous operator in the same plan. Get materialize.public.ordered
Project Produces a subset of the columns in the input rows. Project (#2, #3)
Map Appends the results of some scalar expressions to each row in the input. Map (((#1 * 10000000dec) / #2) * 1000dec)
CallTable Appends the result of some table function to each row in the input. CallTable generate_series(1, 7, 1)
Filter Removes rows of the input for which some scalar predicates return false. Filter (#20 < #21)
~Join Performs one of INNER / LEFT / RIGHT / FULL OUTER / CROSS join on the two inputs, using the given predicate. InnerJoin (#3 = #5).
Reduce Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs. In the case where the group key is empty and the input is empty, returns a single row with the aggregate functions applied to the empty input collection. Reduce group_by=[#2] aggregates=[min(#0), max(#0)]
Distinct Removes duplicate copies of input rows. Distinct
TopK Groups the inputs rows by some scalar expressions, sorts each group using the group key, removes the top offset rows in each group, and returns the next limit rows. TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5
Negate Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. Negate
Threshold Removes any rows with negative counts. Threshold
Union Sums the counts of each row of all inputs. Union
Return … With … Binds sub-plans consumed multiple times by downstream operators. See above

Timestamps

EXPLAIN TIMESTAMP displays the timestamps used for a SELECT statement, view, or materialized view – valuable information to acknowledge query delays.

The explanation divides in two parts:

  1. Determinations for a timestamp
  2. Sources frontiers

Determinations for a timestamp

Queries in Materialize have a logical timestamp, known as query timestamp. It plays a critical role to return a correct result. Returning a correct result implies retrieving data with the same logical time from each source present in a query.

In this case, sources are objects providing data: materialized views, views, indexes, tables, or sources. Each will have a pair of logical timestamps frontiers, denoted as sources frontiers.

Sources frontiers

Every source has a beginning read frontier and an ending write frontier. They stand for a source’s limits to return a correct result immediately:

  • Read frontier: Indicates the minimum logical timestamp to return a correct result (known as compaction)
  • Write frontier: Indicates the maximum timestamp to build a correct result without waiting unprocessed data.

Having a query timestamp outside the values of the frontiers can explain the presence of delays. While in the middle, the space of processed but not yet compacted data, allows building and returning a correct result immediately.

Example

EXPLAIN TIMESTAMP FOR MATERIALIZED VIEW users;
                                 Timestamp
---------------------------------------------------------------------------
                 query timestamp: 1673618185152 (2023-01-13 13:56:25.152) +
           oracle read timestamp: 1673618185152 (2023-01-13 13:56:25.152) +
 largest not in advance of upper: 1673618185152 (2023-01-13 13:56:25.152) +
                           upper:[1673618185153 (2023-01-13 13:56:25.153)]+
                           since:[1673618184000 (2023-01-13 13:56:24.000)]+
         can respond immediately: true                                    +
                        timeline: Some(EpochMilliseconds)                 +
                                                                          +
 source materialize.public.raw_users (u2014, storage):                    +
                   read frontier:[1673618184000 (2023-01-13 13:56:24.000)]+
                  write frontier:[1673618185153 (2023-01-13 13:56:25.153)]+

Definitions

Field Meaning Example
query timestamp The query timestamp value 1673612424151 (2023-01-13 12:20:24.151)
oracle read The value of the timeline’s oracle timestamp, if used. 1673612424151 (2023-01-13 12:20:24.151)
largest not in advance of upper The largest timestamp not in advance of upper. 1673612424151 (2023-01-13 12:20:24.151)
since The maximum read frontier of all involved sources. [1673612423000 (2023-01-13 12:20:23.000)]
upper The minimum write frontier of all involved sources [1673612424152 (2023-01-13 12:20:24.152)]
can respond immediately Returns true when the query timestamp is greater or equal to since and lower than upper true
timeline The type of timeline the query’s timestamp belongs Some(EpochMilliseconds)
Timeline values
Field Meaning Example
EpochMilliseconds Means the timestamp is the number of milliseconds since the Unix epoch. Some(EpochMilliseconds)
External Means the timestamp comes from an external data source and we don’t know what the number means. The attached String is the source’s name, which will result in different sources being incomparable. Some(External)
User Means the user has manually specified a timeline. The attached String is specified by the user, allowing them to decide sources that are joinable. Some(User)
Sources frontiers
Field Meaning Example
source Source’s identifiers source materialize.public.raw_users (u2014, storage)
read frontier Minimum logical timestamp. [1673612423000 (2023-01-13 12:20:23.000)]
write frontier Maximum logical timestamp. [1673612424152 (2023-01-13 12:20:24.152)]
Back to top ↑