EXPLAIN PLAN

EXPLAIN PLAN displays the plans used for:


`SELECT` statements `CREATE VIEW` statements `CREATE INDEX` statements `CREATE MATERIALIZED VIEW` statements	Existing views Existing indexes Existing materialized views

WARNING! EXPLAIN is not part of Materialize’s stable interface and is not subject to our backwards compatibility guarantee. The syntax and output of EXPLAIN may change arbitrarily in future versions of Materialize.

Syntax

EXPLAIN [ [ RAW | DECORRELATED | [LOCALLY] OPTIMIZED | PHYSICAL ] PLAN
    [ WITH (<output_modifier> [, <output_modifier> ...])]
    [ AS TEXT | AS JSON ]
FOR ]       -- The FOR keyword is required if the PLAN keyword is specified
    <SELECT ...>
;

EXPLAIN <RAW | DECORRELATED | LOCALLY OPTIMIZED> PLAN
    [ WITH (<output_modifier> [, <output_modifier> ...]) ]
    [ AS TEXT | AS JSON ]
FOR
    <CREATE VIEW ...>
;

EXPLAIN [ [ OPTIMIZED | PHYSICAL ] PLAN
    [ WITH (<output_modifier> [, <output_modifier> ...]) ]
    [ AS TEXT | AS JSON ]
FOR ]  -- The FOR keyword is required if the PLAN keyword is specified
    <CREATE INDEX ...>
;

EXPLAIN [ [ RAW | DECORRELATED | [LOCALLY] OPTIMIZED | PHYSICAL ] PLAN
    [ WITH (<output_modifier> [, <output_modifier> ...])]
    [ AS TEXT | AS JSON ]
FOR ]          -- The FOR keyword is required if the PLAN keyword is specified
    <CREATE MATERIALIZED VIEW ...>
;

EXPLAIN <RAW | LOCALLY OPTIMIZED> PLAN
    [ WITH (<output_modifier> [, <output_modifier> ...])]
    [ AS TEXT | AS JSON ]
FOR
  VIEW <name>
;

EXPLAIN [ [ OPTIMIZED | PHYSICAL ] PLAN
      [ WITH (<output_modifier> [, <output_modifier> ...]) ]
      [ AS TEXT | AS JSON ]
FOR ]  -- The FOR keyword is required if the PLAN keyword is specified
  INDEX <name>
;

EXPLAIN [[ RAW | [LOCALLY] OPTIMIZED | PHYSICAL ] PLAN
    [ WITH (<output_modifier> [, <output_modifier> ...]) ]
    [ AS TEXT | AS JSON ]
FOR ] -- The FOR keyword is required if the PLAN keyword is specified
  MATERIALIZED VIEW <name>
;

Note that the FOR keyword is required if the PLAN keyword is present. The following four statements are equivalent:

EXPLAIN <explainee>;
EXPLAIN PLAN FOR <explainee>;
EXPLAIN PHYSICAL PLAN FOR <explainee>;
EXPLAIN PHYSICAL PLAN AS TEXT FOR <explainee>;

Explained object

The following object types can be explained.

Explained object	Description
select_stmt	Display a plan for an ad-hoc `SELECT` statement.
create_view	Display a plan for a `CREATE VIEW` statement.
create_index	Display a plan for a `CREATE INDEX` statement.
create_materialized_view	Display a plan for a `CREATE MATERIALIZED VIEW` statement.
VIEW name	Display the `RAW` or `LOCALLY OPTIMIZED` plan for an existing view.
INDEX name	Display the `OPTIMIZED` or `PHYSICAL` plan for an existing index.
MATERIALIZED VIEW name	Display the `OPTIMIZED` or `PHYSICAL` plan for an existing materialized view.

Output format

You can select between JSON and TEXT for the output format of EXPLAIN PLAN. Non-text output is more machine-readable and can be parsed by common graph visualization libraries, while formatted text is more human-readable.

Output type	Description
TEXT	Format the explanation output as UTF-8 text.
JSON	Format the explanation output as a JSON object.

Explained stage

This stage determines the query optimization stage at which the plan snapshot will be taken.

Plan Stage	Description
RAW PLAN	Display the raw plan; this is closest to the original SQL.
DECORRELATED PLAN	Display the decorrelated but not-yet-optimized plan.
LOCALLY OPTIMIZED	Display the locally optimized plan (before view inlining and access path selection). This is the final stage for regular `CREATE VIEW` optimization.
OPTIMIZED PLAN	Display the optimized plan.
PHYSICAL PLAN	(Default) Display the physical plan; this corresponds to the operators shown in `mz_introspection.mz_lir_mapping`.

Output modifiers

Output modifiers act as boolean toggles and can be combined in order to slightly tweak the information and rendering style of the generated explanation output.

Modifier	Description
arity	(on by default) Annotate each subplan with its number of produced columns. This is useful due to the use of offset-based column names.
cardinality	Annotate each subplan with a symbolic estimate of its cardinality.
join implementations	Render details about the implementation strategy of optimized MIR `Join` nodes.
keys	Annotates each subplan with a parenthesized list of unique keys. Each unique key is presented as a bracketed list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, `([0], [1,2])` is a list of two unique keys: column zero is a unique key, and columns 1 and 2 also form a unique key. Materialize only reports the most succinct form of keys, so for example while `[0]` and `[0, 1]` might both be unique keys, the latter is implied by the former and omitted. `()` indicates that the collection does not have any unique keys, while `([])` indicates that the empty projection is a unique key, meaning that the collection consists of 0 or 1 rows.
node identifiers	Annotate each subplan in a `PHYSICAL PLAN` with its node ID.
redacted	Anonymize literals in the output.
timing	Annotate the output with the optimization time.
types	Annotate each subplan with its inferred type.
humanized expressions	(on by default) Add human-readable column names to column references. For example, `#0{id}` refers to column 0, whose name is `id`. Note that SQL-level aliasing is not considered when inferring column names, which means that the displayed column names can be ambiguous.
filter pushdown	(on by default) For each source, include a `pushdown` field that explains which filters can be pushed down to the storage layer.

Note that most modifiers are currently only supported for the AS TEXT output.

Details

To execute SELECT statements, Materialize generates a plan consisting of operators that interface with our underlying Differential dataflow engine. EXPLAIN PLAN lets you see the plan for a given query, which can provide insight into Materialize’s behavior for specific queries, e.g. performance.

Query compilation pipeline

The job of the Materialize planner is to turn SQL code into a differential dataflow program. We get there via a series of progressively lower-level plans:

SQL ⇒ raw plan ⇒ decorrelated plan ⇒ optimized plan ⇒ physical plan ⇒ dataflow

From SQL to raw plan

In this stage, the planner:

Replaces SQL variable names with column numbers.
Infers the type of each expression.
Chooses the correct overload for each function.

From raw plan to decorrelated plan

In this stage, the planner:

Replaces subqueries and lateral joins with non-nested operations.
Replaces OUTER joins with lower-level operations.
Replaces global aggregate default values with lower-level operations.

From decorrelated plan to optimized plan

In this stage, the planner performs various optimizing rewrites:

Coalesces joins.
Chooses join order and implementation.
Fuses adjacent operations.
Removes redundant operations.
Pushes down predicates.
Evaluates any operations on constants.

From optimized plan to physical plan

In this stage, the planner:

Decides on the exact execution details of each operator, and maps plan operators to differential dataflow operators.
Makes the final choices about creating or reusing arrangements.

From physical plan to dataflow

In the final stage, the planner:

Renders an actual dataflow from the physical plan, and
Installs the new dataflow into the running system.

The rendering step does not make any further optimization choices, as the physical plan is meant to be a definitive and complete description of the rendered dataflow.

Fast path queries

Queries are sometimes implemented using a fast path. In this mode, the program that implements the query will just hit an existing index, transform the results, and optionally apply a finishing action. For fast path queries, all of these actions happen outside of the regular dataflow engine. The fast path is indicated by an “Explained Query (fast path):” heading before the explained query in the EXPLAIN OPTIMIZED PLAN and EXPLAIN PHYSICAL PLAN result.

Explained Query (fast path):
  Project (#0, #1)
    ReadIndex on=materialize.public.t1 t1_x_idx=[lookup value=(5)]

Used Indexes:
  - materialize.public.t1_x_idx (lookup)

Reading decorrelated and optimized plans

Materialize plans are directed, potentially cyclic, graphs of operators. Each operator in the graph receives inputs from zero or more other operators and produces a single output. Sub-graphs where each output is consumed only once are rendered as tree-shaped fragments. Sub-graphs consumed more than once are represented as common table expressions (CTEs). In the example below, the CTE l0 represents a linear sub-plan (a chain of Read from the table t) which is used in both inputs of a self-join (Differential Join).

> CREATE TABLE t(x INT NOT NULL, y INT NOT NULL);
CREATE TABLE
> EXPLAIN SELECT t1.x, t1.y
          FROM (SELECT * FROM t WHERE x > y) AS t1,
               (SELECT * FROM t where x > y) AS t2
          WHERE t1.y = t2.y;
                     Physical Plan
--------------------------------------------------------
 Explained Query:                                      +
   →With                                               +
     cte l0 =                                          +
       →Read materialize.public.t                      +
   →Return                                             +
     →Differential Join %0 » %1                        +
       Join stage %0: Lookup key #0{y} in %1           +
       →Arrange                                        +
         Keys: 1 arrangement available, plus raw stream+
           Arrangement 0: #1{y}                        +
         →Stream l0                                    +
       →Arrange                                        +
         Keys: 1 arrangement available, plus raw stream+
           Arrangement 0: #0{y}                        +
         →Read l0                                      +
                                                       +
 Source materialize.public.t                           +
   filter=((#0{x} > #1{y}))                            +
                                                       +
 Target cluster: quickstart                            +

Note that CTEs in optimized plans do not directly correspond to CTEs in your original SQL query: For example, CTEs might disappear due to inlining (i.e., when a CTE is used only once, its definition is copied to that usage site); new CTEs can appear due to the optimizer recognizing that a part of the query appears more than once (aka common subexpression elimination). Also, certain SQL-level concepts, such as outer joins or subqueries, do not have an explicit representation in optimized plans, and are instead expressed as a pattern of operators involving CTEs. CTE names are always l0, l1, l2, …, and do not correspond to SQL-level CTE names.

Many operators need to refer to columns in their input. These are displayed like #3 for column number 3. (Columns are numbered starting from column 0). To get a better sense of columns assigned to Map operators, it might be useful to request the arity output modifier.

Each operator can also be annotated with additional metadata. Details are shown by default in the EXPLAIN PHYSICAL PLAN output (the default), but are hidden elsewhere. In EXPLAIN OPTIMIZED PLAN, details about the implementation in the Join operator can be requested with the join implementations output modifier (that is, EXPLAIN OPTIMIZED PLAN WITH (join implementations) FOR ...).

Join on=(#1 = #2 AND #3 = #4) type=delta
  implementation
    %0:t » %1:u[#0]K » %2:v[#0]K
    %1:u » %0:t[#1]K » %2:v[#0]K
    %2:v » %1:u[#1]K » %0:t[#1]K
  ArrangeBy keys=[[#1]]
    ReadStorage materialize.public.t
  ArrangeBy keys=[[#0], [#1]]
    ReadStorage materialize.public.u
  ArrangeBy keys=[[#0]]
    ReadStorage materialize.public.v

The %0, %1, etc. refer to each of the join inputs. A differential join shows one join path, which is simply a sequence of binary joins (each of whose results need to be maintained as state). A delta join shows a join path for each of the inputs. The expressions in a bracket show the key for joining with that input. The letters after the brackets indicate the input characteristics used for join ordering. U means unique, the number of Ks means the key length, A means already arranged (e.g., an index exists). The small letters refer to filter characteristics: equality to a literal, like, is null, inequality to a literal, any filter.

A plan can optionally end with a finishing action, which can sort, limit and project the result data. This operator is special, as it can only occur at the top of the plan. Finishing actions are executed outside the parallel dataflow that implements the rest of the plan.

Finish order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5 output=[#0, #1]
  CrossJoin
    ReadStorage materialize.public.r
    ReadStorage materialize.public.s

Below the plan, a “Used indexes” section indicates which indexes will be used by the query, and in what way.

Reference: Plan operators

Materialize offers several output formats for EXPLAIN and debugging. LIR plans as rendered in mz_introspection.mz_lir_mapping are deliberately succinct, while the plans in other formats give more detail.

The decorrelated and optimized plans from EXPLAIN DECORRELATED PLAN FOR ..., EXPLAIN LOCALLY OPTIMIZED PLAN FOR ..., and EXPLAIN OPTIMIZED PLAN FOR ... are in a mid-level representation that is closer to LIR than SQL. The raw plans from EXPLAIN RAW PLAN FOR ... are closer to SQL (and therefore less indicative of how the query will actually run).

The following table lists the operators that are available in the LIR plan.

For those operators that require memory to maintain intermediate state, Uses memory is marked with Yes.
For those operators that expand the data size (either rows or columns), Can increase data size is marked with Yes.

Operator	Description	Example
Constant	Always produces the same collection of rows. Can increase data size: No Uses memory: No	`→Constant (2 rows)`
Stream, Arranged, Index Lookup, Read	Produces rows from either an existing relation (source/view/materialized view/table) or from a previous CTE in the same plan. A parent `Fused Map/Filter/Project` operator can combine with this operator. There are four types of `Get`. `Stream` indicates that the results are not arranged in memory and will be streamed directly. `Arranged` indicates that the results are arranged in memory. `Index Lookup` indicates the results will be looked up in an existing [arrangement]((/get-started/arrangements/#arrangements). `Read` indicates that the results are unarranged, and will be processed as they arrive. Can increase data size: No Uses memory: No	`Arranged materialize.public.t`
Map/Filter/Project	Computes new columns (maps), filters columns, and projects away columns. Works row-by-row. Maps and filters will be printed, but projects will not. These may be marked as `Fused` `Map/Filter/Project`, which means they will combine with the operator beneath them to run more efficiently. Can increase data size: Each row may have more data, from the `Map`. Each row may also have less data, from the `Project`. There may be fewer rows, from the `Filter`. Uses memory: No	`→Map/Filter/Project Filter: (#0{a} < 7) Map: (#0{a} + #1{b})`
Table Function	Appends the result of some (one-to-many) table function to each row in the input. A parent `Fused Table Function unnest_list` operator will fuse with its child `GroupAggregate` operator. Fusing these operator is part of how we efficiently compile window functions from SQL to dataflows. A parent `Fused Map/Filter/Project` can combine with this operator. Can increase data size: Depends on the table function used. Uses memory: No	`→Table Function generate_series(#0{a}, #1{b}, 1) Input key: (#0{a})`
Differential Join, Delta Join	Both join operators indicate the join ordering selected. Returns combinations of rows from each input whenever some equality predicates are `true`. Joins will indicate the join order of their children, starting from 0. For example, `Differential Join %1 » %0` will join its second child into its first. The two joins differ in performance characteristics. Can increase data size: Depends on the join order and facts about the joined collections. Uses memory: ✅ Uses memory for 3-way or more differential joins.	`→Differential Join %1 » %0 Join stage %0: Lookup key #0{a} in %0`
GroupAggregate	Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs. There are five types of `GroupAggregate`, ordered by increasing complexity: `Distinct GroupAggregate` corresponds to the SQL `DISTINCT` operator. `Accumulable GroupAggregate` (e.g., `SUM`, `COUNT`) corresponds to several easy to implement aggregations that can be executed simultaneously and efficiently. `Hierarchical GroupAggregate` (e.g., `MIN`, `MAX`) corresponds to an aggregation requiring a tower of arrangements. These can be either monotonic (more efficient) or bucketed. These may benefit from a hint; see `mz_introspection.mz_expected_group_size_advice`. These may either be bucketed or monotonic (more efficient). These may consolidate their output, which will increase memory usage. `Collated Multi-GroupAggregate` corresponds to an arbitrary mix of reductions of different types, which will be performed separately and then joined together. `Non-incremental GroupAggregate` (e.g., window functions, `list_agg`) corresponds to a single non-incremental aggregation. These are the most computationally intensive reductions. A parent `Fused Map/Filter/Project` can combine with this operator. Can increase data size: No Uses memory: ✅ `Distinct` and `Accumulable` aggregates use a moderate amount of memory (proportional to twice the output size). `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see `mz_introspection.mz_expected_group_size_advice`. `Non-incremental` aggregates use memory proportional to the input + output size. `Collated` aggregates use memory that is the sum of their constituents, plus some memory for the join at the end.	`→Accumulable GroupAggregate Simple aggregates: count(*) Post-process Map/Filter/Project Filter: (#0 > 1)`
TopK	Groups the input rows, sorts them according to some ordering, and returns at most `K` rows at some offset from the top of the list, where `K` is some (possibly computed) limit. There are three types of `TopK`. Two are special cased for monotonic inputs (i.e., inputs which never retract data). `Monotonic Top1`. `Monotonic TopK`, which may give an expression indicating the limit. `Non-monotonic TopK`, a generic `TopK` plan. Each version of the `TopK` operator may include grouping, ordering, and limit directives. Can increase data size: No Uses memory: ✅ `Monotonic Top1` and `Monotonic TopK` use a moderate amount of memory. `Non-monotonic TopK` uses significantly more memory as the operator can significantly overestimate the group sizes. Consult `mz_introspection.mz_expected_group_size_advice`.	`→Consolidating Monotonic TopK Order By #1 asc nulls_last, #0 desc nulls_first Limit 5`
Negate Diffs	Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. Can increase data size: No Uses memory: No	`→Negate Diffs`
Threshold Diffs	Removes any rows with negative counts. Can increase data size: No Uses memory: ✅ Uses memory proportional to the input and output size, twice.	`→Threshold Diffs`
Union	Combines its inputs into a unified output, emitting one row for each row on any input. (Corresponds to `UNION ALL` rather than `UNION`/`UNION DISTINCT`.) Can increase data size: No Uses memory: ✅ A `Consolidating Union` will make moderate use of memory, particularly at hydration time. A `Union` that is not `Consolidating` will not consume memory.	`→Consolidating Union`
Arrange	Indicates a point that will become an arrangement in the dataflow engine, i.e., it will consume memory to cache results. Can increase data size: No Uses memory: ✅ Uses memory proportional to the input size. Note that in the LIR / physical plan, `Arrange`/`ArrangeBy` almost always means that an arrangement will actually be created. (This is in contrast to the “optimized” plan, where an `ArrangeBy` being present in the plan often does not mean that an arrangement will actually be created.)	`→Arrange Keys: 1 arrangement available, plus raw stream Arrangement 0: #0`
Unarranged Raw Stream	Indicates a point where data will be streamed (even if it is somehow already arranged). Can increase data size: No Uses memory: No	`→Unarranged Raw Stream`
With ... Return ...	Introduces CTEs, i.e., makes it possible for sub-plans to be consumed multiple times by downstream operators. Can increase data size: No Uses memory: No	See above

Can increase data size: Specifies whether the operator can increase the data size (can be the number of rows or the number of columns).
Uses memory: Specifies whether the operator use memory to maintain state for its inputs.

The following table lists the operators that are available in the optimized plan.

For those operators that require memory to maintain intermediate state, Uses memory is marked with Yes.
For those operators that expand the data size (either rows or columns), Can increase data size is marked with Yes.

Operator	Description	Example
Constant	Always produces the same collection of rows. Can increase data size: No Uses memory: No	`Constant - ((1, 2) x 2) - (3, 4)`
Get	Produces rows from either an existing relation (source/view/materialized view/table) or from a previous CTE in the same plan. Can increase data size: No Uses memory: No	`Get materialize.public.ordered`
Project	Produces a subset of the columns in the input rows. See also column numbering. Can increase data size: No Uses memory: No	`Project (#2, #3)`
Map	Appends the results of some scalar expressions to each row in the input. Can increase data size: Each row has more data (i.e., longer rows but same number of rows). Uses memory: No	`Map (((#1 * 10000000dec) / #2) * 1000dec)`
FlatMap	Appends the result of some (one-to-many) table function to each row in the input. Can increase data size: Depends on the table function used. Uses memory: No	`FlatMap jsonb_foreach(#3)`
Filter	Removes rows of the input for which some scalar predicates return `false`. Can increase data size: No Uses memory: No	`Filter (#20 < #21)`
Join	Returns combinations of rows from each input whenever some equality predicates are `true`. Can increase data size: Depends on the join order and facts about the joined collections. Uses memory: ✅ The `Join` operator itself uses memory only for `type=differential` with more than 2 inputs. However, `Join` operators need arrangements on their inputs (shown by the `ArrangeBy` operator). These arrangements use memory proportional to the input sizes. If an input has an appropriate index, then the arrangement of the index will be reused.	`Join on=(#1 = #2) type=delta`
CrossJoin	An alias for a `Join` with an empty predicate (emits all combinations). Note that not all cross joins are marked as `CrossJoin`: In a join with more than 2 inputs, it can happen that there is a cross join between some of the inputs. You can recognize this case by `ArrangeBy` operators having empty keys, i.e., `ArrangeBy keys=[[]]`. Can increase data size: Cartesian product of the inputs (\|N\| x \|M\|). Uses memory: ✅ Uses memory for 3-way or more differential joins.	`CrossJoin type=differential`
Reduce	Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs. Can increase data size: No Uses memory: ✅ `SUM`, `COUNT`, and most other aggregations use a moderate amount of memory (proportional either to twice the output size or to input size + output size). `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see `mz_introspection.mz_expected_group_size_advice`.	`Reduce group_by=[#0] aggregates=[max((#0 * #1))]`
Distinct	Alias for a `Reduce` with an empty aggregate list. Can increase data size: No Uses memory: ✅ Uses memory proportional to twice the output size.	`Distinct`
TopK	Groups the input rows by some scalar expressions, sorts each group using the group key, removes the top `offset` rows in each group, and returns the next `limit` rows. Can increase data size: No Uses memory: ✅ Can use significant amount as the operator can significantly overestimate the group sizes. Consult `mz_introspection.mz_expected_group_size_advice`.	`TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5`
Negate	Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. Can increase data size: No Uses memory: No	`Negate`
Threshold	Removes any rows with negative counts. Can increase data size: No Uses memory: ✅ Uses memory proportional to the input and output size, twice.	`Threshold`
Union	Sums the counts of each row of all inputs. (Corresponds to `UNION ALL` rather than `UNION`/`UNION DISTINCT`.) Can increase data size: No Uses memory: ✅ Moderate use of memory. Some union operators force consolidation, which results in a memory spike, largely at hydration time.	`Union`
ArrangeBy	Indicates a point that will become an arrangement in the dataflow engine (each `keys` element will be a different arrangement). Note that if an appropriate index already exists on the input or the output of the previous operator is already arranged with a key that is also requested here, then this operator will just pass on that existing arrangement instead of creating a new one. Can increase data size: No Uses memory: ✅ Depends. If arrangements need to be created, they use memory proportional to the input size.	`ArrangeBy keys=[[#0]]`
With ... Return ...	Introduces CTEs, i.e., makes it possible for sub-plans to be consumed multiple times by downstream operators. Can increase data size: No Uses memory: No	See above

Can increase data size: Specifies whether the operator can increase the data size (can be the number of rows or the number of columns).
Uses memory: Specifies whether the operator use memory to maintain state for its inputs.

The following table lists the operators that are available in the raw plan.

For those operators that require memory to maintain intermediate state, Uses memory is marked with Yes.
For those operators that expand the data size (either rows or columns), Can increase data size is marked with Yes.

Operator	Description	Example
Constant	Always produces the same collection of rows. Can increase data size: No Uses memory: No	`Constant - ((1, 2) x 2) - (3, 4)`
Get	Produces rows from either an existing relation (source/view/materialized view/table) or from a previous CTE in the same plan. Can increase data size: No Uses memory: No	`Get materialize.public.ordered`
Project	Produces a subset of the columns in the input rows. See also column numbering. Can increase data size: No Uses memory: No	`Project (#2, #3)`
Map	Appends the results of some scalar expressions to each row in the input. Can increase data size: Each row has more data (i.e., longer rows but same number of rows). Uses memory: No	`Map (((#1 * 10000000dec) / #2) * 1000dec)`
CallTable	Appends the result of some (one-to-many) table function to each row in the input. Can increase data size: Depends on the table function used. Uses memory: No	`CallTable generate_series(1, 7, 1)`
Filter	Removes rows of the input for which some scalar predicates return `false`. Can increase data size: No Uses memory: No	`Filter (#20 < #21)`
~Join	Performs one of `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` join on the two inputs, using the given predicate. Can increase data size: For `CrossJoin`s, Cartesian product of the inputs (\|N\| x \|M\|). Note that, in many cases, a join that shows up as a cross join in the RAW PLAN will actually be turned into an inner join in the OPTIMIZED PLAN, by making use of an equality WHERE condition. For other join types, depends on the join order and facts about the joined collections. Uses memory: ✅ Uses memory proportional to the input sizes, unless the inputs have appropriate indexes. Certain joins with more than 2 inputs use additional memory, see details in the optimized plan.	`InnerJoin (#0 = #2)`
Reduce	Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs. In the case where the group key is empty and the input is empty, returns a single row with the aggregate functions applied to the empty input collection. Can increase data size: No Uses memory: ✅ `SUM`, `COUNT`, and most other aggregations use a moderate amount of memory (proportional either to twice the output size or to input size + output size). `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see `mz_introspection.mz_expected_group_size_advice`.	`Reduce group_by=[#0] aggregates=[max((#0 * #1))]`
Distinct	Removes duplicate copies of input rows. Can increase data size: No Uses memory: ✅ Uses memory proportional to twice the output size.	`Distinct`
TopK	Groups the input rows by some scalar expressions, sorts each group using the group key, removes the top `offset` rows in each group, and returns the next `limit` rows. Can increase data size: No Uses memory: ✅ Can use significant amount as the operator can significantly overestimate the group sizes. Consult `mz_introspection.mz_expected_group_size_advice`.	`TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5`
Negate	Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. Can increase data size: No Uses memory: No	`Negate`
Threshold	Removes any rows with negative counts. Can increase data size: No Uses memory: ✅ Uses memory proportional to the input and output size, twice.	`Threshold`
Union	Sums the counts of each row of all inputs. (Corresponds to `UNION ALL` rather than `UNION`/`UNION DISTINCT`.) Can increase data size: No Uses memory: ✅ Moderate use of memory. Some union operators force consolidation, which results in a memory spike, largely at hydration time.	`Union`
With ... Return ...	Introduces CTEs, i.e., makes it possible for sub-plans to be consumed multiple times by downstream operators. Can increase data size: No Uses memory: No	See above

Can increase data size: Specifies whether the operator can increase the data size (can be the number of rows or the number of columns).
Uses memory: Specifies whether the operator use memory to maintain state for its inputs.

Operators are sometimes marked as Fused .... We write this to mean that the operator is fused with its input. That is, if you see a Fused X operator above a Y operator:

→Fused X
  →Y

Then the X and Y operators will be combined into a single, more efficient operator.

Examples

For the following examples, let’s assume that you have the auction house load generator created in your current environment.

Explaining a `SELECT` query

Let’s start with a simple join query that lists the total amounts bid per buyer.

Explain the optimized plan as text:

EXPLAIN
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;

Same as above, but a bit more verbose:

EXPLAIN PLAN
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;

Same as above, but even more verbose:

EXPLAIN OPTIMIZED PLAN AS TEXT FOR
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;

Same as above, but every sub-plan is annotated with its schema types:

EXPLAIN WITH(types) FOR
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;

Explain the physical plan as text:

EXPLAIN PHYSICAL PLAN FOR
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;

Explaining an index on a view

Let’s create a view with an index for the above query.

-- create the view
CREATE VIEW my_view AS
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
-- create an index on the view
CREATE INDEX my_view_idx ON my_view(id);

You can inspect the plan of the dataflow that will maintain your index with the following statements.

Explain the optimized plan as text:

EXPLAIN
INDEX my_view_idx;

Same as above, but a bit more verbose:

EXPLAIN PLAN FOR
INDEX my_view_idx;

Same as above, but even more verbose:

EXPLAIN OPTIMIZED PLAN AS TEXT FOR
INDEX my_view_idx;

Same as above, but every sub-plan is annotated with its schema types:

EXPLAIN WITH(types) FOR
INDEX my_view_idx;

Explain the physical plan as text:

EXPLAIN PHYSICAL PLAN FOR
INDEX my_view_idx;

Explaining a materialized view

Let’s create a materialized view for the above SELECT query.

CREATE MATERIALIZED VIEW my_mat_view AS
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;

You can inspect the plan of the dataflow that will maintain your view with the following statements.

Explain the optimized plan as text:

EXPLAIN
MATERIALIZED VIEW my_mat_view;

Same as above, but a bit more verbose:

EXPLAIN PLAN FOR
MATERIALIZED VIEW my_mat_view;

Same as above, but even more verbose:

EXPLAIN OPTIMIZED PLAN AS TEXT FOR
MATERIALIZED VIEW my_mat_view;

Same as above, but every sub-plan is annotated with its schema types:

EXPLAIN WITH(types)
MATERIALIZED VIEW my_mat_view;

Explain the physical plan as text:

EXPLAIN PHYSICAL PLAN FOR
MATERIALIZED VIEW my_mat_view;

Debugging running dataflows

The EXPLAIN ANALYZE statement will let you debug memory and cpu usage (optionally with information about worker skew) for existing indexes and materialized views in terms of their physical plan operators. It can also attribute TopK hints to individual operators.

Privileges

The privileges required to execute this statement are:

USAGE privileges on the schemas that all relations in the explainee are contained in.

EXPLAIN PLAN

Syntax

Explained object

Output format

Explained stage

Output modifiers

Details

Query compilation pipeline

From SQL to raw plan

From raw plan to decorrelated plan

From decorrelated plan to optimized plan

From optimized plan to physical plan

From physical plan to dataflow

Fast path queries

Reading decorrelated and optimized plans

Reference: Plan operators

Examples

Explaining a SELECT query

Explaining an index on a view

Explaining a materialized view

Debugging running dataflows

Privileges

Explaining a `SELECT` query