EXPLAIN PLAN
EXPLAIN PLAN
displays the plans used for SELECT
statements, indexes, and materialized views.
EXPLAIN
is not part of Materialize’s stable interface and is not subject to
our backwards compatibility guarantee. The syntax and output of EXPLAIN
may
change arbitrarily in future versions of Materialize.
Syntax
Note that the FOR
keyword is required if the PLAN
keyword is present. In other words, the following three statements are equivalent:
EXPLAIN <explainee>;
EXPLAIN PLAN FOR <explainee>;
EXPLAIN OPTIMIZED PLAN FOR <explainee>;
Explained object
The following three objects can be explained.
Explained object | Description |
---|---|
select_stmt | Display a plan for an ad-hoc SELECT statement. |
create_view | Display a plan for a CREATE VIEW statement. |
create_index | Display a plan for a CREATE INDEX statement. |
create_materialized_view | Display a plan for a CREATE MATERIALIZED VIEW statement. |
VIEW name | Display the RAW or LOCALLY OPTIMIZED plan for an existing view. |
INDEX name | Display the OPTIMIZED or PHYSICAL plan for an existing index. |
MATERIALIZED VIEW name | Display the OPTIMIZED or PHYSICAL plan for an existing materialized view. |
Output format
You can select between JSON
and TEXT
for the output format of EXPLAIN PLAN
. Non-text
output is more machine-readable and can be parsed by common graph visualization libraries,
while formatted text is more human-readable.
Output type | Description |
---|---|
TEXT | Format the explanation output as UTF-8 text. |
JSON | Format the explanation output as a JSON object. |
Explained stage
This stage determines the query optimization stage at which the plan snapshot will be taken.
Plan Stage | Description |
---|---|
RAW PLAN | Display the raw plan. |
DECORRELATED PLAN | Display the decorrelated plan. |
LOCALLY OPTIMIZED | Display the locally optimized plan (before view inlining and access path selection). This is the final stage for regular CREATE VIEW optimization. |
OPTIMIZED PLAN | (Default) Display the optimized plan. |
PHYSICAL PLAN | Display the physical plan. |
Output modifiers
Output modifiers act as boolean toggles and can be combined in order to slightly tweak the information and rendering style of the generated explanation output.
Modifier | Description |
---|---|
arity | Annotate each subplan with its number of produced columns. This is useful due to the use of offset-based column names. |
cardinality | Annotate each subplan with a symbolic estimate of its cardinality. |
join implementations | Render details about the implementation strategy of optimized MIR Join nodes. |
keys | Annotate each subplan with its unique keys. |
node identifiers | Annotate each subplan in a PHYSICAL PLAN with its node ID. |
redacted | Anonymize literals in the output. |
timing | Annotate the output with the optimization time. |
types | Annotate each subplan with its inferred type. |
humanized expressions | Render EXPLAIN AS TEXT output with human-readable column references in operator expressions. Warning: SQL-level aliasing is not considered when inferring column names, so the plan output might become ambiguous if you use this modifier. |
filter pushdown | Private preview For each source, include a pushdown field that explains which filters can be pushed down. |
Note that most modifiers are currently only supported for the AS TEXT
output.
Details
To execute SELECT
statements, Materialize generates a plan consisting of
operators that interface with our underlying Differential dataflow engine.
EXPLAIN PLAN
lets you see the plan for a given query, which can provide insight
into Materialize’s behavior for specific queries, e.g. performance.
Query compilation pipeline
The job of the Materialize planner is to turn SQL code into a differential dataflow program. We get there via a series of progressively lower-level plans:
SQL ⇒ raw plan ⇒ decorrelated plan ⇒ optimized plan ⇒ physical plan ⇒ dataflow
From SQL to raw plan
In this stage, the planner:
- Replaces SQL variable names with column numbers.
- Infers the type of each expression.
- Chooses the correct overload for each function.
From raw plan to decorrelated plan
In this stage, the planner:
- Replaces subqueries and lateral joins with non-nested operations.
- Replaces
OUTER
joins with lower-level operations. - Replaces global aggregate default values with lower-level operations.
From decorrelated plan to optimized plan
In this stage, the planner performs various optimizing rewrites:
- Coalesces joins.
- Chooses join order and implementation.
- Fuses adjacent operations.
- Removes redundant operations.
- Pushes down predicates.
- Evaluates any operations on constants.
From optimized plan to physical plan
In this stage, the planner:
- Maps plan operators to differential dataflow operators.
- Locates existing arrangements which can be reused.
From physical plan to dataflow
In the final stage, the planner:
- Renders an actual dataflow from the physical plan, and
- Installs the new dataflow into the running system.
No smart logic runs as part of the rendering step, as the physical plan is meant to be a definitive and complete description of the rendered dataflow.
Fast path queries
Queries are sometimes implemented using a fast path.
In this mode, the program that implements the query will just hit an existing index,
transform the results, and optionally apply a finishing action. For fast path queries,
all of these actions happen outside of the regular dataflow engine. The fast path is
indicated by an “Explained Query (fast path):” heading before the explained query in the
EXPLAIN OPTIMIZED PLAN
and EXPLAIN PHYSICAL PLAN
result.
Explained Query (fast path):
Finish order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5 output=[#0, #1]
ReadExistingIndex materialize.public.t_a_idx
Reading decorrelated and optimized plans
Materialize plans are directed, potentially cyclic, graphs of operators. Each operator in the graph
receives inputs from zero or more other operators and produces a single output.
Sub-graphs where each output is consumed only once are rendered as tree-shaped fragments.
Sub-graphs consumed more than once are represented as common table expressions (CTEs).
In the example below, the CTE l0
represents a linear sub-plan (a chain of Get
,
Filter
, and Project
operators) which is used in both inputs of a self-join.
With
cte l0 =
Project (#0, #1)
Filter (#0 > #2)
ReadStorage materialize.public.t
Return
Join on=(#1 = #2)
Get l0
Get l0
Many operators need to refer to columns in their input. These are displayed like
#3
for column number 3. (Columns are numbered starting from column 0). To get a better sense of
columns assigned to Map
operators, it might be useful to request the arity
output modifier.
Each operator can also be annotated with additional metadata. Details are shown
by default in the EXPLAIN PHYSICAL PLAN
output, but are hidden elsewhere. In EXPLAIN OPTIMIZED PLAN
, details about the implementation in the Join
operator can be requested
with the join implementations
output modifier (that is,
EXPLAIN OPTIMIZED PLAN WITH (join implementations) FOR ...
).
Join on=(#1 = #2 AND #3 = #4) type=delta
implementation
%0:t » %1:u[#0]K » %2:v[#0]K
%1:u » %0:t[#1]K » %2:v[#0]K
%2:v » %1:u[#1]K » %0:t[#1]K
ArrangeBy keys=[[#1]]
ReadStorage materialize.public.t
ArrangeBy keys=[[#0], [#1]]
ReadStorage materialize.public.u
ArrangeBy keys=[[#0]]
ReadStorage materialize.public.v
The %0
, %1
, etc. refer to each of the join inputs.
A differential join shows one join path, which is simply a sequence of binary
joins (each of whose results need to be maintained as state).
A delta join
shows a join path for each of the inputs.
The expressions in
a bracket show the key for joining with that input. The letters after the brackets
indicate the input characteristics used for join ordering. U
means unique, the
number of K
s means the key length, A
means already arranged (e.g., an index
exists). The small letters refer to filter characteristics:
equality to a literal,
like,
is null,
inequality to a literal,
any filter.
A plan can optionally end with a finishing action which can sort, limit and project the result data. This operator is special, as it can only occur at the top of the plan. Finishing actions are executed outside the parallel dataflow that implements the rest of the plan.
Finish order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5 output=[#0, #1]
CrossJoin
ReadStorage materialize.public.r
ReadStorage materialize.public.s
Below the plan, a “Used indexes” section indicates which indexes will be used by the query, and in what way.
Reference: Plan operators
The following table lists the operators that are available in the optimized plan.
- For those operators that require memory to maintain intermediate state, Uses memory is marked with Yes.
- For those operators that expand the data size (either rows or columns), Can increase data size is marked with Yes.
Operator | Description | Example |
---|---|---|
Constant |
Always produces the same collection of rows. Can increase data size: No Uses memory: No |
|
Get |
Produces rows from either an existing source/view or from a previous
operator in the same plan. Can increase data size: No Uses memory: No |
Get materialize.public.ordered |
Project |
Produces a subset of the columns in the input
rows. See also column numbering. Can increase data size: No Uses memory: No |
Project (#2, #3) |
Map |
Appends the results of some scalar expressions to each row in the input. Can increase data size: Each row has more data (i.e., longer rows but same number of rows). Uses memory: No |
Map (((#1 * 10000000dec) / #2) * 1000dec) |
FlatMap |
Appends the result of some (one-to-many) table function to each row in the input. Can increase data size: Depends on the table function used. Uses memory: No |
FlatMap jsonb_foreach(#3) |
Filter |
Removes rows of the input for which some scalar predicates return false .Can increase data size: No Uses memory: No |
Filter (#20 < #21) |
Join |
Returns combinations of rows from each input whenever some equality predicates are true .Can increase data size: Depends on the join order and facts about the joined collections. Uses memory: ✅ Uses memory for 3-way or more differential joins. |
Join on=(#1 = #2) type=delta |
CrossJoin |
An alias for a Join with an empty predicate (emits all combinations).Can increase data size: Cartesian product of the inputs (|N| x |M|). Uses memory: ✅ Uses memory for 3-way or more differential joins. |
CrossJoin type=differential |
Reduce |
Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs. Can increase data size: No Uses memory: ✅ Can use significant amount as the operator can significantly overestimate the size. For MIN and MAX aggregates, consult
mz_introspection.mz_expected_group_size_advice .
|
Reduce group_by=[#0] aggregates=[max((#0 * #1))] |
Distinct |
Alias for a Reduce with an empty aggregate list.Can increase data size: No Uses memory: ✅ Uses memory proportional to the number of input updates, twice. |
Distinct |
TopK |
Groups the input rows by some scalar expressions, sorts each group using the group key, removes the top offset rows in each group, and returns the next limit rows.Can increase data size: No Uses memory: ✅ Can use significant amount as the operator can significantly overestimate the size. Consult mz_introspection.mz_expected_group_size_advice .
|
TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5 |
Negate |
Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. Can increase data size: No Uses memory: No |
Negate |
Threshold |
Removes any rows with negative counts. Can increase data size: No Uses memory: ✅ Uses memory proportional to the number of input updates, twice. |
Threshold |
Union |
Sums the counts of each row of all inputs. Can increase data size: No Uses memory: ✅ Moderate use of memory. Union stages force consolidation, which results in a memory spike, largely at hydration time. |
Union |
ArrangeBy |
Indicates a point that will become an arrangement in the dataflow engine (each keys element will be a different arrangement). Note that if the output of the previous operator is already arranged with a key that is also requested here, then this operator will just pass on that existing arrangement instead of creating a new one.Can increase data size: No Uses memory: ✅ Depends. When it does, uses memory proportional to the number of input updates. |
ArrangeBy keys=[[#0]] |
Return ... With ... |
Binds sub-plans consumed multiple times by downstream operators. Can increase data size: No Uses memory: No |
See above |
The following table lists the operators that are available in the raw plan.
- For those operators that require memory to maintain intermediate state, Uses memory is marked with Yes.
- For those operators that expand the data size (either rows or columns), Can increase data size is marked with Yes.
Operator | Description | Example |
---|---|---|
Constant |
Always produces the same collection of rows. Can increase data size: No Uses memory: No |
|
Get |
Produces rows from either an existing source/view or from a previous
operator in the same plan. Can increase data size: No Uses memory: No |
Get materialize.public.ordered |
Project |
Produces a subset of the columns in the input
rows. See also column numbering. Can increase data size: No Uses memory: No |
Project (#2, #3) |
Map |
Appends the results of some scalar expressions to each row in the input. Can increase data size: Each row has more data (i.e., longer rows but same number of rows). Uses memory: No |
Map (((#1 * 10000000dec) / #2) * 1000dec) |
CallTable |
Appends the result of some (one-to-many) table function to each row in the input. Can increase data size: Depends on the table function used. Uses memory: No |
CallTable generate_series(1, 7, 1) |
Filter |
Removes rows of the input for which some scalar predicates return false .Can increase data size: No Uses memory: No |
Filter (#20 < #21) |
~Join |
Performs one of INNER / LEFT / RIGHT / FULL OUTER / CROSS join on the two inputs, using the given predicate.Can increase data size: For CROSSJOINS , Cartesian product of the inputs (|N| x |M|).
For other join types, depends on the join order and facts about the joined collections.
Uses memory: ✅ Uses memory for 3-way or more differential joins. |
Join on=(#1 = #2) type=delta |
Reduce |
Groups the input rows by some scalar expressions, reduces each group using
some aggregate functions, and produces rows containing the group key and
aggregate outputs. In the case where the group key is empty and the input
is empty, returns a single row with the aggregate functions applied to the
empty input collection. Can increase data size: No Uses memory: ✅ Can use significant amount as the operator can significantly overestimate the size. For MIN and MAX aggregates, consult
mz_introspection.mz_expected_group_size_advice .
|
Reduce group_by=[#0] aggregates=[max((#0 * #1))] |
Distinct |
Removes duplicate copies of input rows. Can increase data size: No Uses memory: ✅ Uses memory proportional to the number of input updates, twice. |
Distinct |
TopK |
Groups the input rows by some scalar expressions, sorts each group using the group key, removes the top offset rows in each group, and returns the next limit rows.Can increase data size: No Uses memory: ✅ Can use significant amount as the operator can significantly overestimate the size. Consult mz_introspection.mz_expected_group_size_advice .
|
TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5 |
Negate |
Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. Can increase data size: No Uses memory: No |
Negate |
Threshold |
Removes any rows with negative counts. Can increase data size: No Uses memory: ✅ Uses memory proportional to the number of input updates, twice. |
Threshold |
Union |
Sums the counts of each row of all inputs. Can increase data size: No Uses memory: ✅ Moderate use of memory. Union stages force consolidation, which results in a memory spike, largely at hydration time. |
Union |
Return ... With ... |
Binds sub-plans consumed multiple times by downstream operators. Can increase data size: No Uses memory: No |
See above |
Examples
For the following examples, let’s assume that you have the auction house load generator created in your current environment.
Explaining a SELECT
query
Let’s start with a simple join query that lists the total amounts bid per buyer.
Explain the optimized plan as text:
EXPLAIN
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
Same as above, but a bit more verbose:
EXPLAIN PLAN
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
Same as above, but even more verbose:
EXPLAIN OPTIMIZED PLAN AS TEXT FOR
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
Same as above, but every sub-plan is annotated with its schema types:
EXPLAIN WITH(types) FOR
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
Explain the physical plan as text:
EXPLAIN PHYSICAL PLAN FOR
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
Explaining an index on a view
Let’s create a view with an index for the above query.
-- create the view
CREATE VIEW my_view AS
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
-- create an index on the view
CREATE INDEX my_view_idx ON my_view(id);
You can inspect the plan of the dataflow that will maintain your index with the following statements.
Explain the optimized plan as text:
EXPLAIN
INDEX my_view_idx;
Same as above, but a bit more verbose:
EXPLAIN PLAN FOR
INDEX my_view_idx;
Same as above, but even more verbose:
EXPLAIN OPTIMIZED PLAN AS TEXT FOR
INDEX my_view_idx;
Same as above, but every sub-plan is annotated with its schema types:
EXPLAIN WITH(types) FOR
INDEX my_view_idx;
Explain the physical plan as text:
EXPLAIN PHYSICAL PLAN FOR
INDEX my_view_idx;
Explaining a materialized view
Let’s create a materialized view for the above SELECT
query.
CREATE MATERIALIZED VIEW my_mat_view AS
SELECT a.id, sum(b.amount) FROM accounts a JOIN bids b ON(a.id = b.buyer) GROUP BY a.id;
You can inspect the plan of the dataflow that will maintain your view with the following statements.
Explain the optimized plan as text:
EXPLAIN
MATERIALIZED VIEW my_mat_view;
Same as above, but a bit more verbose:
EXPLAIN PLAN FOR
MATERIALIZED VIEW my_mat_view;
Same as above, but even more verbose:
EXPLAIN OPTIMIZED PLAN AS TEXT FOR
MATERIALIZED VIEW my_mat_view;
Same as above, but every sub-plan is annotated with its schema types:
EXPLAIN WITH(types)
MATERIALIZED VIEW my_mat_view;
Explain the physical plan as text:
EXPLAIN PHYSICAL PLAN FOR
MATERIALIZED VIEW my_mat_view;
Privileges
The privileges required to execute this statement are:
USAGE
privileges on the schemas that all relations in the explainee are contained in.