EXPLAIN
EXPLAIN
displays the plan used for a SELECT
statement or a view.
EXPLAIN
is not part of Materialize’s stable interface and is not subject to
our backwards compatibility guarantee. The
syntax and output of EXPLAIN
may change arbitrarily in future versions of
Materialize.
Conceptual framework
To execute SELECT
statements, Materialize generates a plan consisting of
operators that interface with our underlying Differential dataflow engine.
EXPLAIN
lets you see the plan for a given query, which can provide insight
into Materialize’s behavior for specific queries, e.g. performance.
Syntax
Field | Use |
---|---|
TYPED | Annotate the plan with column types and unique keys |
RAW | Display the raw plan |
DECORRELATED | Display the decorrelated plan |
OPTIMIZED | (Default) Display the optimized plan |
VIEW | Display the plan for an existing view |
Changed in v0.4.0:
Accept EXPLAIN <statement>
as shorthand for EXPLAIN PLAN FOR <statement>
.
Details
The job of the Materialize planner is to turn SQL code into differential dataflow. We get there via a series of progressively lower-level plans:
SQL -> raw plan -> decorrelated plan -> optimized plan -> dataflow
From SQL to raw plan
In this stage, the planner:
- Replaces SQL variable names with column numbers
- Infers the type of each expression
- Choose the correct overload for each function
From raw plan to decorrelated plan
In this stage, the planner:
- Replaces subqueries and lateral joins with non-nested operations
- Replaces
OUTER
joins with lower-level operations - Replaces aggregate default values with lower-level operations
From decorrelated plan to optimized plan
In this stage, the planner:
- Coalesces joins
- Chooses join order and implementation
- Fuses adjacent operations
- Removes redundant operations
- Evaluates any operations on constants
From optimized plan to dataflow
In this stage, the planner:
- Maps plan operators to differential dataflow operators
- Locates existing arrangements which can be reused
- Installs the new dataflow into the running system
For more information on using EXPLAIN
, see the details below.
Reading decorrelated/optimized plans
Materialize plans are directed acyclic graphs of operators. Each operator in the graph receives inputs from zero or more other operators and produces a single output.
Most operators have only one input, so to make the displayed plan easier to read we group chains of single-input operators together into a single block of text in the display. Here is a single chain of operators:
%0 =
| Get materialize.public.customer (u15)
| ArrangeBy (#0)
Multiple chains are separated by blank lines.
%0 =
| Get materialize.public.customer (u15)
| ArrangeBy (#0)
%1 =
| Get materialize.public.orders (u18)
| ArrangeBy (#0) (#1)
Each chain is assigned a number eg %1
. Operators which have multiple inputs
refer to these numbers to specify their inputs. In the next example the Join
operator use the outputs of chains %0
, %1
and %2
as its inputs.
%0 =
| Get materialize.public.customer (u15)
| ArrangeBy (#0)
%1 =
| Get materialize.public.orders (u18)
| ArrangeBy (#0) (#1)
%2 =
| Get materialize.public.lineitem (u21)
| ArrangeBy (#0)
%3 =
| Join %0 %1 %2 (= #8 #17) (= #0 #9)
| Filter (#6 = "BUILDING"), (#12 < 1995-03-15), (#27 > 1995-03-15)
| Reduce group=(#8, #12, #15)
| | agg sum((#22 * (100dec - #23)))
| Project (#0, #3, #1, #2)
Many operators need to refer to columns in their input. These are displayed like
#3
for column number 3. (Columns are numbered starting from column 0).
If you are explaining view creation for a view or one-off query that refers to unmaterialized sources, you may see Filter
or Project
operators attached to the unmaterialized source. This indicates that those operators will run before the upsert operator, and the upsert operator will not maintain state corresponding to data removed by the filter and project operators.
Here’s an example of a materialized query pulling from unmaterialized sources and the EXPLAIN PLAN
command:
CREATE MATERIALIZED VIEW mv as SELECT a, c FROM data EXCEPT ALL SELECT a, c FROM data2 where d is null;
EXPLAIN PLAN FOR VIEW mv;
First, Materialize reads from two unmaterialized sources, u1
and u3
. You can see the Filter
and Project
operators applied during this process.
Source materialize.public.data (u1):
| Project (#0, #2)
Source materialize.public.data2 (u3):
| Filter isnull(#3)
| Project (#0, #2, #3)
Next, Materialize runs the query on the ingested sources. Because the FILTER
and PROJECT
operators have already been applied during the ingestion, they don’t need be run again.
Query:
%0 =
| Get materialize.public.data (u1)
| Project (#0, #2)
%1 =
| Get materialize.public.data2 (u3)
| Filter isnull(#3)
| Project (#0, #2)
| Negate
%2 =
| Union %0 %1
| Threshold
Each operator can also be annotated with additional metadata. The most common
example is the choice of implementation in the Join
operator.
%3 =
| Join %0 %1 %2 (= #8 #17) (= #0 #9)
| | implementation = DeltaQuery
| | delta %0 %1.(#1) %2.(#0)
| | delta %1 %0.(#0) %2.(#0)
| | delta %2 %1.(#0) %0.(#0)
| Filter (#6 = "BUILDING"), (#12 < 1995-03-15), (#27 > 1995-03-15)
| Reduce group=(#8, #12, #15) sum((#22 * (100dec - #23)))
| Project (#0, #3, #1, #2)
Finally, a plan can optionally have a finishing action which can sort, limit and project the data. This is executed outside of the dataflow engine, allowing many simple queries to just hit an existing index instead of installing a temporary dataflow.
Finish order_by=(#1 desc, #2 asc) limit=none offset=0 project=(#0..=#3)
Operators in decorrelated and optimized plans
Operator | Meaning | Example |
---|---|---|
Constant | Always produces the same collection of rows | Constant (1) |
Get | Produces rows from either an existing source/view or from a previous operator in the same plan | Get materialize.public.ordered (u2) |
Project | Produces a subset of the columns in the input rows | Project (#2, #3) |
Map | Appends the results of some scalar expressions to each row in the input | Map (((#1 * 10000000dec) / #2) * 1000dec) |
FlatMap | Appends the result of some table function to each row in the input | FlatMap jsonb_foreach(#3) |
Filter | Remove rows of the input for which some scalar predicates return false | Filter (#20 < #21) |
Join | Returns combinations of rows from each input whenever some scalar predicates are true | Join %1 %4 (= #0 #9) |
Reduce | Groups the input rows by some scalar expressions, reduces each groups using some aggregate functions and produce rows containing the group key and aggregate outputs | Reduce group=(#5) countall(null) |
TopK | Groups the inputs rows by some scalar expressions, sorts each group using the group key, removes the top offset rows in each group and returns the next limit rows |
TopK group=() order=(#1 asc, #0 desc) limit=5 offset=0 |
Negate | Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. | Negate |
Threshold | Removes any rows with negative counts. | Threshold |
Union | Sums the rows counts of both inputs | Union %2 %3 |
ArrangeBy | Indicates a point that will become an arrangement in the dataflow engine | ArrangeBy (#0) (#3) |
Let | Marks a branch of computation whose result is used later by other branches with %0 . |
%0 = Let 10 = |
Reading raw plans
Raw plans are similar to decorrelated/optimized plans, but may also contain subqueries. For example:
EXPLAIN RAW PLAN FOR SELECT (SELECT 1)
0 =
| Constant ()
| Map select(%1)
| |
| | 1 =
| | | Constant ()
| | | Map 1
| | | Project (#0)
| | | Map
| | | Project (#0)
| |
| Project (#0)
| Map
The select(%1)
indicates that the subquery block %1 below is run for each row
in the input. The outputs of each run are unioned together and then passed to
the select
operator.
Inside a subquery, scalar expressions can refer to columns of the outside rows
by adding a ^
to the column number eg #^1
.
Operators in raw plans
Operator | Meaning | Example |
---|---|---|
Constant | Always produces the same collection of rows | Constant (1) |
Get | Produces rows from either an existing source/view or from a previous operator in the same plan | Get materialize.public.ordered (u2) |
Project | Produces a subset of the columns in the input rows | Project (#2, #3) |
Map | Appends the results of some scalar expressions to each row in the input | Map (((#1 * 10000000dec) / #2) * 1000dec) |
CallTable | Appends the result of some table function to each row in the input | CallTable generate_series (#3) |
Filter | Remove rows of the input for which some scalar predicates return false | Filter (#20 < #21) |
Join | Perform one of INNER / LEFT / RIGHT / FULL OUTER on the two inputs, using the given predicate | InnerJoin %0 %1 on (#1 = #2) |
Reduce | Groups the input rows by some scalar expressions, reduces each group using some aggregate functions and produce rows containing the group key and aggregate outputs. In the case where the group key is empty and the input is empty, returns a single row with the aggregate functions applied to empty rows. | Reduce group=(#5) agg countall(null) |
Distinct | Remove duplicate copies of input rows | Distinct |
TopK | Groups the inputs rows by some scalar expressions, sorts each group using the group key, removes the top offset rows in each group and returns the next limit rows |
TopK group=() order=(#1 asc, #0 desc) limit=5 offset=0 |
Negate | Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input. | Negate |
Threshold | Removes any rows with negative counts. | Threshold |
Union | Sums the rows counts of both inputs | Union %2 %3 |