We’ve made some big changes to how Materialize EXPLAIN
s itself.
New EXPLAIN
format
First, we’ve redesigned the output of EXPLAIN
to help users focus on
the most important parts of their queries when optimizing
performance. Materialize power users should find the new format more
concise and easier to understand. In the name of making things
accessible to non-power users, we’ve adopted syntactic conventions
that look more like Postgres’s output:
CREATE TABLE t (a int, b int);
CREATE INDEX t_a_idx ON t(a);
CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE a IS NOT NULL;
EXPLAIN SELECT t1.a, t1.b FROM t t1, t t2 WHERE t1.a = t2.b;
Yields the following plan:
Explained Query:
→Differential Join %0 » %1
Join stage 0 in %1 with lookup key #0{b}
→Arranged materialize.public.t
→Arrange
Keys: 1 arrangement available, plus raw stream
Arrangement 0: #0{b}
→Fused Map/Filter/Project
Filter: (#1{b}) IS NOT NULL
→Arranged materialize.public.t
Key: (#0{a})
Used Indexes:
- materialize.public.t_a_idx (*** full scan ***, differential join)
The *** full scan ***
indicates that we’ll need to fully scan the
t_a_idx
index. Looking at the plan, the Arrange
on the second
input of the join creates a new, local arrangement corresponding to
t.b
. If we add an index on b
, we’ll get a much better plan:
CREATE INDEX t_b_idx ON t(b);
EXPLAIN SELECT t1.a, t1.b FROM t t1, t t2 WHERE t1.a = t2.b;
Yields the following, much better plan, where both uses of t
will be
Arranged
, i.e., we won’t be creating any local in-memory
arrangements---we’re able to re-use existing indices.
Explained Query:
→Differential Join %0 » %1
Join stage 0 in %1 with lookup key #1{b}
project=(#0, #1)
filter=((#0{a}) IS NOT NULL)
→Arranged materialize.public.t
→Arranged materialize.public.t
Used Indexes:
- materialize.public.t_a_idx (differential join)
- materialize.public.t_b_idx (differential join)
Concretely, the new format uses physical plans (which gives a more direct accounting of where memory is consumed by arrangements) rather than the higher-level plans we used to report (which are our optimizer’s primary internal representation).
A quick reminder: the EXPLAIN
syntax isn’t a stable interface, and
we’ll be improving on the output here over time. We’d be happy to
hear your feedback!
EXPLAIN ANALYZE
your indexes and materialized views
We’ve already taken advantage of Materialize’s “introspection
sources”
to build tools for understanding long-running dataflows, like indexes
and materialized views, with a
blogpost
documenting the possibility. Now, we’ve made a top-level SQL
statement that lets you take advantage of these queries without having
to write a lick of SQL yourself: EXPLAIN ANALYZE
.
Under the hood, EXPLAIN ANALYZE
generates SQL that reports on your
long-running dataflows. For example, you can EXPLAIN ANALYZE MEMORY FOR ...
an index or materialized view to see total memory consumption
and the number of records, like so:
operator | total_memory | total_records |
---|---|---|
Arrange | 386 kB | 15409 |
Stream u8 | ||
Non-monotonic TopK | 36 MB | 731975 |
Differential Join %0 » %1 | ||
Arrange | 2010 kB | 84622 |
Stream u5 | ||
Arrange | 591 kB | 15410 |
Read u4 |
The new statement supports explaining MEMORY
and CPU
; it can also
break things down by worker (WITH SKEW
), allowing you to see if your
data are skewed. Skew happens when work is unevenly distributed
across
workers,
leading to higher latency as some workers having to deal with much
more than their fair share of data while other workers twiddle their
thumbs. Finally, you can also EXPLAIN ANALYZE HINTS FOR ...
to see
hints on sizing TopK
forms:
operator | levels | to_cut | hint | savings |
---|---|---|---|---|
Arrange | ||||
Stream u8 | ||||
Non-monotonic TopK | 8 | 6 | 255 | 26 MB |
Differential Join %0 » %1 | ||||
Arrange | ||||
Stream u5 | ||||
Arrange | ||||
Read u4 |
Since EXPLAIN ANALYZE
is just running SQL queries under the hood,
you can EXPLAIN ANALYZE ... AS SQL
to see the query that it
runs---which you can then tune to your liking. Read more about
EXPLAIN ANALYZE
in the
docs!