Materialize Logo

Diagnosing Using SQL

Diagnostic queries that can help understand apparent performance defects.

Sections are meant to be thematically related based on what problems are experienced.

The queries may be useful for spot debugging, but it may also be smart to create them as views and TAIL them, or as materialized views to read them more efficiently (understand that any of these have the potential to impact the system itself).

Are my sources loading data in a reasonable fashion?

Count the number of records accepted in a materialized source or view. Note that this makes less sense for a non-materialized source or view, as invoking it will create a new dataflow and run it to the point that it is caught up with its sources; that elapsed time may be informative, but it tells you something other than how fast a collection is populated.

-- Report the number of records available from the materialization.
select count(*) from my_materialized_source_or_view;

This logging source indicates the upper frontier of materializations.

This source provides timestamp-based progress, which reveals not the volume of data, but how closely the contents track source timestamps.

-- For each materialization, the next timestamp to be added.
select * from mz_materialization_frontiers;

It seems like things aren’t getting done as fast as I would like!

Materialize spends time in various dataflow operators maintaining materialized views. The amount of time may be more than one expects, either because Materialize is behaving badly or because expectations aren’t aligned with reality. These queries reveal which operators take the largest total amount of time.

-- Extract raw elapsed time information, by worker
select mdo.id, mdo.name, mdo.worker, mse.elapsed_ns
from mz_scheduling_elapsed as mse,
     mz_dataflow_operators as mdo
where
    mse.id = mdo.id and
    mse.worker = mdo.worker
order by elapsed_ns desc;
-- Extract raw elapsed time information, summed across workers
select mdo.id, mdo.name, sum(mse.elapsed_ns) as elapsed_ns
from mz_scheduling_elapsed as mse,
     mz_dataflow_operators as mdo
where
    mse.id = mdo.id and
    mse.worker = mdo.worker
group by mdo.id, mdo.name
order by elapsed_ns desc;

Materialize becomes unresponsive for seconds at a time!

What causes Materialize to take control away for seconds at a time? Materialize operators get scheduled and try to behave themselves by returning control promptly, but for various reasons that doesn’t always happen. These queries reveal how many times each operator was scheduled for each power-of-two elapsed time: high durations indicate an event that took roughly that amount of time before it yielded, and incriminate the subject.

-- Extract raw scheduling histogram information, by worker.
select mdo.id, mdo.name, mdo.worker, msh.duration_ns, count
from mz_scheduling_histogram as msh,
     mz_dataflow_operators as mdo
where
    msh.id = mdo.id and
    msh.worker = mdo.worker
order by msh.duration_ns desc;
-- Extract raw scheduling histogram information, summed across workers.
select mdo.id, mdo.name, msh.duration_ns, sum(msh.count) count
from mz_scheduling_histogram as msh,
     mz_dataflow_operators as mdo
where
    msh.id = mdo.id and
    msh.worker = mdo.worker
group by mdo.id, mdo.name, msh.duration_ns
order by msh.duration_ns desc;

Materialize is using lots of memory! What gives?

The majority of Materialize’s memory live in “arrangements”, which are differential dataflow structures that maintain indexes for data as they change. These queries extract the numbers of records and batches backing each of the arrangements. The reported records may exceed the number of logical records, as they reflect un-compacted state. The number of batches should be logarithmic-ish in this number, and anything significantly larger is probably a bug.

-- Extract arrangement records and batches, by worker.
select mdo.id, mdo.name, mdo.worker, mas.records, mas.batches
from mz_arrangement_sizes as mas,
     mz_dataflow_operators as mdo
where
    mas.operator = mdo.id and
    mas.worker = mdo.worker
order by mas.records desc;
-- Extract arrangement records and batches, summed across workers.
select mdo.id, mdo.name, sum(mas.records) as records, sum(mas.batches) as batches
from mz_arrangement_sizes as mas,
     mz_dataflow_operators as mdo
where
    mas.operator = mdo.id and
    mas.worker = mdo.worker
group by mdo.id, mdo.name
order by sum(mas.records) desc;

We’ve also bundled a memory usage visualization tool to aid in debugging. The sql queries above show all arrangements in Materialize (including system arrangements), whereas the memory visualization tool shows only user-created arrangements, and it shows them by dataflow. The amount of memory used by Materialize should correlate with the number of arrangement records that are displayed by either the visual interface or the sql queries.

How can I check whether work is distributed equally across workers?

Work is distributed across workers by the hash of their keys. Thus, work can become skewed if situations arise where Materialize needs to use arrangements with very few or no keys. Example situations include:

Additionally, the operators that implement data sources may demonstrate skew, as they (currently) have a granularity determined by the source itself. For example, Kafka topic ingestion work can become skewed if most of the data is in only one out of multiple partitions.

-- Average the total time spent by each operator across all workers.
create view avg_elapsed_by_id as
select
    id,
    avg(elapsed_ns) as avg_ns
from
    mz_scheduling_elapsed
group by
    id;

-- Get operators where one worker has spent more than 2 times the average
-- amount of time spent. The number 2 can be changed according to the threshold
-- for the amount of skew deemed problematic.
select
    mse.id,
    dod.name,
    mse.worker,
    elapsed_ns,
    avg_ns,
    elapsed_ns/avg_ns as ratio
from
    mz_scheduling_elapsed mse,
    avg_elapsed_by_id aebi,
    mz_dataflow_operator_dataflows dod
where
    mse.id = aebi.id and
    mse.elapsed_ns > 2 * aebi.avg_ns and
    mse.id = dod.id
order by ratio desc;

I found a problematic operator! How do I find where it came from?

Look up the operator in mz_dataflow_operator_addresses. If an operator has value x in slot n, then it is part of the x subregion of the region defined by slots 0..n-1. The example SQL query and result below shows an operator whose id is 515 that belongs to “subregion 5 of region 1 of dataflow 21”.

select * from mz_dataflow_operator_addresses where id=515 and worker=0;
 id  | worker | slot | value
-----+--------+------+-------
 515 |      0 |    0 |    21
 515 |      0 |    1 |     1
 515 |      0 |    2 |     5

Usually, it is only important to know the name of the dataflow a problematic operator comes from. Once the name is known, the dataflow can be correlated to an index or view in Materialize.

Each dataflow has an operator representing the entire dataflow. The address of said operator has only a single slot. For the example operator 515 above, you can find the name of the dataflow if you can find the name of the operator whose address is just “dataflow 21.”

-- get id and name of the operator representing the entirety of the dataflow
-- that a problematic operator comes from
SELECT
    mdo.id as id,
    mdo.name as name
FROM
    mz_dataflow_operator_addresses mdoa,
    -- source of operator names
    mz_dataflow_operators mdo,
    -- view containing operators representing entire dataflows
    (SELECT
      mdoa.id as dataflow_operator,
      sum(mdoa.value) as dataflow_value
    FROM
      mz_dataflow_operator_addresses mdoa
    WHERE
      mdoa.worker = 0
    GROUP BY
      mdoa.id,
      mdoa.worker
    HAVING
      count(*) = 1) dataflows
WHERE
    mdoa.slot = 0
    AND mdoa.worker = 0
    AND mdoa.id = <problematic_operator_id>
    AND mdoa.value = dataflows.dataflow_value
    AND mdo.id = dataflows.dataflow_operator
    AND mdo.worker = 0
Did this info help?
Yes No