mz_introspection
The following sections describe the available objects in the mz_introspection
schema.
mz_introspection
schema are not part of Materialize’s stable interface.
Backwards-incompatible changes to these objects may be made at any time.
SELECT
statements may reference these objects, but creating views that
reference these objects is not allowed.
Introspection relations are maintained by independently collecting internal logging information within each of the replicas of a cluster.
Thus, in a multi-replica cluster, queries to these relations need to be directed to a specific replica by issuing the command SET cluster_replica = <replica_name>
.
Note that once this command is issued, all subsequent SELECT
queries, for introspection relations or not, will be directed to the targeted replica.
Replica targeting can be cancelled by issuing the command RESET cluster_replica
.
For each of the below introspection relations, there exists also a variant with a _per_worker
name suffix.
Per-worker relations expose the same data as their global counterparts, but have an extra worker_id
column that splits the information by Timely Dataflow worker.
mz_active_peeks
The mz_active_peeks
view describes all read queries (“peeks”) that are pending in the dataflow layer.
Field | Type | Meaning |
---|---|---|
id |
uuid |
The ID of the peek request. |
object_id |
text |
The ID of the collection the peek is targeting. Corresponds to mz_catalog.mz_indexes.id , mz_catalog.mz_materialized_views.id , mz_catalog.mz_sources.id , or mz_catalog.mz_tables.id . |
type |
text |
The type of the corresponding peek: index if targeting an index or temporary dataflow; persist for a source, materialized view, or table. |
time |
mz_timestamp |
The timestamp the peek has requested. |
mz_arrangement_sharing
The mz_arrangement_sharing
view describes how many times each arrangement in the system is used.
Field | Type | Meaning |
---|---|---|
operator_id |
uint8 |
The ID of the operator that created the arrangement. Corresponds to mz_dataflow_operators.id . |
count |
bigint |
The number of operators that share the arrangement. |
mz_arrangement_sizes
The mz_arrangement_sizes
view describes the size of each arrangement in the system.
The size, capacity, and allocations are an approximation, which may underestimate the actual size in memory. Specifically, reductions can use more memory than we show here.
Field | Type | Meaning |
---|---|---|
operator_id |
uint8 |
The ID of the operator that created the arrangement. Corresponds to mz_dataflow_operators.id . |
records |
numeric |
The number of records in the arrangement. |
batches |
numeric |
The number of batches in the arrangement. |
size |
numeric |
The utilized size in bytes of the arrangement. |
capacity |
numeric |
The capacity in bytes of the arrangement. Can be larger than the size. |
allocations |
numeric |
The number of separate memory allocations backing the arrangement. |
mz_compute_error_counts
The mz_compute_error_counts
view describes the counts of errors in objects exported by dataflows in the system.
Dataflow exports that don’t have any errors are not included in this view.
Field | Type | Meaning |
---|---|---|
export_id |
text |
The ID of the dataflow export. Corresponds to mz_compute_exports.export_id . |
count |
numeric |
The count of errors present in this dataflow export. |
mz_compute_exports
The mz_compute_exports
view describes the objects exported by dataflows in the system.
Field | Type | Meaning |
---|---|---|
export_id |
text |
The ID of the index, materialized view, or subscription exported by the dataflow. Corresponds to mz_catalog.mz_indexes.id , mz_catalog.mz_materialized_views.id , or mz_internal.mz_subscriptions . |
dataflow_id |
uint8 |
The ID of the dataflow. Corresponds to mz_dataflows.id . |
mz_compute_frontiers
The mz_compute_frontiers
view describes the frontier of each dataflow export in the system.
The frontier describes the earliest timestamp at which the output of the dataflow may change; data prior to that timestamp is sealed.
Field | Type | Meaning |
---|---|---|
export_id |
text |
The ID of the dataflow export. Corresponds to mz_compute_exports.export_id . |
time |
mz_timestamp |
The next timestamp at which the dataflow output may change. |
mz_compute_import_frontiers
The mz_compute_import_frontiers
view describes the frontiers of each dataflow import in the system.
The frontier describes the earliest timestamp at which the input into the dataflow may change; data prior to that timestamp is sealed.
Field | Type | Meaning |
---|---|---|
export_id |
text |
The ID of the dataflow export. Corresponds to mz_compute_exports.export_id . |
import_id |
text |
The ID of the dataflow import. Corresponds to mz_catalog.mz_sources.id or mz_catalog.mz_tables.id or mz_compute_exports.export_id . |
time |
mz_timestamp |
The next timestamp at which the dataflow input may change. |
mz_compute_operator_durations_histogram
The mz_compute_operator_durations_histogram
view describes a histogram of the duration in nanoseconds of each invocation for each dataflow operator.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
duration_ns |
uint8 |
The upper bound of the duration bucket in nanoseconds. |
count |
numeric |
The (noncumulative) count of invocations in the bucket. |
mz_dataflows
The mz_dataflows
view describes the dataflows in the system.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the dataflow. |
name |
text |
The internal name of the dataflow. |
mz_dataflow_addresses
The mz_dataflow_addresses
view describes how the dataflow channels and operators in the system are nested into scopes.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the channel or operator. Corresponds to mz_dataflow_channels.id or mz_dataflow_operators.id . |
address |
bigint list |
A list of scope-local indexes indicating the path from the root to this channel or operator. |
mz_dataflow_arrangement_sizes
The mz_dataflow_arrangement_sizes
view describes the size of arrangements per
operators under each dataflow.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the dataflow. Corresponds to mz_dataflows.id . |
name |
text |
The name of the dataflow. |
records |
numeric |
The number of records in all arrangements in the dataflow. |
batches |
numeric |
The number of batches in all arrangements in the dataflow. |
size |
numeric |
The utilized size in bytes of the arrangements. |
capacity |
numeric |
The capacity in bytes of the arrangements. Can be larger than the size. |
allocations |
numeric |
The number of separate memory allocations backing the arrangements. |
mz_dataflow_channels
The mz_dataflow_channels
view describes the communication channels between dataflow operators.
A communication channel connects one of the outputs of a source operator to one of the inputs of a target operator.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the channel. |
from_index |
uint8 |
The scope-local index of the source operator. Corresponds to mz_dataflow_addresses.address . |
from_port |
uint8 |
The source operator’s output port. |
to_index |
uint8 |
The scope-local index of the target operator. Corresponds to mz_dataflow_addresses.address . |
to_port |
uint8 |
The target operator’s input port. |
mz_dataflow_channel_operators
The mz_dataflow_channel_operators
view associates dataflow channels with the operators that are their endpoints.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the channel. Corresponds to mz_dataflow_channels.id . |
from_operator_id |
uint8 |
The ID of the source of the channel. Corresponds to mz_dataflow_operators.id . |
from_operator_address |
uint8 list |
The address of the source of the channel. Corresponds to mz_dataflow_addresses.address . |
to_operator_id |
uint8 |
The ID of the target of the channel. Corresponds to mz_dataflow_operators.id . |
to_operator_address |
uint8 list |
The address of the target of the channel. Corresponds to mz_dataflow_addresses.address . |
mz_dataflow_global_ids
The mz_dataflow_global_ids
view associates dataflow ids with global ids (ids of the form u8
or t5
).
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The dataflow ID. |
global_id |
text |
A global ID associated with that dataflow. |
mz_dataflow_operators
The mz_dataflow_operators
view describes the dataflow operators in the system.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the operator. |
name |
text |
The internal name of the operator. |
mz_dataflow_operator_dataflows
The mz_dataflow_operator_dataflows
view describes the dataflow to which each operator belongs.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
name |
text |
The internal name of the operator. |
dataflow_id |
uint8 |
The ID of the dataflow hosting the operator. Corresponds to mz_dataflows.id . |
dataflow_name |
text |
The internal name of the dataflow hosting the operator. |
mz_dataflow_operator_parents
The mz_dataflow_operator_parents
view describes how dataflow operators are nested into scopes, by relating operators to their parent operators.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
parent_id |
uint8 |
The ID of the operator’s parent operator. Corresponds to mz_dataflow_operators.id . |
mz_dataflow_shutdown_durations_histogram
The mz_dataflow_shutdown_durations_histogram
view describes a histogram of the time in nanoseconds required to fully shut down dropped dataflows.
Field | Type | Meaning |
---|---|---|
duration_ns |
uint8 |
The upper bound of the bucket in nanoseconds. |
count |
numeric |
The (noncumulative) count of dataflows in this bucket. |
mz_expected_group_size_advice
The mz_expected_group_size_advice
view provides advice on opportunities to set query hints.
Query hints are applicable to dataflows maintaining MIN
, MAX
, or Top K query patterns.
The maintainance of these query patterns is implemented inside an operator scope, called a region,
through a hierarchical scheme for either aggregation or Top K computations.
Field | Type | Meaning |
---|---|---|
dataflow_id |
uint8 |
The ID of the dataflow. Corresponds to mz_dataflows.id . |
dataflow_name |
text |
The internal name of the dataflow hosting the min/max aggregation or Top K. |
region_id |
uint8 |
The ID of the root operator scope. Corresponds to mz_dataflow_operators.id . |
region_name |
text |
The internal name of the root operator scope for the min/max aggregation or Top K. |
levels |
bigint |
The number of levels in the hierarchical scheme implemented by the region. |
to_cut |
bigint |
The number of levels that can be eliminated (cut) from the region’s hierarchy. |
savings |
numeric |
A conservative estimate of the amount of memory in bytes to be saved by applying the hint. |
hint |
double precision |
The hint value that will eliminate to_cut levels from the region’s hierarchy. |
mz_lir_mapping
The mz_lir_mapping
view describes the low-level internal representation (LIR) plan that corresponds to global ids.
LIR is a higher-level representation than dataflows; this view is used for profiling and debugging indices and materialized views.
Note that LIR is not a stable interface and may change at any time.
In particular, you should not attempt to parse operator
descriptions.
LIR nodes are implemented by zero or more dataflow operators with sequential ids.
We use the range [operator_id_start, operator_id_end)
to record this information.
If an LIR node was implemented without any dataflow operators, operator_id_start
will be equal to operator_id_end
.
Field | Type | Meaning |
---|---|---|
global_id | text |
The global ID. |
lir_id | uint8 |
The LIR node ID. |
operator | text |
The LIR operator, in the format OperatorName INPUTS [OPTIONS] . |
parent_lir_id | uint8 |
The parent of this LIR node. May be NULL . |
nesting | uint2 |
The nesting level of this LIR node. |
operator_id_start | uint8 |
The first dataflow operator ID implementing this LIR operator (inclusive). |
operator_id_end | uint8 |
The first dataflow operator ID after this LIR operator (exclusive). |
mz_message_counts
The mz_message_counts
view describes the messages and message batches sent and received over the dataflow channels in the system.
It distinguishes between individual records (sent
, received
) and batches of records (batch_sent
, batch_sent
).
Field | Type | Meaning |
---|---|---|
channel_id |
uint8 |
The ID of the channel. Corresponds to mz_dataflow_channels.id . |
sent |
numeric |
The number of messages sent. |
received |
numeric |
The number of messages received. |
batch_sent |
numeric |
The number of batches sent. |
batch_received |
numeric |
The number of batches received. |
mz_peek_durations_histogram
The mz_peek_durations_histogram
view describes a histogram of the duration in nanoseconds of read queries (“peeks”) in the dataflow layer.
Field | Type | Meaning |
---|---|---|
type |
text |
The peek variant: index or persist . |
duration_ns |
uint8 |
The upper bound of the bucket in nanoseconds. |
count |
numeric |
The (noncumulative) count of peeks in this bucket. |
mz_records_per_dataflow
The mz_records_per_dataflow
view describes the number of records in each dataflow.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the dataflow. Corresponds to mz_dataflows.id . |
name |
text |
The internal name of the dataflow. |
records |
numeric |
The number of records in the dataflow. |
batches |
numeric |
The number of batches in the dataflow. |
size |
numeric |
The utilized size in bytes of the arrangements. |
capacity |
numeric |
The capacity in bytes of the arrangements. Can be larger than the size. |
allocations |
numeric |
The number of separate memory allocations backing the arrangements. |
mz_records_per_dataflow_operator
The mz_records_per_dataflow_operator
view describes the number of records in each dataflow operator in the system.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
name |
text |
The internal name of the operator. |
dataflow_id |
uint8 |
The ID of the dataflow. Corresponds to mz_dataflows.id . |
records |
numeric |
The number of records in the operator. |
batches |
numeric |
The number of batches in the dataflow. |
size |
numeric |
The utilized size in bytes of the arrangement. |
capacity |
numeric |
The capacity in bytes of the arrangement. Can be larger than the size. |
allocations |
numeric |
The number of separate memory allocations backing the arrangement. |
mz_scheduling_elapsed
The mz_scheduling_elapsed
view describes the total amount of time spent in each dataflow operator.
Field | Type | Meaning |
---|---|---|
id |
uint8 |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
elapsed_ns |
numeric |
The total elapsed time spent in the operator in nanoseconds. |
mz_scheduling_parks_histogram
The mz_scheduling_parks_histogram
view describes a histogram of dataflow worker park events. A park event occurs when a worker has no outstanding work.
Field | Type | Meaning |
---|---|---|
slept_for_ns |
uint8 |
The actual length of the park event in nanoseconds. |
requested_ns |
uint8 |
The requested length of the park event in nanoseconds. |
count |
numeric |
The (noncumulative) count of park events in this bucket. |