System catalog
New in v0.5.0.
Materialize exposes a system catalog that contains metadata about the running Materialize instance.
Details
The system catalog consists of two schemas that are implicitly available in all databases:
-
mz_catalog
, which exposes metadata in Materialize’s native format. -
pg_catalog
, which presents the data inmz_catalog
in the format used by PostgreSQL.
These schemas contain sources, tables, and views that expose metadata like:
-
Descriptions of each database, schema, source, table, view, sink, and index in the system.
-
Descriptions of all running dataflows.
-
Metrics about dataflow execution.
Whenever possible, applications should prefer to query mz_catalog
over
pg_catalog
. The mapping between Materialize concepts and PostgreSQL concepts
is not one-to-one, and so the data in pg_catalog
cannot accurately represent
the particulars of Materialize. For example, PostgreSQL has no notion of
sinks, and therefore pg_catalog
does not display
information about the sinks available in a Materialize.
mz_catalog
The following sections describe the available objects in the mz_catalog
schema.
mz_array_types
The mz_array_types
table contains a row for each array type in the system.
Field | Type | Meaning |
---|---|---|
type_id |
text |
The ID of the array type. |
element_id |
text |
The ID of the array’s element type. |
mz_arrangement_sharing
The mz_arrangement_sharing
source describes how many times each arrangement
in the system is used.
Field | Type | Meaning |
---|---|---|
operator |
bigint |
The ID of the operator that created the arrangement. Corresponds to mz_dataflow_operators.id . |
worker |
bigint |
The ID of the worker thread hosting the arrangement. |
count |
bigint |
The number of operators that share the arrangement. |
mz_arrangement_sizes
The mz_arrangement_sizes
source describes the size of each arrangement in
the system.
Field | Type | Meaning |
---|---|---|
operator |
bigint |
The ID of the operator that created the arrangement. Corresponds to mz_dataflow_operators.id . |
worker |
bigint |
The worker hosting the arrangement. |
records |
bigint |
The number of records in the arrangement. |
batches |
bigint |
The number of batches in the arrangement. |
mz_avro_ocf_sinks
The mz_avro_ocf_sinks
table contains a row for each Avro OCF sink in the
system.
Field | Type | Meaning |
---|---|---|
sink_id |
text |
The ID of the sink. |
path |
bytea |
The path to the Avro OCF file into which the sink is writing. |
mz_base_types
The mz_base_types
table contains a row for each base type in the system.
Field | Type | Meaning |
---|---|---|
type_id |
text |
The ID of the type. |
mz_columns
The mz_columns
contains a row for each column in each table, source, and view
in the system.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The unique ID of the table, source, or view containing the column. |
name |
text |
The name of the column. |
position |
bigint |
The 1-indexed position of the column in its containing table, source, or view. |
nullable |
boolean |
Can the column contain a NULL value? |
type |
text |
The data type of the column. |
default |
text |
The default expression of the column. |
type_oid |
oid |
The OID of the type of the column (references mz_types ). |
mz_databases
The mz_databases
table contains a row for each database in the system.
Field | Type | Meaning |
---|---|---|
id |
bigint |
Materialize’s unique ID for the database. |
oid |
oid |
A PostgreSQL-compatible OID for the database. |
name |
text |
The name of the database. |
mz_dataflow_channels
The mz_dataflow_channels
source describes the communication channels between
dataflow operators. A communication channel connects one of the outputs of a
source operator to one of the inputs of a target operator.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the channel. |
worker |
bigint |
The ID of the worker thread hosting the channel. |
source_node |
bigint |
The ID of the source operator. Corresponds to mz_dataflow_operators.id . |
source_port |
bigint |
The source operator’s output port. |
target_node |
bigint |
The ID of the target operator. Corresponds to mz_dataflow_operators.id . |
target_port |
bigint |
The target operator’s input port. |
mz_dataflow_names
The mz_dataflow_names
view describes the dataflows in the system.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the dataflow. |
worker |
bigint |
The ID of the worker thread hosting the dataflow. |
local_id |
bigint |
The scope-local index of the dataflow. |
name |
text |
The internal name of the dataflow. |
mz_dataflow_operator_addresses
The mz_dataflow_operator_addresses
source describes how the dataflow channels
and operators in the system are nested into scopes.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the channel or operator. Corresponds to mz_dataflow_channels.id or mz_dataflow_operators.id . |
worker |
bigint |
The ID of the worker thread hosting the channel or operator. |
address |
bigint list |
A list of scope-local indexes indicating the path from the root to this channel or operator. |
mz_dataflow_operator_dataflows
The mz_dataflow_operator_dataflows
view describes the dataflow to which each
dataflow operator belongs.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the operator. |
name |
text |
The internal name of the operator. |
worker |
bigint |
The ID of the worker thread hosting the operator. |
dataflow_id |
bigint |
The ID of the dataflow hosting the operator. |
dataflow_name |
text |
The name of the dataflow hosting the operator. |
mz_dataflow_operators
The mz_dataflow_operators
source describes the dataflow operators in the
system.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the operator. |
worker |
bigint |
The ID of the worker thread hosting the operator. |
name |
text |
The name of the operator. |
mz_functions
The mz_functions
table contains a row for each function in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the function. |
oid |
oid |
A PostgreSQL-compatible OID for the function. |
schema_id |
bigint |
The ID of the schema to which the function belongs. |
name |
text |
The name of the function. |
arg_ids |
text array |
The function’s arguments' types. Elements refers to mz_types.id . |
variadic_id |
text |
The variadic array parameter’s elements, or NULL if the function does not have a variadic parameter. Refers to mz_types.id . |
ret_id |
text |
The returned value’s type, or NULL if the function does not return a value. Refers to mz_types.id . Note that for table functions with > 1 column, this type corresponds to [record ]. |
ret_set |
[bool ] |
Whether the returned value is a set, i.e. the function is a table function. |
mz_indexes
The mz_indexes
table contains a row for each index in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the index. |
oid |
oid |
A PostgreSQL-compatible OID for the index. |
name |
text |
The name of the index. |
on_id |
text |
The ID of the relation on which the index is built. |
volatility |
text |
Whether the index is volatile. Either volatile , nonvolatile , or unknown . |
enabled |
[bool ] |
Whether or not the index represents an arrangement. false only in the case of Disabling user indexes. |
mz_index_columns
The mz_index_columns
table contains a row for each column in each index in the
system. For example, an index on (a, b + 1)
would have two rows in this table,
one for each of the two columns in the index.
For a given row, if field_number
is null then expression
will be nonnull, or
vice-versa.
Field | Type | Meaning |
---|---|---|
index_id |
text |
The ID of the index which contains this column. |
index_position |
bigint |
The 1-indexed position of this column within the index. (The order of columns in an index does not necessarily match the order of columns in the relation on which the index is built.) |
on_position |
bigint |
If not NULL , specifies the 1-indexed position of a column in the relation on which this index is built that determines the value of this index column. |
on_expression |
text |
If not NULL , specifies a SQL expression that is evaluated to compute the value of this index column. The expression may contain references to any of the columns of the relation. |
nullable |
boolean |
Can this column of the index evaluate to NULL ? |
mz_kafka_source_statistics
The mz_kafka_source_statistics
table contains statistics for Kafka sources
from the underlying librdkafka library.
Note that the contents of this table may evolve without warning when Materialize updates the version of librdkafka in use.
Field | Type | Meaning |
---|---|---|
source_id |
text |
The ID of the source. Corresponds to mz_source_info.source_id . |
worker_id |
bigint |
The ID of the worker thread hosting the dataflow. |
statistics |
jsonb |
A JSON object containing the statistics. See STATISTICS.md in the librdkafka documentation for details. |
mz_kafka_sinks
The mz_kafka_sinks
table contains a row for each Kafka sink in the system.
Field | Type | Meaning |
---|---|---|
sink_id |
text |
The ID of the sink. |
topic |
text |
The name of the Kafka topic into which the sink is writing. |
consistency_topic |
text |
The name of the Kafka topic into which the sink is writing consistency information. This is NULL when the sink does not write consistency information. |
mz_list_types
The mz_list_types
table contains a row for each list type in the system.
Field | Type | Meaning |
---|---|---|
type_id |
text |
The ID of the list type. |
element_id |
text |
The IID of the list’s element type. |
mz_message_counts
The mz_message_counts
source describes the messages sent and received over the
dataflow channels in the system.
Field | Type | Meaning |
---|---|---|
channel |
bigint |
The ID of the channel. Corresponds to mz_dataflow_channels.id . |
source_worker |
bigint |
The ID of the worker thread sending the message. |
target_worker |
bigint |
The ID of the worker thread receiving the message. |
sent |
bigint |
The number of messages sent. |
received |
bigint |
The number of messages received. |
mz_materialization_dependencies
The mz_materialization_dependencies
source describes the sources that each
dataflow depends on.
Field | Type | Meaning |
---|---|---|
dataflow |
text |
The ID of the index that created the dataflow. Corresponds to mz_indexes.id . |
source |
text |
The ID of the source. Corresponds to mz_sources.id . |
worker |
bigint |
The ID of the worker thread hosting the dataflow. |
mz_materialization_frontiers
The mz_materialization_frontiers
view describes the frontier for each
dataflow in the system across all workers. The frontier describes the earliest
timestamp at which the output of the dataflow may change; data prior to that
timestamp is sealed.
For per-worker frontier information, see
mz_worker_materialization_frontiers
.
Field | Type | Meaning |
---|---|---|
global_id |
text |
The ID of the index that created the dataflow. Corresponds to mz_indexes.id . |
time |
bigint |
The next timestamp at which the materialization may change. |
mz_materializations
The mz_materializations
source describes the indexes in the system.
Field | Type | Meaning |
---|---|---|
name |
text |
The ID of the index. (name is a misnomer.) |
worker |
bigint |
The ID of the worker thread hosting the index. |
mz_map_types
The mz_map_types
table contains a row for each map type in the system.
Field | Type | Meaning |
---|---|---|
type_id |
text |
The ID of the map type. |
key_id |
text |
The ID of the map’s key type. |
value_id |
text |
The ID of the map’s value type. |
mz_metrics
The mz_metrics
table contains recordings for each Prometheus
metric in the system that is a counter or gauge.
The values are recorded every 30 seconds and retained for five minutes.
Field | Type | Meaning |
---|---|---|
metric |
text |
The name of the metric. |
time |
timestamp with time zone |
The time at which the metric was recorded. |
labels |
jsonb |
The metric’s labels and their associated values as a JSON object. |
value |
double precision |
The value of the counter or gauge. |
mz_metrics_meta
The mz_metrics_meta
table describes metadata about each Prometheus
metric in the system.
Field | Type | Meaning |
---|---|---|
metric |
text |
The name of the metric. |
type |
text |
The type of the metric (counter , gauge , or histogram ). |
help |
text |
The description of the metric. |
mz_metric_histograms
The mz_metric_histograms
table contains recordings for each Prometheus
metric in the system that is a histogram. The
values are recorded every 30 seconds and retained for five minutes. Each row of
the table represents one bucket of one histogram from one recording. Note that
Prometheus histograms are cumulative.
Field | Type | Meaning |
---|---|---|
metric |
text |
The name of the metric. |
time |
timestamp |
The time at which the metric was recor |
labels |
jsonb |
The metric’s labels and their associated values as a JSON object. |
bound |
double precision |
The upper bound of the bucket. |
count |
bigint |
The (cumulative) count of observations in the bucket. |
mz_objects
The mz_objects
view contains a row for each table, source, view, sink, and
index in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the object. |
oid |
oid |
A PostgreSQL-compatible OID for the object. |
schema_id |
bigint |
The ID of the schema to which the object belongs. |
name |
text |
The name of the object. |
type |
text |
The type of the object: either table , source , view , sink , or index . |
mz_peek_active
The mz_peek_active
source describes all read queries (“peeks”) that are
pending in the dataflow layer.
Field | Type | Meaning |
---|---|---|
id |
uuid |
The ID of the peek request. |
worker |
bigint |
The ID of the worker thread servicing the peek. |
index_id |
text |
The ID of the index the peek is targeting. |
time |
bigint |
The timestamp the peek has requested. |
mz_peek_durations
The mz_peek_durations
source describes a histogram of the duration of read
queries (“peeks”) in the dataflow layer.
Field | Type | Meaning |
---|---|---|
worker |
bigint |
The ID of the worker thread servicing the peek. |
duration_ns |
bigint |
The upper bound of the bucket in nanoseconds. |
count |
bigint |
The (noncumulative) count of peeks in this bucket. |
mz_perf_dependency_frontiers
The mz_perf_dependency_frontiers
view describes the lag between a dataflow’s
outputs and its sources. It contains a row for every dataflow in the system that
depends on at least one source.
Field | Type | Meaning |
---|---|---|
dataflow |
text |
The name of the dataflow hosting the source. |
source |
text |
The name of the source. |
lag_ms |
bigint |
The amount of lag between when a record is emitted by the source and when the output of the dataflow reflects that record. |
mz_pseudo_types
The mz_pseudo_types
table contains a row for each pseudo type in the system.
Field | Type | Meaning |
---|---|---|
type_id |
text |
The ID of the type. |
mz_records_per_dataflow
The mz_records_per_dataflow
view describes the number of records in each
dataflow on each worker in the system.
For the same information aggregated across all workers, see
mz_records_per_dataflow_global
.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the dataflow. Corresponds to mz_dataflow_names.id . |
name |
text |
The internal name of the dataflow. |
worker |
bigint |
The ID of the worker thread hosting the dataflow. |
records |
numeric |
The number of records in the dataflow. |
mz_records_per_dataflow_global
The mz_records_per_dataflow_global
view describes the number of records in each
dataflow in the system.
For the same information broken down across workers, see
mz_records_per_dataflow
.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the dataflow. Corresponds to mz_dataflow_names.id . |
name |
text |
The internal name of the dataflow. |
records |
numeric |
The number of records in the dataflow. |
mz_records_per_dataflow_operator
The mz_records_per_dataflow_operator
view describes the number of records in
each dataflow operator in the system.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
name |
text |
The internal name of the dataflow. |
worker |
bigint |
The ID of the worker thread hosting the dataflow. |
dataflow_id |
bigint |
The ID of the dataflow. Corresponds to mz_dataflow_names.id . |
records |
numeric |
The number of records in the dataflow. |
mz_relations
The mz_relations
view contains a row for each table, source, and view in the
system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the relation. |
oid |
oid |
A PostgreSQL-compatible OID for the relation. |
schema_id |
bigint |
The ID of the schema to which the relation belongs. |
name |
text |
The name of the relation. |
type |
text |
The type of the relation: either table , source , or view . |
mz_roles
The mz_roles
table contains a row for each role in the system.
Field | Type | Meaning |
---|---|---|
id |
bigint |
Materialize’s unique ID for the role. |
oid |
oid |
A PostgreSQL-compatible OID for the role. |
name |
text |
The name of the role. |
mz_scheduling_elapsed
The mz_scheduling_elapsed
source describes the total amount of time spent in
each dataflow operator.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
worker |
bigint |
The ID of the worker thread hosting the operator. |
elapsed_ns |
bigint |
The total elapsed time spent in the operator in nanoseconds. |
mz_scheduling_histogram
The mz_scheduling_histogram
source stores a histogram describing the
duration of each invocation for each dataflow operator.
Field | Type | Meaning |
---|---|---|
id |
bigint |
The ID of the operator. Corresponds to mz_dataflow_operators.id . |
worker |
bigint |
The ID of the worker thread hosting the operator. |
duration_ns |
bigint |
The upper bound of the bucket. |
count |
bigint |
The number of recordings in the bucket. |
mz_scheduling_parks
The mz_scheduling_parks
source stores a histogram describing dataflow worker
park events. A park event occurs when a worker has no outstanding work.
Field | Type | Meaning |
---|---|---|
worker |
bigint |
The ID of the worker thread. |
slept_for |
bigint |
The actual length of the park event. |
requested |
bigint |
The requested length of the park event. |
count |
bigint |
The number of park events in this bucket. |
mz_schemas
The mz_schemas
table contains a row for each schema in the system.
Field | Type | Meaning |
---|---|---|
id |
bigint |
Materialize’s unique ID for the schema. |
oid |
oid |
A PostgreSQL-compatible oid for the schema. |
database_id |
bigint |
The ID of the database containing the schema. |
name |
text |
The name of the schema. |
mz_sinks
The mz_sinks
table contains a row for each sink in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the sink. |
oid |
oid |
A PostgreSQL-compatible OID for the sink. |
schema_id |
bigint |
The ID of the schema to which the sink belongs. |
name |
text |
The name of the sink. |
connector_type |
text |
The type of the sink: avro-ocf or kafka . |
volatility |
text |
Whether the sink is volatile. Either volatile , nonvolatile , or unknown . |
mz_source_info
The mz_source_info
table contains a row for each partition of each source
in the system.
Field | Type | Meaning |
---|---|---|
source_name |
text |
Materialize’s internal name for the source. |
source_id |
text |
Materialize’s unique ID for the source. Corresponds to mz_sources.id . |
dataflow_id |
bigint |
The ID of the dataflow responsible for processing this source. |
partition_id |
text |
The ID of the partition within the source. The concept of partition varies by source type. Not all sources types have multiple partitions, in which case there will be only one entry for partition ID 0 . |
offset |
bigint |
The highest offset processed by this source. |
timestamp |
bigint |
The largest mz_timestamp processed by this source. |
mz_sources
The mz_sources
table contains a row for each source in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the source. |
oid |
oid |
A PostgreSQL-compatible OID for the source. |
schema_id |
bigint |
The ID of the schema to which the source belongs. |
name |
text |
The name of the source. |
connector_type |
text |
The type of the source: avro-ocf , file , kafka , kinesis , s3 , postgres , or pubnub . |
volatility |
text |
Whether the source is volatile. Either volatile , nonvolatile , or unknown . |
mz_tables
The mz_tables
table contains a row for each table in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the table. |
oid |
oid |
A PostgreSQL-compatible OID for the table. |
schema_id |
bigint |
The ID of the schema to which the table belongs. |
name |
text |
The name of the table. |
persisted_name |
text |
The name of the table’s persisted materialization, or NULL if the table is not being persisted. |
mz_types
The mz_types
table contains a row for each type in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the type. |
oid |
oid |
A PostgreSQL-compatible OID for the type. |
schema_id |
bigint |
The ID of the schema to which the type belongs. |
name |
text |
The name of the type. |
mz_views
The mz_views
table contains a row for each view in the system.
Field | Type | Meaning |
---|---|---|
id |
text |
Materialize’s unique ID for the view. |
oid |
oid |
A PostgreSQL-compatible OID for the view. |
schema_id |
bigint |
The ID of the schema to which the view belongs. |
name |
text |
The name of the view. |
volatility |
text |
Whether the view is volatile. Either volatile , nonvolatile , or unknown . |
definition |
text |
The view definition (a SELECT query). |
mz_worker_materialization_frontiers
The mz_worker_materialization_frontiers
source describes each worker’s
frontier for each dataflow in the system. The frontier describes the earliest
timestamp at which the output of the dataflow may change; data prior to that
timestamp is sealed.
For frontier information aggregated across all workers, see
mz_materialization_frontiers
.
Field | Type | Meaning |
---|---|---|
global_id |
text |
The ID of the index that created the dataflow. Corresponds to mz_indexes.id . |
worker |
bigint |
The ID of the worker thread hosting the dataflow. |
time |
bigint |
The next timestamp at which the dataflow may change. |
pg_catalog
Materialize has compatibility shims for the following relations from PostgreSQL’s system catalog:
pg_am
pg_attribute
pg_class
pg_collation
pg_constraint
pg_database
pg_description
pg_enum
pg_index
pg_inherits
pg_namespace
pg_policy
pg_proc
pg_range
pg_roles
pg_settings
pg_tables
pg_type
pg_views
These compatibility shims are largely incomplete. Most are lacking some columns
that are present in PostgreSQL, or if they do include the column the result set
its value may always be NULL
. The precise nature of the incompleteness is
intentionally undocumented. New tools developed against Materialize should use
the documented mz_catalog
API instead.
If you are having trouble making a PostgreSQL tool work with Materialize, please
file a GitHub issue. Many PostgreSQL tools can be made to work with
Materialize with minor changes to the pg_catalog
compatibility shim.
information_schema
Materialize has compatibility shims for the following relations from the
SQL standard information_schema
schema, which is automatically available in all databases:
These compatibility shims are largely incomplete. Most are lacking some columns
that are present in the SQL standard, or if they do include the column the
result set its value may always be NULL
. The precise nature of the
incompleteness is intentionally undocumented. New tools developed against
Materialize should use the documented mz_catalog
API instead.