This page is a work in progress and will have more detail in the coming months. If you have specific questions, feel free to file a GitHub issue.
This page provides an overview of the tools available to diagnose a running materialized instance. You can use the same features that are used by these tools to integrate Materialize with other observability infrastructure.
Quick monitoring dashboard
The monitoring dashboard is provided on a best-effort basis. It relies on Materialize’s unstable Prometheus metrics and occasionally lags behind changes to these metrics.
For best results, use only the latest version of the dashboard with the latest version of Materialize.
Materialize provides a recommended Grafana dashboard and an all-inclusive Docker image
preconfigured to run it as
The only configuration required to get started with the Docker image is the
MATERIALIZED_URL=<host>:<port> environment variable.
As an example, if you are running Materialize in a cloud instance at the IP address
172.16.0.0, you can get a dashboard by running this command and
opening http://localhost:3000 in your web browser:
$ docker run -d -p 3000:3000 -e MATERIALIZED_URL=172.16.0.0:6875 materialize/dashboard # expose ports ______point it at materialize______
To instead run the dashboard on the machine on which you are running Materialize, see the Observing local Materialize section below.
The dashboard Docker image bundles Prometheus and Grafana together to make getting insight into Materialize’s performance easy. It is not particularly configurable, and in particular is not designed to handle large metric volumes or long uptimes. It will start truncating metrics history after about 1GB of storage, which corresponds to about 3 days of data with the very fine-grained metrics collected inside the container.
So, while the dashboard is provided as a convenience and should not be relied on for
production monitoring, if you would like to persist metrics across restarts of the
container you can mount a Docker volume onto
$ docker run -d \ -v /tmp/prom-data:/prometheus -u "$(id -u):$(id -g)" \ -p 3000:3000 -e MATERIALIZED_URL=172.16.0.0:6875 \ materialize/dashboard
Observing local Materialize
Using the dashboard to observe a Materialize instance running on the same machine as the dashboard is complicated by Docker. The solution depends upon your host platform.
Inside Docker Compose or Kubernetes
Local schedulers like Docker Compose (which we use for our demos) or Kubernetes will typically expose running containers to each other using their service name as a public DNS hostname, but only within the network that they are running in.
The easiest way to use the dashboard inside a scheduler is to tell the scheduler to run it. Check out the example configuration for Docker Compose.
On macOS, with Materialize running outside of Docker
The problem with this is that
localhost inside of Docker does not, on Docker for Mac,
refer to the macOS network. So instead you must use
docker run -p 3000:3000 -e MATERIALIZED_URL=host.docker.internal:6875 materialize/dashboard
On Linux, with Materialize running outside of Docker
Docker containers use a different network than their host by default, but that is easy to
override by using the
--network host option. Using the host network means that ports will be
allocated from the host, so the
-p flag is no longer necessary:
docker run --network host -e MATERIALIZED_URL=localhost:6875 materialize/dashboard
Materialize supports a basic HTTP health check at
The health check returns HTTP status code 200 as long as Materialize has enough resources to respond to HTTP requests. It does not otherwise assess the state of the system.
To perform health checks that assess other metrics, consider using the Prometheus metrics endpoint.
Memory usage visualization
Materialize exposes an interactive, web-based memory usage visualization at
http://<materialized host>:6875/memory to aid in diagnosing unexpected memory
consumption. The visualization can display a diagram of the operators in each
running dataflow overlaid with the number of rows stored by each operator.
Materialize exposes Prometheus metrics at the default
Materialize broadly publishes the following types of data there:
- Materialize-specific data with a
mz_*prefix. For example,
rate(mz_responses_sent_total[10s])will show you the number of responses averaged over 10 second windows.
- Standard process metrics with a
process_*prefix. For exmple,
System catalog SQL interface
mz_catalog SQL interface provides a variety of ways to introspect Materialize. An
introduction to this catalog is available as part of our SQL
A user guide for debugging a running
materialized using the system catalog is available
in the form of a walkthrough of useful diagnostic queries.
Materialize metrics can be sent to Datadog via the OpenMetrics agent check, which is bundled with recent versions of the Datadog agent.
Simply add the following configuration parameters to
Materialize periodically emits messages to its log file. These log messages serve several purposes:
- To alert operators to critical issues
- To record system status changes
- To provide developers with visibility into the system’s execution when troubleshooting issues
We recommend that you monitor for messages at the
levels. Every message at either of these levels indicates an issue
that must be investigated and resolved.
Each log message is a single line with the following format:
<timestamp> <level> <module>: [<tag>]... <body>
For example, Materialize emits the following log message when a table named
2021-04-08T04:12:25.927738Z INFO coord::catalog: create table materialize.public.t (u1)
The timestamp is always in UTC and formatted according to ISO 8601.
The log level is one of the five levels described in the next section, formatted with all uppercase letters.
The module path reflects the log message’s location in Materialize’s source code. Module paths change frequently from release to release and are not part of Materialize’s stable interface.
The tags, if included, further categorize the message. Tags are surrounded by square brackets. The currently used tags are:
[customer-data]: the message includes clear-text contents of data in the system
[deprecation]: a feature in use will be removed or changed in a future release
The body is unstructured text intended for human consumption. It may include embedded newlines.
Every log message is associated with a level that indicates the severity of the
message. The levels are, in decreasing order of severity,
Log levels are used in the
--log-filter command-line option
to determine which log messages to emit.
Messages at each level must meet the indicated standard:
ERROR: Reports an error that has caused data corruption, data loss, or unavailability. You should page on-call staff immediately about these errors.
- Authentication with an external system (e.g., Amazon S3) has failed.
- A source ingested the deletion of a record that does not exist, causing a “negative multiplicity.”
WARN: Reports an issue that may lead to data corruption, data loss, or unavailability. It is reasonable to check for
WARN-level messages once per day during normal business hours.
- A network request (e.g., downloading an object from Amazon S3) has failed several times, but a retry is in progress.
INFO: Reports normal system status changes. Messages at this level may be of interest to operators, but do not typically require attention.
- A view was created.
- A view was dropped.
DEBUG: Provides information that may help when troubleshooting issues. Messages at this level are primarily of interest to Materialize engineers.
- An HTTP request was routed through a proxy specified by the
- An S3 object downloaded by an S3 source had an invalid
Content-Encodingheader that was ignored, but the object was nonetheless decoded successfully.
- An HTTP request was routed through a proxy specified by the
DEBUG, but the information meets a lower standard of relevance or importance.
TRACElogs can generate multiple gigabytes of log messages per hour. We recommend that you only enable this level in development or at the direction of a Materialize engineer.
- A Kafka source consumed a message.
- A SQL client issued a command.