Troubleshooting
As you wire up data ingestion in Materialize, you might run into some snags or
unexpected scenarios. This guide collects common questions around data
ingestion to help you troubleshoot your sources. If you’re looking for
troubleshooting guidance for slow or unresponsive queries, check out the
Transform data
troubleshooting guide
instead.
Why isn’t my source ingesting data?
First, check the status of your source in the Materialize console by navigating to https://console.materialize.com/, clicking the Sources tab in the navigation bar, and clicking the affected source.
Alternatively, you can get this information from the system catalog by querying
the mz_source_statuses
table:
SELECT * FROM mz_internal.mz_source_statuses
WHERE name = <SOURCE_NAME>;
If your source reports a status of stalled
or failed
, you likely have a
configuration issue. The returned error
field will provide more details.
If your source reports a status of starting
for more than a few minutes,
reach out to our team for support.
If your source reports a status of running
, but you are not receiving data
when you query the source, the source may still be ingesting its initial
snapshot. See Has my source ingested its initial snapshot?.
Has my source ingested its initial snapshot?
When you create a source, Materialize takes a snapshot of any existing data in the upstream external system and ingests that snapshotted data before it starts ingesting new data. To ensure correct and consistent results, this snapshot is committed atomically at a specific timestamp. Because of that, you will not be able to query your source (or, queries will return no data) until Materialize has finished ingesting the initial snapshot.
Snapshotting can take between a few minutes to several hours, depending on the size of your dataset and the size of your ingestion cluster.
To determine whether your source has completed ingesting the initial snapshot,
you can query the mz_source_statistics
system catalog table:
SELECT snapshot_committed
FROM mz_internal.mz_source_statistics
WHERE id = <SOURCE_ID>;
You generally want to aggregate the snapshot_committed
field across all worker
threads, as done in the above query. The snapshot is only considered committed
for the source as a whole once all worker threads have committed their
components of the snapshot.
Even if your source has not yet committed its initial snapshot, you can still monitor its progress. See How do I monitor source ingestion progress?.
How do I monitor source ingestion progress?
Repeatedly query the
mz_source_statistics
table and look for ingestion statistics that advance over time:
SELECT
bytes_received,
messages_received,
updates_staged,
updates_committed
FROM mz_internal.mz_source_statistics
WHERE id = <SOURCE_ID>;
You can also look at statistics for individual worker threads to evaluate whether ingestion progress is skewed, but it’s generally simplest to start by looking at the aggregate statistics for the whole source.
The bytes_received
and messages_received
statistics should roughly match the
external system’s measure of progress. For example, the bytes_received
and
messages_received
fields for a Kafka source should roughly match the upstream
Kafka broker reports as the number of bytes (including the key) and number of
messages transmitted, respectively.
During the initial snapshot, updates_committed
will remain at zero until all
messages in the snapshot have been staged. Only then will updates_committed
advance. This is expected, and not a cause for concern.
After the initial snapshot, there should be relatively little skew between
updates_staged
and updates_committed
. A large gap is usually an indication
that the source has fallen behind, and that you likely need to scale it up.
messages_received
does not necessarily correspond with updates_staged
and updates_commmited
. For example, a source with ENVELOPE UPSERT
can have more
updates than messages, because messages can cause both deletions and insertions
(i.e. when they update a value for a key), which are both counted in the
updates_*
statistics. There can also be fewer updates than messages, as
many messages for a single key can be consolidated if they occur within a (small)
internally configured window. That said, messages_received
making
steady progress while updates_staged
/updates_committed
doesn’t is also
evidence that a source has fallen behind, and may need to be scaled up.
Beware that these statistics periodically reset to zero, as internal components of the system restart. This is expected behavior. As a result, you should restrict your attention to how these statistics evolve over time, and not their absolute values at any moment in time.