Materialize Documentation
s
Join the Community github/materialize

TAIL

TAIL streams updates from a source, table, or view as they occur.

Conceptual framework

The TAIL statement is a more general form of a SELECT statement. While a SELECT statement computes a relation at a moment in time, a tail operation computes how a relation changes over time.

Fundamentally, TAIL produces a sequence of updates. An update describes either the insertion or deletion of a row to the relation at a specific time. Taken together, the updates describe the complete set of changes to a relation, in order, while the TAIL is active.

You can use TAIL to:

Syntax

TAIL object_name ( select_stmt ) WITH ( option_name = option_value , ) AS OF timestamp_expression
Field Use
object_name The name of the source, table, or view that you want to tail.
select_stmt The SELECT statement whose output you want to tail.
timestamp_expression The logical time at which the TAIL begins as a [bigint] representing milliseconds since the Unix epoch. See AS OF below.

WITH options

The following options are valid within the WITH clause.

Option name Value type Default Describes
SNAPSHOT boolean true Whether to emit a snapshot of the current state of the relation at the start of the operation. See SNAPSHOT below.
PROGRESS boolean false Whether to include detailed progress information. See PROGRESS below.

Details

Output

TAIL emits a sequence of updates as rows. Each row contains all of the columns of the tailed relation or derived from the SELECT statement, prepended with several additional columns that describe the nature of the update:

Column Type Represents
mz_timestamp numeric Materialize's internal logical timestamp. This will never be less than any timestamp previously emitted by the same TAIL operation.
mz_progressed boolean

This column is only present if the PROGRESS option is specified.

If true, indicates that the TAIL will not emit additional records at times strictly less than mz_timestamp. See PROGRESS below.
mz_diff bigint The change in frequency of the row. A positive number indicates that mz_diff copies of the row were inserted, while a negative number indicates that |mz_diff| copies of the row were deleted.
Column 1 Varies The columns from the tailed relation, each as its own column, representing the data that was inserted into or deleted from the relation.
...
Column N Varies

Duration

TAIL will continue to run until canceled, session ends, or until all updates have been presented. The latter case typically occurs when tailing constant views (e.g. CREATE VIEW v AS SELECT 1) or file sources that were created in non-tailing mode (tail = false).

WARNING!

Many PostgreSQL drivers wait for a query to complete before returning its results. Since TAIL can run forever, naively executing a TAIL using your driver’s standard query API may never return.

Either use an API in your driver that does not buffer rows or use the FETCH statement to fetch rows from a TAIL in batches. See the examples for details.

AS OF

The AS OF clause specifies the time at which a TAIL operation begins. See SNAPSHOT below for details on what this means.

If you don’t specify AS OF explicitly, Materialize will pick a timestamp automatically:

A given timestamp will be rejected if data it would report has already been compacted by Materialize. See the --logical-compaction-window command-line option for details on Materialize’s compaction policy.

SNAPSHOT

By default, a TAIL begins by emitting a snapshot of the tailed relation, which consists of a series of updates describing the contents of the relation at its AS OF timestamp. After the snapshot, TAIL emits further updates as they occur.

For updates in the snapshot, the mz_timestamp field will be fast-forwarded to the AS OF timestamp. For example, TAIL ... AS OF 21 would present an insert that occurred at time 15 as if it occurred at time 21.

To see only updates after the AS OF timestamp, specify WITH (SNAPSHOT = false).

PROGRESS

Intuitively, progress messages communicate that no updates have occurred in a given time window. Without explicit progress messages, it is impossible to distinguish between a stall in Materialize and a legitimate period of no updates.

If the PROGRESS option is specified via WITH (PROGRESS), an additional mz_progressed column appears in the output. It is false if there may be more rows with the same timestamp. It is true if no more timestamps will appear that are strictly less than the timestamp. All further columns after mz_progressed will be NULL in the true case.

Not all timestamps that appear will have a corresponding mz_progressed row. For example, the following is a valid sequence of updates:

mz_timestamp | mz_progressed | mz_diff | column1
-------------|---------------|---------|--------------
1            | false         | 1       | data
2            | false         | 1       | more data
3            | false         | 1       | even more data
4            | true          | NULL    | NULL

Notice how Materialize did not emit explicit progress messages for timestamps 1 or 2. The receipt of the update at timestamp 2 implies that there are no more updates for timestamp 1, because timestamps are always presented in non-decreasing order. The receipt of the explicit progress message at timestamp 4 implies that there are no more updates for either timestamp 2 or 3—but that there may be more data arriving at timestamp 4.

Examples

TAIL produces rows similar to a SELECT statement, except that TAIL may never complete. Many drivers buffer all results until a query is complete, and so will never return. Below are the recommended ways to work around this.

Tailing with FETCH

The recommended way to use TAIL is with DECLARE and FETCH. These must be used within a transaction, with a single DECLARE per transaction. This allows you to limit the number of rows and the time window of your requests. As an example, let’s tail the mz_scheduling_elapsed system table, which shows the total amount of time each worker spends in each dataflow.

First, declare a TAIL cursor:

BEGIN;
DECLARE c CURSOR FOR TAIL (SELECT * FROM mz_scheduling_elapsed);

Then, use FETCH in a loop to retrieve each batch of results as soon as it’s ready:

FETCH ALL c;

That will retrieve all of the rows that are currently available. If there are no rows available, it will wait until there are some ready and return those. A timeout can be used to specify a window in which to wait for rows. This will return up to the specified count (or ALL) of rows that are ready within the timeout. To retrieve up to 100 rows that are available in at most the next 1s:

FETCH 100 c WITH (timeout='1s');

To retrieve all available rows available over the next 1s:

FETCH ALL c WITH (timeout='1s');

A 0s timeout can be used to return rows that are available now without waiting:

FETCH ALL c WITH (timeout='0s');

Using clients

If you want to use TAIL from an interactive SQL session (e.g.psql), wrap the query in COPY:

COPY (TAIL (SELECT * FROM mz_scheduling_elapsed)) TO STDOUT;
Additional guides
Go
Java
Node.js
PHP
Python
Ruby

Using AS OF

AS OF requires Materialize to start with a custom compaction window otherwise it will default to 0.

docker run -p 6875:6875 materialize/materialized:v0.26.4 --logical-compaction-window 10s

Create a non-materialized view:

CREATE VIEW most_scheduled_worker AS
  SELECT
    worker,
    SUM(elapsed_ns) as time_working
  FROM mz_scheduling_elapsed
  GROUP BY worker
  ORDER BY time_working DESC
  LIMIT 1;

Create an index and set the compaction window:

CREATE INDEX most_scheduled_worker_idx
  ON most_scheduled_worker (worker, time_working)
  WITH (logical_compaction_window = '10 seconds');

Stream out all changes starting 10 seconds before the statement’s execution time:

COPY (TAIL most_scheduled_worker AS OF NOW() - INTERVAL '10 seconds') TO STDOUT;

Take into account that, in this example, 10 logical seconds need to pass within Materialize to browse and recover changes from the last 10 seconds.

Mapping rows to their updates

After all the rows from the SNAPSHOT have been transmitted, the updates will be emitted as they occur. How can you map each row to its corresponding update?

mz_timestamp mz_progressed mz_diff Column 1 …. Column N
1 false 1 id1 value1
1 false 1 id2 value2
1 false 1 id3 value3 <- Last row from SNAPSHOT
2 false -1 id1 value1
2 false 1 id1 value4

If your row has a unique column key, it is possible to map the update to its corresponding origin row; if the key is unknown, you can use the output of hash(columns_values) instead.

In the example above, Column 1 acts as the column key that uniquely identifies the origin row the update refers to; in case this was unknown, hashing the values from Column 1 to Column N would identify the origin row.