TAIL streams updates from a source, table, or view as they occur.
TAIL statement is a more general form of a
statement. While a
SELECT statement computes a relation at a moment in time, a
tail operation computes how a relation changes over time.
TAIL produces a sequence of updates. An update describes either
the insertion or deletion of a row to the relation at a specific time. Taken
together, the updates describe the complete set of changes to a relation, in
order, while the
TAIL is active.
You can use
- Power event processors that react to every change to a relation or an arbitrary
- Replicate the complete history of a relation while the
- Test a SQL
SELECTstatement over non-materialized views
|object_name||The name of the source, table, or view that you want to tail.|
|timestamp_expression||The logical time at which the
The following options are valid within the
|Option name||Value type||Default||Describes|
||Whether to emit a snapshot of the current state of the relation at the start of the operation. See
||Whether to include detailed progress information. See
TAIL emits a sequence of updates as rows. Each row contains all of the columns of
the tailed relation or derived from the
SELECT statement, prepended with several additional columns that describe
the nature of the update:
Materialize's internal logical timestamp. This will never be less than any
timestamp previously emitted by the same
This column is only present if the
The change in frequency of the row. A positive number indicates that
|Column 1||Varies||The columns from the tailed relation, each as its own column, representing the data that was inserted into or deleted from the relation.|
TAIL will continue to run until canceled, session ends, or until all updates have been presented. The latter case typically occurs when
tailing constant views (e.g.
CREATE VIEW v AS SELECT 1) or
file sources that were created in non-tailing
tail = false).
Many PostgreSQL drivers wait for a query to complete before returning its
TAIL can run forever, naively executing a
TAIL using your
driver’s standard query API may never return.
AS OF clause specifies the time at which a
TAIL operation begins.
SNAPSHOT below for details on what this means.
If you don’t specify
AS OF explicitly, Materialize will pick a timestamp
- If the tailed relation is materialized, Materialize picks the latest time for which results are computed.
- If the tailed relation is not materialized, Materialize picks time
A given timestamp will be rejected if data it would report has already been
compacted by Materialize. See the
--logical-compaction-window command-line option for
details on Materialize’s compaction policy.
By default, a
TAIL begins by emitting a snapshot of the tailed relation, which
consists of a series of updates describing the contents of the relation at its
AS OF timestamp. After the snapshot,
TAIL emits further updates as
For updates in the snapshot, the
mz_timestamp field will be fast-forwarded to the
AS OF timestamp. For example,
TAIL ... AS OF 21 would present an insert that
occured at time 15 as if it occurred at time 21.
To see only updates after the
AS OF timestamp, specify
WITH (SNAPSHOT = false).
Intuitively, progress messages communicate that no updates have occurred in a given time window. Without explicit progress messages, it is impossible to distinguish between a stall in Materialize and a legitimate period of no updates.
PROGRESS option is specified via
WITH (PROGRESS), an additional
mz_progressed column appears in the output.
false if there may be more rows with the same timestamp.
true if no more timestamps will appear that are strictly less than the
All further columns after
mz_progressed will be
NULL in the
Not all timestamps that appear will have a corresponding
For example, the following is a valid sequence of updates:
mz_timestamp | mz_progressed | mz_diff | column1 -------------|---------------|---------|-------------- 1 | false | 1 | data 2 | false | 1 | more data 3 | false | 1 | even more data 4 | true | NULL | NULL
Notice how Materialize did not emit explicit progress messages for timestamps
2. The receipt of the update at timestamp
2 implies that there
are no more updates for timestamp
1, because timestamps are always presented
in non-decreasing order. The receipt of the explicit progress message at
4 implies that there are no more updates for either timestamp
3—but that there may be more data arriving at timestamp
TAIL produces rows similar to a
SELECT statement, except that
TAIL may never complete.
Many drivers buffer all results until a query is complete, and so will never return.
Below are the recommended ways to work around this.
The recommended way to use
TAIL is with
These must be used within a transaction, with a single
DECLARE per transaction.
This allows you to limit the number of rows and the time window of your requests.
As an example, let’s tail the
mz_scheduling_elapsed system table, which shows the total amount of time each worker spends in each dataflow.
First, declare a
BEGIN; DECLARE c CURSOR FOR TAIL (SELECT * FROM mz_scheduling_elapsed);
FETCH in a loop to retrieve each batch of results as soon as it’s ready:
FETCH ALL c;
That will retrieve all of the rows that are currently available.
If there are no rows available, it will wait until there are some ready and return those.
timeout can be used to specify a window in which to wait for rows. This will return up to the specified count (or
ALL) of rows that are ready within the timeout. To retrieve up to 100 rows that are available in at most the next
FETCH 100 c WITH (timeout='1s');
To retrieve all available rows available over the next
FETCH ALL c WITH (timeout='1s');
0s timeout can be used to return rows that are available now without waiting:
FETCH ALL c WITH (timeout='0s');
If you want to use
TAIL from an interactive SQL session (e.g.
psql), wrap the query in
COPY (TAIL (SELECT * FROM mz_scheduling_elapsed)) TO STDOUT;
Using AS OF
docker run -p 6875:6875 materialize/materialized:v0.26.3 --logical-compaction-window 10s
Create a non-materialized view:
CREATE VIEW most_scheduled_worker AS SELECT worker, SUM(elapsed_ns) as time_working FROM mz_scheduling_elapsed GROUP BY worker ORDER BY time_working DESC LIMIT 1;
Create an index and set the compaction window:
CREATE INDEX most_scheduled_worker_idx ON most_scheduled_worker (worker, time_working) WITH (logical_compaction_window = '10 seconds');
Stream out all changes starting 10 seconds before the statement’s execution time:
COPY (TAIL most_scheduled_worker AS OF NOW() - INTERVAL '10 seconds') TO STDOUT;
Take into account that, in this example, 10 logical seconds need to pass within Materialize to browse and recover changes from the last 10 seconds.
Mapping rows to their updates
After all the rows from the
SNAPSHOT have been transmitted, the updates will be emitted as they occur. How can you map each row to its corresponding update?
|mz_timestamp||mz_progressed||mz_diff||Column 1||….||Column N|
|1||false||1||id3||value3||<- Last row from
If your row has a unique column key, it is possible to map the update to its corresponding origin row; if the key is unknown, you can use the output of
In the example above,
Column 1 acts as the column key that uniquely identifies the origin row the update refers to; in case this was unknown, hashing the values from
Column 1 to
Column N would identify the origin row.