Time travel queries

PREVIEW This feature is in private preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.

You must contact us to enable this feature in your Materialize region.

By default, all user-defined sources, tables, materialized views, and indexes keep track of the most recent version of their underlying data. Some use cases require accessing past versions of this data at specific points in time (i.e. time traveling). To allow time travel queries, you can configure these objects to retain history.

When set to a value different than the default (i.e. no history retention), the history retention period of an object determines the duration for which historical versions of its underlying data are available for querying.

The common use cases for time traveling in Materialize are:

  • Lossless, durable subscriptions to your changing results. See the durable subscriptions section below for details on how to create this type of subscription.

  • Accessing a past version of results at a specific point in time.

History retention period

You can change the history retention period for sources, tables, materialized views, and indexes. To configure the history retention period for an object, use the RETAIN HISTORY option in the CREATE statement. This value can also be adjusted at any time using the object-specific ALTER statement.

Configuration

Increasing the history retention period

When you increase the history retention period for an object, both the existing historical data and any subsequently produced historical data are retained for the specified time period.

  • For sources, tables and materialized views: increasing the history retention period will not restore older historical data that was already outside the previous history retention period before the change.

  • For indexes: if all of the underlying source, table, and materialized view historical data is available for the increased history retention period, the index can use that data to backfill as far back as the underlying historical data is available.

Decreasing the history retention period

When you decrease the history retention period for an object:

  • Newly produced historical data is retained for the new, shorter history retention period.

  • Historical data outside the new, shorter history retention period is no longer retained. If you subsequently increase the history retention period again, the older historical data may already be unavailable.

Resource utilization

It’s important to note that increasing the history retention period for an object will lead to increased resource utilization in Materialize.

  • For sources, tables and materialized views: increasing the history retention period for these objects increases the amount of historical data that is retained in the storage layer. You can expect storage resource utilization to increase, which may incur additional costs.

  • For indexes: increasing the history retention period for an index increases the amount of historical data that is retained in memory in the cluster maintaining the index. You can expect memory resource utilization for that cluster to go up, which might require you to size up your cluster and consume additional compute credits.

Clean-up

The history retention period represents the minimum amount of historical data guaranteed to be retained by Materialize. History clean-up is processed in the background, so older history may be accessible for the period of time between when it falls outside the retention period and when it is cleaned up.

Best practices

Given the additional memory costs associated with increasing the history retention period on indexes, if you don’t need the performance benefits of an index—in particular for the use case of durable subscriptions-you should consider creating a materialized view for your subscription query and configure the history retention period on that materialized view instead.

Similarly, because of the increased storage costs and processing time for the additional historical data on whichever object has an increased history retention period, you should consider configuring history retention period on the index or materialized view directly powering the subscription, rather than all the way through the dependency chain from the source to the index or materialized view.

Durable subscriptions

Because SUBSCRIBE requests happen over the network, these connections might get disrupted for both expected and unexpected reasons. You can adjust the history retention period for the objects a subscription depends on, and then use AS OF to pick up where you left off on connection drops—this ensures that no data is lost in the subscription process, and avoids the need for re-snapshotting the data.

Implementing durable subscriptions in your application

To set up a durable subscription in your application:

  1. In Materialize, configure the history retention period for the object (s) queried in the SUBSCRIBE. Choose a duration you expect will allow you to recover in case of connection drops. One hour (1h) is a good place to start, though you should be mindful of the resource utilization impact of increasing an object’s history retention period.

  2. In order to restart your application without losing or re-snapshotting data after a connection drop, you need to store the latest timestamp processed for the subscription (either in Materialize, or elsewhere in your application state). This will allow you to resume using the retained history upstream.

  3. The first time you start the subscription, run the following continuous query against Materialize in your application code:

    SUBSCRIBE (<your query>) WITH (PROGRESS, SNAPSHOT true);
    

    If you do not need a full snapshot to bootstrap your application, change this to SNAPSHOT false.

  4. As results come in continuously, buffer the latest results in memory until you receive a progress message. At that point, the data up until the progress message is complete, so you can:

    1. Process all the buffered data in your application.
    2. Persist the mz_timestamp of the progress message.
  5. To resume the subscription in subsequent restarts, use the following continuous query against Materialize in your application code:

    SUBSCRIBE (<your query>) WITH (PROGRESS, SNAPSHOT false) AS OF <last_progress_mz_timestamp>;
    

    In a similar way, as results come in continuously, buffer the latest results in memory until you receive a progress message. At that point, the data up until the progress message is complete, so you can:

    1. Process all the buffered data in your application.
    2. Persist the mz_timestamp of the progress message.

    You can tweak the flush interval at which you durably record the latest progress timestamp, if every progress message is too frequent.

Idempotency

The guide above recommends you buffer data in memory until receiving a progress message, then persisting the data and progress message mz_timestamp at the same time. This is to ensure data is processed exactly once.

In the case that your application crashes and you need to resume your subscription using the persisted progress message mz_timestamp:

  • If you were processing data in your application before persisting the subsequent progress message’s mz_timestamp: you may end up processing duplicate data.
  • If you were persisting the progress message’s mz_timestamp before processing all the buffered data from before that progress message: you may end up dropping some data.

As a result, the way to guarantee exactly once data processing in the case of your application crashes is to write the progress message mz_timestamp and all buffered data together in a single transaction.

Examples

Enabling history retention

To set a history retention period for an object, use the RETAIN HISTORY option, which accepts positive interval values (e.g. '1hr'):

CREATE MATERIALIZED VIEW winning_bids
WITH (RETAIN HISTORY FOR '1hr') AS
SELECT auction_id,
       bid_id,
       item,
       amount
FROM highest_bid_per_auction
WHERE end_time < mz_now();

Adjusting history retention configuration

To adjust the history retention period for an object, use ALTER:

ALTER MATERIALIZED VIEW winning_bids SET (RETAIN HISTORY FOR '2hr');

Viewing history retention configuration

To see what history retention period has been configured for an object, look up the object in the mz_internal.mz_history_retention_strategies catalog table.

SELECT
    d.name AS database_name,
    s.name AS schema_name,
    mv.name,
    hrs.strategy,
    hrs.value
FROM
    mz_catalog.mz_materialized_views AS mv
        LEFT JOIN mz_schemas AS s ON mv.schema_id = s.id
        LEFT JOIN mz_databases AS d ON s.database_id = d.id
        LEFT JOIN mz_internal.mz_history_retention_strategies AS hrs ON mv.id = hrs.id
WHERE mv.name = 'winning_bids';

 database_name | schema_name |     name     | strategy |  value
---------------+-------------+--------------+----------+---------
 materialize   | public      | winning_bids | FOR      | 3600000

Disabling history retention

To disable history retention, reset the RETAIN HISTORY option:

ALTER MATERIALIZED VIEW winning_bids RESET (RETAIN HISTORY);
Back to top ↑