Guide: Handle upstream schema changes with zero downtime

PREVIEW This feature is in private preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.
To enable this feature in your Materialize region, contact our team.

NOTE:

Changing column types is currently unsupported.
You must be on v26+ to use the new syntax.

Materialize allows you to handle certain types of upstream table schema changes seamlessly, specifically:

Adding a column in the upstream database.
Dropping a column in the upstream database.

This guide walks you through how to handle these changes without any downtime in Materialize.

Prerequisites

Some familiarity with Materialize. If you’ve never used Materialize before, start with our guide to getting started to learn how to connect a database to Materialize.

Set up a PostgreSQL database

For this guide, setup a PostgreSQL 11+ database. In your PostgreSQL, create a table T and populate:

CREATE TABLE T (
    A INT
);

INSERT INTO T (A) VALUES
    (10);

Connect your source database to Materialize

To create a source from PostgreSQL 11+, you must first:

Configure upstream PostgreSQL instance
- Set up logical replication.
- Create a publication.
- Create a replication user and password for Materialize to use to connect.
Configure network security
- Ensure Materialize can connect to your PostgreSQL instance.
Create a connection to PostgreSQL in Materialize
- The connection setup depends on the network security configuration.

For details, see the PostgreSQL integration guides.

Create a source using the new syntax

In Materialize, create a source using the updated CREATE SOURCE syntax.

CREATE SOURCE IF NOT EXISTS my_source
    FROM POSTGRES CONNECTION my_connection (PUBLICATION 'mz_source');

Unlike the legacy syntax, the new syntax does not include the FOR [[ALL] TABLES|SCHEMAS] clause; i.e., the new syntax does not create corresponding subsources in Materialize automatically. Instead, the new syntax requires a separate CREATE TABLE ... FROM SOURCE, which will create the corresponding tables and start the snapshotting process. See Create a table from the source.

NOTE: The legacy syntax is still supported. However, the legacy syntax doesn’t support upstream schema changes.

Create a table from the source

To start ingesting specific tables from your source database, you can create a table in Materialize. We’ll add it into the v1 schema in Materialize.

CREATE SCHEMA v1;

CREATE TABLE v1.T
    FROM SOURCE my_source(REFERENCE public.T);

Once you’ve created a table from source, the initial snapshot of table v1.T will begin.

NOTE: During the snapshotting, the data ingestion for the other tables associated with the source is temporarily blocked. As before, you can monitor progress for the snapshot operation on the overview page for the source in the Materialize console.

Create a view on top of the table.

For this guide, add a materialized view matview (also in schema v1) that sums column A from table T.

CREATE MATERIALIZED VIEW v1.matview AS
    SELECT SUM(A) from v1.T;

Handle upstream column addition

A. Add a column in your upstream PostgreSQL database

In your upstream PostgreSQL database, add a new column B to the table T:

ALTER TABLE T
    ADD COLUMN B BOOLEAN DEFAULT false;

INSERT INTO T (A, B) VALUES
    (20, true);

This operation will have no immediate effect in Materialize. In Materialize, v1.T will continue to ingest only column A. The materialized view v1.matview will continue to have access to column A as well.

B. Incorporate the new column in Materialize

To incorporate the new column into Materialize, create a new v2 schema and recreate the table in the new schema:

CREATE SCHEMA v2;

CREATE TABLE v2.T
    FROM SOURCE my_source(REFERENCE public.T);

The snapshotting of table v2.T will begin. v2.T will include columns A and B.

When the new v2.T table has finished snapshotting, create a new materialized view matview in the new schema. Since the new v2.matview is referencing the new v2.T, it can reference column B:

CREATE MATERIALIZED VIEW v2.matview AS
    SELECT SUM(A)
    FROM v2.T
    WHERE B = true;

Handle upstream column drop

A. Exclude the column in Materialize

To drop a column safely, in Materialize, first, create a new v3 schema, and recreate table T in the new schema but exclude the column to drop. In this example, we’ll drop the column B.

CREATE SCHEMA v3;
CREATE TABLE v3.T
    FROM SOURCE my_source(REFERENCE public.T) WITH (EXCLUDE COLUMNS (B));

B. Drop a column in your upstream PostgreSQL database

In your upstream PostgreSQL database, drop the column B from the table T:

ALTER TABLE T DROP COLUMN B;

Dropping the column B will have no effect on v3.T. However, the drop affects v2.T and v2.matview from our earlier examples. When the user attempts to read from either, Materialize will report an error that the source table schema has been altered.

Optional: Swap schemas

When you’re ready to fully cut over to the new source version, you can optionally swap the schemas and drop the old objects.

ALTER SCHEMA v1 SWAP WITH v3;

DROP SCHEMA v3 CASCADE;