Guide: Handle upstream schema changes with zero downtime
View as MarkdownTo enable this feature in your Materialize region, contact our team.
Materialize allows you to handle certain types of upstream table schema changes seamlessly, specifically:
- Adding a column in the upstream database.
- Dropping a column in the upstream database.
This guide walks you through how to handle these changes without any downtime in Materialize.
Prerequisites
Some familiarity with Materialize. If you’ve never used Materialize before, start with our guide to getting started to learn how to connect a database to Materialize.
Set up a MySQL database
For this guide, setup a MySQL 5.7+ database. In your MySQL database, create a
table t1 and populate it:
CREATE TABLE t1 (
a INT
);
INSERT INTO t1 (a) VALUES (10);
Configure your MySQL database
Configure your MySQL database for GTID-based binlog replication using the configuration instructions for self-hosted MySQL.
Connect your source database to Materialize
Create a connection to your MySQL database using the CREATE CONNECTION syntax.
Create a source
In Materialize, create a source using the CREATE SOURCE
syntax.
CREATE SOURCE my_source
FROM MYSQL CONNECTION mysql_connection;
Create a table from the source
To start ingesting specific tables from your source database, create a
table in Materialize. We’ll add it into the v1 schema.
CREATE SCHEMA v1;
CREATE TABLE v1.t1
FROM SOURCE my_source (REFERENCE mydb.t1);
Once you’ve created a table from source, the initial
snapshot of table v1.t1 will begin.
Create a view on top of the table
For this guide, add a materialized view matview (also in schema v1) that
sums column a from table t1.
CREATE MATERIALIZED VIEW v1.matview AS
SELECT SUM(a) FROM v1.t1;
Handle upstream column addition
A. Add a column in your upstream MySQL database
In your upstream MySQL database, add a new column b to the table t1:
ALTER TABLE t1
ADD COLUMN b BOOLEAN DEFAULT false;
INSERT INTO t1 (a, b) VALUES (20, true);
This operation has no immediate effect in Materialize. In Materialize:
- The table
v1.t1will continue to ingest only columna. - The materialized view
v1.matviewwill continue to have access to columnaonly.
B. Incorporate the new column in Materialize
Unlike SQL Server CDC, MySQL uses binlog-based replication, which automatically
includes all columns. To incorporate the new column into Materialize, create a
new v2 schema and recreate the table in the new schema:
CREATE SCHEMA v2;
CREATE TABLE v2.t1
FROM SOURCE my_source (REFERENCE mydb.t1);
The snapshotting of table v2.t1 will begin.
v2.t1 will include columns a and b.
When v2.t1 has finished snapshotting, create a new materialized view in the
new schema. Since v2.matview references v2.t1, it can now reference column b:
CREATE MATERIALIZED VIEW v2.matview AS
SELECT SUM(a)
FROM v2.t1
WHERE b = true;
Handle upstream column drop
A. Exclude the column in Materialize
To drop a column safely, first create a new schema in Materialize and recreate
the table excluding the column you intend to drop. In this example, we’ll drop
column b.
CREATE SCHEMA v3;
CREATE TABLE v3.t1
FROM SOURCE my_source (REFERENCE mydb.t1) WITH (EXCLUDE COLUMNS (b));
B. Drop the column in your upstream MySQL database
In your upstream MySQL database, drop column b from table t1:
ALTER TABLE t1 DROP COLUMN b;
Dropping column b will have no effect on v3.t1 in Materialize, provided
you completed step A before dropping the column. However, the drop affects
v2.T and v2.matview from our earlier examples. When the user attempts to
read from either, Materialize will report an error that the source table schema
has been altered.
Once you have finished migrating any views and queries from v2 to v3, you
can clean up the old objects:
DROP TABLE v2.t1;
DROP MATERIALIZED VIEW v2.matview;
DROP SCHEMA v2;