CREATE SOURCE: PostgreSQL
New in v0.8.0.
CREATE SOURCE connects Materialize to an external data source and lets you interact
with its data as if the data were in a SQL table.
This page details how to connect Materialize to a PostgreSQL (10+) database to create and efficiently maintain real-time materialized views on top of a replication stream.
Sources represent connections to resources outside Materialize that it can read data from. For more information, see Key Concepts: Sources.
|MATERIALIZED||Materializes the source’s data, which retains all data in memory and makes sources directly selectable. For more information, see Key Concepts — Materialized sources.|
|src_name||The name for the source.|
|IF NOT EXISTS||Do nothing (except issuing a notice) if a source with the same name already exists. Default.|
|CONNECTION connection_info||Postgres connection parameters. See the Postgres documentation on supported connection parameters for details.|
|PUBLICATION publication_name||Postgres publication (the replication data set containing the tables to be streamed to Materialize).|
The following option is valid within the
Change data capture
This source uses PostgreSQL’s native replication protocol to continually ingest changes resulting from
DELETE operations in the upstream database (also know as change data capture).
For this reason, the upstream database must be configured to support logical replication. To get logical replication set up, follow the step-by-step instructions in the Change Data Capture (Postgres) guide.
Creating a source
To avoid creating multiple replication slots upstream and minimize the required bandwidth, Materialize ingests the raw replication stream data for all tables included in a specific publication. This means that, when you define a source:
CREATE SOURCE mz_source FROM POSTGRES CONNECTION 'host=example.com port=5432 user=host dbname=postgres sslmode=require' PUBLICATION 'mz_source';
, its schema looks like:
SHOW COLUMNS FROM mz_source; name | nullable | type ----------+----------+--------- oid | f | integer row_data | f | list
where each row of every upstream table is represented as a single row with two columns:
||A unique identifier for the tables included in the publication.|
||A text-encoded, variable length
It’s important to note that the schema metadata is captured when the source is initially created, and is validated against the upstream schema upon restart. If you wish to add additional tables to the original publication and use them in Materialize, the source must be dropped and recreated.
Creating replication views
From here, you can break down the source into views that reproduce the publication’s original tables based on the
oid identifier and convert the text elements in
row_data to the original data types:
Create views for specific tables included in the Postgres publication
CREATE VIEWS FROM SOURCE mz_source (table1, table2);
Create views for all tables
CREATE VIEWS FROM SOURCE mz_source;
Under the hood, Materialize parses this statement into view definitions for each table that can be used as a base for your materialized view.
CREATE VIEWS will attempt to create each upstream table in the same schema as Postgres. For example, if the publication contains tables
CREATE VIEWS is the equivalent of:
CREATE VIEW public.foo; CREATE VIEW otherschema.foo;
CREATE VIEWS to succeed, either all upstream schemas included in the publication must exist in Materialize as well, or you must explicitly specify the downstream schemas and rename the resulting views:
CREATE VIEWS FROM SOURCE mz_source (public.foo AS foo, otherschema.foo AS foo2);
Creating materialized views
To produce correct results, Postgres sources can only be materialized once. As soon as you define a materialized view, Materialize:
Creates a replication slot in the upstream Postgres database (see Postgres replication slots). The name of the replication slots created by Materialize is prefixed with
materialize_for easy identification.
Performs an initial, snapshot-based sync of the tables in the publication before it starts ingesting change events.
Note: During this phase, disk space consumption may increase proportionally to the size of the upstream database before returning to a steady state. To profile disk usage, see Troubleshooting.
Incrementally updates the view as new change events stream in as a result of
DELETEoperations in the upstream Postgres database.
Postgres replication slots
Each Materialize replication slot can be used to source data for a single materialized view. You can create multiple non-materialized views for the same replication slot using the
CREATE VIEWS statement.
If you stop Materialize or delete the materialized view without also dropping the source, the upstream replication slot will linger and continue to accumulate data so that the source can resume in the future. To avoid unbounded disk space usage, make sure to use
DROP SOURCE or manually delete the replication slot (in case you have deleted the Materialize instance).
For PostgreSQL 13+, it is recommended that you set a reasonable value for
max_slot_wal_keep_size to limit the amount of storage used by replication slots.
Schema changes: Materialize does not support changes to schemas for existing publications, and will set the source into an error state if a breaking DDL change is detected upstream. To handle schema changes, you need to drop the existing sources and then recreate them after creating new publications for the updated schemas.
- Supported data types: Sources can only be created from publications that use data types supported by Materialize. Attempts to create sources from publications which contain unsupported data types will fail with an error.
- Truncation: Tables replicated into Materialize should not be truncated. If a table is truncated while replicated, the whole source becomes inaccessible and will not produce any data until it is recreated.