CREATE SOURCE: PostgreSQL
New in v0.8.0.
CREATE SOURCE connects Materialize to an external data source and lets you interact
with its data as if the data were in a SQL table.
This document details how to connect Materialize to a Postgres database for Postgres versions 10 and higher. Before you create the source in Materialize, you must perform some prerequisite steps in Postgres.
Sources represent connections to resources outside Materialize that it can read data from. For more information, see API Components: Sources.
|MATERIALIZED||Materializes the source’s data, which retains all data in memory and makes sources directly selectable. Currently required for all Postgres sources. For more information, see Materialized source details.|
|src_name||The name for the source.|
|IF NOT EXISTS||Do nothing (except issuing a notice) if a source with the same name already exists. Default.|
|CONNECTION connection_info||Postgres connection parameters. See the Postgres documentation on supported correction parameters for details.|
|PUBLICATION publication_name||Postgres publication (the replication data set containing the tables to be streamed to Materialize).|
Materialized source details
Materializing a source keeps data it receives in an in-memory
index, the presence of which makes the
source directly queryable. In contrast, non-materialized sources cannot process
queries directly; to access the data the source receives, you need to create
materialized views that
SELECT from the
For a mental model, materializing the source is approximately equivalent to creating a non-materialized source, and then creating a materialized view from all of the source’s columns:
CREATE SOURCE src ...; CREATE MATERIALIZED VIEW src_view AS SELECT * FROM src;
The actual implementation of materialized sources differs, though, by letting you refer to the source’s name directly in queries.
For more details about the impact of materializing sources (and implicitly
creating an index), see
CREATE INDEX: Details — Memory
PostgreSQL source details
Materialize makes use of PostgreSQL’s native replication capabilities to create a continuously updated replica of the desired Postgres tables.
Before creating the source in Materialize, you must:
- Set up your Postgres database to allow logical replication.
- Ensure that the user for your Materialize connection has
- Create a Postgres publication, or replication data set, containing the tables to be streamed to Materialize. Since Postgres sources are materialized (kept in memory) in their entirety, we strongly recommend that you limit publications only to the data you need to query.
Once you create a materialized source from the publication, the source will contain the raw data stream of replication updates. You can then break the stream out into views that represent the publication’s original tables with
CREATE VIEWS. You can treat these tables as you would any other source and create other views or materialized views from them.
CREATE VIEWS will attempt to create each upstream table in the same schema as Postgres. For example, if the publication contains tables
CREATE VIEWS is the equivalent of
CREATE VIEW "public"."foo"; CREATE VIEW "otherschema"."foo";
Therefore, in order for
CREATE VIEWS to succeed, all upstream schemas included in the publication must exist in Materialize as well, or you must explicitly specify the downstream schemas and rename the resulting tables. For example:
CREATE VIEWS FROM "mz_source" ("public"."foo" AS "foo", "otherschema"."foo" AS "foo2");
Postgres replication slots
If you stop or delete Materialize without first dropping the Postgres source, the Postgres replication slot isn’t deleted and will continue to accumulate data. In such cases, you should manually delete the Materialize replication slot to recover memory and avoid degraded performance in the upstream database. Materialize replication slot names always begin with
materialize_ for easy identification.
Restrictions on Postgres sources
Materialize does not support changes to schemas for existing publications. You will need to drop the existing sources and then recreate them after creating new publications for the updated schemas.
Sources can only be created from publications that use data types supported by Materialize. Attempts to create sources from publications which contain unsupported data types will fail with an error.
Materialize does not support TOASTED values except in append-only tables. Practically speaking, you can include rows with TOASTED values as long as they are never updated or deleted, or you can disable TOAST on the original Postgres table. If a TOASTED column is updated, the source will enter an errored state that renders all per-table views inaccessible.
To disable TOAST on a column, use the following command in Postgres:
ALTER TABLE table_name ALTER COLUMN column_name SET STORAGE PLAIN;
Tables replicated into Materialize should not be truncated. If a table is truncated while replicated, the whole source becomes inaccessible and will not produce any data until it is re-created.
Since Postgres sources are materialized by default, Postgres table sources must be smaller than the available memory.
Supported Postgres versions
Postgres sources in Materialize require that upstream Postgres instances be version 10 or greater.
Setting up PostgreSQL
Before you create a Postgres source in Materialize, you must complete the following prerequisite steps in Postgres.
Ensure the database configuration allows logical replication. For most configurations, it should suffice to set
wal_level = logicalin
postgresql.conf, but additional steps may be required for Amazon RDS. See the Amazon Relational Database Service Documentation for details.
Assign the user
ALTER ROLE "user" WITH REPLICATION;
Set replica identity to full for all the tables that you wish to replicate:
ALTER TABLE foo REPLICA IDENTITY FULL;
Create a publication containing all the tables you wish to query in Materialize:
For all tables in Postgres:
CREATE PUBLICATION mz_source FOR ALL TABLES;
For specific tables:
CREATE PUBLICATION mz_source FOR TABLE table1, table2;
Creating a source
Once you have set up Postgres, you can create the source in Materialize.
CREATE MATERIALIZED SOURCE "mz_source" FROM POSTGRES CONNECTION 'host=postgres port=5432 user=host sslmode=require dbname=postgres' PUBLICATION 'mz_source';
This creates a source that…
- Connects to a Postgres server
- Contains raw data from all of the tables that went into the publication
- Needs to broken out into more usable views that reproduce the original Postgres tables
Once you have created the Postgres source, you need to create views that represent the upstream publication’s original tables.
Create views for all tables included in the Postgres publication
CREATE VIEWS FROM SOURCE "mz_source"; SHOW FULL VIEWS;
Create views for specific upstream tables
CREATE VIEWS FROM SOURCE "mz_source" ("a", "b"); SHOW FULL VIEWS;