CREATE SOURCE: PostgreSQL (New Syntax)
To enable this feature in your Materialize region, contact our team.
Creates a new source from PostgreSQL. Materialize
supports creating sources from PostgreSQL version 11+. Once a new source is created, you can CREATE TABLE FROM SOURCE from the source
to create the corresponding tables in Materialize and start the data ingestion
process.
Prerequisites
To create a source from PostgreSQL 11+, you must first:
- Configure upstream PostgreSQL instance
- Set up logical replication.
- Create a publication.
- Create a replication user and password for Materialize to use to connect.
- Configure network security
- Ensure Materialize can connect to your PostgreSQL instance.
- Create a connection to PostgreSQL in Materialize
- The connection setup depends on the network security configuration.
For details, see the PostgreSQL integration guides.
Syntax
To create a source from an external PostgreSQL:
CREATE SOURCE [IF NOT EXISTS] <source_name>
[IN CLUSTER <cluster_name>]
FROM POSTGRES CONNECTION <connection_name> (PUBLICATION '<publication_name>')
;
| Parameter | Description |
|---|---|
| IF NOT EXISTS | Optional. If specified, do not throw an error if a source with the same name already exists. Instead, issue a notice and skip the source creation. |
<source_name>
|
The name of the source to create. Names for sources must follow the naming guidelines. |
IN CLUSTER <cluster_name>
|
Optional. The cluster to maintain this source. Otherwise, the source will be created in the active cluster.
💡 Tip: If possible, use a cluster dedicated just for sources. See also
Operational guidelines.
|
<connection_name>
|
The name of the PostgreSQL connection to use for the source. For details
on creating connections, check the A connection is reusable across multiple |
<publication_name>
|
The name of the PostgreSQL publication to associate with the source. For details on creating a publication in your PostgreSQL database, see the integration guides for your PostgreSQL. |
Details
Ingesting data
After a source is created, you can create tables from the source, referencing the tables in the publication, to start ingesting data. You can create multiple tables that reference the same table in the publication.
See CREATE TABLE FROM SOURCE for details.
Handling table schema changes
The use of the CREATE SOURCE with the new CREATE TABLE FROM SOURCE allows for the handling of upstream DDL changes without downtime.
See CREATE TABLE FROM SOURCE for details.
Supported types
With the new syntax, after a PostgreSQL source is created, you CREATE TABLE FROM SOURCE to create a corresponding table in
Matererialize and start ingesting data.
Materialize natively supports the following PostgreSQL types (including the array type for each of the types):
boolbpcharbyteachardatedaterangefloat4float8int2int2vectorint4int4rangeint8int8rangeintervaljsonjsonbnumericnumrangeoidtexttimetimestamptimestamptztsrangetstzrangeuuidvarchar
For more information, including strategies for handling unsupported types,
see CREATE TABLE FROM SOURCE.
Upstream table truncation restrictions
Upstream tables replicated into Materialize should not be truncated. If an
upstream table is truncated while replicated, the whole source becomes
inaccessible and will not produce any data until it is recreated. Instead of
truncating, you can use an unqualified DELETE to remove all rows from the
table:
DELETE FROM t;
For additional considerations, see also CREATE TABLE.
Publication membership
PostgreSQL’s logical replication API does not provide a signal when users remove tables from publications. Because of this, Materialize relies on periodic checks to determine if a table has been removed from a publication, at which time it generates an irrevocable error, preventing any values from being read from the table.
However, it is possible to remove a table from a publication and then re-add it before Materialize notices that the table was removed. In this case, Materialize can no longer provide any consistency guarantees about the data we present from the table and, unfortunately, is wholly unaware that this occurred.
To mitigate this issue, if you need to drop and re-add a table to a publication,
ensure that you remove the table/subsource from the source before re-adding it
using the DROP SOURCE command.
PostgreSQL replication slots
When you define a source, Materialize will automatically create a replication slot in the upstream PostgreSQL database (see PostgreSQL replication slots). Each source ingests the raw replication stream data for all tables in the specified publication using a single replication slot. This allows you to minimize the performance impact on the upstream database as well as reuse the same source across multiple materializations.
The name of the replication slot created by Materialize is prefixed with
materialize_. In Materialize, you can query the
mz_internal.mz_postgres_sources to find the replication slots created:
SELECT id, replication_slot FROM mz_internal.mz_postgres_sources;
id | replication_slot
---------+----------------------------------------------
u8 | materialize_7f8a72d0bf2a4b6e9ebc4e61ba769b71
- For PostgreSQL 13+, set a reasonable value for
max_slot_wal_keep_sizeto limit the amount of storage used by replication slots.
-
If you stop using Materialize, or if either the Materialize instance or the PostgreSQL instance crash, delete any replication slots. You can query the
mz_internal.mz_postgres_sourcestable to look up the name of the replication slot created for each source. -
If you delete all objects that depend on a source without also dropping the source, the upstream replication slot remains and will continue to accumulate data so that the source can resume in the future. To avoid unbounded disk space usage, make sure to use
DROP SOURCEor manually delete the replication slot.
Examples
Prerequisites
To create a source from PostgreSQL 11+, you must first:
- Configure upstream PostgreSQL instance
- Set up logical replication.
- Create a publication.
- Create a replication user and password for Materialize to use to connect.
- Configure network security
- Ensure Materialize can connect to your PostgreSQL instance.
- Create a connection to PostgreSQL in Materialize
- The connection setup depends on the network security configuration.
For details, see the PostgreSQL integration guides.
Create a source
Once you have configured the upstream PostgreSQL, network security, and
created the connection, you can create the source. In this example, the
PostgreSQL publication is mz_source and the connection to PostgreSQL is
pg_connection.
/* This example assumes:
- In the upstream PostgreSQL, you have defined:
- replication user and password with the appropriate access.
- a publication named `mz_source` for the `public.items` and `public.orders` tables.
- In Materialize:
- You have created a secret for the PostgreSQL password.
- You have defined the connection to the upstream PostgreSQL.
- You have used the connection to create a source.
For example (substitute with your configuration):
CREATE SECRET pgpass AS '<replication user password>'; -- substitute
CREATE CONNECTION pg_connection TO POSTGRES (
HOST '<hostname>', -- substitute
DATABASE <db>, -- substitute
USER <replication user>, -- substitute
PASSWORD SECRET pgpass
-- [, <network security configuration> ]
);
*/
CREATE SOURCE pg_source
FROM POSTGRES CONNECTION pg_connection (
PUBLICATION 'mz_source'
);
After a source is created, you can create tables from the source,
referencing specific table(s). For example, the following creates a table in Materialize from the upstream table public.items.
CREATE TABLE items
FROM SOURCE pg_source(REFERENCE public.items)
;
- Although the example creates the table with the same name as the upstream table, the table in Materialize can have a different name.
- You can create multiple tables that reference the same upstream table.
For more information, see CREATE TABLE.