CREATE SOURCE: PostgreSQL (New Syntax)

PREVIEW This feature is in private preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.
To enable this feature in your Materialize region, contact our team.
Disambiguation
This page reflects the new syntax which allows Materialize to handle upstream DDL changes, specifically adding or dropping columns, without downtime. For the deprecated syntax, see the old reference page.

Creates a new source from PostgreSQL. Materialize supports creating sources from PostgreSQL version 11+. Once a new source is created, you can CREATE TABLE FROM SOURCE from the source to create the corresponding tables in Materialize and start the data ingestion process.

Prerequisites

To create a source from PostgreSQL 11+, you must first:

  • Configure upstream PostgreSQL instance
    • Set up logical replication.
    • Create a publication.
    • Create a replication user and password for Materialize to use to connect.
  • Configure network security
    • Ensure Materialize can connect to your PostgreSQL instance.
  • Create a connection to PostgreSQL in Materialize
    • The connection setup depends on the network security configuration.

For details, see the PostgreSQL integration guides.

Syntax

To create a source from an external PostgreSQL:

CREATE SOURCE [IF NOT EXISTS] <source_name>
[IN CLUSTER <cluster_name>]
FROM POSTGRES CONNECTION <connection_name> (PUBLICATION '<publication_name>')
;
Parameter Description
IF NOT EXISTS Optional. If specified, do not throw an error if a source with the same name already exists. Instead, issue a notice and skip the source creation.
<source_name> The name of the source to create. Names for sources must follow the naming guidelines.
IN CLUSTER <cluster_name>

Optional. The cluster to maintain this source. Otherwise, the source will be created in the active cluster.

💡 Tip: If possible, use a cluster dedicated just for sources. See also Operational guidelines.
<connection_name>

The name of the PostgreSQL connection to use for the source. For details on creating connections, check the CREATE CONNECTION documentation page.

A connection is reusable across multiple CREATE SOURCE statements.

<publication_name> The name of the PostgreSQL publication to associate with the source. For details on creating a publication in your PostgreSQL database, see the integration guides for your PostgreSQL.

Details

Ingesting data

After a source is created, you can create tables from the source, referencing the tables in the publication, to start ingesting data. You can create multiple tables that reference the same table in the publication.

See CREATE TABLE FROM SOURCE for details.

Handling table schema changes

The use of the CREATE SOURCE with the new CREATE TABLE FROM SOURCE allows for the handling of upstream DDL changes without downtime.

See CREATE TABLE FROM SOURCE for details.

Supported types

With the new syntax, after a PostgreSQL source is created, you CREATE TABLE FROM SOURCE to create a corresponding table in Matererialize and start ingesting data.

Materialize natively supports the following PostgreSQL types (including the array type for each of the types):

  • bool
  • bpchar
  • bytea
  • char
  • date
  • daterange
  • float4
  • float8
  • int2
  • int2vector
  • int4
  • int4range
  • int8
  • int8range
  • interval
  • json
  • jsonb
  • numeric
  • numrange
  • oid
  • text
  • time
  • timestamp
  • timestamptz
  • tsrange
  • tstzrange
  • uuid
  • varchar

For more information, including strategies for handling unsupported types, see CREATE TABLE FROM SOURCE.

Upstream table truncation restrictions

Upstream tables replicated into Materialize should not be truncated. If an upstream table is truncated while replicated, the whole source becomes inaccessible and will not produce any data until it is recreated. Instead of truncating, you can use an unqualified DELETE to remove all rows from the table:

DELETE FROM t;

For additional considerations, see also CREATE TABLE.

Publication membership

PostgreSQL’s logical replication API does not provide a signal when users remove tables from publications. Because of this, Materialize relies on periodic checks to determine if a table has been removed from a publication, at which time it generates an irrevocable error, preventing any values from being read from the table.

However, it is possible to remove a table from a publication and then re-add it before Materialize notices that the table was removed. In this case, Materialize can no longer provide any consistency guarantees about the data we present from the table and, unfortunately, is wholly unaware that this occurred.

To mitigate this issue, if you need to drop and re-add a table to a publication, ensure that you remove the table/subsource from the source before re-adding it using the DROP SOURCE command.

PostgreSQL replication slots

When you define a source, Materialize will automatically create a replication slot in the upstream PostgreSQL database (see PostgreSQL replication slots). Each source ingests the raw replication stream data for all tables in the specified publication using a single replication slot. This allows you to minimize the performance impact on the upstream database as well as reuse the same source across multiple materializations.

The name of the replication slot created by Materialize is prefixed with materialize_. In Materialize, you can query the mz_internal.mz_postgres_sources to find the replication slots created:

SELECT id, replication_slot FROM mz_internal.mz_postgres_sources;
    id   |             replication_slot
---------+----------------------------------------------
  u8     | materialize_7f8a72d0bf2a4b6e9ebc4e61ba769b71
💡 Tip:
  • For PostgreSQL 13+, set a reasonable value for max_slot_wal_keep_size to limit the amount of storage used by replication slots.
  • If you stop using Materialize, or if either the Materialize instance or the PostgreSQL instance crash, delete any replication slots. You can query the mz_internal.mz_postgres_sources table to look up the name of the replication slot created for each source.

  • If you delete all objects that depend on a source without also dropping the source, the upstream replication slot remains and will continue to accumulate data so that the source can resume in the future. To avoid unbounded disk space usage, make sure to use DROP SOURCE or manually delete the replication slot.

Examples

Prerequisites

To create a source from PostgreSQL 11+, you must first:

  • Configure upstream PostgreSQL instance
    • Set up logical replication.
    • Create a publication.
    • Create a replication user and password for Materialize to use to connect.
  • Configure network security
    • Ensure Materialize can connect to your PostgreSQL instance.
  • Create a connection to PostgreSQL in Materialize
    • The connection setup depends on the network security configuration.

For details, see the PostgreSQL integration guides.

Create a source

Once you have configured the upstream PostgreSQL, network security, and created the connection, you can create the source. In this example, the PostgreSQL publication is mz_source and the connection to PostgreSQL is pg_connection.

/* This example assumes:
- In the upstream PostgreSQL, you have defined:
  - replication user and password with the appropriate access.
  - a publication named `mz_source` for the `public.items` and `public.orders` tables.
- In Materialize:
  - You have created a secret for the PostgreSQL password.
  - You have defined the connection to the upstream PostgreSQL.
  - You have used the connection to create a source.

  For example (substitute with your configuration):
    CREATE SECRET pgpass AS '<replication user password>'; -- substitute
    CREATE CONNECTION pg_connection TO POSTGRES (
      HOST '<hostname>',          -- substitute
      DATABASE <db>,              -- substitute
      USER <replication user>,    -- substitute
      PASSWORD SECRET pgpass
      -- [, <network security configuration> ]
    );
*/

CREATE SOURCE pg_source
FROM POSTGRES CONNECTION pg_connection (
  PUBLICATION 'mz_source'
);

After a source is created, you can create tables from the source, referencing specific table(s). For example, the following creates a table in Materialize from the upstream table public.items.

CREATE TABLE items
FROM SOURCE pg_source(REFERENCE public.items)
;
NOTE:
  • Although the example creates the table with the same name as the upstream table, the table in Materialize can have a different name.
  • You can create multiple tables that reference the same upstream table.

For more information, see CREATE TABLE.

Back to top ↑