PostgreSQL

Change Data Capture (CDC)

Materialize supports PostgreSQL as a real-time data source. The PostgreSQL source uses PostgreSQL’s replication protocol to continually ingest changes resulting from CRUD operations in the upstream database. The native support for PostgreSQL Change Data Capture (CDC) in Materialize gives you the following benefits:

  • No additional infrastructure: Ingest PostgreSQL change data into Materialize in real-time with no architectural changes or additional operational overhead. In particular, you do not need to deploy Kafka and Debezium for PostgreSQL CDC.

  • Transactional consistency: The PostgreSQL source ensures that transactions in the upstream PostgreSQL database are respected downstream. Materialize will never show partial results based on partially replicated transactions.

  • Incrementally updated materialized views: Materialized views in PostgreSQL are computationally expensive and require manual refreshes. You can use Materialize as a read-replica to build views on top of your PostgreSQL data that are efficiently maintained and always up-to-date.

Supported versions and services

NOTE: PostgreSQL-compatible database systems are not guaranteed to work with the PostgreSQL source out-of-the-box. Yugabyte is currently supported with limitations.

The PostgreSQL source requires PostgreSQL 11+ and is compatible with most common PostgreSQL hosted services.

Integration guides

If there is a hosted service or PostgreSQL distribution that is not listed above but you would like to use with Materialize, please submit a feature request or reach out in the Materialize Community Slack.

Considerations

Schema changes

NOTE: Work to more smoothly support ddl changes to upstream tables is currently in progress. The work introduces the ability to re-ingest the same upstream table under a new schema and switch over without downtime.

Materialize supports schema changes in the upstream database as follows:

Compatible schema changes

  • Adding columns to tables. Materialize will not ingest new columns added upstream unless you use DROP SOURCE to first drop the affected subsource, and then add the table back to the source using ALTER SOURCE...ADD SUBSOURCE.

  • Dropping columns that were added after the source was created. These columns are never ingested, so you can drop them without issue.

  • Adding or removing NOT NULL constraints to tables that were nullable when the source was created.

Incompatible schema changes

All other schema changes to upstream tables will set the corresponding subsource into an error state, which prevents you from reading from the source.

To handle incompatible schema changes, use DROP SOURCE and ALTER SOURCE...ADD SUBSOURCE to first drop the affected subsource, and then add the table back to the source. When you add the subsource, it will have the updated schema from the corresponding upstream table.

Publication membership

PostgreSQL’s logical replication API does not provide a signal when users remove tables from publications. Because of this, Materialize relies on periodic checks to determine if a table has been removed from a publication, at which time it generates an irrevocable error, preventing any values from being read from the table.

However, it is possible to remove a table from a publication and then re-add it before Materialize notices that the table was removed. In this case, Materialize can no longer provide any consistency guarantees about the data we present from the table and, unfortunately, is wholly unaware that this occurred.

To mitigate this issue, if you need to drop and re-add a table to a publication, ensure that you remove the table/subsource from the source before re-adding it using the DROP SOURCE command.

Supported types

Materialize natively supports the following PostgreSQL types (including the array type for each of the types):

  • bool
  • bpchar
  • bytea
  • char
  • date
  • daterange
  • float4
  • float8
  • int2
  • int2vector
  • int4
  • int4range
  • int8
  • int8range
  • interval
  • json
  • jsonb
  • numeric
  • numrange
  • oid
  • text
  • time
  • timestamp
  • timestamptz
  • tsrange
  • tstzrange
  • uuid
  • varchar

Replicating tables that contain unsupported data types is possible via the TEXT COLUMNS option. The specified columns will be treated as text, and will thus not offer the expected PostgreSQL type features. For example:

  • enum: the implicit ordering of the original PostgreSQL enum type is not preserved, as Materialize will sort values as text.

  • money: the resulting text value cannot be cast back to e.g. numeric, since PostgreSQL adds typical currency formatting to the output.

Truncation

Upstream tables replicated into Materialize should not be truncated. If an upstream table is truncated while replicated, the whole source becomes inaccessible and will not produce any data until it is recreated. Instead of truncating, you can use an unqualified DELETE to remove all rows from the table:

DELETE FROM t;

Inherited tables

When using PostgreSQL table inheritance, PostgreSQL serves data from SELECTs as if the inheriting tables’ data is also present in the inherited table. However, both PostgreSQL’s logical replication and COPY only present data written to the tables themselves, i.e. the inheriting data is not treated as part of the inherited table.

PostgreSQL sources use logical replication and COPY to ingest table data, so inheriting tables’ data will only be ingested as part of the inheriting table, i.e. in Materialize, the data will not be returned when serving SELECTs from the inherited table.

You can mimic PostgreSQL’s SELECT behavior with inherited tables by creating a materialized view that unions data from the inherited and inheriting tables (using UNION ALL). However, if new tables inherit from the table, data from the inheriting tables will not be available in the view. You will need to add the inheriting tables via ADD SUBSOURCE and create a new view (materialized or non-) that unions the new table.

Back to top ↑