PostgreSQL
Change Data Capture (CDC)
Materialize supports PostgreSQL as a real-time data source. The PostgreSQL source uses PostgreSQL’s replication protocol to continually ingest changes resulting from CRUD operations in the upstream database. The native support for PostgreSQL Change Data Capture (CDC) in Materialize gives you the following benefits:
-
No additional infrastructure: Ingest PostgreSQL change data into Materialize in real-time with no architectural changes or additional operational overhead. In particular, you do not need to deploy Kafka and Debezium for PostgreSQL CDC.
-
Transactional consistency: The PostgreSQL source ensures that transactions in the upstream PostgreSQL database are respected downstream. Materialize will never show partial results based on partially replicated transactions.
-
Incrementally updated materialized views: Materialized views in PostgreSQL are computationally expensive and require manual refreshes. You can use Materialize as a read-replica to build views on top of your PostgreSQL data that are efficiently maintained and always up-to-date.
Supported versions and services
The PostgreSQL source requires PostgreSQL 11+ and is compatible with most common PostgreSQL hosted services.
Integration guides |
---|
If there is a hosted service or PostgreSQL distribution that is not listed above but you would like to use with Materialize, please submit a feature request or reach out in the Materialize Community Slack.
Considerations
Schema changes
Materialize supports schema changes in the upstream database as follows:
Compatible schema changes
-
Adding columns to tables. Materialize will not ingest new columns added upstream unless you use
DROP SOURCE
to first drop the affected subsource, and then add the table back to the source usingALTER SOURCE...ADD SUBSOURCE
. -
Dropping columns that were added after the source was created. These columns are never ingested, so you can drop them without issue.
-
Adding or removing
NOT NULL
constraints to tables that were nullable when the source was created.
Incompatible schema changes
All other schema changes to upstream tables will set the corresponding subsource into an error state, which prevents you from reading from the source.
To handle incompatible schema changes, use DROP SOURCE
and ALTER SOURCE...ADD SUBSOURCE
to first drop the
affected subsource, and then add the table back to the source. When you add the
subsource, it will have the updated schema from the corresponding upstream
table.
Publication membership
PostgreSQL’s logical replication API does not provide a signal when users remove tables from publications. Because of this, Materialize relies on periodic checks to determine if a table has been removed from a publication, at which time it generates an irrevocable error, preventing any values from being read from the table.
However, it is possible to remove a table from a publication and then re-add it before Materialize notices that the table was removed. In this case, Materialize can no longer provide any consistency guarantees about the data we present from the table and, unfortunately, is wholly unaware that this occurred.
To mitigate this issue, if you need to drop and re-add a table to a publication,
ensure that you remove the table/subsource from the source before re-adding it
using the DROP SOURCE
command.
Supported types
Materialize natively supports the following PostgreSQL types (including the array type for each of the types):
bool
bpchar
bytea
char
date
daterange
float4
float8
int2
int2vector
int4
int4range
int8
int8range
interval
json
jsonb
numeric
numrange
oid
text
time
timestamp
timestamptz
tsrange
tstzrange
uuid
varchar
Replicating tables that contain unsupported data types is
possible via the TEXT COLUMNS
option. The specified columns will be treated
as text
, and will thus not offer the expected PostgreSQL type features. For
example:
-
enum
: the implicit ordering of the original PostgreSQLenum
type is not preserved, as Materialize will sort values astext
. -
money
: the resultingtext
value cannot be cast back to e.g.numeric
, since PostgreSQL adds typical currency formatting to the output.
Truncation
Upstream tables replicated into Materialize should not be truncated. If an
upstream table is truncated while replicated, the whole source becomes
inaccessible and will not produce any data until it is recreated. Instead of
truncating, you can use an unqualified DELETE
to remove all rows from the
table:
DELETE FROM t;
Inherited tables
When using PostgreSQL table inheritance,
PostgreSQL serves data from SELECT
s as if the inheriting tables’ data is also
present in the inherited table. However, both PostgreSQL’s logical replication
and COPY
only present data written to the tables themselves, i.e. the
inheriting data is not treated as part of the inherited table.
PostgreSQL sources use logical replication and COPY
to ingest table data, so
inheriting tables’ data will only be ingested as part of the inheriting table,
i.e. in Materialize, the data will not be returned when serving SELECT
s from
the inherited table.
You can mimic PostgreSQL’s SELECT
behavior with inherited tables by creating a
materialized view that unions data from the inherited and inheriting tables
(using UNION ALL
). However, if new tables inherit from the table, data from
the inheriting tables will not be available in the view. You will need to add
the inheriting tables via ADD SUBSOURCE
and create a new view (materialized or
non-) that unions the new table.