CREATE SINK
A sink describes an external system you want Materialize to write data to, and provides details about how to encode that data. To create a sink, you must specify a connector, a format and an envelope.
Connectors
Materialize bundles native sink connectors for the following external systems:
For details on the syntax, supported formats and features of each connector,
check out the dedicated CREATE SINK
documentation pages.
Formats
To write to an external data sink, you must specify the format Materialize
should use to encode ouput data. This is handled by specifying a FORMAT
in
the CREATE SINK
statement.
Avro
Syntax: FORMAT AVRO
Materialize can encode output data as Avro messages, and automatically publish a schema to a schema registry based on the columns and data types in the source, table or materialized view you want to send to the sink.
JSON
Syntax: FORMAT JSON
Materialize can encode output data as JSON messages. Publishing schemas to a schema registry is not supported yet for JSON-formatted sinks (#7186).
Envelopes
In addition to determining how to encode output data, Materialize also needs to
understand the desired behaviour of sinked events in the downstream external
system. Whether a change is simply emitted as an event with a diff structure,
or actively inserts, updates, or deletes existing data downstream depends on
the ENVELOPE
specified in the CREATE SINK
statement.
Upsert envelope
Syntax: ENVELOPE UPSERT
The upsert envelope treats all records as having a key and a value, and supports inserts, updates and deletes in the sink:
-
If the key does not match a preexisting record downstream, it inserts the record’s key and value.
-
If the key matches a preexisting record downstream and the value is non-null, Materialize updates the existing record with the new value.
-
If the key matches a preexisting record downstream and the value is null, Materialize deletes the record.
Debezium envelope
Syntax: ENVELOPE DEBEZIUM
Materialize provides a dedicated envelope that describes the decoded records' old and new values with a Debezium-like diff structure, representing inserts, updates, or deletes to the underlying source, table or materialized view being written to the sink:
-
If the
before
field is null, the record represents an insert. -
If the
before
andafter
fields are non-null, the record represents an update. -
If the
after
field is null, the record represents a delete.
Best practices
Sizing a sink
Some sinks require relatively few resources to handle data ingestion, while
others are high traffic and require hefty resource allocations. You choose the
amount of CPU and memory available to a sink using the SIZE
option, and
adjust the provisioned size after sink creation using the ALTER SINK
command.
Sinks that specify the SIZE
option are linked to a single-purpose cluster
dedicated to maintaining that sink.
You can also choose to place a sink in an existing
cluster by using the IN CLUSTER
option.
Sinks in a cluster share the resource allocation of the cluster with all other
objects in the cluster.
Colocating multiple sinks onto the same cluster can be more resource efficient when you have many low-traffic sinks that occasionally need some burst capacity.