CREATE SOURCE: SQL Server
To enable this feature in your Materialize region, contact our team.
CREATE SOURCE
connects Materialize to an external system you want to read data from, and provides details about how to decode and interpret that data.
Materialize supports SQL Server (2016+) as a real-time data source. To connect to a
SQL Server database, you first need to tweak its configuration to enable Change Data
Capture
and SNAPSHOT
transaction isolation
for the database that you would like to replicate. Then create a connection
in Materialize that specifies access and authentication parameters.
Syntax
with_options
Field | Use |
---|---|
src_name | The name for the source. |
IF NOT EXISTS | Do nothing (except issuing a notice) if a source with the same name already exists. Default. |
IN CLUSTER cluster_name | The cluster to maintain this source. |
CONNECTION connection_name | The name of the SQL Server connection to use in the source. For details on creating connections, check the CREATE CONNECTION documentation page. |
FOR ALL TABLES | Create subsources for all tables with CDC enabled in all schemas upstream. |
FOR TABLES ( table_list ) | Create subsources for specific tables upstream. Requires fully-qualified table names (<schema>.<table> ). |
RETAIN HISTORY FOR retention_period |
Private preview. This option has known performance or stability issues and is under active development. Duration for which Materialize retains historical data, which is useful to implement durable subscriptions. Accepts positive interval values (e.g. '1hr' ). Default: 1s . |
Creating a source
Materialize ingests the CDC stream for all (or a specific set of) tables in your upstream SQL Server database that have Change Data Capture enabled.
CREATE SOURCE mz_source
FROM SQL SERVER CONNECTION sql_server_connection
FOR ALL TABLES;
When you define a source, Materialize will automatically:
-
Create a subsource for each capture instance upstream, and perform an initial, snapshot-based sync of the associated tables before it starts ingesting change events.
SHOW SOURCES;
name | type | cluster | ----------------------+------------+------------ mz_source | sql-server | mz_source_progress | progress | table_1 | subsource | table_2 | subsource |
-
Incrementally update any materialized or indexed views that depend on the source as change events stream in, as a result of
INSERT
,UPDATE
andDELETE
operations in the upstream SQL Server database.
It’s important to note that the schema metadata is captured when the source is initially created, and is validated against the upstream schema upon restart. If you create new tables upstream after creating a SQL Server source and want to replicate them to Materialize, the source must be dropped and recreated.
SQL Server schemas
CREATE SOURCE
will attempt to create each upstream table in the same schema as
the source. This may lead to naming collisions if, for example, you are
replicating schema1.table_1
and schema2.table_1
. Use the FOR TABLES
clause to provide aliases for each upstream table, in such cases, or to specify
an alternative destination schema in Materialize.
CREATE SOURCE mz_source
FROM SQL SERVER CONNECTION sql_server_connection
FOR TABLES (schema1.table_1 AS s1_table_1, schema2.table_1 AS s2_table_1);
Monitoring source progress
By default, SQL Server sources expose progress metadata as a subsource that you
can use to monitor source ingestion progress. The name of the progress
subsource can be specified when creating a source using the EXPOSE PROGRESS AS
clause; otherwise, it will be named <src_name>_progress
.
The following metadata is available for each source as a progress subsource:
Field | Type | Details |
---|---|---|
lsn |
bytea |
The upper-bound Log Sequence Number replicated thus far into Materialize. |
And can be queried using:
SELECT lsn
FROM <src_name>_progress;
The reported lsn
should increase as Materialize consumes new CDC events
from the upstream SQL Server database. For more details on monitoring source
ingestion progress and debugging related issues, see Troubleshooting.
Known limitations
Schema changes
Materialize supports schema changes in the upstream database as follows:
Compatible schema changes
-
Adding columns to tables. Materialize will not ingest new columns added upstream unless you use
DROP SOURCE
to first drop the affected subsource, and then add the table back to the source usingALTER SOURCE...ADD SUBSOURCE
. -
Dropping columns that were added after the source was created. These columns are never ingested, so you can drop them without issue.
-
Adding or removing
NOT NULL
constraints to tables that were nullable when the source was created.
Incompatible schema changes
All other schema changes to upstream tables will set the corresponding subsource into an error state, which prevents you from reading from the source.
To handle incompatible schema changes, use DROP SOURCE
and ALTER SOURCE...ADD SUBSOURCE
to first drop the
affected subsource, and then add the table back to the source. When you add the
subsource, it will have the updated schema from the corresponding upstream
table.
Supported types
Materialize natively supports the following SQL Server types:
tinyint
smallint
int
bigint
real
double
bit
decimal
numeric
money
smallmoney
char
nchar
varchar
nvarchar
sysname
binary
varbinary
json
date
time
smalldatetime
datetime
datetime2
datetimeoffset
uniqueidentifier
Replicating tables that contain unsupported data types is possible via the EXCLUDE COLUMNS
option for the
following types:
text
ntext
image
varchar(max)
nvarchar(max)
varbinary(max)
Columns with the specified types need to be excluded because SQL Server does not provide the “before” value when said column is updated.
Timestamp Rounding
The time
, datetime2
, and datetimeoffset
types in SQL Server have a default
scale of 7 decimal places, or in other words a accuracy of 100 nanoseconds. But
the corresponding types in Materialize only support a scale of 6 decimal places.
If a column in SQL Server has a higher scale than what Materialize can support, it
will be rounded up to the largest scale possible.
-- In SQL Server
CREATE TABLE my_timestamps (a datetime2(7));
INSERT INTO my_timestamps VALUES
('2000-12-31 23:59:59.99999'),
('2000-12-31 23:59:59.999999'),
('2000-12-31 23:59:59.9999999');
-- Replicated into Materialize
SELECT * FROM my_timestamps;
'2000-12-31 23:59:59.999990'
'2000-12-31 23:59:59.999999'
'2001-01-01 00:00:00'
Examples
SNAPSHOT
transaction isolation in the upstream database.
Creating a connection
A connection describes how to connect and authenticate to an external system you want Materialize to read data from.
Once created, a connection is reusable across multiple CREATE SOURCE
statements. For more details on creating connections, check the
CREATE CONNECTION
documentation page.
CREATE SECRET sqlserver_pass AS '<SQL_SERVER_PASSWORD>';
CREATE CONNECTION sqlserver_connection TO SQL SERVER (
HOST 'instance.foo000.us-west-1.rds.amazonaws.com',
PORT 1433,
USER 'materialize',
PASSWORD SECRET sqlserver_pass
);
If your SQL Server instance is not exposed to the public internet, you can tunnel the connection through and SSH bastion host.
CREATE CONNECTION ssh_connection TO SSH TUNNEL (
HOST 'bastion-host',
PORT 22,
USER 'materialize'
);
CREATE CONNECTION sqlserver_connection TO SQL SERVER (
HOST 'instance.foo000.us-west-1.rds.amazonaws.com',
SSH TUNNEL ssh_connection
);
For step-by-step instructions on creating SSH tunnel connections and configuring an SSH bastion server to accept connections from Materialize, check this guide.
Creating a source
Once Change Data Capture and SNAPSHOT
transaction isolation are enabled for
the database in your SQL Server instance, you also need to enable Change Data
capture for the specific tables you want to replicate.
-- In SQL Server.
EXEC sys.sp_cdc_enable_table
@source_schema = '<SCHEMA_NAME>',
@source_name = '<TABLE_NAME>',
@role_name = '<ROLE_FROM_MZ_CONNECTION>',
@supports_net_changes = 0;
Once CDC is enabled for all of the relevant tables, you can create a SOURCE
in
Materialize to begin replicating data!
Create subsources for all tables in SQL Server
CREATE SOURCE mz_source
FROM SQL SERVER CONNECTION sqlserver_connection
FOR ALL TABLES;
Create subsources for specific tables in SQL Server
CREATE SOURCE mz_source
FROM SQL SERVER CONNECTION sqlserver_connection
FOR TABLES (mydb.table_1, mydb.table_2 AS alias_table_2);
Handling unsupported types
If you’re replicating tables that use data types unsupported
by SQL Server’s CDC feature, use the EXCLUDE COLUMNS
option to exclude them from
replication. This option expects the upstream fully-qualified names of the
replicated table and column (i.e. as defined in your SQL Server database).
CREATE SOURCE mz_source
FROM SQL SERVER CONNECTION sqlserver_connection (
EXCLUDE COLUMNS (mydb.table_1.column_of_unsupported_type)
)
FOR ALL TABLES;
Handling errors and schema changes
To handle upstream schema changes or errored subsources, use
the DROP SOURCE
syntax to drop the affected
subsource, and then ALTER SOURCE...ADD SUBSOURCE
to add
the subsource back to the source.
-- List all subsources in mz_source
SHOW SUBSOURCES ON mz_source;
-- Get rid of an outdated or errored subsource
DROP SOURCE table_1;
-- Start ingesting the table with the updated schema or fix
ALTER SOURCE mz_source ADD SUBSOURCE table_1;