CREATE SOURCE: MySQL (New Syntax)
View as MarkdownCreates a new source from MySQL. Materialize
supports creating sources from MySQL version 8.0.1+. Once a new source is created, you can CREATE TABLE FROM SOURCE
to create the corresponding tables in Materialize and start the data ingestion
process.
Prerequisites
To create a source from MySQL(8.0.1+), you must first:
- Configure upstream MySQL instance
- Enable GTID-based binary log(binlog)
replication. You must set
binlog_row_metadata=FULLto use the newCREATE SOURCEsyntax. - Create a replication user and password for Materialize to use to connect.
- Enable GTID-based binary log(binlog)
replication. You must set
- Configure network security
- Ensure Materialize can connect to your MySQL instance.
- Create a connection to MySQL in Materialize
- The connection setup depends on the network security configuration.
Syntax
To create a source from an external MySQL database:
CREATE SOURCE [IF NOT EXISTS] <source_name>
[IN CLUSTER <cluster_name>]
FROM MYSQL CONNECTION <connection_name>
;
| Syntax element | Description |
|---|---|
| IF NOT EXISTS | Optional. If specified, do not throw an error if a source with the same name already exists. Instead, issue a notice and skip the source creation. |
<source_name>
|
The name of the source to create. Names for sources must follow the naming guidelines. |
IN CLUSTER <cluster_name>
|
Optional. The cluster to maintain this source. Otherwise, the source will be created in the active cluster.
💡 Tip: If possible, use a cluster dedicated just for sources. See also
Operational guidelines.
|
<connection_name>
|
The name of the MySQL connection to use for the source. For details
on creating connections, see A connection is reusable across multiple To start ingesting data, create a |
Ingesting data
After a source is created, you can create tables from the source referencing
upstream MySQL tables that have GTID-based binlog replication
enabled (Note: binlog_row_metadata=FULL is required to
use the new syntax). You can create multiple tables that reference the same
upstream table. See CREATE TABLE FROM SOURCE for
details.
Handling table schema changes
The use of CREATE SOURCE with the new CREATE TABLE FROM SOURCE allows for the handling of certain upstream DDL
changes, specifically adding or dropping columns in the upstream tables, without
downtime.
See Guide: Handle upstream schema changes for details.
Supported types
With the new syntax, after a MySQL source is created, you CREATE TABLE FROM SOURCE to create a corresponding table in Materialize and
start ingesting data.
Materialize natively supports the following MySQL types:
bigintbinarybitblobbooleanchardatedatetimedecimaldoublefloatintjsonlongbloblongtextmediumblobmediumintmediumtextnumericrealsmallinttexttimetimestamptinyblobtinyinttinytextvarbinaryvarchar
When replicating tables that contain the unsupported data types, you can:
-
Use
TEXT COLUMNSoption for the following unsupported MySQL types:enumyear
The specified columns will be treated as
textand will not offer the expected MySQL type features. -
Use the
EXCLUDE COLUMNSoption to exclude any columns that contain unsupported data types.
For more information, including strategies for handling unsupported types,
see CREATE TABLE FROM SOURCE.
Upstream table truncation restrictions
Avoid truncating upstream tables that are being replicated into Materialize. If a replicated upstream table is truncated, the corresponding subsource in Materialize becomes inaccessible and will not produce any data until it is recreated.
Instead of truncating, use an unqualified DELETE to remove all rows from
the upstream table:
DELETE FROM t;
For additional considerations, see also CREATE TABLE.
Change data capture
For step-by-step instructions on enabling GTID-based binlog replication for your MySQL service, see the integration guides:
The source uses MySQL’s binlog replication protocol to continually ingest
changes resulting from INSERT, UPDATE and DELETE operations in the
upstream database. This process is known as change data capture.
The replication method used is based on global transaction identifiers (GTIDs), and guarantees transactional consistency — any operation inside a MySQL transaction is assigned the same timestamp in Materialize, which means that the source will never show partial results based on partially replicated transactions.
Before creating a source in Materialize, you must configure the upstream MySQL database for GTID-based binlog replication:
| MySQL Configuration | Value | Notes |
|---|---|---|
log_bin
|
ON
|
|
binlog_row_image
|
FULL
|
|
binlog_row_metadata
|
FULL
|
|
binlog_format
|
ROW
|
Deprecated as of MySQL 8.0.34. Newer versions of MySQL default to row-based logging. |
gtid_mode
|
ON
|
|
enforce_gtid_consistency
|
ON
|
|
replica_preserve_commit_order
|
ON
|
Only required when connecting Materialize to a read-replica. |
binlog_row_metadata, using SET GLOBAL binlog_row_metadata = FULL; does
not persist across MySQL server restarts. To make
the setting durable, use SET PERSIST (MySQL 8.0.11+) or set
binlog_row_metadata=FULL in the server’s configuration file. On managed
services, set the variable through the service’s parameter configuration
instead.
If you’re running MySQL using a managed service, additional configuration changes might be required. To enable GTID-based binlog replication for your MySQL service, see the integration guides.
Binlog retention
By default, MySQL retains binlog files for 30 days (i.e., 2592000 seconds)
before automatically removing them. This is configurable via the
binlog_expire_logs_seconds
system variable. We recommend using the default value for this configuration in
order to not compromise Materialize’s ability to resume replication in case of
failures or restarts.
In some MySQL managed services, binlog expiration can be overridden by a service-specific configuration parameter. It’s important that you double-check if such a configuration exists, and ensure it’s set to the maximum interval available.
As an example, Amazon RDS for MySQL has its
own configuration parameter for binlog retention (binlog retention hours)
that overrides binlog_expire_logs_seconds and is set to NULL by default.
Monitoring source progress
By default, MySQL sources expose progress metadata as a subsource that you can
use to monitor source ingestion progress. The name of the progress subsource
can be specified when creating a source using the EXPOSE PROGRESS AS clause;
otherwise, it will be named <src_name>_progress.
The following metadata is available for each source as a progress subsource:
| Field | Type | Details |
|---|---|---|
source_id_lower |
uuid |
The lower-bound GTID source_id of the GTIDs covered by this range. |
source_id_upper |
uuid |
The upper-bound GTID source_id of the GTIDs covered by this range. |
transaction_id |
uint8 |
The transaction_id of the next GTID possible from the GTID source_ids covered by this range. |
And can be queried using:
SELECT transaction_id
FROM <src_name>_progress;
Progress metadata is represented as a GTID set
of future possible GTIDs, which is similar to the
gtid_executed
system variable on a MySQL replica. The reported transaction_id should
increase as Materialize consumes new binlog records from the upstream MySQL
database. For more information, see Troubleshooting.
Example
binlog_row_metadata=FULL to use the new syntax.
Prerequisites
To create a source from MySQL(8.0.1+), you must first:
- Configure upstream MySQL instance
- Enable GTID-based binary log(binlog)
replication. You must set
binlog_row_metadata=FULLto use the newCREATE SOURCEsyntax. - Create a replication user and password for Materialize to use to connect.
- Enable GTID-based binary log(binlog)
replication. You must set
- Configure network security
- Ensure Materialize can connect to your MySQL instance.
- Create a connection to MySQL in Materialize
- The connection setup depends on the network security configuration.
For details, see the MySQL integration guides.
Create a source
Once you have configured the upstream MySQL, network security, and created
the connection to MySQL, you can create
the source. In this example, assume the connection you created is named
mysql_connection.
CREATE SOURCE mysql_source
FROM MYSQL CONNECTION mysql_connection;
After a source is created, you can create a table from the source, referencing specific upstream table(s). Use a DDL transaction block to create multiple tables from the same source.
BEGIN;
CREATE TABLE items
FROM SOURCE mysql_source (REFERENCE mydb.items);
CREATE TABLE orders
FROM SOURCE mysql_source (REFERENCE mydb.orders);
COMMIT;