CREATE SOURCE: MySQL (New Syntax)

View as Markdown
PREVIEW This feature is in public preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.
Disambiguation
This page reflects the new syntax which allows Materialize to handle upstream DDL changes, specifically adding or dropping columns, without downtime. For the deprecated syntax, see the old reference page.

Creates a new source from MySQL. Materialize supports creating sources from MySQL version 8.0.1+. Once a new source is created, you can CREATE TABLE FROM SOURCE to create the corresponding tables in Materialize and start the data ingestion process.

Prerequisites

To create a source from MySQL(8.0.1+), you must first:

  • Configure upstream MySQL instance
  • Configure network security
    • Ensure Materialize can connect to your MySQL instance.
  • Create a connection to MySQL in Materialize

Syntax

To create a source from an external MySQL database:

CREATE SOURCE [IF NOT EXISTS] <source_name>
[IN CLUSTER <cluster_name>]
FROM MYSQL CONNECTION <connection_name>
;
Syntax element Description
IF NOT EXISTS Optional. If specified, do not throw an error if a source with the same name already exists. Instead, issue a notice and skip the source creation.
<source_name> The name of the source to create. Names for sources must follow the naming guidelines.
IN CLUSTER <cluster_name>

Optional. The cluster to maintain this source. Otherwise, the source will be created in the active cluster.

💡 Tip: If possible, use a cluster dedicated just for sources. See also Operational guidelines.
<connection_name>

The name of the MySQL connection to use for the source. For details on creating connections, see CREATE CONNECTION.

A connection is reusable across multiple CREATE SOURCE statements.

To start ingesting data, create a CREATE TABLE FROM SOURCE statement for each upstream table to replicate.

Ingesting data

After a source is created, you can create tables from the source referencing upstream MySQL tables that have GTID-based binlog replication enabled (Note: binlog_row_metadata=FULL is required to use the new syntax). You can create multiple tables that reference the same upstream table. See CREATE TABLE FROM SOURCE for details.

Handling table schema changes

The use of CREATE SOURCE with the new CREATE TABLE FROM SOURCE allows for the handling of certain upstream DDL changes, specifically adding or dropping columns in the upstream tables, without downtime.

See Guide: Handle upstream schema changes for details.

Supported types

With the new syntax, after a MySQL source is created, you CREATE TABLE FROM SOURCE to create a corresponding table in Materialize and start ingesting data.

Materialize natively supports the following MySQL types:

  • bigint
  • binary
  • bit
  • blob
  • boolean
  • char
  • date
  • datetime
  • decimal
  • double
  • float
  • int
  • json
  • longblob
  • longtext
  • mediumblob
  • mediumint
  • mediumtext
  • numeric
  • real
  • smallint
  • text
  • time
  • timestamp
  • tinyblob
  • tinyint
  • tinytext
  • varbinary
  • varchar

When replicating tables that contain the unsupported data types, you can:

  • Use TEXT COLUMNS option for the following unsupported MySQL types:

    • enum
    • year

    The specified columns will be treated as text and will not offer the expected MySQL type features.

  • Use the EXCLUDE COLUMNS option to exclude any columns that contain unsupported data types.

For more information, including strategies for handling unsupported types, see CREATE TABLE FROM SOURCE.

Upstream table truncation restrictions

Avoid truncating upstream tables that are being replicated into Materialize. If a replicated upstream table is truncated, the corresponding subsource in Materialize becomes inaccessible and will not produce any data until it is recreated.

Instead of truncating, use an unqualified DELETE to remove all rows from the upstream table:

DELETE FROM t;

For additional considerations, see also CREATE TABLE.

Change data capture

NOTE:

For step-by-step instructions on enabling GTID-based binlog replication for your MySQL service, see the integration guides:

The source uses MySQL’s binlog replication protocol to continually ingest changes resulting from INSERT, UPDATE and DELETE operations in the upstream database. This process is known as change data capture.

The replication method used is based on global transaction identifiers (GTIDs), and guarantees transactional consistency — any operation inside a MySQL transaction is assigned the same timestamp in Materialize, which means that the source will never show partial results based on partially replicated transactions.

Before creating a source in Materialize, you must configure the upstream MySQL database for GTID-based binlog replication:

MySQL Configuration Value Notes
log_bin ON
binlog_row_image FULL
binlog_row_metadata FULL
binlog_format ROW Deprecated as of MySQL 8.0.34. Newer versions of MySQL default to row-based logging.
gtid_mode ON
enforce_gtid_consistency ON
replica_preserve_commit_order ON Only required when connecting Materialize to a read-replica.
💡 Tip: For binlog_row_metadata, using SET GLOBAL binlog_row_metadata = FULL; does not persist across MySQL server restarts. To make the setting durable, use SET PERSIST (MySQL 8.0.11+) or set binlog_row_metadata=FULL in the server’s configuration file. On managed services, set the variable through the service’s parameter configuration instead.

If you’re running MySQL using a managed service, additional configuration changes might be required. To enable GTID-based binlog replication for your MySQL service, see the integration guides.

Binlog retention

WARNING! If Materialize tries to resume replication and finds GTID gaps due to missing binlog files, the source enters an errored state and you have to drop and recreate it.

By default, MySQL retains binlog files for 30 days (i.e., 2592000 seconds) before automatically removing them. This is configurable via the binlog_expire_logs_seconds system variable. We recommend using the default value for this configuration in order to not compromise Materialize’s ability to resume replication in case of failures or restarts.

In some MySQL managed services, binlog expiration can be overridden by a service-specific configuration parameter. It’s important that you double-check if such a configuration exists, and ensure it’s set to the maximum interval available.

As an example, Amazon RDS for MySQL has its own configuration parameter for binlog retention (binlog retention hours) that overrides binlog_expire_logs_seconds and is set to NULL by default.

Monitoring source progress

By default, MySQL sources expose progress metadata as a subsource that you can use to monitor source ingestion progress. The name of the progress subsource can be specified when creating a source using the EXPOSE PROGRESS AS clause; otherwise, it will be named <src_name>_progress.

The following metadata is available for each source as a progress subsource:

Field Type Details
source_id_lower uuid The lower-bound GTID source_id of the GTIDs covered by this range.
source_id_upper uuid The upper-bound GTID source_id of the GTIDs covered by this range.
transaction_id uint8 The transaction_id of the next GTID possible from the GTID source_ids covered by this range.

And can be queried using:

SELECT transaction_id
FROM <src_name>_progress;

Progress metadata is represented as a GTID set of future possible GTIDs, which is similar to the gtid_executed system variable on a MySQL replica. The reported transaction_id should increase as Materialize consumes new binlog records from the upstream MySQL database. For more information, see Troubleshooting.

Example

! Important: Before creating a MySQL source, you must enable GTID-based binary log (binlog) replication, including setting binlog_row_metadata=FULL to use the new syntax.

Prerequisites

To create a source from MySQL(8.0.1+), you must first:

  • Configure upstream MySQL instance
  • Configure network security
    • Ensure Materialize can connect to your MySQL instance.
  • Create a connection to MySQL in Materialize

For details, see the MySQL integration guides.

Create a source

Once you have configured the upstream MySQL, network security, and created the connection to MySQL, you can create the source. In this example, assume the connection you created is named mysql_connection.

CREATE SOURCE mysql_source
FROM MYSQL CONNECTION mysql_connection;

After a source is created, you can create a table from the source, referencing specific upstream table(s). Use a DDL transaction block to create multiple tables from the same source.

BEGIN;
CREATE TABLE items
FROM SOURCE mysql_source (REFERENCE mydb.items);

CREATE TABLE orders
FROM SOURCE mysql_source (REFERENCE mydb.orders);
COMMIT;
Back to top ↑