We are thrilled to announce that bulk exports to Amazon S3 are now available in Private Preview! This feature supports exporting objects and query results from Materialize to object storage (starting with Amazon S3) using the COPY TO command, and enables tasks like periodic backups for auditing, or downstream processing in analytical data warehouses like Snowflake, Databricks or BigQuery. During this preview period, you can reach out to our team to have this feature enabled in your environment. We invite you to try it out and share feedback that can help us refine and continue to improve the feature.

Why build this?

Materialize is the Operational Data Warehouse that empowers organizations to act confidently on fast-changing data. While an operational data warehouse focuses on serving you the freshest data as it happens, it’s likely you’ll need this data in historical contexts as well. This is where batch exports from an operational data warehouse can help! Bulk exports let you snapshot your data to a batch system, where it can be integrated into other analytical workloads, or periodically backed up.

With bulk exports to S3, you can now write data from Materialize into Amazon S3 object storage. This lets you use S3 as an intermediary to sink data to a broader set of systems downstream, such as batch data warehouses like Amazon Redshift, Snowflake, BigQuery, or Databricks.

In addition, by bulk exporting a snapshot of data from Materialize into Amazon S3, you can use the object store to back up your data, including historical snapshots of the output of a materialized view.

Exporting your data

To export data from Materialize to Amazon s3, you need access to your Amazon Web Services (AWS) account, and enough permissions to create and manage IAM roles. The first step to exporting data to an S3 is to configure access to the target bucket: Materialize natively integrates with AWS Identity and Access Management (IAM), so you can do this securely via role assumption. In Materialize, all you need to do is create an AWS connection that uses the configured role:

sql
CREATE CONNECTION aws_conn TO AWS (
  ASSUME ROLE ARN = 'arn:aws:iam::001234567890:role/MZS3Exporter'
);

This connection can then be used with the COPY TO command to export the results of any SQL query or object in Materialize as a Parquet or CSV file into an S3 bucket in your AWS account. We went basic and wide with the Parquet writer configurations for maximum compatibility with Snowflake, but plan to extend compatibility to other cloud data warehouses, as well as support for table formats like Apache Iceberg or Delta Lake.

sql
COPY my_view TO 's3://mz-to-snow/parquet/'
  WITH (
  AWS CONNECTION = aws_conn,
  FORMAT = 'parquet'
);

It’s important to note that the COPY TO command is one-shot: every time you want to export results, you must run the command. To automate exporting results on a regular basis, you can set up scheduling, for example using a simple cron-like service, or an orchestration platform like Airflow or Dagster.

With the data sitting in S3, you can now, for example, import it into Snowflake using a storage integration, an external stage and the COPY INTO command:

sql
CREATE STORAGE INTEGRATION s3_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/SnowS3Importer'
STORAGE_ALLOWED_LOCATIONS = ('*');
sql
CREATE STAGE s3_stage
STORAGE_INTEGRATION = s3_int
URL = 's3://mz-to-snow/parquet/';
sql
CREATE TABLE s3_table_parquet (
  mz_snap VARIANT
);
sql
COPY INTO s3_table_parquet
FROM @s3_stage
FILE_FORMAT = (TYPE = 'PARQUET');

Try it out!

As we launch bulk exports to S3 into Private Preview, we invite you to try it out. Existing customers can work with our team to enable this new feature in their environments. We are eager to hear your feedback on this new feature. Stay tuned for additional updates as we continue to build out this feature, including support for additional formats (like Apache Iceberg and Delta Lake) and object stores, as well as continual exports. These improvements are part of our ongoing effort to broaden our reach in serving and sinking data downstream to other systems within your data stack.

More Articles

Technical Article

The Problem with Lying is Keeping Track of All the Lies

Or why clear consistency guarantees are how to stay sane when programming distributed systems.
Arjun Narayan

Jun 5, 2024

Company News

Celebrating our newest partnership at Data Cloud Summit

Materialize is now partners with Snowflake. Celebrate with us next week at Snowflake Data Cloud Summit.
Ross Perez

May 30, 2024

Technical Article

Materialize and Memory

We reduced memory requirements for many users by nearly 2x, resulting in significant cost-savings.
Frank McSherry

May 16, 2024

Try Materialize Free