Amazon S3
This guide walks you through the steps required to export results from Materialize to Amazon S3. Copying results to S3 is useful to perform tasks like periodic backups for auditing, or downstream processing in analytical data warehouses like Snowflake, Databricks or BigQuery.
Before you begin
- Ensure you have access to an AWS account, and permissions to create and manage IAM policies and roles. If you’re not an account administrator, you will need support from one!
Step 1. Set up an Amazon S3 bucket
First, you must set up an S3 bucket and give Materialize enough permissions to write files to it. We strongly recommend using role assumption-based authentication to manage access to the target bucket.
Create a bucket
-
Log in to your AWS account.
-
Navigate to AWS Services, then S3.
-
Create a new, general purpose S3 bucket with the suggested default configurations.
Create an IAM policy
Once you create an S3 bucket, you must associate it with an IAM policy that specifies what actions can be performed on the bucket by the Materialize exporter role. For Materialize to be able to write data into the bucket, the IAM policy must allow the following actions:
Action type | Action name | Action description |
---|---|---|
Write | s3:PutObject |
Grants permission to add an object to a bucket. |
List | s3:ListBucket |
Grants permission to list some or all of the objects in a bucket. |
Write | s3:DeleteObject |
Grants permission to remove an object from a bucket. |
To create a new IAM policy:
-
Navigate to AWS Services, then AWS IAM.
-
In the IAM Dashboard, click Policies, then Create policy.
-
For Policy editor, choose JSON.
-
Copy and paste the policy below into the editor, replacing
<bucket>
with the bucket name and<prefix>
with the folder path prefix.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:DeleteObject" ], "Resource": "arn:aws:s3:::<bucket>/<prefix>/*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": "arn:aws:s3:::<bucket>", "Condition": { "StringLike": { "s3:prefix": [ "<prefix>/*" ] } } } ] }
-
Click Next.
-
Enter a name for the policy, and click Create policy.
Create an IAM role
Next, you must attach the policy you just created to a Materialize-specific IAM role.
-
Navigate to AWS Services, then AWS IAM.
-
In the IAM Dashboard, click Roles, then Create role.
-
In Trusted entity type, select Custom trust policy, and copy and paste the policy below.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::664411391173:role/MaterializeConnection" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "PENDING" } } } ] }
Materialize always uses the provided IAM principal to assume the role, and also generates an external ID which uniquely identifies your AWS connection across all Materialize regions (see AWS connection permissions). For now, you’ll set this ID to a dummy value; later, you’ll update it with the unique identifier for your Materialize region.
-
Click Next.
-
In Add permissions, select the IAM policy you created in Create an IAM policy, and click Next.
-
Enter a name for the role, and click Create role.
-
Click View role to see the role summary page, and note down the role ARN. You will need it in the next step to create an AWS connection in Materialize.
Step 2. Create a connection
-
In the SQL Shell, or your preferred SQL client connected to Materialize, create an AWS connection, replacing
<account-id>
with the 12-digit number that identifies your AWS account, and<role>
with the name of the role you created in the previous step:CREATE CONNECTION aws_connection TO AWS (ASSUME ROLE ARN = 'arn:aws:iam::<account-id>:role/<role>');
-
Retrieve the external ID for the connection:
SELECT awsc.id, external_id FROM mz_internal.mz_aws_connections awsc JOIN mz_connections c ON awsc.id = c.id WHERE c.name = 'aws_connection';
The external ID will have the following format:
mz_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX_uXXX
-
In your AWS account, find the IAM role you created in Create an IAM role and, under Trust relationships, click Edit trust policy. Use the
external_id
from the previous step to update the trust policy’ssts:ExternalId
, then click Update policy.WARNING! Failing to constrain the external ID in your role trust policy will allow other Materialize customers to assume your role and use AWS privileges you have granted the role! -
Back in Materialize, validate the AWS connection you created using the
VALIDATE CONNECTION
command.VALIDATE CONNECTION aws_connection;
If no validation error is returned, you’re ready to use this connection to run a bulk export from Materialize to your target S3 bucket! 🔥
Step 3. Run a bulk export
To export data to your target S3 bucket, use the COPY TO
command, and the AWS connection you created in the previous step.
COPY some_object TO 's3://<bucket>/<path>'
WITH (
AWS CONNECTION = aws_connection,
FORMAT = 'parquet'
);
For details on the Parquet writer settings Materialize uses, as well as data type support and conversion, check the reference documentation.
COPY some_object TO 's3://<bucket>/<path>'
WITH (
AWS CONNECTION = aws_connection,
FORMAT = 'csv'
);
You might notice that Materialize first writes a sentinel file to the target S3
bucket. When the copy operation is complete, this file is deleted. This allows
using the s3:ObjectRemoved
event
to trigger downstream processing.
Step 4. (Optional) Add scheduling
Bulk exports to Amazon S3 using the COPY TO
command are one-shot: every time
you want to export results, you must run the command. To automate running bulk
exports on a regular basis, you can set up scheduling, for example using a
simple cron
-like service or an orchestration platform like Airflow or
Dagster.