Amazon S3

PREVIEW This feature is in public preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.

This guide walks you through the steps required to export results from Materialize to Amazon S3. Copying results to S3 is useful to perform tasks like periodic backups for auditing, or downstream processing in analytical data warehouses like Snowflake, Databricks or BigQuery.

Before you begin

Ensure you have access to an AWS account, and permissions to create and manage IAM policies and roles. If you’re not an account administrator, you will need support from one!

Step 1. Set up an Amazon S3 bucket

First, you must set up an S3 bucket and give Materialize enough permissions to write files to it. We strongly recommend using role assumption-based authentication to manage access to the target bucket.

Create a bucket

Log in to your AWS account.
Navigate to AWS Services, then S3.
Create a new, general purpose S3 bucket with the suggested default configurations.

Create an IAM policy

Once you create an S3 bucket, you must associate it with an IAM policy that specifies what actions can be performed on the bucket by the Materialize exporter role. For Materialize to be able to write data into the bucket, the IAM policy must allow the following actions:

Action type	Action name	Action description
Write	`s3:PutObject`	Grants permission to add an object to a bucket.
List	`s3:ListBucket`	Grants permission to list some or all of the objects in a bucket.
Write	`s3:DeleteObject`	Grants permission to remove an object from a bucket.

To create a new IAM policy:

Navigate to AWS Services, then AWS IAM.
In the IAM Dashboard, click Policies, then Create policy.
For Policy editor, choose JSON.

Copy and paste the policy below into the editor, replacing <bucket> with the bucket name and <prefix> with the folder path prefix.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
              "s3:PutObject",
              "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::<bucket>",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "<prefix>/*"
                    ]
                }
            }
        }
    ]
}

Click Next.
Enter a name for the policy, and click Create policy.

Create an IAM role

Next, you must attach the policy you just created to a Materialize-specific IAM role.

Navigate to AWS Services, then AWS IAM.
In the IAM Dashboard, click Roles, then Create role.

In Trusted entity type, select Custom trust policy, and copy and paste the policy below.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::664411391173:role/MaterializeConnection"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "PENDING"
                }
            }
        }
    ]
}

Materialize always uses the provided IAM principal to assume the role, and also generates an external ID which uniquely identifies your AWS connection across all Materialize regions (see AWS connection permissions). For now, you’ll set this ID to a dummy value; later, you’ll update it with the unique identifier for your Materialize region.

Click Next.
In Add permissions, select the IAM policy you created in Create an IAM policy, and click Next.
Enter a name for the role, and click Create role.
Click View role to see the role summary page, and note down the role ARN. You will need it in the next step to create an AWS connection in Materialize.

Step 2. Create a connection

In the SQL Shell, or your preferred SQL client connected to Materialize, create an AWS connection, replacing <account-id> with the 12-digit number that identifies your AWS account, and <role> with the name of the role you created in the previous step:
```
CREATE CONNECTION aws_connection
   TO AWS (ASSUME ROLE ARN = 'arn:aws:iam::<account-id>:role/<role>');
```

Retrieve the external ID for the connection:

SELECT awsc.id, external_id
 FROM mz_internal.mz_aws_connections awsc
 JOIN mz_connections c ON awsc.id = c.id
 WHERE c.name = 'aws_connection';

The external ID will have the following format:

mz_XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX_uXXX

In your AWS account, find the IAM role you created in Create an IAM role and, under Trust relationships, click Edit trust policy. Use the external_id from the previous step to update the trust policy’s sts:ExternalId, then click Update policy.

WARNING! Failing to constrain the external ID in your role trust policy will allow other Materialize customers to assume your role and use AWS privileges you have granted the role!
Back in Materialize, validate the AWS connection you created using the VALIDATE CONNECTION command.
```
VALIDATE CONNECTION aws_connection;
```
If no validation error is returned, you’re ready to use this connection to run a bulk export from Materialize to your target S3 bucket! 🔥

Step 3. Run a bulk export

To export data to your target S3 bucket, use the COPY TO command, and the AWS connection you created in the previous step.

COPY some_object TO 's3://<bucket>/<path>'
WITH (
    AWS CONNECTION = aws_connection,
    FORMAT = 'parquet'
  );

For details on the Parquet writer settings Materialize uses, as well as data type support and conversion, check the reference documentation.

COPY some_object TO 's3://<bucket>/<path>'
WITH (
    AWS CONNECTION = aws_connection,
    FORMAT = 'csv'
  );

You might notice that Materialize first writes a sentinel file to the target S3 bucket. When the copy operation is complete, this file is deleted. This allows using the s3:ObjectRemoved event to trigger downstream processing.

Step 4. (Optional) Add scheduling

Bulk exports to Amazon S3 using the COPY TO command are one-shot: every time you want to export results, you must run the command. To automate running bulk exports on a regular basis, you can set up scheduling, for example using a simple cron-like service or an orchestration platform like Airflow or Dagster.