Snowflake

PREVIEW This feature is in public preview. It is under active development and may have stability or performance issues. It isn't subject to our backwards compatibility guarantees.

This guide walks you through the steps required to bulk-export results from Materialize to Snowflake using Amazon S3 as the intermediate object store.

Before you begin

Ensure you have access to an AWS account, and permissions to create and manage IAM policies and roles. If you’re not an account administrator, you will need support from one!
Ensure you have access to a Snowflake account, and are able to connect as a user with either the ACCOUNTADMIN role, or a role with the global CREATE INTEGRATION privilege.

Step 1. Set up bulk exports to Amazon S3

Follow the Amazon S3 integration guide to set up an Amazon S3 bucket that Materialize securely writes data into. This will be your starting point for bulk-loading Materialize data into Snowflake.

Step 2. Configure a Snowflake storage integration

Create an IAM policy

To bulk-load data from an S3 bucket into Snowflake, you must create a new IAM policy that specifies what actions can be performed on the bucket by the Snowflake importer role. For Snowflake to be able to read data from the bucket, the IAM policy must allow the following actions:

Action type	Action name	Action description
Write	`s3:GetBucketLocation`	Grants permission to return the region the bucket is hosted in.
Read	`s3:GetObject`	Grants permission to retrieve objects from a bucket.
Read	`s3:GetObjectVersion`	Grants permission to retrieve a specific version of an object from a bucket.
List	`s3:ListBucket`	Grants permission to list some or all of the objects in a bucket.
Write	`s3:DeleteObject`	(Optional) Grants permission to remove an object from a bucket.
Write	`s3:DeleteObjectVersion`	(Optional) Grants permission to remove a specific version of an object from a bucket.
Write	`s3:PutObject`	(Optional) Grants permission to add an object to a bucket.

To create a new IAM policy:

Navigate to AWS Services, then AWS IAM.
In the IAM Dashboard, click Policies, then Create policy.
For Policy editor, choose JSON.

Copy and paste the policy below into the editor, replacing <bucket> with the bucket name and <prefix> with the folder path prefix.

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Effect": "Allow",
         "Action": [
            "s3:PutObject",
            "s3:GetObject",
            "s3:GetObjectVersion",
            "s3:DeleteObject",
            "s3:DeleteObjectVersion"
         ],
         "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
      },
      {
         "Effect": "Allow",
         "Action": [
            "s3:ListBucket",
            "s3:GetBucketLocation"
         ],
         "Resource": "arn:aws:s3:::<bucket>",
         "Condition": {
            "StringLike": {
               "s3:prefix": [
                  "<prefix>/*"
               ]
            }
         }
      }
   ]
}

Click Next.
Enter a name for the policy, and click Create policy.

Create an IAM role

Next, you must attach the policy you just created to a Snowflake-specific IAM role.

Navigate to AWS Services, then AWS IAM.
In the IAM Dashboard, click Roles, then Create role.
In Trusted entity type, select Account ID, then This account. Later, you’ll update it with the unique identifier for your Snowflake account.
Check the Require external ID box. Enter a placeholder External ID (e.g. 0000). Later, you’ll update it with the unique external ID for your Snowflake storage integration.
Click Next.
In Add permissions, select the IAM policy you created in Create an IAM policy, and click Next.
Enter a name for the role, and click Create role.
Click View role to see the role summary page, and note down the role ARN. You will need it in the next step to create a Snowflake storage integration.

Create a Snowflake storage integration

NOTE: Only users with either the ACCOUNTADMIN role, or a role with the global CREATE INTEGRATION privilege can execute this step.

In Snowsight, or your preferred SQL client connected to Snowflake, create a storage integration, replacing <role> with the name of the role you created in the previous step:

CREATE STORAGE INTEGRATION S3_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/<role>'
  STORAGE_ALLOWED_LOCATIONS = ('*');

Retrieve the IAM principal for your Snowflake account using the DESC INTEGRATION command:
```
DESC INTEGRATION s3_int;
```
Note down the values for the STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID properties. You will need them in the next step to update the Snowflake trust policy attached to your S3 bucket.

Update the IAM policy

In your AWS account, find the IAM role you created in Create an IAM role and, under Trust relationships, click Edit trust policy. Use the values for the STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID properties from the previous step to update the trust policy’s Principal and ExternalId, then click Update policy.

Step 3. Create a Snowflake external stage

Back in Snowflake, create an external stage that uses the storage integration above and references your S3 bucket.

CREATE STAGE s3_stage
  STORAGE_INTEGRATION = s3_int
  URL = 's3://<bucket>/<prefix>/';

NOTE: To create a stage that uses a storage integration, the active user must have a role with the CREATE STAGE privilege for the active schema, as well as the USAGE privilege on the relevant storage integration.

Step 4. Import data into Snowflake

To import the data stored in S3 into Snowflake, you can then create a table and use the COPY INTO command to load it from the external stage.

Create a table with a single column of type VARIANT:

CREATE TABLE s3_table_parquet (
    col VARIANT
);

Use COPY INTO to load the data into the table:

COPY INTO s3_table_parquet
  FROM @S3_stage
  FILE_FORMAT = (TYPE = 'PARQUET');

For more details on importing Parquet files staged in S3 into Snowflake, check the Snowflake documentation.

Create a table with the same number of columns as the number of delimited columns in the input data file:

CREATE TABLE s3_table_csv (
    col_1 INT,
    col_2 TEXT,
    col_3 TIMESTAMP
);

Use COPY INTO to load the data into the table:

COPY INTO s3_table_csv
  FROM @s3_stage
  FILE_FORMAT = (TYPE = 'CSV');

For more details on importing CSV files staged in S3 into Snowflake, check the Snowflake documentation.

Step 5. (Optional) Add scheduling

Bulk exports to Amazon S3 using the COPY TO command are one-shot: every time you want to export results, you must run the command. To automate running bulk exports from Materialize to Snowflake on a regular basis, you can set up scheduling, for example using a simple cron-like service or an orchestration platform like Airflow or Dagster.