Snowflake
This guide walks you through the steps required to bulk-export results from Materialize to Snowflake using Amazon S3 as the intermediate object store.
Before you begin
-
Ensure you have access to an AWS account, and permissions to create and manage IAM policies and roles. If you’re not an account administrator, you will need support from one!
-
Ensure you have access to a Snowflake account, and are able to connect as a user with either the
ACCOUNTADMIN
role, or a role with the globalCREATE INTEGRATION
privilege.
Step 1. Set up bulk exports to Amazon S3
Follow the Amazon S3 integration guide to set up an Amazon S3 bucket that Materialize securely writes data into. This will be your starting point for bulk-loading Materialize data into Snowflake.
Step 2. Configure a Snowflake storage integration
Create an IAM policy
To bulk-load data from an S3 bucket into Snowflake, you must create a new IAM policy that specifies what actions can be performed on the bucket by the Snowflake importer role. For Snowflake to be able to read data from the bucket, the IAM policy must allow the following actions:
Action type | Action name | Action description |
---|---|---|
Write | s3:GetBucketLocation |
Grants permission to return the region the bucket is hosted in. |
Read | s3:GetObject |
Grants permission to retrieve objects from a bucket. |
Read | s3:GetObjectVersion |
Grants permission to retrieve a specific version of an object from a bucket. |
List | s3:ListBucket |
Grants permission to list some or all of the objects in a bucket. |
Write | s3:DeleteObject |
(Optional) Grants permission to remove an object from a bucket. |
Write | s3:DeleteObjectVersion |
(Optional) Grants permission to remove a specific version of an object from a bucket. |
Write | s3:PutObject |
(Optional) Grants permission to add an object to a bucket. |
To create a new IAM policy:
-
Navigate to AWS Services, then AWS IAM.
-
In the IAM Dashboard, click Policies, then Create policy.
-
For Policy editor, choose JSON.
-
Copy and paste the policy below into the editor, replacing
<bucket>
with the bucket name and<prefix>
with the folder path prefix.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:GetObjectVersion", "s3:DeleteObject", "s3:DeleteObjectVersion" ], "Resource": "arn:aws:s3:::<bucket>/<prefix>/*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::<bucket>", "Condition": { "StringLike": { "s3:prefix": [ "<prefix>/*" ] } } } ] }
-
Click Next.
-
Enter a name for the policy, and click Create policy.
Create an IAM role
Next, you must attach the policy you just created to a Snowflake-specific IAM role.
-
Navigate to AWS Services, then AWS IAM.
-
In the IAM Dashboard, click Roles, then Create role.
-
In Trusted entity type, select Account ID, then This account. Later, you’ll update it with the unique identifier for your Snowflake account.
-
Check the Require external ID box. Enter a placeholder External ID (e.g. 0000). Later, you’ll update it with the unique external ID for your Snowflake storage integration.
-
Click Next.
-
In Add permissions, select the IAM policy you created in Create an IAM policy, and click Next.
-
Enter a name for the role, and click Create role.
-
Click View role to see the role summary page, and note down the role ARN. You will need it in the next step to create a Snowflake storage integration.
Create a Snowflake storage integration
ACCOUNTADMIN
role,
or a role with the global CREATE INTEGRATION
privilege
can execute this step.
-
In Snowsight, or your preferred SQL client connected to Snowflake, create a storage integration, replacing
<role>
with the name of the role you created in the previous step:CREATE STORAGE INTEGRATION S3_int TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = 'S3' ENABLED = TRUE STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/<role>' STORAGE_ALLOWED_LOCATIONS = ('*');
-
Retrieve the IAM principal for your Snowflake account using the
DESC INTEGRATION
command:DESC INTEGRATION s3_int;
Note down the values for the
STORAGE_AWS_IAM_USER_ARN
andSTORAGE_AWS_EXTERNAL_ID
properties. You will need them in the next step to update the Snowflake trust policy attached to your S3 bucket.
Update the IAM policy
- In your AWS account, find the IAM role you created in Create an IAM role
and, under Trust relationships, click Edit trust policy. Use the values
for the
STORAGE_AWS_IAM_USER_ARN
andSTORAGE_AWS_EXTERNAL_ID
properties from the previous step to update the trust policy’sPrincipal
andExternalId
, then click Update policy.
Step 3. Create a Snowflake external stage
Back in Snowflake, create an external stage that uses the storage integration above and references your S3 bucket.
CREATE STAGE s3_stage
STORAGE_INTEGRATION = s3_int
URL = 's3://<bucket>/<prefix>/';
CREATE STAGE
privilege
for the active schema, as well as the USAGE
privilege
on the relevant storage integration.
Step 4. Import data into Snowflake
To import the data stored in S3 into Snowflake, you can then create a table and
use the COPY INTO
command to load it from the external stage.
Create a table with a single column of type VARIANT
:
CREATE TABLE s3_table_parquet (
col VARIANT
);
Use COPY INTO
to load the data into the table:
COPY INTO s3_table_parquet
FROM @S3_stage
FILE_FORMAT = (TYPE = 'PARQUET');
For more details on importing Parquet files staged in S3 into Snowflake, check the Snowflake documentation.
Create a table with the same number of columns as the number of delimited columns in the input data file:
CREATE TABLE s3_table_csv (
col_1 INT,
col_2 TEXT,
col_3 TIMESTAMP
);
Use COPY INTO
to load the data into the table:
COPY INTO s3_table_csv
FROM @s3_stage
FILE_FORMAT = (TYPE = 'CSV');
For more details on importing CSV files staged in S3 into Snowflake, check the Snowflake documentation.
Step 5. (Optional) Add scheduling
Bulk exports to Amazon S3 using the COPY TO
command are one-shot: every time
you want to export results, you must run the command. To automate running bulk
exports from Materialize to Snowflake on a regular basis, you can set up
scheduling, for example using a simple cron
-like service or an orchestration
platform like Airflow or Dagster.