CREATE SOURCE: CSV from local file
CREATE SOURCE connects Materialize to an external data source and lets you interact
with its data as if the data were in a SQL table.
This document details how to connect Materialize to local CSV files.
Sources represent connections to resources outside Materialize that it can read data from. For more information, see API Components: Sources.
|MATERIALIZED||Materializes the source’s data, which retains all data in memory and makes sources directly selectable. For more information, see API Components: Materialized sources.|
|src_name||The name for the source, which is used as its table name within SQL.|
|col_name||Override default column name with the provided identifier. If used, a col_name must be provided for each column in the created source.|
|FILE path||The absolute path to the file you want to use as the source.|
|COMPRESSION algorithm||The compression algorithm used with this file.|
|WITH ( option_list )||Options affecting source creation. For more detail, see
|HEADER||Treat the first line of the CSV file as a header. See CSV format details.|
|HEADER (name_list)||Treat the first line of the CSV file as a header, validate that the columns in the file match the provided column list. See CSV format details.|
|n COLUMNS||Format the source’s data as a CSV with n columns. See CSV format details.|
|DELIMITED BY char||Delimit the CSV by char. ASCII comma by default (
|ENVELOPE NONE||(Default) Use an append-only envelope. This means that records will only be appended and cannot be updated or deleted.|
|PRIMARY KEY ( col_list ) NOT ENFORCED||Declare a set of columns as a primary key. For more information, see
The following options are valid within the
||Continually check the file for new content.|
File source details
path values must be the file’s absolute path, e.g.
CREATE SOURCE server_source FROM FILE '/Users/sean/server.log'...
COMPRESSIONis equivalent to
GZIPcompression requires the file to be a normal
.gzfile (e.g. one produced by
gzip) or a concatenation of multiple
CSV format details
Materialize uses the format method you specify to determine the number of columns to create in the source, as well as the columns' names.
|HEADER||Materialize reads the first line of the file to determine:
• The number of columns in the file
• The name of each column
The first line of the file is not ingested as data.
|HEADER (name_list)||All of the same behaviors as bare HEADER with the additional features that:
• Header names from source objects will be validated to exactly match those specified in the name list.
• Specifying a column list allows using CSV format with sources that have headers but individual objects may not yet exist. Primarily this is intended for S3 sources.
|n COLUMNS||• Materialize treats the file as if it has n columns.
• Columns are named
- You can override these naming conventions by explicitly naming columns using
the col_name option in
- All rows without the number of columns determined by the format are dropped, and Materialize logs an error.
- To avoid case-sensitivity conflicts with Materialize identifiers, we recommend you double-quote all field names when working with Avro-formatted sources. For more information, see Identifiers: Case sensitivity.
CREATE MATERIALIZED VIEW salaries AS SELECT (employee_id::int, salary::numeric(38, 2)) FROM csv_employee_data;
Append-only envelope means that all records received by the source is treated as an insert. This is Materialize’s default envelope (i.e. if no envelope is specified), and can be specified with ENVELOPE NONE.
Key constraint details
For other source configurations, the
allows to manually declare a set of columns as a primary key. This enables
optimizations and constructs that rely on a key to be present when it cannot be
WARNING! Materialize will not enforce the constraint and will produce wrong results if it is not correct.
Creating a source from a static CSV
CREATE SOURCE static_w_header FROM FILE '[path to .csv]' FORMAT CSV WITH HEADER;
This creates a source that…
- Is append-only.
- Materialize reads once during source creation.
- Has as many columns as the first row of the CSV file, and uses that row to name the columns.
Creating a source from a dynamic CSV
CREATE SOURCE dynamic_wo_header (col_foo, col_bar, col_baz) FROM FILE '[path to .csv]' WITH (tail = true) FORMAT CSV WITH 3 COLUMNS;
This creates a source that…
- Is append-only.
- Has 3 columns (
col_baz). Materialize will not ingest any row without 3 columns.
- Materialize dynamically checks for new entries.