Materialize Logo

CREATE SOURCE: CSV from local file

CREATE SOURCE connects Materialize to some data source, and lets you interact with its data as if it were in a SQL table.

This document details how to connect Materialize to local CSV files.

Conceptual framework

Sources represent connections to resources outside Materialize that it can read data from. For more information, see API Components: Sources.

Syntax

CREATE MATERIALIZED SOURCE IF NOT EXISTS src_name ( col_name , ) FROM FILE path COMPRESSION NONE GZIP WITH ( field = val , ) FORMAT CSV WITH HEADER n COLUMNS DELIMITED BY char
Field Use
MATERIALIZED Materializes the source’s data, which retains all data in memory and makes sources directly selectable. For more information, see Materialized source details.
src_name The name for the source, which is used as its table name within SQL.
col_name Override default column name with the provided identifier. If used, a col_name must be provided for each column in the created source.
FILE path The absolute path to the file you want to use as the source.
COMPRESSION algorithm The compression algorithm used with this file.
WITH ( option_list ) Options affecting source creation. For more detail, see WITH options.
HEADER Treat the first line of the CSV file as a header. See CSV format details.
n COLUMNS Format the source’s data as a CSV with n columns. See CSV format details.
DELIMITED BY char Delimit the CSV by char. ASCII comma by default (','). This must be an ASCII character; other Unicode code points are not supported.
ENVELOPE NONE (Default) Use an append-only envelope. This means that records will only be appended and cannot be updated or deleted.

WITH options

The following options are valid within the WITH clause.

Field Value type Description
tail boolean Continually check the file for new content.

Details

Materialized source details

Materializing a source keeps data it receives in an in-memory index, the presence of which makes the source directly queryable. In contrast, non-materialized sources cannot process queries directly; to access the data the source receives, you need to create materialized views that SELECT from the source.

For a mental model, materializing the source is approximately equivalent to creating a non-materialized source, and then creating a materialized view from all of the source’s columns:

CREATE SOURCE src ...;
CREATE MATERIALIZED VIEW src_view AS SELECT * FROM src;

The actual implementation of materialized sources differs, though, by letting you refer to the source’s name directly in queries.

For more details about the impact of materializing sources (and implicitly creating an index), see CREATE INDEX: Details — Memory footprint.

File source details

path values must be the file’s absolute path, e.g.

CREATE SOURCE server_source FROM FILE '/Users/sean/server.log'...

Compression

CSV format details

Materialize uses the format method you specify to determine the number of columns to create in the source, as well as the columns’ names.

Method Outcome
HEADER Materialize reads the first line of the file to determine:

• The number of columns in the file

• The name of each column

The first line of the file is not ingested as data.
n COLUMNS • Materialize treats the file as if it has n columns.

• Columns are named column1, column2columnN.

Note that:

Types

Materialize treats all columns in CSV sources as text. You can “type” this data using casts when creating views using this source, e.g.:

CREATE MATERIALIZED VIEW salaries AS
  SELECT (employee_id::int, salary::numeric(38, 2))
  FROM csv_employee_data;

Append-only envelope

Append-only envelope means that all records received by the source is treated as an insert. This is Materialize’s default envelope (i.e. if no envelope is specified), and can be specified with ENVELOPE NONE.

Example

Creating a source from a static CSV

CREATE SOURCE static_w_header
FROM FILE '[path to .csv]'
FORMAT CSV WITH HEADER;

This creates a source that…

Creating a source from a dynamic CSV

CREATE SOURCE dynamic_wo_header (col_foo, col_bar, col_baz)
FROM FILE '[path to .csv]'
WITH (tail = true)
FORMAT CSV WITH 3 COLUMNS;

This creates a source that…

Did this info help?
Yes No