CREATE SOURCE: JSON over Kinesis
CREATE SOURCE connects Materialize to an external data source and lets you interact
with its data as if the data were in a SQL table.
This document details how to connect Materialize to JSON-formatted Kinesis streams.
Sources represent connections to resources outside Materialize that it can read data from. For more information, see API Components: Sources.
|MATERIALIZED||Materializes the source’s data, which retains all data in memory and makes sources directly selectable. For more information, see API Components: Materialized sources.|
|src_name||The name for the source, which is used as its table name within SQL.|
|col_name||Override default column name with the provided identifier. If used, a col_name must be provided for each column in the created source.|
|KINESIS ARN arn||The AWS ARN of the Kinesis Data Stream.|
|WITH ( option_list )||Options affecting source creation. For more detail, see
|FORMAT BYTES||Leave data received from the source as unformatted bytes, and store them in a column named
|ENVELOPE NONE||(Default) Use an append-only envelope. This means that records will only be appended and cannot be updated or deleted.|
The following options are valid within the
||A valid access key ID for the AWS resource.|
||A valid secret access key for the AWS resource.|
||The session token associated with the credentials, if the credentials are temporary|
If you do not provide credentials via with options then
materialized will examine the standard
AWS authorization chain:
- Environment variables:
credential_processcommand in the AWS config file, usually located at
- AWS credentials file. Usually located at
- IAM instance profile. Will only work if running on an EC2 instance with an instance profile/role.
Credentials fetched from a container or instance profile expire on a fixed schedule. Materialize will attempt to refresh the credentials automatically before they expire, but the source will become inoperable if the refresh operation fails. For details about the IAM account whose details you provide, see Kinesis source details.
Kinesis source details
- A Kinesis source represents a single Kinesis stream.
- By default, Materialize will try to read credentials automatically via Rusoto’s ChainProvider. If credentials are explicitly provided, those will be used instead.
- The IAM account whose credentials you provide requires
kinesis-readpermissions and access to
- Kinesis sources will only have one column, which will be named
Raw byte format details
Raw byte-formatted sources provide Materialize the raw bytes received from the source without applying any formatting or decoding.
Raw byte-formatted sources have one column, which, by default, is named
Extracting JSON data from bytes
Materialize cannot receive JSON data directly from a source. Instead, you must
create a source that stores the data it receives as raw bytes (FORMAT
BYTES), and then construct views that provides access to your JSON data by
casting the source’s
bytea column (named
text, and then to
CREATE MATERIALIZED VIEW jsonified_bytes AS SELECT CAST(data AS JSONB) AS data FROM ( SELECT CONVERT_FROM(data, 'utf8') AS data FROM bytea_source )
Append-only envelope means that all records received by the source is treated as an insert. This is Materialize’s default envelope (i.e. if no envelope is specified), and can be specified with ENVELOPE NONE.
CREATE SOURCE kinesis_source FROM KINESIS ARN ... WITH ( access_key_id = ..., secret_access_key = ... ) FORMAT BYTES;
This creates a source that…
- Is append-only.
- Has one column,
data, which represents the stream’s incoming bytes.
To use this data in views, you can decode its bytes into
jsonb. For example:
CREATE MATERIALIZED VIEW jsonified_kinesis_source AS SELECT CAST(data AS jsonb) AS data FROM ( SELECT convert_from(data, 'utf8') AS data FROM kinesis_source )