Materialize Documentation
Join the Community github/materialize

How to use dbt to manage Materialize

NOTE: The dbt-materialize adapter can only be used with dbt Core. We are working with the dbt community to bring native Materialize support to dbt Cloud!

dbt has become the standard for data transformation (“the T in ELT”). It combines the accessibility of SQL with software engineering best practices, allowing you to not only build reliable data pipelines, but also document, test and version-control them.

In this guide, we’ll cover how to use dbt and Materialize to transform streaming data in real time — from model building to continuous testing.

Setup

Minimum requirements: dbt v0.18.1+

Setting up a dbt project with Materialize is similar to setting it up with any other database that requires a non-native adapter. To get up and running, you need to:

  1. Install the dbt-materialize plugin (optionally using a virtual environment):

    python3 -m venv dbt-venv         # create the virtual environment
    source dbt-venv/bin/activate     # activate the virtual environment
    pip install dbt-materialize      # install the adapter
    

    The installation will include dbt-core and the dbt-postgres dependency. To check that the plugin was successfully installed, run:

    dbt --version
    

    materialize should be listed under “Plugins”. If this is not the case, double-check that the virtual environment is activated!

  2. To get started, make sure you have a Materialize account.

Create and configure a dbt project

A dbt project is a directory that contains all dbt needs to run and keep track of your transformations. At a minimum, it must have a project file (dbt_project.yml) and at least one model (.sql).

To create a new project, run:

dbt init <project_name>

This command will bootstrap a starter project with default configurations and create a profiles.yml file, if it doesn’t exist.

Connect to Materialize

dbt manages all your connection configurations (or, profiles) in a file called profiles.yml. By default, this file is located under ~/.dbt/.

  1. Locate the profiles.yml file in your machine:

    dbt debug --config-dir
    

    Note: If you started from an existing project but it’s your first time setting up dbt, it’s possible that this file doesn’t exist yet. You can manually create it in the suggested location.

  2. Open profiles.yml and adapt it to connect to your Materialize instance using the reference profile configuration.

    As an example, the following profile would allow you to connect to Materialize in two different environments: a developer environment (dev) and a production environment (prod).

    default:
      outputs:
    
        prod:
          type: materialize
          threads: 1
          host: <host>
          port: 6875
          user: <user@domain.com>
          pass: <password>
          database: materialize
          schema: public
          cluster: default
          sslmode: require
        dev:
          type: materialize
          threads: 1
          host: <host>
          port: 6875
          user: <user@domain.com>
          pass: <password>
          database: <dev_database>
          schema: <dev_schema>
          cluster: <dev_cluster>
          sslmode: require
    
      target: dev
    

    The target parameter allows you to configure the target environment that dbt will use to run your models.

  3. To test the connection to Materialize, run:

    dbt debug
    

    If the output reads All checks passed!, you’re good to go! The dbt documentation has some helpful pointers in case you run into errors.

Build and run dbt models

For dbt to know how to persist (or not) a transformation, the model needs to be associated with a materialization strategy. Because Materialize is optimized for real-time transformations of streaming data and the core of dbt is built around batch, the dbt-materialize adapter implements a few custom materialization types:

Type Details Config Options
source Creates a source. cluster, indexes
view Creates a view. cluster, indexes
materializedview Creates a materialized view. cluster, indexes
table Creates a materialized view (actual table support pending #5266). cluster, indexes
sink Creates a sink.
ephemeral Executes queries using CTEs.

Create a materialization for each SQL statement you’re planning to deploy. Each individual materialization should be stored as a .sql file under the directory defined by model-paths in dbt_project.yml.

Sources

In dbt, using a source makes it possible to name and describe the data loaded into Materialize. You can instruct dbt to create a source in Materialize using the custom source materialization, which allows for injecting the complete source statement into your .sql file.

NOTE: To connect to a Kafka broker or PostgreSQL database, you first need to create a connection that specifies access and authentication parameters. Once created, a connection is reusable across multiple CREATE SOURCE statements. For more details on creating connections, check the CREATE CONNECTION documentation page.

Create a Kafka source.

Filename: sources/kafka_topic_a.sql

{{ config(materialized='source') }}

CREATE SOURCE IF NOT EXISTS {{ this }}
  FROM KAFKA CONNECTION kafka_connection (TOPIC 'topic_a')
  FORMAT TEXT

Create a PostgreSQL source.

Filename: sources/postgres.sql

{{ config(materialized='source',
    post_hook="CREATE VIEWS FROM SOURCE {{ this }} (
                table_a as {{ this.database }}.{{ this.schema }}.table_a,
                table_b as {{ this.database }}.{{ this.schema }}.table_b, ...)") }}

CREATE SOURCE IF NOT EXISTS {{ this }}
  FROM POSTGRES CONNECTION pg_connection (PUBLICATION 'mz_source')

The pre-hook defined above is used to create the replication views that reproduce the publication’s original tables.

Sources are defined in .yml files nested under a sources: key.

sources:
  - name: postgres
    schema: "{{ target.schema }}"
    tables:
      - name: postgres_table_a
      - name: postgres_table_b
  - name: kafka
    schema: "{{ target.schema }}"
    tables:
      - name: kafka_topic_a

The sources above would be compiled to:

database.schema.postgres_table_a
database.schema.postgres_table_b
database.schema.kafka_topic_a

Views and materialized views

In dbt, a model is a SELECT statement that encapsulates a data transformation you want to run on top of your database.

When you use dbt with Materialize, your models stay up-to-date without manual or configured refreshes. This allows you to efficiently transform streaming data using the same thought process you’d use for batch transformations on top of any other database.

Views

dbt models are materialized as views by default, so to create a view in Materialize you can simply provide the SQL statement in the model (and skip the materialized configuration parameter).

Filename: models/view_a.sql

SELECT
    col_a, ...
FROM {{ source('kafka','kafka_topic_a') }}

The model above would be compiled to database.schema.view_a. One thing to note here is that the model depends on the Kafka source defined above. To express this dependency and track the lineage of your project, you can use the dbt source() function.

Materialized views

This is where Materialize goes beyond dbt’s incremental models (and traditional databases), with materialized views that continuously update as the underlying data changes:

Filename: models/materialized_view_a.sql

{{ config(materialized='materializedview') }}

SELECT
    col_a, ...
FROM {{ ref('view_a') }}

The model above would be compiled to database.schema.materialized_view_a. Here, the model depends on the view defined above, and is referenced as such via the dbt ref() function.

Configuration

source, view, and materialized view materializations accept the following additional configuration options.

cluster

Use the cluster option to specify the cluster in which the materialization is created. If unspecified, the default cluster for the connection is used.

{{ config(materialized='materializedview', cluster='cluster_a') }}

indexes

Use the indexes option to define a list of indexes on a materialization. Each Materialize index can have three components:

{{ config(materialized='view',
          indexes=[{'columns': ['col_a'], 'cluster': 'cluster_a'}]) }}

Build and run dbt

  1. Run the dbt models:

    dbt run
    

    This command generates executable SQL code from any model files under the specified directory and runs it in the target environment. You can find the compiled statements under /target/run and target/compiled in the dbt project folder.

  2. Using a new terminal window, connect to Materialize to double-check that all objects have been created:

    psql "postgres://<user>:<password>@<host>:6875/materialize"
    
    materialize=> SHOW SOURCES [FROM database.schema];
           name
    -------------------
     postgres_table_a
     postgres_table_b
     kafka_topic_a
    
     materialize=> SHOW VIEWS;
           name
    -------------------
     view_a
    
     materialize=> SHOW MATERIALIZED VIEWS;
           name
    -------------------
     materialized_view_a
    

That’s it! From here on, Materialize makes sure that your models are incrementally updated as new data streams in, and that you get fresh and correct results with millisecond latency whenever you query your views.

Test and document a dbt project

Continuous testing

Using dbt in a streaming context means that you’re able to run data quality and integrity tests non-stop, and monitor failures as soon as they happen. This is useful for unit testing during the development of your dbt models, and later in production to trigger real-time alerts downstream.

  1. To configure your project for continuous testing, add a tests property to dbt_project.yml with the store_failures configuration:

    tests:
      dbt_project.name:
        models:
          +store_failures: true
          +schema: 'etl_failure'
    

    This will instruct dbt to create a materialized view for each configured test that can keep track of failures over time. By default, test views are created in a schema suffixed with dbt_test__audit. To specify a custom suffix, use the schema config.

    Note: As an alternative, you can specify the --store-failures flag when running dbt test.

  2. Add tests to your models using the tests property in the model configuration .yml files:

    models:
      - name: materialized_view_a
        description: 'materialized view a description'
        columns:
          - name: col_a
            description: 'column a description'
            tests:
              - not_null
              - unique
    

    The type of test and the columns being tested are used as a base for naming the test materialized views. For example, the configuration above would create views named not_null_col_a and unique_col_a.

  3. Run the tests:

    dbt test
    

    When configured to store_failures, this command will create a materialized view for each test using the respective SELECT statements, instead of doing a one-off check for failures as part of its execution.

    This guarantees that your tests keep running in the background as views that are automatically updated as soon as an assertion fails.

  4. Using a new terminal window, connect to Materialize to double-check that the schema storing the tests has been created, as well as the test materialized views:

    psql "postgres://<user>:<password>@<host>:6875/materialize"
    
    materialize=> SHOW SCHEMAS;
           name
    -------------------
     public
     public_etl_failure
    
     materialize=> SHOW MATERIALIZED VIEWS FROM public_etl_failure;;
           name
    -------------------
     not_null_col_a
     unique_col_a
    

With continuous testing in place, you can then build alerts off of the test materialized views using any common PostgreSQL-compatible client library and SUBSCRIBE (see the Python cheatsheet for a reference implementation).

Documentation

dbt can automatically generate documentation for your project as a shareable website. This brings data governance to your streaming pipelines, speeding up life-saving processes like data discovery (where to find what data) and lineage (the path data takes from source(s) to sink(s), as well as the transformations that happen along the way).

If you’ve already created .yml files with helpful properties about your project resources (like model and column descriptions, or tests), you are all set.

  1. To generate documentation for your project, run:

    dbt docs generate
    

    dbt will grab any additional project information and Materialize catalog metadata, then compile it into .json files (manifest.json and catalog.json, respectively) that can be used to feed the documentation website. You can find the compiled files under /target, in the dbt project folder.

  2. Launch the documentation website. By default, this command starts a web server on port 8000:

    dbt docs serve #--port <port>
    
  3. In a browser, navigate to localhost:8000. There, you can find an overview of your dbt project, browse existing models and metadata, and in general keep track of what’s going on.

    If you click View Lineage Graph in the lower right corner, you can even inspect the lineage of your streaming pipelines!

    dbt lineage graph