Materialize Documentation
Join the Community github/materialize

dbt and Materialize

NOTE: The dbt-materialize adapter can only be used with dbt Core. We are working with the dbt community to bring native Materialize support to dbt Cloud!

dbt has become the standard for data transformation (“the T in ELT”). It combines the accessibility of SQL with software engineering best practices, allowing you to not only build reliable data pipelines, but also document, test and version-control them.

While dbt is a great fit for batch transformations, it can only approximate transforming streaming data (officially through incremental models, and unofficially through lambda views). In this guide, we’ll cover how to use dbt and Materialize to transform streaming data in real time.

Setup

Minimum requirements: dbt v0.18.1+

Setting up a dbt project with Materialize is similar to setting it up with any other database that requires a non-native adapter. To get up and running, you need to:

  1. Install dbt by following the instructions in the official documentation.

    Note: If you’re completely new to dbt, we also recommend going through their getting started guide to get familiar with the CLI and core concepts.

  2. Install the dbt-materialize plugin (optionally using a virtual environment):

    python3 -m venv dbt-venv         # create the virtual environment
    source dbt-venv/bin/activate     # activate the virtual environment
    pip install dbt-materialize      # install the adapter
    

    To check that the plugin was successfully installed, run:

    dbt --version
    

    materialize should be listed under “Plugins”. If this is not the case, double-check that the virtual environment is activated!

  3. Last but not least, make sure you have Materialize installed and running. You can also use the dbt-materialize adapter with Materialize Cloud.

Create and configure a dbt project

A dbt project is a directory that contains all dbt needs to run and keep track of your transformations. At a minimum, it must have a project file (dbt_project.yml) and at least one model (.sql).

To create a new project, run:

dbt init <project_name>

This command will bootstrap a starter project with default configurations and create a profiles.yml file, if it doesn’t exist.

Connect to Materialize

dbt manages all your connection configurations (or, profiles) in a file called profiles.yml. By default, this file is located under ~/.dbt/.

  1. Locate the profiles.yml file in your machine:

    dbt debug --config-dir
    

    Note: If you started from an existing project but it’s your first time setting up dbt, it’s possible that this file doesn’t exist yet. You can manually create it in the suggested location.

  2. Open profiles.yml and adapt it to connect to your Materialize instance using the reference profile configuration.

    As an example, the following profile would allow you to connect to Materialize in two different environments: an instance running locally (dev) and a Materialize Cloud instance (prod).

    default:
      outputs:
    
        dev:
          type: materialize
          threads: 1
          host: localhost
          port: 6875
          user: materialize
          pass: password
          dbname: materialize
          schema: public
    
        prod:
          type: materialize
          threads: 1
          host: instance.materialize.cloud
          port: 6875
          user: materialize
          pass: password
          dbname: materialize
          schema: analytics
          sslmode: verify-ca
          sslcert: materialize.crt
          sslkey: materialize.key
          sslrootcert: ca.crt
    
      target: dev
    

    The target parameter allows you to configure the target environment that dbt will use to run your models.

  3. To test the connection to Materialize, run:

    dbt debug
    

    If the output reads All checks passed!, you’re good to go! The dbt documentation has some helpful pointers in case you run into errors.

Build and run dbt models

In dbt, a model is a SELECT statement that encapsulates a data transformation you want to run on top of your database. For dbt to know how to persist (or not) a transformation, the model needs to be associated with a materialization strategy.

dbt models

When you use dbt with Materialize, your models stay up-to-date without manual or configured refreshes. This allows you to efficiently transform streaming data using the same thought process you’d use for batch transformations on top of any other database.

  1. Create a model for each SQL statement you’re planning to deploy. Each individual model should be stored as a .sql file under the directory defined by source-paths in dbt_project.yml.

    As an example, we’ll use the SQL statements in our getting started guide and re-write them as dbt models.

    Creating a source

    You can instruct dbt to create a source in Materialize using the custom source materialization:

    {{ config(materialized='source') }}
    
    {% set source_name %}
        {{ mz_generate_name('market_orders_raw') }}
    {% endset %}
    
    CREATE SOURCE {{ source_name }}
    FROM PUBNUB
    SUBSCRIBE KEY 'sub-c-4377ab04-f100-11e3-bffd-02ee2ddab7fe'
    CHANNEL 'pubnub-market-orders'
    

    The mz_generate_name macro allows you to generate a fully-qualified name from a base object name. Here, source_name would be compiled to materialize.public.market_orders_raw.

    Creating a view

    dbt models are materialized as views by default, so to create a view in Materialize you can simply provide the SQL statement in the model (and skip the materialized configuration parameter):

    SELECT
        ((text::jsonb)->>'bid_price')::float AS bid_price,
        (text::jsonb)->>'order_quantity' AS order_quantity,
        (text::jsonb)->>'symbol' AS symbol,
        (text::jsonb)->>'trade_type' AS trade_type,
        to_timestamp(((text::jsonb)->'timestamp')::bigint) AS ts
    FROM {{ source('public','market_orders_raw') }}
    

    One thing to note here is that the model depends on the source defined in the previous step. To express this dependency and track the lineage of your project, you can use the dbt source() and ref() functions.

    Creating a materialized view

    This is where Materialize goes beyond dbt’s incremental models (and traditional databases), with materialized views that continuously update as the underlying data changes:

    {{ config(materialized='materializedview') }}
    
    SELECT symbol,
           AVG(bid_price) AS avg
    FROM {{ ref('market_orders') }}
    GROUP BY symbol
    

    When should you use what? We recommend using materializedview models exclusively for your core business logic to ensure that you’re not consuming more memory than needed in Materialize. Intermediate or staging views should use the view materialization type instead.

  2. Run the dbt models:

    dbt run
    

    This command generates executable SQL code from any model files under the specified directory and runs it in the target environment. You can find the compiled statements under /target/run and target/compiled in the dbt project folder.

  3. Using a new terminal window, connect to Materialize to double-check that all objects have been created:

    psql -U materialize -h localhost -p 6875 materialize
    
    materialize=> SHOW SOURCES;
    
           name
    -------------------
     market_orders_raw
    
     materialize=> SHOW VIEWS;
    
           name
    -------------------
     avg_bid
     market_orders
    

That’s it! From here on, Materialize makes sure that your models are incrementally updated as new data streams in, and that you get fresh and correct results with millisecond latency whenever you query your views.

Materializations

dbt models are materialized as views by default, but can be configured to use a different materialization type through the materialized configuration parameter. This parameter can be set directly in the model file using:

{{ config(materialized='materializedview') }}

Because Materialize is optimized for real-time transformations of streaming data and the core of dbt is built around batch, the dbt-materialize adapter implements a few custom materialization types:

Type Details
source Creates a source.
view Creates a view.
materializedview Creates a materialized view.
table Creates a materialized view (actual table support pending #5266).
index Creates an index.
sink Creates a sink.
ephemeral Executes queries using CTEs.

Document and test a dbt project

dbt can automatically generate documentation for your project as a shareable website. This brings data governance to your streaming pipelines, speeding up life-saving processes like data discovery (where to find what data) and lineage (the path data takes from source(s) to sink(s), as well as the transformations that happen along the way).

  1. Optionally, create a .yml file with helpful properties about your project resources (like model and column descriptions, or tests) and add it to directory where your models live:

    version: 2
    
    sources:
      - name: public
        description: "Public schema"
        tables:
          - name: market_orders_raw
            description: "Market order data source (PubNub)"
    models:
      - name: market_orders
        description: "Converts market order data to proper data types"
    
      - name: avg_bid
        description: "Computes the average bid price"
        columns:
          - name: symbol
            description: "The stock ticker"
            tests:
              - not_null
          - name: avg
            description: "The average bid price"
    
  2. To generate documentation for your project, run:

    dbt docs generate
    

    dbt will grab any additional project information and Materialize catalog metadata, then compile it into .json files (manifest.json and catalog.json, respectively) that can be used to feed the documentation website. You can find the compiled files under /target, in the dbt project folder.

  3. Launch the documentation website. By default, this command starts a web server on port 8000:

    dbt docs serve #--port <port>
    
  4. In a browser, navigate to localhost:8000. There, you can find an overview of your dbt project, browse existing models and metadata, and in general keep track of what’s going on.

    If you click View Lineage Graph in the lower right corner, you can even inspect the lineage of your streaming pipelines!

    dbt lineage graph

  5. Finally, run some tests on your models:

    dbt test
    

    Under the hood, dbt creates a SELECT query for each test that returns the rows where this assertion is not true; if the test returns zero rows, the assertion passes.

Did this info help?
Yes No