Managing streaming analytics pipelines with dbt

Let's explore a hands-on example where we use dbt (data build tool) to manage and document a streaming analytics workflow from a message broker to Metabase.

Marta Paes
Marta PaesDeveloper Experience

This blogpost was originally contributed to the Metabase community blog.

In a previous post, we broke down how to build a streaming analytics pipeline with Materialize and Metabase. The TL;DR is that you can write some SQL against streaming data sources, let Materialize efficiently maintain your results up-to-date as new data arrives, and keep your dashboards light and fresh.

This time around, we’re going to explore how to use dbt to manage and document this workflow end-to-end.

Transforming streaming data with dbt

As much as you want to run your analytics non-stop, one thing you probably don’t want is to let go of the tools that make you productive in batch. What if you could manage your streaming analytics pipelines using the exact same tooling?

Although dbt was built with batch in mind, it’s flexible enough as a framework to serve as a unified transformation layer on top of both batch and streaming backends; as long as that backend is SQL-based, what’s running under the covers becomes an implementation detail.

We’ve built the dbt-materialize adapter to bring streaming transformations (and beyond) to dbt. If this looks familiar…


{{ config(
    materialized ='materializedview'
) }}
SELECT fi.icao24,
       manufacturername,
       model,
       operator,
       origin_country,
       time_position,
       longitude,
       latitude
FROM {{ ref('stg_flight_information') }} fi
JOIN {{ ref('stg_icao_mapping') }} icao ON fi.icao24 = icao.icao24

…that’s all there is: you define your business logic as dbt models using SQL and some Jinja, deploy the pipeline (once), and Materialize keeps things up and running for you. For use cases that would otherwise require you to redeploy your models multiple times a day ($$), maintain complex incremental logic and make some serious trade-offs to optimize for speed (like, say, correctness), using a dedicated streaming database like Materialize can take you further, faster.

Documenting streaming analytics pipelines

More than standardizing how we reason about and manage analytics workflows, dbt made documentation cool again (well, was it ever really cool before dbt?). With a few YAML files, you can bring data governance to your streaming pipelines, speeding up life-saving processes like data discovery and lineage, and even ensure that your metrics stay put. If the tendency is to treat your dbt projects as the source of truth for business logic and documentation, shouldn’t your BI tool just…get in sync?

How to sync dbt and Metabase

The dbt-metabase plugin is a great starting point if you’re looking to bring dbt and Metabase closer together. For example, you can use it to:

Document Metabase items as dbt exposures

Getting a good understanding of the end-to-end dependencies of a dbt project requires also being able to track dependencies that are external to it, like Metabase questions and dashboards. One way to track these dependencies in the DAG is to declare any use of models downstream as exposures (notice the orange node!), which you can generate automatically using the plugin:

Propagate dbt metadata to the Metabase data model

If you’re already documenting your data model in dbt, the plugin also allows you to derive the Metabase data model from existing model properties and configurations, like table and column descriptions, semantic type definitions and other useful metadata that helps create a shared context between data producers and consumers:

There’s a lot more to explore when it comes to managing streaming analytics pipelines with dbt, so if you’re curious to get your hands down, check out the sample demo used to create the examples above and the Materialize+dbt integration guide.

More Articles

Technical Article

Real-time data quality tests using dbt and Materialize

In traditional databases, a SQL query used as a test runs as a point-in-time check. In streaming, the same query can run continually as data changes, creating a SQL-based data monitoring primitive.

Anna Glander

Jul 14, 2022

Technical Article

Virtual Time: The Secret to Strong Consistency and Scalable Performance in Materialize

The key to Materialize's ability to separate compute from storage and scale horizontally without sacrificing consistency is a concept called virtual time.

Frank McSherry

Jun 14, 2022

Technical Article

Let’s talk about Data Apps

What is a Data Application? How do they help our customers? What new challenges do we face when building Data Apps? Here's our perspective.

Joaquin Colacci

Jun 9, 2022

Join the Materialize Community

Join hundreds of other Materialize users and connect directly with our engineers.

Join the Community

© 2022 Materialize, Inc. Terms of Service