important
This blog post was originally contributed to the Metabase community blog.
In a previous post, we broke down how to build a streaming analytics pipeline with Materialize and Metabase. The TL;DR is that you can write some SQL against streaming data sources, let Materialize efficiently maintain your results up-to-date as new data arrives, and keep your dashboards light and fresh.
This time around, we’re going to explore using dbt to manage and document this workflow end-to-end.
Transforming streaming data with dbt
As much as you want to run your analytics non-stop, one thing you probably don’t want is to let go of the tools that make you productive in batch. What if you could manage your streaming analytics pipelines using the exact same tooling?
Although dbt was built with batch in mind, it’s flexible enough as a framework to serve as a unified transformation layer on top of both batch and streaming backends; as long as that backend is SQL-based, what’s running under the covers becomes an implementation detail.
We’ve built the dbt-materialize adapter to bring transformations (and beyond) to dbt streaming. If this looks familiar…
{{ config(
materialized = 'materializedview'
) }}
SELECT fi.icao24,
manufacturername,
model,
operator,
origin_country,
time_position,
longitude,
latitude
FROM {{ ref('stg_flight_information') }} fi
JOIN {{ ref('stg_icao_mapping') }} icao ON fi.icao24 = icao.icao24
…that’s all there is: you define your business logic as dbt models using SQL and some Jinja, deploy the pipeline (once), and Materialize keeps things up and running for you. For use cases that would otherwise require you to redeploy your models multiple times a day ($$), maintain complex incremental logic and make some serious trade-offs to optimize for speed (like, say, correctness), using a dedicated operational data warehouse like Materialize can take you further, faster.
Documenting dbt streaming analytics pipelines
More than standardizing how we reason about and manage analytics workflows, dbt made documentation cool . With a few YAML files, you can bring data governance to your streaming pipelines, speeding up life-saving processes like data discovery and lineage and evenensuring that your metrics stay put. If the tendency is to treat your dbt projects as the source of truth for business logic and documentation, shouldn’t your BI tool just… get in sync?
How to sync dbt and Metabase
The dbt-metabase plugin is a great starting point if you’re looking to bring dbt streaming and Metabase closer together. For example, you can use it to:
Document Metabase items as dbt streaming exposures
Getting a good understanding of the end-to-end dependencies of a dbt project requires also being able to track dependencies that are external to it, like Metabase questions and dashboards. One way to track these dependencies in the DAG is to declare any use of models downstream asexposures (notice the orange node!), which you can generate automatically using the plugin:
Propagate dbt metadata to the Metabase data model
If you’re already documenting your data model in dbt, the plugin also allows you to derive the Metabasedata model from existing model properties and configurations, like table and column descriptions, semantic type definitions, and other useful metadata that helps create a shared context between data producers and consumers:
There’s a lot more to explore when it comes to managing analytics pipelines with dbt streaming
If you’re curious to get your hands down, check out the sample demo used to create the examples above and the Materialize+dbt integration guide.