Development workflows
As you progress from development to production, deploying changes to Materialize requires different workflows. This page provides an overview of best practices and deployment patterns across the different stages you will progress through using dbt as your deployment tool.
Development
When you’re prototyping your use case and fine-tuning the underlying data model,
your priority is iteration speed. dbt has many features that can help speed
up development, like node selection and model preview.
Before you start, we recommend getting familiar with how these features
work with the dbt-materialize
adapter to make the most of your development
time.
Node selection
By default, the dbt-materialize
adapter drops and recreates all models on
each dbt run
invocation. This can have unintended consequences, in particular
if you’re managing sources and sinks as models in your dbt project. dbt allows
you to selectively run specific models and exclude specific materialization
types from each run using node selection.
Exclude sources and sinks
You can manually exclude specific materialization types using the
exclude
flag in
your dbt run invocations. To exclude sources and sinks, use:
dbt run --exclude materialized:source materialized:sink
YAML selectors
Instead of manually specifying node selection on each run, you can create a YAML selector that makes this the default behavior when running dbt:
# YAML selectors should be defined in a top-level file named selectors.yml
selectors:
- name: exclude_sources_and_sinks
description: >
Exclude models that use source or sink materializations in the command
invocation.
default: true
definition:
union:
# The fqn method combined with the "*" operator selects all nodes in the
# dbt graph
- method: fqn
value: "*"
- exclude:
- 'config.materialized:source'
- 'config.materialized:sink'
Because default: true
is specified, dbt will use the selector’s criteria
whenever you run an unqualified command (e.g. dbt build
, dbt run
). You can
still override this default by adding selection criteria to commands, or adjust
the value of default
depending on the target environment. To learn more about
using the default
and exclude
properties with YAML selectors, check the
dbt documentation.
Run a subset of models
You can run individual models, or groups of models, using the select
flag
in your dbt run invocations:
dbt run --select "my_dbt_project_name" # runs all models in your project
dbt run --select "my_dbt_model" # runs a specific model
dbt run --select "my_model+" # select my_model and all downstream dependencies
dbt run --select "path.to.my.models" # runs all models in a specific directory
dbt run --select "my_package.some_model" # runs a specific model in a specific package
dbt run --select "tag:nightly" # runs models with the "nightly" tag
dbt run --select "path/to/models" # runs models contained in path/to/models
dbt run --select "path/to/my_model.sql" # runs a specific model by its path
For a full rundown of selection logic options, check the dbt documentation.
Preview model results
dbt show
command uses a LIMIT
clause under the hood, which has
known performance limitations
in Materialize.
To debug and preview the results of your models without materializing the
results, you can use the dbt show
command:
dbt show --select "model_name.sql"
23:02:20 Running with dbt=1.7.7
23:02:20 Registered adapter: materialize=1.7.3
23:02:20 Found 3 models, 1 test, 4 seeds, 1 source, 0 exposures, 0 metrics, 430 macros, 0 groups, 0 semantic models
23:02:20
23:02:23 Previewing node 'model_name':
| col |
| -------------------- |
| value1 |
| value2 |
| value3 |
| value4 |
| value5 |
By default, the dbt show
command will return the first 5 rows from the query
result (i.e. LIMIT 5
). You can adjust the number of rows returned using the
--limit n
flag.
It’s important to note that previewing results compiles the model and runs the
compiled SQL against Materialize; it doesn’t query the already-materialized
database relation (see dbt-core
#7391).
Deployment
Once your dbt project is ready to move out of development, or as soon as you start managing multiple users and deployment environments, we recommend checking the code in to version control and setting up an automated workflow to control the deployment of changes.
Depending on the environment context and your deployment requirements, there are two patterns that help optimize and automate your dbt runs: blue/green deployments and slim deployments.
Blue/green deployments
The dbt-materialize
adapter ships with helper macros to automate blue/green
deployments. We recommend using the blue/green pattern any time you need to
deploy changes to the definition of objects in Materialize in production
environments and can’t tolerate downtime.
For development environments with no downtime considerations, you might prefer to use the slim deployment pattern instead for quicker iteration and reduced CI costs.
Configuration and initialization
In a blue/green deployment, you first deploy your code changes to a deployment environment (“green”) that is a clone of your production environment (“blue”), in order to validate the changes without causing unavailability. These environments are later swapped transparently.
-
In
dbt_project.yml
, use thedeployment
variable to specify the cluster(s) and schema(s) that contain the changes you want to deploy.vars: deployment: default: clusters: # To specify multiple clusters, use [<cluster1_name>, <cluster2_name>]. - <cluster_name> schemas: # to specify multiple schemas, use [<schema1_name>, <schema2_name>]. - <schema_name>
-
Use the
run-operation
command to invoke thedeploy_init
macro:dbt run-operation deploy_init
This macro spins up a new cluster named
<cluster_name>_dbt_deploy
and a new schema named<schema_name>_dbt_deploy
using the same configuration as the current environment to swap with (including privileges). -
Run the dbt project containing the code changes against the new deployment environment.
dbt run --vars 'deploy: True'
The
deploy: True
variable instructs the adapter to append_dbt_deploy
to the original schema or cluster specified for each model scoped for deployment, which transparently handles running that subset of models against the deployment environment.
Validation
We strongly recommend validating the results of the deployed changes on the deployment environment to ensure it’s safe to cutover.
-
After deploying the changes, the objects in the deployment cluster need to fully hydrate before you can safely cut over. Use the
run-operation
command to invoke thedeploy_await
macro, which periodically polls the cluster readiness status, and waits for all objects to have a lag of less than 1 second to return successfully.dbt run-operation deploy_await
The default poll interval is
15
seconds, but can be configured to a different value by passing thepoll_interval=<value>
argument in the macro invocation. -
Once
deploy_await
returns successfully, you can manually run tests against the new deployment environment to validate the results.
Cutover and cleanup
-
Once
deploy_await
returns successfully and you have validated the results of the deployed changes on the deployment environment, it is safe to push the changes to your production environment.Use the
run-operation
command to invoke thedeploy_promote
macro, which (atomically) swaps the environments.dbt run-operation deploy_promote
NOTE: Thedeploy_promote
operation might fail if objects are concurrently modified by a different session. If this occurs, re-run the operation.This macro ensures all deployment targets, including schemas and clusters, are deployed together as a single atomic operation. If any part of the deployment fails, the entire deployment is rolled back to guarantee consistency and prevent partial updates.
-
Use the run
run-operation
command to invoke thedeploy_cleanup
macro, which (cascade) drops the_dbt_deploy
-suffixed cluster(s) and schema(s):dbt run-operation deploy_cleanup
NOTE: Any activeSUBSCRIBE
commands attached to the swapped cluster(s) will break. On retry, the client will automatically connect to the newly deployed cluster
Slim deployments
On each run, dbt generates artifacts
with metadata about your dbt project, including the manifest file
(manifest.json
). This file contains a complete representation of the latest
state of your project, and you can use it to avoid re-deploying resources
that didn’t change since the last run.
We recommend using the slim deployment pattern when you want to reduce development idle time and CI costs in development environments. For production deployments, you should prefer the blue/green deployment pattern.
-
Fetch the production
manifest.json
file into the CI environment:- name: Download production manifest from s3 env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }} AWS_REGION: us-east-1 run: | aws s3 cp s3://mz-test-dbt/manifest.json ./manifest.json
-
Then, instruct dbt to run and test changed models and dependencies only:
- name: Build dbt env: MZ_HOST: ${{ secrets.MZ_HOST }} MZ_USER: ${{ secrets.MZ_USER }} MZ_PASSWORD: ${{ secrets.MZ_PASSWORD }} CI_TAG: "${{ format('{0}_{1}', 'gh_ci', github.event.number ) }}" run: | source .venv/bin/activate dbt run-operation drop_environment dbt build --profiles-dir ./ --select state:modified+ --state ./ --target production
In the example above,
--select state:modified+
instructs dbt to run all models that were modified (state:modified
) and their downstream dependencies (+
). Depending on your deployment requirements, you might want to use a different combination of state selectors, or go a step further and use the--defer
flag to reduce even more the number of models that need to be rebuilt. For a full rundown of the available state modifier and graph operator options, check the dbt documentation. -
Every time you deploy to production, upload the new
manifest.json
file to blob storage (e.g. s3):- name: upload new manifest to s3 env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }} AWS_REGION: us-east-1 run: | aws s3 cp ./target/manifest.json s3://mz-test-dbt
Other deployment tools
As a tool primarily meant to manage your data model, the dbt-materialize
adapter does not expose all Materialize objects types. If there is a clear
separation between data modeling and infrastructure management ownership
in your team, and you want to manage objects like clusters,
connections, or secrets as code,
we recommend using the Materialize Terraform provider as a
complementary deployment tool.