Operational guidelines

The following provides some general guidelines for production.

Clusters

Production clusters for production workloads only

Use production cluster(s) for production workloads only. That is, avoid using production cluster(s) to run development workloads or non-production tasks.

Cluster architecture

In production, use a three-tier architecture, if feasible. A three-tier architecture consists of:

Tier
A dedicated cluster(s) for sources.
A dedicated cluster(s) for compute (e.g., materialized views).
A dedicated cluster(s) for serving queries, including indexes.

Benefits of a three-tier architecture include:

Upsert source consideration

In addition:

  • Consider separating upsert sources from your other sources. Upsert sources have higher resource requirements (since, for upsert sources, Materialize maintains each key and associated last value for the key as well as to perform deduplication). As such, if possible, use a separate source cluster for upsert sources.

  • Consider using a larger cluster size during snapshotting for upsert sources. Once the snapshotting operation is complete, you can downsize the cluster to align with the steady-state ingestion.

Alternatives

Alternatively, if a three-tier architecture is not feasible or unnecessary due to low volume or a non-production setup, a two cluster or a single cluster architecture may suffice.

If a three-tier architecture is not feasible or unnecessary due to low volume or a non-production setup, a two cluster architecture may suffice. A two-cluster architecture consists of:

  • A dedicated cluster for sources.

  • A dedicated cluster for both compute and serving queries.

Benefits of a two-cluster architecture include:

However, with a two-cluster architecture, compute and queries compete for the same cluster resources.

If a three-tier architecture is not feasible or unnecessary due to low volume or a non-production setup, a single cluster may suffice for your sources, compute objects, and query serving needs.

Benefits of a single cluster architecture include:

  • Cost effective

Limitations of a single cluster architecture include:

  • Sources, compute objects, and queries compete for cluster resources.

  • Blue/green deployment is unsupported since sources would need to be dropped and recreated, putting strain on your upstream system during source recreation.

    To support blue/green deployments, use a two-cluster architecture by moving compute objects to a new cluster (i.e., recreating compute objects in a new cluster).

Sources

Scheduling

If possible, schedule creating new sources during off-peak hours to mitigate the impact of snapshotting on both the upstream system and the Materialize cluster.

Separate cluster(s) for sources

In production,

  • If possible, use a dedicated cluster for sources; i.e., avoid putting sources on the same cluster that hosts compute objects, sinks, and/or serves queries.

  • Separate upsert sources from other sources. Upsert sources have higher resource requirements (since, for upsert sources, Materialize maintains each key and the key’s last value as well as performs deduplication). As such, if possible, use a separate source cluster for upsert sources.

See also Cluster architecture.

Sinks

Separate sinks from sources

Avoid putting sinks on the same cluster that hosts sources to allow for blue/green deployment.

See also Cluster architecture.

Snapshotting and hydration considerations

  • For upsert sources, snapshotting is a resource-intensive operation that can require a significant amount of CPU and memory.

  • During hydration (both initial and subsequent rehydrations), materialized views require memory proportional to both the input and output. When estimating required resources, consider both the hydration cost and the steady-state cost.

  • During initial hydration, sinks need to load an entire snapshot of the data in memory.

Back to top ↑