Clusters
Overview
Clusters are pools of compute resources (CPU, memory, and scratch disk space) for running your workloads.
The following operations require compute resources in Materialize, and so need to be associated with a cluster:
- Maintaining sources and sinks.
- Maintaining indexes and materialized views.
- Executing
SELECT
andSUBSCRIBE
statements.
Resource isolation
Clusters provide resource isolation. Each cluster provisions dedicated compute resources and can fail independently from other clusters.
Workloads on different clusters are strictly isolated from one another. That is, a given workload has access only to the CPU, memory, and scratch disk of the cluster that it is running on. All workloads on a given cluster compete for access to that cluster’s compute resources.
Best practices
-
Use clusters to isolate different classes of workloads. For example, you could place your development workloads in a cluster named
dev
and your production workloads in a cluster namedprod
. -
Use different clusters to separate sources from sinks. That is, avoid placing sources and sinks in the same cluster.
Fault tolerance
The replication factor of a cluster determines the number of replicas provisioned for the cluster. Each replica of the cluster provisions a new pool of compute resources to perform exactly the same work on exactly the same data.
Provisioning more than one replica for a cluster improves fault tolerance. Clusters with multiple replicas can tolerate failures of the underlying hardware that cause a replica to become unreachable. As long as one replica of the cluster remains available, the cluster can continue to maintain dataflows and serve queries.
-
Increasing the replication factor does not increase the cluster’s work capacity. Replicas are exact copies of one another: each replica must do exactly the same work as all the other replicas of the cluster(i.e., maintain the same dataflows and process the same queries).
To increase the capacity of a cluster, you must increase its size.
-
See also Usage.
Materialize automatically assigns names to replicas (e.g., r1
, r2
). You
can view information about individual replicas in the Materialize console and the system
catalog.
Cluster sizing
When creating a cluster, you must choose its size (e.g.,
25cc
, 50cc
, 100cc
), which determines its resource allocation (CPU, memory,
and scratch disk space).
For self-managed Materialize, the cluster sizes are configured with the following default resource allocations:
Size | Scale | CPU Limit | Disk Limit | Memory Limit |
---|---|---|---|---|
25cc |
1 |
0.5 |
7762MiB |
3881MiB |
50cc |
1 |
1 |
15525MiB |
7762MiB |
100cc |
1 |
2 |
31050MiB |
15525MiB |
200cc |
1 |
4 |
62100MiB |
31050MiB |
300cc |
1 |
6 |
93150MiB |
46575MiB |
400cc |
1 |
8 |
124201MiB |
62100MiB |
600cc |
1 |
12 |
186301MiB |
93150MiB |
800cc |
1 |
16 |
248402MiB |
124201MiB |
1200cc |
1 |
24 |
372603MiB |
186301MiB |
1600cc |
1 |
31 |
481280MiB |
240640MiB |
3200cc |
1 |
62 |
962560MiB |
481280MiB |
6400cc |
2 |
62 |
962560MiB |
481280MiB |
mz_cluster_replica_sizes
system catalog table for the specific resource allocations.
The appropriate size for a cluster depends on the resource requirements of your workload. Larger clusters have more compute resources available and can therefore process data faster and handle larger data volumes.
As your workload changes, you can resize a cluster. Depending on the type of objects in the cluster, this operation might incur downtime. See Resizing downtime for more details.