CREATE CLUSTER
CREATE CLUSTER
creates a new cluster.
Conceptual framework
A cluster is a pool of compute resources (CPU, memory, and, optionally, scratch disk space) for running your workloads.
The following operations require compute resources in Materialize, and so need to be associated with a cluster:
- Executing
SELECT
andSUBSCRIBE
statements. - Maintaining indexes and materialized views.
- Maintaining sources and sinks.
Syntax
Options
Field | Value | Description |
---|---|---|
SIZE |
text |
The size of the resource allocations for the cluster. See Size for details. |
REPLICATION FACTOR |
text |
The number of replicas to provision for the cluster. See Replication factor for details. Default: 1 |
INTROSPECTION INTERVAL |
interval |
The interval at which to collect introspection data. See Troubleshooting for details about introspection data. The special value 0 entirely disables the gathering of introspection data.Default: 1s |
INTROSPECTION DEBUGGING |
bool |
Indicates whether to introspect the gathering of the introspection data. Default: FALSE |
MANAGED |
bool |
Whether to automatically manage the cluster’s replicas based on the configured size and replication factor. If FALSE , enables the use of the deprecated CREATE CLUSTER REPLICA command.Default: TRUE |
Details
Initial state
Each Materialize region initially contains a pre-installed cluster
named quickstart
with a size of 25cc
and a replication factor of 1
. You
can drop or alter this cluster to suit your needs.
Choosing a cluster
When performing an operation that requires a cluster, you must specify which cluster you want to use. Not explicitly naming a cluster uses your session’s active cluster.
To show your session’s active cluster, use the SHOW
command:
SHOW cluster;
To switch your session’s active cluster, use the SET
command:
SET cluster = other_cluster;
Resource isolation
Clusters provide resource isolation. Each cluster provisions a dedicated pool of CPU, memory, and, optionally, scratch disk space.
All workloads on a given cluster will compete for access to these compute resources. However, workloads on different clusters are strictly isolated from one another. A given workload has access only to the CPU, memory, and scratch disk of the cluster that it is running on.
Clusters are commonly used to isolate different classes of workloads. For
example, you could place your development workloads in a cluster named
dev
and your production workloads in a cluster named prod
.
Size
The SIZE
option determines the amount of compute resources (CPU, memory, and
disk) available to the cluster. Valid sizes are:
25cc
50cc
100cc
200cc
300cc
400cc
600cc
800cc
1200cc
1600cc
3200cc
6400cc
128C
256C
512C
The resource allocations are proportional to the number in the size name. For
example, a cluster of size 600cc
has 2x as much CPU, memory, and disk as a
cluster of size 300cc
, and 1.5x as much CPU, memory, and disk as a cluster of
size 400cc
. To determine the specific resource allocations for a size,
query the mz_internal.mz_cluster_replica_sizes
table.
Clusters of larger sizes can process data faster and handle larger data volumes.
You can use ALTER CLUSTER
to resize the cluster in order to respond to
changes in the resource requirements of your workload.
mz_internal.mz_cluster_replica_sizes
table may change at any
time. You should not rely on them for any kind of capacity planning.
Legacy sizes
Materialize also offers some legacy sizes. Clusters using legacy sizes run on older hardware without local disks attached.
In most cases, you should not use legacy sizes. Standard sizes offer better performance per credit for nearly all workloads. We recommend using standard sizes for all new clusters, and recommend migrating existing legacy-sized clusters to standard sizes. In many cases, migrating from legacy to standard sizes will result in a 25-50% cost reduction.
However, certain rare workloads exhibit better performance per credit on legacy sizes. Materialize is committed to supporting these workloads on legacy sizes until they have equivalent or better performance per credit on standard sizes.
When legacy sizes are enabled for a region, the following sizes are available:
3xsmall
2xsmall
xsmall
small
medium
large
xlarge
2xlarge
3xlarge
4xlarge
5xlarge
6xlarge
The correspondence between non-legacy sizes and legacy sizes is shown in the credit usage table.
Replication factor
The REPLICATION FACTOR
option determines the number of replicas provisioned
for the cluster. Each replica of the cluster provisions a new pool of compute
resources to perform exactly the same computations on exactly the same data.
Provisioning more than one replica improves fault tolerance. Clusters with multiple replicas can tolerate failures of the underlying hardware that cause a replica to become unreachable. As long as one replica of the cluster remains available, the cluster can continue to maintain dataflows and serve queries.
Materialize makes the following guarantees when provisioning replicas:
- Replicas of a given cluster are never provisioned on the same underlying hardware.
- Replicas of a given cluster are spread as evenly as possible across the underlying cloud provider’s availability zones.
Materialize automatically assigns names to replicas like r1
, r2
, etc. You
can view information about individual replicas in the console and the system
catalog, but you cannot directly modify individual replicas.
You can pause a cluster’s work by specifying a replication factor of 0
. Doing
so removes all replicas of the cluster. Any indexes, materialized views,
sources, and sinks on the cluster will cease to make progress, and any queries
directed to the cluster will block. You can later resume the cluster’s work by
using ALTER CLUSTER
to set a nonzero replication factor.
A common misconception is that increasing a cluster’s replication factor will increase its capacity for work. This is not the case. Increasing the replication factor increases the fault tolerance of the cluster, not its capacity for work. Replicas are exact copies of one another: each replica must do exactly the same work (i.e., maintain the same dataflows and process the same queries) as all the other replicas of the cluster.
To increase a cluster’s capacity, you should instead increase the cluster’s size.
Credit usage
Each replica of the cluster consumes credits at a rate determined by the cluster’s size:
Size | Legacy size | Credits per replica per hour |
---|---|---|
25cc |
3xsmall |
0.25 |
50cc |
2xsmall |
0.5 |
100cc |
xsmall |
1 |
200cc |
small |
2 |
300cc |
3 | |
400cc |
medium |
4 |
600cc |
6 | |
800cc |
large |
8 |
1200cc |
12 | |
1600cc |
xlarge |
16 |
3200cc |
2xlarge |
32 |
6400cc |
3xlarge |
64 |
128C |
4xlarge |
128 |
256C |
5xlarge |
256 |
512C |
6xlarge |
512 |
Credit usage is measured at a one second granularity. For a given replica,
credit usage begins when a CREATE CLUSTER
or ALTER CLUSTER
statement
provisions the replica and ends when an ALTER CLUSTER
or DROP CLUSTER
statement deprovisions the replica.
A cluster with a replication factor of zero uses no credits.
As an example, consider the following sequence of events:
Time | Event |
---|---|
2023-08-29 3:45:00 | CREATE CLUSTER c (SIZE '400cc', REPLICATION FACTOR 2 ) |
2023-08-29 3:45:45 | ALTER CLUSTER c SET (REPLICATION FACTOR 1) |
2023-08-29 3:47:15 | DROP CLUSTER c |
Cluster c
will have consumed 0.4 credits in total:
- Replica
c.r1
was provisioned from 3:45:00 to 3:47:15, consuming 0.3 credits. - Replica
c.r2
was provisioned from 3:45:00 to 3:45:45, consuming 0.1 credits.
Known limitations
Clusters have several known limitations:
-
Clusters containing sources and sinks can only have a replication factor of
0
or1
. -
When a cluster of size
3200cc
or larger uses multiple replicas, those replicas are not guaranteed to be spread evenly across the underlying cloud provider’s availability zones.
We plan to remove these restrictions in future versions of Materialize.
Examples
Basic
Create a cluster with two 400cc
replicas:
CREATE CLUSTER c1 (SIZE = '400cc', REPLICATION FACTOR = 2);
Introspection disabled
Create a cluster with a single replica and introspection disabled:
CREATE CLUSTER c (SIZE = '100cc', INTROSPECTION INTERVAL = 0);
Disabling introspection can yield a small performance improvement, but you lose the ability to run troubleshooting queries against that cluster replica.
Empty
Create a cluster with no replicas:
CREATE CLUSTER c1 (SIZE '100cc', REPLICATION FACTOR = 0);
You can later add replicas to this cluster with ALTER CLUSTER
.
Privileges
The privileges required to execute this statement are:
CREATECLUSTER
privileges on the system.