Materialize now supports a new PARTITION BY
option for materialized views and tables
that allows you to control how Materializes interally groups and stores your data. This
can lead to dramatically faster query performance and rehydration times for certain workloads.
For example, for a timeseries-style event table, you could declare that the data should be partitioned by time:
CREATE TABLE events (event_ts timestamptz, body jsonb)
WITH (
PARTITION BY (event_ts)
);
Materialize already uses filters on queries to winnow down the amount of data it needs to fetch from storage, and by additionally partitioning our events table by time, a query for a narrow range of timestamps can be orders of magnitude faster than if it were partitioned randomly.
We’ve applied this new partitioning strategy to Materialize’s own internal catalog tables,
like mz_statement_lifecycle_history
,
which are too large to fully index, and found a 6x improvement for our most expensive
queries!
For more details, see the documentation on partitioning and filter pushdown.