Introducing Workload Capture & Replay

Dennis Felsing

February 10, 2026

When customers hit issues in production, it can be an effort to locally reproduce them, especially when external sources are involved. Reproducing issues is useful not just to figure out the root cause, but also to verify the fix and add a regression test. The newly introduced workload capture & replay tooling records a Materialize instance's state as well as recent queries and ingestion rates, then replays them in a Docker Compose environment with synthetic data. In this blog post I’ll show how it works and talk about some of the challenges and future work.

Capturing

In this example we are running the Materialize Emulator locally (see related blog post):

$ docker pull materialize/materialized:latest 
$ docker run -it --name materialized -p 127.0.0.1:6874-6878:6874-6878 -e \ 
 'MZ_SYSTEM_PARAMETER_DEFAULT=enable_statement_lifecycle_logging=true;statement_logging_default_sample_rate=1;statement_logging_max_sample_rate=1' \ 
 materialize/materialized:latest 
$ psql postgres://materialize@127.0.0.1:6875/materialize 
materialize=> CREATE TABLE some_table (full_name text, age_in_years int); 
CREATE TABLE 
materialize=> INSERT INTO some_table VALUES ('Matty', 100); 
INSERT 0 1 
materialize=> CREATE MATERIALIZED VIEW some_mv AS SELECT full_name FROM some_table; 
CREATE MATERIALIZED VIEW 
materialize=> CREATE VIEW some_view AS SELECT sum(age_in_years) AS total_years FROM some_table; 
CREATE VIEW 
materialize=> CREATE DEFAULT INDEX ON some_view; 
CREATE INDEX 
materialize=> SELECT * FROM some_view; 
total_years 
------------- 
        100 
(1 row) 
materialize=> SELECT *, 'some literal' FROM some_mv; 
full_name |   ?column? 
-----------+-------------- 
Matty     | some literal 
(1 row) 
 batchfile

Capturing a workload is simple. Check out the Materialize repository and run it against the system user’s 6877 port:

$ bin/mz-workload-capture \ 
 postgres://mz_system:materialize@127.0.0.1:6877/materialize 
Fetching clusters                                     [   0.00s] 
Fetching databases                                    [   0.00s] 
Fetching schemas                                      [   0.00s] 
Fetching data types                                   [   0.00s] 
Fetching connections                                  [   0.00s] 
Fetching sources                                      [   0.01s] 
Fetching subsources                                   [   0.01s] 
Fetching tables                                       [   0.04s] 
Fetching views                                        [   0.01s] 
Fetching materialized views                           [   0.01s] 
Fetching sinks                                        [   0.00s] 
Fetching indexes                                      [   0.01s] 
Fetching queries                                      [   0.60s] 
Fetching source/subsource/table statistics            [   0.00s] 
Writing workload to workload_2026-02-06T12-08-57.yml  [   0.00s] 
 batchfile

Since our Materialize instance has so few objects, the state is quickly captured. By default the last 360 seconds of queries are captured, but you can also specify --time 3600 for an hour for example. The output is a YAML workload file:

clusters: 
 quickstart: 
   create_sql: CREATE CLUSTER "quickstart" (INTROSPECTION DEBUGGING = false, INTROSPECTION 
     INTERVAL = INTERVAL '00:00:01', MANAGED = true, REPLICATION FACTOR = 1, SIZE 
     = '800cc', SCHEDULE = MANUAL) 
   managed: true 
databases: 
 materialize: 
   public: 
     connections: {} 
     indexes: 
       some_view_primary_idx: 
         create_sql: 'CREATE INDEX some_view_primary_idx 

 
           IN CLUSTER quickstart 

 
           ON materialize.public.some_view (total_years);' 
     materialized_views: 
       some_mv: 
         columns: 
         - default: null 
           name: full_name 
           nullable: true 
           type: text 
         create_sql: "CREATE MATERIALIZED VIEW materialize.public.some_mv\n    IN 
           CLUSTER quickstart\n    WITH (REFRESH = ON COMMIT)\n    AS SELECT full_name 
           FROM materialize.public.some_table;" 
     sinks: {} 
     sources: {} 
     tables: 
       some_table: 
         columns: 
         - default: 'NULL' 
           name: full_name 
           nullable: true 
           type: text 
         - default: 'NULL' 
           name: age_in_years 
           nullable: true 
           type: integer 
         create_sql: CREATE TABLE materialize.public.some_table (full_name pg_catalog.text, 
           age_in_years pg_catalog.int4); 
         id: u1 
         rows: 1 
     types: {} 
     views: 
       some_view: 
         columns: 
         - default: null 
           name: total_years 
           nullable: true 
           type: bigint 
         create_sql: "CREATE VIEW\n    materialize.public.some_view\n    AS SELECT 
           pg_catalog.sum(age_in_years) AS total_years FROM materialize.public.some_table;" 
mz_workload_version: 1.0.0 
queries: 
- began_at: 2026-02-06 12:08:50.038000+00:00 
 cluster: quickstart 
 database: materialize 
 duration: 0.013 
 finished_status: success 
 params: [] 
 result_size: 37 
 search_path: 
 - public 
 sql: SELECT *, 'some literal' FROM some_mv 
 statement_type: select 
 transaction_isolation: strict serializable 
 yaml

For the sake of brevity I have removed all but the last query. This was of course a pretty simple setup, but it shows us the most basic functionality of mz-workload-capture. The definitions and metadata of objects are extracted, as well as the queries run during the specified time. For tables we have statistics about how many rows there, but not their actual contents.

The capture tool leverages the introspection views that the Materialize Console is using to show source/sink statistics, as well as the Query History.

What’s been missing in this example are the things actually making Materialize interesting: Ingesting data from large PostgreSQL, MySQL, SQL Server & Kafka sources as well as through Webhooks, and Kafka sinks. But fear not, all of the above are supported by mz-workload-capture as well. This is how a PostgreSQL source looks for example:

sources: 
 pg_cdc: 
   bytes_second: 691.7790633608815 
   bytes_total: 685433819 
   children: 
     qa_canary_environment.public_pg_cdc.pg_people: 
       bytes_second: 498.3641873278237 
       bytes_total: 493694651 
       columns: 
       - default: null 
         name: id 
         nullable: false 
         type: integer 
       [...] 
       create_sql: "CREATE TABLE\n    qa_canary_environment.public_pg_cdc.pg_people\n 
         \       (\n            id pg_catalog.int4 NOT NULL,\n            name 
         pg_catalog.text,\n            incarnation pg_catalog.int4,\n            CONSTRAINT 
         people_pkey PRIMARY KEY (id)\n        )\nFROM SOURCE qa_canary_environment.public_pg_cdc.pg_cdc 
         (REFERENCE = postgres.public.people)\nWITH (\n    DETAILS = '0a7c0a7a0a0670656f706c6512067075626c696318b7d00a22130a026964101718ffffffffffffffffff01300122170a046e616d65101918ffffffffffffffffff0120013002221e0a0b696e6361726e6174696f6e101718ffffffffffffffffff01200130032a1608bfd00a120b70656f706c655f706b65791a01012001'\n);" 
       database: qa_canary_environment 
       id: u87088 
       messages_second: 6.714325068870523 
       messages_total: 6621734 
       name: pg_people 
       schema: public_pg_cdc 
       type: table 
     [...] 
   columns: 
   - default: null 
     name: lsn 
     nullable: true 
     type: uint8 
   create_sql: 'CREATE SOURCE qa_canary_environment.public_pg_cdc.pg_cdc 

 
     IN CLUSTER qa_canary_environment_storage 

 
     FROM POSTGRES CONNECTION qa_canary_environment.public.pg (PUBLICATION 
     = ''mz_source'');' 
   id: u87073 
   messages_second: 10.743801652892563 
   messages_total: 10616300 
   type: postgres 
 yaml

As can be seen for sources we have statistics about the total number of messages as well as how many are ingested during a time period.

Replaying

Now we’re getting to the most interesting part: Actually replaying a workload capture file for 1 hour, with 1% of the initial data synthetically generated, and the full amount of queries and ingestions happening during the continuous phase:

$ bin/mzcompose --find workload-replay run default \ 
 --runtime=3600 --verbose workload_ddnet.yml \ 
 --factor-initial-data=.01 --factor-queries=1 --factor-ingestions=1 
 batchfile

Under the hood this sets up a local Docker Compose setup containing all the required services, which always includes the Materialize emulator (materialized), and depending on the sources/sinks in the workload file Kafka, PostgreSQL, MySQL and SQL Server. This means we are currently limited to workloads that can fit on a single machine. Then we create all the specified objects: Clusters, databases, schemas, types, connections, sources, tables, views, materialized views, sinks and indexes.

All connections to external sources are automatically rewritten to target the instances we are running inside of Docker Compose instead of the original systems. The replayer runs in total isolation from the outside world, and sets up everything it needs itself.

As there can be dependencies between objects, the order of creation is important. For views and materialized views there can be dependencies between objects. One solution would be to create a graph and create them in a supported ordering. Instead we chose to retry failed object creations after having created all the other objects, since a failed CREATE DDL is cheap.

After everything is initialized workload-replay generates synthetic data in external sources as specified in each source/subsource/table in Materialize, as well as fills up Materialize-native tables and webhooks. The amount of data can be varied by using --factor-initial-data, defaulting to 1.0, meaning we generate as many rows/messages as are recorded in the original Materialize. Before we can continue we have to wait for Materialize to hydrate all its objects.

Care was taken to make the data generation fast, using COPY FROM STDIN for Postgres/Materialize instead of INSERT, as well as asynchronous data production for Kafka and Webhooks. In our CI we are seeing about 20k rows/s for PostgreSQL sources, 10k rows/s for Kafka, and 3k rows/s for Webhooks. The exact speed depends on the source definition and what views, indexes and materialized views depend on the ingested data, since we start hydrating them during the initial ingestion by default.

The synthetic data itself is generated with a long-tail distribution, which is something that’s often seen in real data.

Finally we have the continuous phase, which in parallel replays data ingestions scaled by --factor-ingestions and queries scaled by --factor-queries. Failing queries and too slow ingestions and queries are logged in the end:

Queries: 
  Total: 403 
 Failed: 0 (0%) 
   Slow: 5 (1%) 
Ingestions: 
  Total: 46 
 Failed: 0 (0%) 
   Slow: 0 (0%) 
 yaml

Regression Tests & Benchmarks

In CI we have a collection of captured workloads and run it against the previous Materialize version compared to the current state. When a query has new errors we can report them as a regression in the new Materialize version and fail the test:

Similarly we can compare the performance between Materialize versions, both for CPU and memory as well as the initial data phase and continuous phase:

Worse performance is detected automatically and would cause the test in CI to fail:

$ bin/mzcompose --find workload-replay run benchmark 
METRIC                   |     OLD      |     NEW      |  CHANGE   | THRESHOLD | REGRESSION? 
-------------------------------------------------------------------------------------------- 
Object creation (s)      |       15.949 |       15.762 |     -1.2% |       20% |      no 
Data ingestion time (s)  |      901.443 |      911.124 |     +1.1% |       20% |      no 
Data ingestion CPU (sum) |   949763.388 |   946471.517 |     -0.3% |       20% |      no 
Data ingestion Mem (sum) |    21960.639 |    22134.356 |     +0.8% |       20% |      no 
CPU avg (%)              |      596.030 |      529.256 |    -11.2% |       20% |      no 
Mem avg (%)              |       47.639 |       43.626 |     -8.4% |       20% |      no 
Query max (ms)           |  1764795.699 |    11374.836 |    -99.4% |           | 
Query min (ms)           |        0.430 |        0.349 |    -18.9% |           | 
Query avg (ms)           |   258871.898 |      977.998 |    -99.6% |           | 
Query p50 (ms)           |   164878.947 |      504.100 |    -99.7% |           | 
Query p95 (ms)           |  1076744.176 |     3240.775 |    -99.7% |           | 
Query p99 (ms)           |  1555137.942 |     5651.330 |    -99.6% |           | 
Query std (ms)           |   332962.336 |     1167.276 |    -99.6% |           | 
 batchfile

In this example we had a nice optimization causing query times to improve significantly for this workload.

Care is taken to run benchmarks against both Materialize versions with the same seed, and make sure a separate RNG is used for each thread. This ensures that the same random data is generated for data ingestions, and the same queries are executed.

Statistics

Workloads of production systems can be huge, so just looking at them manually can be daunting. We can print some statistics instead:

$ bin/mzcompose --find workload-replay run stats 
workload_prod_sandbox.yml 
 size                   1.9 MiB 
 clusters                     7 
 databases                    7 
 schemas                     21 
 data types                   0 
 tables                      34 
 connections                 13 
 sources                     11 
   kafka                      2 
   load-generator             5 
   mysql                      1 
   postgres                   2 
   webhook                    1 
 subsources                  23 
 views                       12 
 mat. views                  12 
 indexes                     15 
 sinks                        4 
   kafka                      4 
 rows               350,207,350 
   /s                    255.98 
 queries                  4,761 
   span                   60min 
   last              2026-01-25 
 batchfile

Diffing

With a YAML diffing tool like dyff you can get reasonable results for workload files. This allows you to tell the difference between two states of a Materialize instance, making it easier to figure out what changed and caused the different behaviors you might be seeing:

$ dyff between -b workload_2026-01-27T14-11-02.yml workload_2026-01-28T09-08-12.yml 

 
databases.materialize.public.sources.record_race.bytes_second 
 ± value change 
   - 308.3648871766648 
   + 151.4994481236203 

 
databases.materialize.public.sources.record_race.bytes_total 
 ± value change 
   - 31323976680 
   + 31340458793 

 
databases.materialize.public.sources.record_race.messages_second 
 ± value change 
   - 0.6194276279581729 
   + 0.304083885209713 

 
databases.materialize.public.sources.record_race.messages_total 
 ± value change 
   - 63161896 
   + 63199673 

 
databases.materialize.public.sources.record_teamrace 
 + two map entries added: 
   bytes_second: 5.649834437086093 
   messages_second: 0.018211920529801324 
 batchfile

Anonymizing

When you’re asking someone to hand you a workload yaml file, they can of course inspect whether it contains any information they don’t want to share, be it an identifier, literal in queries, or a default in a table.

We also have an initial simple anonymizer implemented, which currently works on a best-effort basis, as it doesn’t properly parse the SQL queries and reconstruct them (yet):

1 $ bin/mz-workload-anonymize workload_2026-02-03T13-11-03.yml 
 batchfile

After the modification the workload looks like this:

clusters: 
 cluster_0: 
   create_sql: CREATE CLUSTER "cluster_0" (INTROSPECTION DEBUGGING = false, INTROSPECTION 
     INTERVAL = INTERVAL '00:00:01', MANAGED = true, REPLICATION FACTOR = 1, SIZE 
     = '800cc', SCHEDULE = MANUAL) 
   managed: true 
databases: 
 materialize: 
   public: 
     connections: {} 
     indexes: 
       index_1: 
         create_sql: 'CREATE INDEX index_1 

 
           IN CLUSTER cluster_0 

 
           ON materialize.public.view_1 (column_3);' 
     materialized_views: 
       mv_1: 
         columns: 
         - default: null 
           name: column_4 
           nullable: true 
           type: text 
         create_sql: "CREATE MATERIALIZED VIEW materialize.public.mv_1\n    IN CLUSTER 
           cluster_0\n    WITH (REFRESH = ON COMMIT)\n    AS SELECT column_4 FROM 
           materialize.public.table_1;" 
     sinks: {} 
     sources: {} 
     tables: 
       table_1: 
         columns: 
         - default: 'NULL' 
           name: column_1 
           nullable: true 
           type: text 
         - default: 'NULL' 
           name: column_2 
           nullable: true 
           type: integer 
         create_sql: CREATE TABLE materialize.public.table_1 (column_4 pg_catalog.text, 
           column_2 pg_catalog.int4); 
         id: u1 
         rows: 1 
     types: {} 
     views: 
       view_1: 
         columns: 
         - default: null 
           name: column_3 
           nullable: true 
           type: bigint 
         create_sql: "CREATE VIEW\n    materialize.public.view_1\n    AS SELECT pg_catalog.sum(column_2) 
           AS column_3 FROM materialize.public.table_1;" 
mz_workload_version: 1.0.0 
queries: 
- began_at: 2026-02-06 12:08:50.038000+00:00 
 cluster: cluster_0 
 database: materialize 
 duration: 0.013 
 finished_status: success 
 params: [] 
 result_size: 37 
 search_path: 
 - public 
 sql: SELECT *, 'literal_3' FROM mv_1 
 statement_type: select 
 transaction_isolation: strict serializable 
 yaml

As you can see the user-specified identifiers as well as literals have been replaced with non-descriptive ones like table_1, mv_1, literal_1 etc.

Future Work

We have an initial set of workloads that serve as a foundation for internal testing. Expanding the captured workloads would further increase our confidence in Materialize and provide additional assurance to customers by reducing the risk of regressions in their specific use cases.

Today we capture some basic statistics about real data, primarily row counts and total bytes, and we also support collecting average column sizes when needed. Extending the statistics collection would allow us to generate synthetic data whose distributions more closely reflect real-world workloads.

Incorporating real samples, or even full data, would open the door to validating correctness in addition to performance, while also making replayed computations more representative. Achieving this would involve closer integration of the capture tooling into Materialize itself, while the current approach relies only querying Materialize’s introspection views.

We currently don’t support replaying creating a Kafka sink to write out data into a topic, and then reading the same topic back into Materialize through a Kafka source. Instead two separate topics will be used by the workload replay tool.

Replayable workload size is currently bounded by what can be executed on a single machine. Supporting distributed replay against both Materialize Self-managed and Materialize Cloud would significantly broaden the scope of testable workloads, with the main challenge being automated setup of the required external sources.

Finally, evolving the anonymization tool to use a full SQL parser and serializer would make identifier replacement more robust and reliable, since we are currently reliant on some stored CREATE statements instead of generating them dynamically.

Conclusion

Creating test cases manually can be challenging, especially when trying to reproduce problems occurring in large Materialize instances with many external systems involved. The newly introduced Workload Capture & Replay tooling simplifies this significantly and allows us to find regressions earlier in the process. Get in touch with us if you are a customer and interested in supplying a captured workload for testing! The source code of the Workload Capture & Replay tooling is available in our Materialize GitHub repository.