How to Use the Materialize Emulator

Dennis Felsing
October 10, 2024

In our last blog about our Quality Assurance (QA) team, we gave an overview of the QA process, including our software and testing methods. One of our key tools during testing is the Materialize Emulator, a Docker image that allows you to maintain a locally hosted version of Materialize.

But there's an important caveat: the Materialize Emulator cannot support production workloads. The Materialize Emulator lacks critical features of our cloud platform, such as fault tolerance and horizontal scalability. However, it does include a local web UI (Materialize Console) available at http://localhost:6874. The Emulator is great for testing and prototyping.

In the following blog, we'll outline a step-by-step walkthrough of how to use the Materialize Emulator.

Materialize Emulator: What Is It?

The Materialize Emulator is an all-in-one Docker image available on Docker Hub for testing and evaluation purposes. The Emulator is not representative of Materialize’s performance and full feature set.

To view a comparison between the Materialize Emulator and the Materialize cloud platform, see the table below:

Materialize Emulator
Materialize SaaS
Production deployments
❌ Not suitable due to performance and license limitations.
✔️
Performance
❌ Limited. Services are bundled in a single container.
✔️ High. Services are scaled across many machines.
Dedicated Support
✔️
Sample data
✔️ Quickstart data source
✔️ Quickstart data source
Data sources
✔️ Connect using a streamlined GUI
✔️ Connect using a streamlined GUI
Version upgrades
✔️ Manual, with no data persistence
✔️ Automated, with data persistence
Use case isolation
✔️
Fault tolerance
✔️
Horizontal scalability
✔️
GUI
✔️ Materialize Console (http://localhost:6874)
✔️ Materialize Console

We've always used the Materialize Emulator for testing, except for the kind that requires cloud integration with Kubernetes.

If you want to use Materialize in production scenarios, sign up for a free trial account or schedule a demo.

Step-by-Step Walkthrough: How to Use the Materialize Emulator

Let's walk through a basic example of how to use the Materialize Emulator with a PostgreSQL source. The only requirements are Docker and the postgres-client (psql).

1
docker network create mznet
2
docker pull materialize/materialized:latest
3
docker run --name materialized --network mznet -d \
4
    -p 127.0.0.1:6874:6874 \
5
    -p 127.0.0.1:6875:6875 \
6
    -p 127.0.0.1:6876:6876 \
7
    -p 127.0.0.1:6877:6877 \
8
    -e MZ_EAT_MY_DATA=1    \
9
    materialize/materialized:latest
10

bash

We'll publish the ports to localhost, since Materialize is running without authentication. Without a NAT or firewall, anyone on the internet can connect to your Materialize instance. You can specify ports if you want to allow access, such as: -p 6874:6874, -p 6875:6875, or -p 6876:6876.

Note that we set MZ_EAT_MY_DATA=1, which disables commands like fsync, so that we might lose data in case of a system crash. Since we are only using the Materialize emulator for local testing, we don't mind the risk of data loss. If you are interested in running production workloads on Materialize, check out Materialize Cloud, Self-managed Materialize or at least don't set MZ_EAT_MY_DATA=1 in your Materialize emulator.

Now Materialize is running locally.

  • Open the Materialize Console (web UI) at http://localhost:6874 to use the built-in SQL Shell.
  • Alternatively, connect with psql:
1
$ psql postgres://materialize@127.0.0.1:6875/materialize
2
NOTICE:  connected to Materialize v0.156.0
3
  Org ID: 4b733a37-b64d-44a2-8e79-e0ebd8a177ba
4
  Region: docker/container
5
  User: materialize
6
  Cluster: quickstart
7
  Database: materialize
8
  Schema: public
9
  Session UUID: 2631437c-61d6-4984-a68b-433f5751cecf
10

11
Issue a SQL query to get started. Need help?
12
  View documentation: https://materialize.com/s/docs
13
  Join our Slack community: https://materialize.com/s/chat
14

15
psql (16.10 (Ubuntu 16.10-0ubuntu0.24.04.1), server 9.5.0)
16
Type "help" for help.
17

18
materialize=>
19

text

Let's start up a Postgres server:

1
docker run --name postgres --network mznet \
2
    -e POSTGRES_PASSWORD=postgres \
3
    -e POSTGRES_INITDB_ARGS="-c wal_level=logical" \
4
    -p 127.0.0.1:5432:5432 -d postgres
5

bash

Connect to the Postgres server. Then generate a simple table. We will replicate this table to Materialize.

1
$ psql postgres://postgres:postgres@127.0.0.1:5432/postgres
2
psql (16.10 (Ubuntu 16.10-0ubuntu0.24.04.1), server 17.6 (Debian 17.6-1.pgdg13+1))
3
WARNING: psql major version 16, server major version 17.
4
         Some psql features might not work.
5
Type "help" for help.
6

7
postgres=# CREATE PUBLICATION mz_source FOR ALL TABLES;
8
CREATE PUBLICATION
9
postgres=# CREATE TABLE t (f1 INTEGER);
10
CREATE TABLE
11
postgres=# ALTER TABLE t REPLICA IDENTITY FULL;
12
ALTER TABLE
13
postgres=# INSERT INTO t VALUES (1), (2), (3);
14
INSERT 0 3
15

text

Now use Materialize to connect to the Postgres instance:

1
materialize=> CREATE SECRET pgpass AS 'postgres';
2
CREATE SECRET
3
materialize=> CREATE CONNECTION pg TO POSTGRES (
4
    HOST postgres, DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
5
);
6
CREATE CONNECTION
7
materialize=> CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
8
    PUBLICATION 'mz_source'
9
) FOR SCHEMAS (public);
10
CREATE SOURCE
11
materialize=> SELECT * FROM t;
12
 f1
13
----
14
  1
15
  2
16
  3
17
(3 rows)
18
materialize=> CREATE MATERIALIZED VIEW mv AS SELECT sum(f1) FROM t;
19
CREATE MATERIALIZED VIEW
20
materialize=> SELECT * FROM mv;
21
 sum
22
-----
23
   6
24
(1 row)
25

text

That's how you replicate the Postgres table in Materialize. Now let's perform a query. Let's execute a one-off query on both Materialize and Postgres. We'll design a heavy workload.

1
\postgres=# \timing
2
Timing is on.
3
postgres=# INSERT INTO t (f1) SELECT * FROM generate_series(4, 10000);
4
INSERT 0 9997
5
Time: 10.137 ms
6
postgres=# SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
7
      sum
8
---------------
9
 1000100000000
10
(1 row)
11
Time: 2323.538 ms (00:02.324)
12

text

Materialize performs the query in 37 seconds, while Postgres performs the query in 2 seconds. This is because Materialize is not designed for one-off queries.

Materialize is optimized for materialized views that update incrementally. Read more about how materialized views work in Materialize. Let's create a materialized view as follows:

1
materialize=> \timing
2
Timing is on.
3
materialize=> SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
4
      sum
5
---------------
6
 1000100000000
7
(1 row)
8
Time: 37277.756 ms (00:37.278)
9
materialize=> DROP MATERIALIZED VIEW mv;
10
DROP MATERIALIZED VIEW
11
Time: 7.374 ms
12
materialize=> CREATE MATERIALIZED VIEW mv AS
13
    SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
14
CREATE MATERIALIZED VIEW
15
Time: 11.144 ms
16
materialize=> SELECT * FROM mv;
17
      sum
18
---------------
19
 1000100000000
20
(1 row)
21
Time: 3.260 ms
22

text

With Materialize, every change to the source table (t1) in Postgres will only require a small amount of incremental work to update the mv materialized view. This is done during INSERT, not during SELECT. And you can use declarative SQL to define the whole view.

1
postgres=# INSERT INTO t (f1) VALUES (10001);
2
INSERT 0 1
3
Time: 5.627 ms
4

5
materialize=> SELECT * FROM mv;
6
      sum
7
---------------
8
 1000400050002
9
(1 row)
10
Time: 3.251 ms
11

text

You can also subscribe to the the materialized view and receive instant updates about all of the changes:

1
materialize=> COPY (SUBSCRIBE (SELECT * FROM mv)) TO STDOUT;
2
1727715520600	1	1000400050002
3
1727715526000	1	1000700160012
4
1727715526000	-1	1000400050002
5
1727715528000	-1	1000700160012
6
1727715528000	1	1001000330036
7

text

This is the output (timestamp, added (1)/removed (-1), value) when these commands run in Postgres:

1
postgres=# INSERT INTO t (f1) VALUES (10002);
2
INSERT 0 1
3
postgres=# INSERT INTO t (f1) VALUES (10003);
4
INSERT 0 1
5

text

To clean up, we can stop the Docker containers again:

1
docker stop materialized postgres
2
docker rm materialized postgres
3
docker network rm mznet
4

bash

And that's it! This is how you launch the Docker image, and define a materialized view, using the Materialize Emulator.

Shell Script: Materialize Emulator as a Docker Compose Project

To tie things together, here is a small shell script (run.sh) that runs the Materialize Emulator as a Docker Compose project.

The shell script contains many of Materialize's features, including a materialized view mv that combines the data of all these sources:

Also, the script uses the following to get the mv out of Materialize:

You can copy the full shell script below:

1
#!/bin/bash
2
set -euo pipefail
3

4
PREF="${PWD##*/}"
5

6
wait_for_health() {
7
  echo -n "waiting for container '$PREF-$1' to be healthy"
8
  while [ "$(docker inspect -f '{{.State.Health.Status}}' "$PREF-$1")" != "healthy" ]; do
9
    echo -n "."
10
    sleep 1
11
  done
12
  printf "\ncontainer '%s' is healthy\n" "$PREF-$1"
13
}
14

15
cat > docker-compose.yml <<EOF
16
version: '3.8'
17
services:
18
  materialized:
19
    image: materialize/materialized:latest
20
    container_name: $PREF-materialized
21
    environment:
22
      MZ_SYSTEM_PARAMETER_DEFAULT: "enable_copy_to_expr=true"
23
    networks:
24
      - network
25
    ports:
26
      - "127.0.0.1:6874:6874" # Console (web UI)
27
      - "127.0.0.1:6875:6875" # Postgres wire
28
      - "127.0.0.1:6876:6876" # HTTP API
29
      - "127.0.0.1:6877:6877" # Postgres wire as mz_system user
30
    environment:
31
      - MZ_EAT_MY_DATA=1      # We don't care about date safety, local testing only!
32
    healthcheck:
33
      test: ["CMD", "curl", "-f", "localhost:6878/api/readyz"]
34
      interval: 1s
35
      start_period: 60s
36

37
  postgres:
38
    image: postgres:latest
39
    container_name: $PREF-postgres
40
    environment:
41
      POSTGRES_PASSWORD: postgres
42
      POSTGRES_INITDB_ARGS: "-c wal_level=logical"
43
    networks:
44
      - network
45
    ports:
46
      - "127.0.0.1:5432:5432"
47
    healthcheck:
48
      test: ["CMD", "pg_isready", "-d", "db_prod"]
49
      interval: 1s
50
      start_period: 60s
51

52
  mysql:
53
    image: mysql:latest
54
    container_name: $PREF-mysql
55
    environment:
56
      MYSQL_ROOT_PASSWORD: mysql
57
    networks:
58
      - network
59
    ports:
60
      - "127.0.0.1:3306:3306"
61
    command:
62
        - "--log-bin=mysql-bin"
63
        - "--gtid_mode=ON"
64
        - "--enforce_gtid_consistency=ON"
65
        - "--binlog-format=row"
66
        - "--binlog-row-image=full"
67
    healthcheck:
68
      test: ["CMD", "mysqladmin", "ping", "--password=mysql", "--protocol=TCP"]
69
      interval: 1s
70
      start_period: 60s
71

72
  redpanda:
73
    image: vectorized/redpanda:latest
74
    container_name: $PREF-redpanda
75
    networks:
76
      - network
77
    ports:
78
      - "127.0.0.1:9092:9092"
79
      - "127.0.0.1:8081:8081"
80
    command:
81
        - "redpanda"
82
        - "start"
83
        - "--overprovisioned"
84
        - "--smp=1"
85
        - "--memory=1G"
86
        - "--reserve-memory=0M"
87
        - "--node-id=0"
88
        - "--check=false"
89
        - "--set"
90
        - "redpanda.enable_transactions=true"
91
        - "--set"
92
        - "redpanda.enable_idempotence=true"
93
        - "--set"
94
        - "--advertise-kafka-addr=redpanda:9092"
95
    healthcheck:
96
      test: ["CMD", "curl", "-f", "localhost:9644/v1/status/ready"]
97
      interval: 1s
98
      start_period: 60s
99

100
  minio:
101
    image: minio/minio:latest
102
    container_name: $PREF-minio
103
    environment:
104
      MINIO_STORAGE_CLASS_STANDARD: "EC:0"
105
    networks:
106
      - network
107
    ports:
108
      - "127.0.0.1:9000:9000"
109
      - "127.0.0.1:9001:9001"
110
    entrypoint: ["sh", "-c"]
111
    command: ["mkdir -p /data/$PREF && minio server /data --console-address :9001"]
112
    healthcheck:
113
      test: ["CMD", "curl", "-f", "localhost:9000/minio/health/live"]
114
      interval: 1s
115
      start_period: 60s
116

117
networks:
118
  network:
119
    driver: bridge
120
EOF
121
docker compose down || true
122
docker compose up -d
123

124
wait_for_health postgres
125
psql postgres://postgres:postgres@127.0.0.1:5432/postgres <<EOF
126
CREATE PUBLICATION mz_source FOR ALL TABLES;
127
CREATE TABLE pg_table (f1 INTEGER);
128
ALTER TABLE pg_table REPLICA IDENTITY FULL;
129
INSERT INTO pg_table VALUES (1), (2), (3);
130
EOF
131

132
wait_for_health mysql
133
mysql --protocol=tcp --user=root --password=mysql <<EOF
134
CREATE DATABASE public;
135
USE public;
136
CREATE TABLE mysql_table (f1 INTEGER);
137
INSERT INTO mysql_table VALUES (1), (2), (3);
138
EOF
139

140
wait_for_health redpanda
141
docker compose exec -T redpanda rpk topic create redpanda_table
142
docker compose exec -T redpanda rpk topic produce redpanda_table <<EOF
143
{"f1": 1}
144
{"f1": 2}
145
{"f1": 3}
146
EOF
147

148
wait_for_health materialized
149
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
150
-- Create a Postgres source
151
CREATE SECRET pgpass AS 'postgres';
152
CREATE CONNECTION pg TO POSTGRES (
153
  HOST '$PREF-postgres', DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
154
);
155
CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
156
  PUBLICATION 'mz_source'
157
) FOR SCHEMAS (public);
158

159
-- Create a MySQL source
160
CREATE SECRET mysqlpass AS 'mysql';
161
CREATE CONNECTION mysql TO MYSQL (
162
  HOST '$PREF-mysql', USER root, PASSWORD SECRET mysqlpass
163
);
164
CREATE SOURCE mysql_source FROM MYSQL CONNECTION mysql FOR ALL TABLES;
165

166
-- Create a Webhook source
167
CREATE SOURCE webhook_table FROM WEBHOOK BODY FORMAT TEXT;
168

169
-- Create a Redpanda (Kafka-compatible) source
170
CREATE CONNECTION kafka_conn TO KAFKA (
171
    BROKER '$PREF-redpanda:9092', SECURITY PROTOCOL PLAINTEXT
172
);
173
CREATE CONNECTION csr_conn TO CONFLUENT SCHEMA REGISTRY (
174
    URL 'http://$PREF-redpanda:8081/'
175
);
176
CREATE SOURCE redpanda_table FROM KAFKA CONNECTION kafka_conn (
177
    TOPIC 'redpanda_table'
178
) FORMAT JSON;
179

180
-- Simple materialized view, incrementally updated, with data from all sources
181
CREATE MATERIALIZED VIEW mv AS
182
SELECT sum(pg_table.f1 + mysql_table.f1 + webhook_table.body::int +
183
           (redpanda_table.data->'f1')::int)
184
FROM pg_table
185
JOIN mysql_table ON TRUE
186
JOIN webhook_table ON TRUE
187
JOIN redpanda_table ON TRUE;
188

189
-- Create a sink to Redpanda so that the topic will always be up to date
190
CREATE SINK sink FROM mv INTO KAFKA CONNECTION kafka_conn (TOPIC 'mv')
191
FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_conn
192
ENVELOPE DEBEZIUM;
193

194
-- One-off export of our materialized view to S3-compatible MinIO
195
CREATE SECRET miniopass AS 'minioadmin';
196
CREATE CONNECTION minio TO AWS (
197
    ENDPOINT 'http://minio:9000',
198
    REGION 'minio',
199
    ACCESS KEY ID 'minioadmin',
200
    SECRET ACCESS KEY SECRET miniopass
201
);
202
COPY (SELECT * FROM mv) TO 's3://$PREF/mv' WITH (
203
    AWS CONNECTION = minio,
204
    FORMAT = 'csv'
205
);
206

207
-- Allow HTTP API read requests without a token
208
CREATE ROLE anonymous_http_user;
209
GRANT SELECT ON TABLE mv TO anonymous_http_user;
210
EOF
211

212
# Write additional data into Webhook source
213
curl -d "1" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
214
curl -d "2" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
215
curl -d "3" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
216

217
# Read latest data from Redpanda
218
docker compose exec -T redpanda rpk topic consume mv --num 1
219

220
# CSV exists on S3-compatible MinIO
221
docker compose exec -T minio mc ls data/mzemulator/mv
222

223
# Use Postgres wire-compatible
224
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
225
SELECT * FROM pg_table;
226
SELECT * FROM mysql_table;
227
SELECT * FROM webhook_table;
228
SELECT * FROM redpanda_table;
229
SELECT * FROM mv;
230
EOF
231

232
# Use HTTP API
233
curl -s -X POST -H "Content-Type: application/json" \
234
    --data '{"queries": [{"query": "SELECT * FROM mv"}]}' \
235
    http://localhost:6876/api/sql | jq -r ".results[0].rows[0][0]"
236

bash

Now you can start up a Materialize Emulator in under a minute:

1
$ cd mzemulator
2
$ cat run.sh
3
#!/bin/bash
4
[...]
5
$ time ./run.sh
6
[...]
7
./run.sh  0.34s user 0.36s system 1% cpu 45.462 total
8
$ psql postgres://materialize@127.0.0.1:6875/materialize -c "SELECT * FROM mv"
9
[...]
10
 sum
11
-----
12
 648
13
(1 row)
14
$ docker compose down
15
[+] Running 6/6
16
 ✔ Container mzemulator-redpanda      Removed                             1.3s
17
 ✔ Container mzemulator-mysql         Removed                            10.6s
18
 ✔ Container mzemulator-postgres      Removed                             1.0s
19
 ✔ Container mzemulator-minio         Removed                             0.7s
20
 ✔ Container mzemulator-materialized  Removed                             1.2s
21
 ✔ Network mzemulator_network         Removed                             0.4s
22

text

It's that simple — just use the shell script to launch your Materialize Emulator. And if you prefer a GUI, open the local Materialize Console at http://localhost:6874.

Materialize Emulator: Test Quickly During Development

While Materialize is best experienced in our cloud, the Materialize Emulator allows you to quickly test your releases in a non-production environment.

Although the Materialize Emulator lacks many critical features included in the cloud version, the ability to test rapidly is helpful during development.

Try our Materialize Emulator right now to build your apps more efficiently! And sign up for a free trial of Materialize to see what our full cloud product is like.

Dennis

Dennis Felsing

Software Engineer in Test, Materialize