How to Use the Materialize Emulator

Dennis Felsing
October 10, 2024

In our last blog about our Quality Assurance (QA) team, we gave an overview of the QA process, including our software and testing methods. One of our key tools during testing is the Materialize Emulator, a Docker image that allows you to maintain a locally hosted version of Materialize.

But there's an important caveat: the Materialize Emulator cannot support production workloads. The Materialize Emulator lacks critical features of our cloud platform, including fault tolerance and GUI support. But the Emulator is great for testing and prototyping.

In the following blog, we'll outline a step-by-step walkthrough of how to use the Materialize Emulator.

Materialize Emulator: What Is It?

The Materialize Emulator is an all-in-one Docker image available on Docker Hub for testing and evaluation purposes. The Emulator is not representative of Materialize’s performance and full feature set.

To view a comparison between the Materialize Emulator and the Materialize cloud platform, see the table below:

We've always used the Materialize Emulator for testing, except for the kind that requires cloud integration with Kubernetes.

If you want to use Materialize in production scenarios, sign up for a free trial account or schedule a demo.

Step-by-Step Walkthrough: How to Use the Materialize Emulator

Let's walk through a basic example of how to use the Materialize Emulator with a PostgreSQL source. The only requirements are Docker and the postgres-client (psql).

1
docker network create mznet
2
docker pull materialize/materialized:latest
3
docker run --name materialized --network mznet -d -p 127.0.0.1:6875:6875 \
4
    -p 127.0.0.1:6876:6876 materialize/materialized:latest
5

bash

We'll publish the port to localhost, since Materialize is running without authentication. Without a NAT or firewall, anyone on the internet can connect to your Materialize instance. You can specify ports if you want to allow access, such as: -p 6875:6875 or -p 6876:6876.

Now Materialize is running locally and we can connect to it:

1
$ psql postgres://materialize@127.0.0.1:6875/materialize
2
NOTICE:  connected to Materialize v0.118.0
3
  Org ID: 4b733a37-b64d-44a2-8e79-e0ebd8a177ba
4
  Region: docker/container
5
  User: materialize
6
  Cluster: quickstart
7
  Database: materialize
8
  Schema: public
9
  Session UUID: 2631437c-61d6-4984-a68b-433f5751cecf
10

11
Issue a SQL query to get started. Need help?
12
  View documentation: https://materialize.com/s/docs
13
  Join our Slack community: https://materialize.com/s/chat
14

15
psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 9.5.0)
16
Type "help" for help.
17

18
materialize=>
19

text

Let's start up a Postgres server:

1
docker run --name postgres --network mznet \
2
    -e POSTGRES_PASSWORD=postgres \
3
    -e POSTGRES_INITDB_ARGS="-c wal_level=logical" \
4
    -p 127.0.0.1:5432:5432 -d postgres
5

bash

Connect to the Postgres server. Then generate a simple table. We will replicate this table to Materialize.

1
$ psql postgres://postgres:postgres@127.0.0.1:5432/postgres
2
psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 16.4 (Debian 16.4-1.pgdg120+1))
3
WARNING: psql major version 15, server major version 16.
4
         Some psql features might not work.
5
Type "help" for help.
6

7
postgres=# CREATE PUBLICATION mz_source FOR ALL TABLES;
8
CREATE PUBLICATION
9
postgres=# CREATE TABLE t (f1 INTEGER);
10
CREATE TABLE
11
postgres=# ALTER TABLE t REPLICA IDENTITY FULL;
12
ALTER TABLE
13
postgres=# INSERT INTO t VALUES (1), (2), (3);
14
INSERT 0 3
15

text

Now use Materialize to connect to the Postgres instance:

1
materialize=> CREATE SECRET pgpass AS 'postgres';
2
CREATE SECRET
3
materialize=> CREATE CONNECTION pg TO POSTGRES (
4
    HOST postgres, DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
5
);
6
CREATE CONNECTION
7
materialize=> CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
8
    PUBLICATION 'mz_source'
9
) FOR SCHEMAS (public);
10
CREATE SOURCE
11
materialize=> SELECT * FROM t;
12
 f1
13
----
14
  1
15
  2
16
  3
17
(3 rows)
18
materialize=> CREATE MATERIALIZED VIEW mv AS SELECT sum(f1) FROM t;
19
CREATE MATERIALIZED VIEW
20
materialize=> SELECT * FROM mv;
21
 sum
22
-----
23
   6
24
(1 row)
25

text

That's how you replicate the Postgres table in Materialize. Now let's perform a query. Let's execute a one-off query on both Materialize and Postgres. We'll design a heavy workload.

1
\postgres=# \timing
2
Timing is on.
3
postgres=# INSERT INTO t (f1) SELECT * FROM generate_series(4, 10000);
4
INSERT 0 9997
5
Time: 10.137 ms
6
postgres=# SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
7
      sum
8
---------------
9
 1000100000000
10
(1 row)
11
Time: 2323.538 ms (00:02.324)
12

text

Materialize performs the query in 37 seconds, while Postgres performs the query in 2 seconds. This is because Materialize is not designed for one-off queries.

Materialize is optimized for materialized views that update incrementally. Read more about how materialized views work in Materialize. Let's create a materialized view as follows:

1
materialize=> SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
2
      sum
3
---------------
4
 1000100000000
5
(1 row)
6
Time: 37277.756 ms (00:37.278)
7
materialize=> CREATE MATERIALIZED VIEW mv AS
8
    SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
9
CREATE MATERIALIZED VIEW
10
Time: 327.252 ms
11
materialize=> SELECT * FROM mv;
12
      sum
13
---------------
14
 1000100000000
15
(1 row)
16
Time: 27.919 ms
17

text

With Materialize, every change to the source table (t1) in Postgres will only require a small amount of incremental work to update the mv materialized view. This is done during INSERT, not during SELECT. And you can use declarative SQL to define the whole view.

1
postgres=# INSERT INTO t (f1) VALUES (10001);
2
INSERT 0 1
3
Time: 5.627 ms
4

5
materialize=> SELECT * FROM mv;
6
      sum
7
---------------
8
 1000400050002
9
(1 row)
10
Time: 40.362 ms
11

text

You can also subscribe to the the materialized view and receive instant updates about all of the changes:

1
materialize=> COPY (SUBSCRIBE (SELECT * FROM mv)) TO STDOUT;
2
1727715520600	1	1000400050002
3
1727715526000	1	1000700160012
4
1727715526000	-1	1000400050002
5
1727715528000	-1	1000700160012
6
1727715528000	1	1001000330036
7

text

This is the output (timestamp, added (1)/removed (-1), value) when these commands run in Postgres:

1
postgres=# INSERT INTO t (f1) VALUES (10002);
2
INSERT 0 1
3
postgres=# INSERT INTO t (f1) VALUES (10003);
4
INSERT 0 1
5

text

To clean up, we can stop the Docker containers again:

1
docker stop materialized postgres
2
docker rm materialized postgres
3
docker network rm mznet
4

bash

And that's it! This is how you launch the Docker image, and define a materialized view, using the Materialize Emulator.

Shell Script: Materialize Emulator as a Docker Compose Project

To tie things together, here is a small shell script (run.sh) that runs the Materialize Emulator as a Docker Compose project.

The shell script contains many of Materialize's features, including a materialized view mv that combines the data of all these sources:

Also, the script uses the following to get the mv out of Materialize:

You can copy the full shell script below:

1
#!/bin/bash
2
set -euo pipefail
3

4
PREF="${PWD##*/}"
5

6
wait_for_health() {
7
  echo -n "waiting for container '$PREF-$1' to be healthy"
8
  while [ "$(docker inspect -f '{{.State.Health.Status}}' "$PREF-$1")" != "healthy" ]; do
9
    echo -n "."
10
    sleep 1
11
  done
12
  printf "\ncontainer '%s' is healthy\n" "$PREF-$1"
13
}
14

15
cat > docker-compose.yml <<EOF
16
version: '3.8'
17
services:
18
  materialized:
19
    image: materialize/materialized:latest
20
    container_name: $PREF-materialized
21
    environment:
22
      MZ_SYSTEM_PARAMETER_DEFAULT: "enable_copy_to_expr=true"
23
    networks:
24
      - network
25
    ports:
26
      - "127.0.0.1:6875:6875"
27
      - "127.0.0.1:6876:6876"
28
    healthcheck:
29
      test: ["CMD", "curl", "-f", "localhost:6878/api/readyz"]
30
      interval: 1s
31
      start_period: 60s
32

33
  postgres:
34
    image: postgres:latest
35
    container_name: $PREF-postgres
36
    environment:
37
      POSTGRES_PASSWORD: postgres
38
      POSTGRES_INITDB_ARGS: "-c wal_level=logical"
39
    networks:
40
      - network
41
    ports:
42
      - "127.0.0.1:5432:5432"
43
    healthcheck:
44
      test: ["CMD", "pg_isready", "-d", "db_prod"]
45
      interval: 1s
46
      start_period: 60s
47

48
  mysql:
49
    image: mysql:latest
50
    container_name: $PREF-mysql
51
    environment:
52
      MYSQL_ROOT_PASSWORD: mysql
53
    networks:
54
      - network
55
    ports:
56
      - "127.0.0.1:3306:3306"
57
    command:
58
        - "--log-bin=mysql-bin"
59
        - "--gtid_mode=ON"
60
        - "--enforce_gtid_consistency=ON"
61
        - "--binlog-format=row"
62
        - "--binlog-row-image=full"
63
    healthcheck:
64
      test: ["CMD", "mysqladmin", "ping", "--password=mysql", "--protocol=TCP"]
65
      interval: 1s
66
      start_period: 60s
67

68
  redpanda:
69
    image: vectorized/redpanda:latest
70
    container_name: $PREF-redpanda
71
    networks:
72
      - network
73
    ports:
74
      - "127.0.0.1:9092:9092"
75
      - "127.0.0.1:8081:8081"
76
    command:
77
        - "redpanda"
78
        - "start"
79
        - "--overprovisioned"
80
        - "--smp=1"
81
        - "--memory=1G"
82
        - "--reserve-memory=0M"
83
        - "--node-id=0"
84
        - "--check=false"
85
        - "--set"
86
        - "redpanda.enable_transactions=true"
87
        - "--set"
88
        - "redpanda.enable_idempotence=true"
89
        - "--set"
90
        - "--advertise-kafka-addr=redpanda:9092"
91
    healthcheck:
92
      test: ["CMD", "curl", "-f", "localhost:9644/v1/status/ready"]
93
      interval: 1s
94
      start_period: 60s
95

96
  minio:
97
    image: minio/minio:latest
98
    container_name: $PREF-minio
99
    environment:
100
      MINIO_STORAGE_CLASS_STANDARD: "EC:0"
101
    networks:
102
      - network
103
    ports:
104
      - "127.0.0.1:9000:9000"
105
      - "127.0.0.1:9001:9001"
106
    entrypoint: ["sh", "-c"]
107
    command: ["mkdir -p /data/$PREF && minio server /data --console-address :9001"]
108
    healthcheck:
109
      test: ["CMD", "curl", "-f", "localhost:9000/minio/health/live"]
110
      interval: 1s
111
      start_period: 60s
112

113
networks:
114
  network:
115
    driver: bridge
116
EOF
117
docker compose down || true
118
docker compose up -d
119

120
wait_for_health postgres
121
psql postgres://postgres:postgres@127.0.0.1:5432/postgres <<EOF
122
CREATE PUBLICATION mz_source FOR ALL TABLES;
123
CREATE TABLE pg_table (f1 INTEGER);
124
ALTER TABLE pg_table REPLICA IDENTITY FULL;
125
INSERT INTO pg_table VALUES (1), (2), (3);
126
EOF
127

128
wait_for_health mysql
129
mysql --protocol=tcp --user=root --password=mysql <<EOF
130
CREATE DATABASE public;
131
USE public;
132
CREATE TABLE mysql_table (f1 INTEGER);
133
INSERT INTO mysql_table VALUES (1), (2), (3);
134
EOF
135

136
wait_for_health redpanda
137
docker compose exec -T redpanda rpk topic create redpanda_table
138
docker compose exec -T redpanda rpk topic produce redpanda_table <<EOF
139
{"f1": 1}
140
{"f1": 2}
141
{"f1": 3}
142
EOF
143

144
wait_for_health materialized
145
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
146
-- Create a Postgres source
147
CREATE SECRET pgpass AS 'postgres';
148
CREATE CONNECTION pg TO POSTGRES (
149
  HOST '$PREF-postgres', DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
150
);
151
CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
152
  PUBLICATION 'mz_source'
153
) FOR SCHEMAS (public);
154

155
-- Create a MySQL source
156
CREATE SECRET mysqlpass AS 'mysql';
157
CREATE CONNECTION mysql TO MYSQL (
158
  HOST '$PREF-mysql', USER root, PASSWORD SECRET mysqlpass
159
);
160
CREATE SOURCE mysql_source FROM MYSQL CONNECTION mysql FOR ALL TABLES;
161

162
-- Create a Webhook source
163
CREATE SOURCE webhook_table FROM WEBHOOK BODY FORMAT TEXT;
164

165
-- Create a Redpanda (Kafka-compatible) source
166
CREATE CONNECTION kafka_conn TO KAFKA (
167
    BROKER '$PREF-redpanda:9092', SECURITY PROTOCOL PLAINTEXT
168
);
169
CREATE CONNECTION csr_conn TO CONFLUENT SCHEMA REGISTRY (
170
    URL 'http://$PREF-redpanda:8081/'
171
);
172
CREATE SOURCE redpanda_table FROM KAFKA CONNECTION kafka_conn (
173
    TOPIC 'redpanda_table'
174
) FORMAT JSON;
175

176
-- Simple materialized view, incrementally updated, with data from all sources
177
CREATE MATERIALIZED VIEW mv AS
178
SELECT sum(pg_table.f1 + mysql_table.f1 + webhook_table.body::int +
179
           (redpanda_table.data->'f1')::int)
180
FROM pg_table
181
JOIN mysql_table ON TRUE
182
JOIN webhook_table ON TRUE
183
JOIN redpanda_table ON TRUE;
184

185
-- Create a sink to Redpanda so that the topic will always be up to date
186
CREATE SINK sink FROM mv INTO KAFKA CONNECTION kafka_conn (TOPIC 'mv')
187
FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_conn
188
ENVELOPE DEBEZIUM;
189

190
-- One-off export of our materialized view to S3-compatible MinIO
191
CREATE SECRET miniopass AS 'minioadmin';
192
CREATE CONNECTION minio TO AWS (
193
    ENDPOINT 'http://minio:9000',
194
    REGION 'minio',
195
    ACCESS KEY ID 'minioadmin',
196
    SECRET ACCESS KEY SECRET miniopass
197
);
198
COPY (SELECT * FROM mv) TO 's3://$PREF/mv' WITH (
199
    AWS CONNECTION = minio,
200
    FORMAT = 'csv'
201
);
202

203
-- Allow HTTP API read requests without a token
204
CREATE ROLE anonymous_http_user;
205
GRANT SELECT ON TABLE mv TO anonymous_http_user;
206
EOF
207

208
# Write additional data into Webhook source
209
curl -d "1" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
210
curl -d "2" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
211
curl -d "3" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
212

213
# Read latest data from Redpanda
214
docker compose exec -T redpanda rpk topic consume mv --num 1
215

216
# CSV exists on S3-compatible MinIO
217
docker compose exec -T minio mc ls data/mzemulator/mv
218

219
# Use Postgres wire-compatible
220
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
221
SELECT * FROM pg_table;
222
SELECT * FROM mysql_table;
223
SELECT * FROM webhook_table;
224
SELECT * FROM redpanda_table;
225
SELECT * FROM mv;
226
EOF
227

228
# Use HTTP API
229
curl -s -X POST -H "Content-Type: application/json" \
230
    --data '{"queries": [{"query": "SELECT * FROM mv"}]}' \
231
    http://localhost:6876/api/sql | jq -r ".results[0].rows[0][0]"
232

bash

Now you can start up a Materialize Emulator in under a minute:

1
$ cd mzemulator
2
$ cat run.sh
3
#!/bin/bash
4
[...]
5
$ time ./run.sh
6
[...]
7
./run.sh  0.34s user 0.36s system 1% cpu 45.462 total
8
$ psql postgres://materialize@127.0.0.1:6875/materialize -c "SELECT * FROM mv"
9
[...]
10
 sum
11
-----
12
 648
13
(1 row)
14
$ docker compose down
15
[+] Running 6/6
16
 ✔ Container mzemulator-redpanda      Removed                             1.3s
17
 ✔ Container mzemulator-mysql         Removed                            10.6s
18
 ✔ Container mzemulator-postgres      Removed                             1.0s
19
 ✔ Container mzemulator-minio         Removed                             0.7s
20
 ✔ Container mzemulator-materialized  Removed                             1.2s
21
 ✔ Network mzemulator_network         Removed                             0.4s
22

text

It's that simple — just use the shell script to launch your Materialize Emulator.

Materialize Emulator: Test Quickly During Development

While Materialize is best experienced in our cloud, the Materialize Emulator allows you to quickly test your releases in a non-production environment.

Although the Materialize Emulator lacks many critical features included in the cloud version, the ability to test rapidly is helpful during development.

Try our Materialize Emulator right now to build your apps more efficiently! And sign up for a free trial of Materialize to see what our full cloud product is like.

Dennis

Dennis Felsing

Materialize