In our last blog about our Quality Assurance (QA) team, we gave an overview of the QA process, including our software and testing methods. One of our key tools during testing is the Materialize Emulator, a Docker image that allows you to maintain a locally hosted version of Materialize.
But there’s an important caveat: the Materialize Emulator cannot support production workloads. The Materialize Emulator lacks critical features of our cloud platform, including fault tolerance and GUI support. But the Emulator is great for testing and prototyping.
In the following blog, we’ll outline a step-by-step walkthrough of how to use the Materialize Emulator.
Materialize Emulator: What Is It?
The Materialize Emulator is an all-in-one Docker image available on Docker Hub for testing and evaluation purposes. The Emulator is not representative of Materialize’s performance and full feature set.
To view a comparison between the Materialize Emulator and the Materialize cloud platform, see the table below:
Materialize Emulator | Materialize SaaS | |
---|---|---|
Production deployments | ❌ Not suitable due to performance and license limitations. | ✔️ |
Performance | ❌ Limited. Services are bundled in a single container. | ✔️ High. Services are scaled across many machines. |
Dedicated Support | ❌ | ✔️ |
Sample data | ✔️ Quickstart data source | ✔️ Quickstart data source |
Data sources | ✔️ Connect using SQL configuration | ✔️ Connect using a streamlined GUI |
Version upgrades | ✔️ Manual, with no data persistence | ✔️ Automated, with data persistence |
Use case isolation | ❌ | ✔️ |
Fault tolerance | ❌ | ✔️ |
Horizontal scalability | ❌ | ✔️ |
GUI | ❌ | ✔️ Materialize Console |
We’ve always used the Materialize Emulator for testing, except for the kind that requires cloud integration with Kubernetes.
If you want to use Materialize in production scenarios, sign up for a free trial account or schedule a demo.
Step-by-Step Walkthrough: How to Use the Materialize Emulator
Let’s walk through a basic example of how to use the Materialize Emulator with a PostgreSQL source. The only requirements are Docker and the postgres-client (psql
).
docker network create mznet
docker pull materialize/materialized:latest
docker run --name materialized --network mznet -d -p 127.0.0.1:6875:6875 \
-p 127.0.0.1:6876:6876 materialize/materialized:latest
We’ll publish the port to localhost, since Materialize is running without authentication. Without a NAT or firewall, anyone on the internet can connect to your Materialize instance. You can specify ports if you want to allow access, such as: -p 6875:6875
or -p 6876:6876
.
Now Materialize is running locally and we can connect to it:
$ psql postgres://materialize@127.0.0.1:6875/materialize
NOTICE: connected to Materialize v0.118.0
Org ID: 4b733a37-b64d-44a2-8e79-e0ebd8a177ba
Region: docker/container
User: materialize
Cluster: quickstart
Database: materialize
Schema: public
Session UUID: 2631437c-61d6-4984-a68b-433f5751cecf
Issue a SQL query to get started. Need help?
View documentation: https://materialize.com/s/docs
Join our Slack community: https://materialize.com/s/chat
psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 9.5.0)
Type "help" for help.
materialize=>
Let’s start up a Postgres server:
docker run --name postgres --network mznet \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_INITDB_ARGS="-c wal_level=logical" \
-p 127.0.0.1:5432:5432 -d postgres
Connect to the Postgres server. Then generate a simple table. We will replicate this table to Materialize.
$ psql postgres://postgres:postgres@127.0.0.1:5432/postgres
psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 16.4 (Debian 16.4-1.pgdg120+1))
WARNING: psql major version 15, server major version 16.
Some psql features might not work.
Type "help" for help.
postgres=# CREATE PUBLICATION mz_source FOR ALL TABLES;
CREATE PUBLICATION
postgres=# CREATE TABLE t (f1 INTEGER);
CREATE TABLE
postgres=# ALTER TABLE t REPLICA IDENTITY FULL;
ALTER TABLE
postgres=# INSERT INTO t VALUES (1), (2), (3);
INSERT 0 3
Now use Materialize to connect to the Postgres instance:
materialize=> CREATE SECRET pgpass AS 'postgres';
CREATE SECRET
materialize=> CREATE CONNECTION pg TO POSTGRES (
HOST postgres, DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
);
CREATE CONNECTION
materialize=> CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
PUBLICATION 'mz_source'
) FOR SCHEMAS (public);
CREATE SOURCE
materialize=> SELECT * FROM t;
f1
----
1
2
3
(3 rows)
materialize=> CREATE MATERIALIZED VIEW mv AS SELECT sum(f1) FROM t;
CREATE MATERIALIZED VIEW
materialize=> SELECT * FROM mv;
sum
-----
6
(1 row)
That’s how you replicate the Postgres table in Materialize. Now let’s perform a query. Let’s execute a one-off query on both Materialize and Postgres. We’ll design a heavy workload.
\postgres=# \timing
Timing is on.
postgres=# INSERT INTO t (f1) SELECT * FROM generate_series(4, 10000);
INSERT 0 9997
Time: 10.137 ms
postgres=# SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
sum
---------------
1000100000000
(1 row)
Time: 2323.538 ms (00:02.324)
Materialize performs the query in 37 seconds, while Postgres performs the query in 2 seconds. This is because Materialize is not designed for one-off queries.
Materialize is optimized for materialized views that update incrementally. Read more about how materialized views work in Materialize. Let’s create a materialized view as follows:
materialize=> SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
sum
---------------
1000100000000
(1 row)
Time: 37277.756 ms (00:37.278)
materialize=> CREATE MATERIALIZED VIEW mv AS
SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
CREATE MATERIALIZED VIEW
Time: 327.252 ms
materialize=> SELECT * FROM mv;
sum
---------------
1000100000000
(1 row)
Time: 27.919 ms
With Materialize, every change to the source table (t1)
in Postgres will only require a small amount of incremental work to update the mv
materialized view. This is done during INSERT
, not during SELECT
. And you can use declarative SQL to define the whole view.
postgres=# INSERT INTO t (f1) VALUES (10001);
INSERT 0 1
Time: 5.627 ms
materialize=> SELECT * FROM mv;
sum
---------------
1000400050002
(1 row)
Time: 40.362 ms
You can also subscribe to the the materialized view and receive instant updates about all of the changes:
materialize=> COPY (SUBSCRIBE (SELECT * FROM mv)) TO STDOUT;
1727715520600 1 1000400050002
1727715526000 1 1000700160012
1727715526000 -1 1000400050002
1727715528000 -1 1000700160012
1727715528000 1 1001000330036
This is the output (timestamp, added (1)/removed (-1), value) when these commands run in Postgres:
postgres=# INSERT INTO t (f1) VALUES (10002);
INSERT 0 1
postgres=# INSERT INTO t (f1) VALUES (10003);
INSERT 0 1
To clean up, we can stop the Docker containers again:
docker stop materialized postgres
docker rm materialized postgres
docker network rm mznet
And that’s it! This is how you launch the Docker image, and define a materialized view, using the Materialize Emulator.
Shell Script: Materialize Emulator as a Docker Compose Project
To tie things together, here is a small shell script (run.sh
) that runs the Materialize Emulator as a Docker Compose project.
The shell script contains many of Materialize’s features, including a materialized view mv
that combines the data of all these sources:
Also, the script uses the following to get the mv
out of Materialize:
Postgres wire protocol using
psql
HTTP API using
curl
You can copy the full shell script below:
#!/bin/bash
set -euo pipefail
PREF="${PWD##*/}"
wait_for_health() {
echo -n "waiting for container '$PREF-$1' to be healthy"
while [ "$(docker inspect -f '{{.State.Health.Status}}' "$PREF-$1")" != "healthy" ]; do
echo -n "."
sleep 1
done
printf "\ncontainer '%s' is healthy\n" "$PREF-$1"
}
cat > docker-compose.yml <<EOF
version: '3.8'
services:
materialized:
image: materialize/materialized:latest
container_name: $PREF-materialized
environment:
MZ_SYSTEM_PARAMETER_DEFAULT: "enable_copy_to_expr=true"
networks:
- network
ports:
- "127.0.0.1:6875:6875"
- "127.0.0.1:6876:6876"
healthcheck:
test: ["CMD", "curl", "-f", "localhost:6878/api/readyz"]
interval: 1s
start_period: 60s
postgres:
image: postgres:latest
container_name: $PREF-postgres
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_INITDB_ARGS: "-c wal_level=logical"
networks:
- network
ports:
- "127.0.0.1:5432:5432"
healthcheck:
test: ["CMD", "pg_isready", "-d", "db_prod"]
interval: 1s
start_period: 60s
mysql:
image: mysql:latest
container_name: $PREF-mysql
environment:
MYSQL_ROOT_PASSWORD: mysql
networks:
- network
ports:
- "127.0.0.1:3306:3306"
command:
- "--log-bin=mysql-bin"
- "--gtid_mode=ON"
- "--enforce_gtid_consistency=ON"
- "--binlog-format=row"
- "--binlog-row-image=full"
healthcheck:
test: ["CMD", "mysqladmin", "ping", "--password=mysql", "--protocol=TCP"]
interval: 1s
start_period: 60s
redpanda:
image: vectorized/redpanda:latest
container_name: $PREF-redpanda
networks:
- network
ports:
- "127.0.0.1:9092:9092"
- "127.0.0.1:8081:8081"
command:
- "redpanda"
- "start"
- "--overprovisioned"
- "--smp=1"
- "--memory=1G"
- "--reserve-memory=0M"
- "--node-id=0"
- "--check=false"
- "--set"
- "redpanda.enable_transactions=true"
- "--set"
- "redpanda.enable_idempotence=true"
- "--set"
- "--advertise-kafka-addr=redpanda:9092"
healthcheck:
test: ["CMD", "curl", "-f", "localhost:9644/v1/status/ready"]
interval: 1s
start_period: 60s
minio:
image: minio/minio:latest
container_name: $PREF-minio
environment:
MINIO_STORAGE_CLASS_STANDARD: "EC:0"
networks:
- network
ports:
- "127.0.0.1:9000:9000"
- "127.0.0.1:9001:9001"
entrypoint: ["sh", "-c"]
command: ["mkdir -p /data/$PREF && minio server /data --console-address :9001"]
healthcheck:
test: ["CMD", "curl", "-f", "localhost:9000/minio/health/live"]
interval: 1s
start_period: 60s
networks:
network:
driver: bridge
EOF
docker compose down || true
docker compose up -d
wait_for_health postgres
psql postgres://postgres:postgres@127.0.0.1:5432/postgres <<EOF
CREATE PUBLICATION mz_source FOR ALL TABLES;
CREATE TABLE pg_table (f1 INTEGER);
ALTER TABLE pg_table REPLICA IDENTITY FULL;
INSERT INTO pg_table VALUES (1), (2), (3);
EOF
wait_for_health mysql
mysql --protocol=tcp --user=root --password=mysql <<EOF
CREATE DATABASE public;
USE public;
CREATE TABLE mysql_table (f1 INTEGER);
INSERT INTO mysql_table VALUES (1), (2), (3);
EOF
wait_for_health redpanda
docker compose exec -T redpanda rpk topic create redpanda_table
docker compose exec -T redpanda rpk topic produce redpanda_table <<EOF
{"f1": 1}
{"f1": 2}
{"f1": 3}
EOF
wait_for_health materialized
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
-- Create a Postgres source
CREATE SECRET pgpass AS 'postgres';
CREATE CONNECTION pg TO POSTGRES (
HOST '$PREF-postgres', DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
);
CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
PUBLICATION 'mz_source'
) FOR SCHEMAS (public);
-- Create a MySQL source
CREATE SECRET mysqlpass AS 'mysql';
CREATE CONNECTION mysql TO MYSQL (
HOST '$PREF-mysql', USER root, PASSWORD SECRET mysqlpass
);
CREATE SOURCE mysql_source FROM MYSQL CONNECTION mysql FOR ALL TABLES;
-- Create a Webhook source
CREATE SOURCE webhook_table FROM WEBHOOK BODY FORMAT TEXT;
-- Create a Redpanda (Kafka-compatible) source
CREATE CONNECTION kafka_conn TO KAFKA (
BROKER '$PREF-redpanda:9092', SECURITY PROTOCOL PLAINTEXT
);
CREATE CONNECTION csr_conn TO CONFLUENT SCHEMA REGISTRY (
URL 'http://$PREF-redpanda:8081/'
);
CREATE SOURCE redpanda_table FROM KAFKA CONNECTION kafka_conn (
TOPIC 'redpanda_table'
) FORMAT JSON;
-- Simple materialized view, incrementally updated, with data from all sources
CREATE MATERIALIZED VIEW mv AS
SELECT sum(pg_table.f1 + mysql_table.f1 + webhook_table.body::int +
(redpanda_table.data->'f1')::int)
FROM pg_table
JOIN mysql_table ON TRUE
JOIN webhook_table ON TRUE
JOIN redpanda_table ON TRUE;
-- Create a sink to Redpanda so that the topic will always be up to date
CREATE SINK sink FROM mv INTO KAFKA CONNECTION kafka_conn (TOPIC 'mv')
FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_conn
ENVELOPE DEBEZIUM;
-- One-off export of our materialized view to S3-compatible MinIO
CREATE SECRET miniopass AS 'minioadmin';
CREATE CONNECTION minio TO AWS (
ENDPOINT 'http://minio:9000',
REGION 'minio',
ACCESS KEY ID 'minioadmin',
SECRET ACCESS KEY SECRET miniopass
);
COPY (SELECT * FROM mv) TO 's3://$PREF/mv' WITH (
AWS CONNECTION = minio,
FORMAT = 'csv'
);
-- Allow HTTP API read requests without a token
CREATE ROLE anonymous_http_user;
GRANT SELECT ON TABLE mv TO anonymous_http_user;
EOF
# Write additional data into Webhook source
curl -d "1" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
curl -d "2" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
curl -d "3" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
# Read latest data from Redpanda
docker compose exec -T redpanda rpk topic consume mv --num 1
# CSV exists on S3-compatible MinIO
docker compose exec -T minio mc ls data/mzemulator/mv
# Use Postgres wire-compatible
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
SELECT * FROM pg_table;
SELECT * FROM mysql_table;
SELECT * FROM webhook_table;
SELECT * FROM redpanda_table;
SELECT * FROM mv;
EOF
# Use HTTP API
curl -s -X POST -H "Content-Type: application/json" \
--data '{"queries": [{"query": "SELECT * FROM mv"}]}' \
http://localhost:6876/api/sql | jq -r ".results[0].rows[0][0]"
Now you can start up a Materialize Emulator in under a minute:
$ cd mzemulator
$ cat run.sh
#!/bin/bash
[...]
$ time ./run.sh
[...]
./run.sh 0.34s user 0.36s system 1% cpu 45.462 total
$ psql postgres://materialize@127.0.0.1:6875/materialize -c "SELECT * FROM mv"
[...]
sum
-----
648
(1 row)
$ docker compose down
[+] Running 6/6
✔ Container mzemulator-redpanda Removed 1.3s
✔ Container mzemulator-mysql Removed 10.6s
✔ Container mzemulator-postgres Removed 1.0s
✔ Container mzemulator-minio Removed 0.7s
✔ Container mzemulator-materialized Removed 1.2s
✔ Network mzemulator_network Removed 0.4s
It’s that simple — just use the shell script to launch your Materialize Emulator.
Materialize Emulator: Test Quickly During Development
While Materialize is best experienced in our cloud, the Materialize Emulator allows you to quickly test your releases in a non-production environment.
Although the Materialize Emulator lacks many critical features included in the cloud version, the ability to test rapidly is helpful during development.
Try our Materialize Emulator right now to build your apps more efficiently! And sign up for a free trial of Materialize to see what our full cloud product is like.