How to Use the Materialize Emulator

In our last blog about our Quality Assurance (QA) team, we gave an overview of the QA process, including our software and testing methods. One of our key tools during testing is the Materialize Emulator, a Docker image that allows you to maintain a locally hosted version of Materialize.

But there’s an important caveat: the Materialize Emulator cannot support production workloads. The Materialize Emulator lacks critical features of our cloud platform, including fault tolerance and GUI support. But the Emulator is great for testing and prototyping.

In the following blog, we’ll outline a step-by-step walkthrough of how to use the Materialize Emulator.

Materialize Emulator: What Is It?

The Materialize Emulator is an all-in-one Docker image available on Docker Hub for testing and evaluation purposes. The Emulator is not representative of Materialize’s performance and full feature set.

To view a comparison between the Materialize Emulator and the Materialize cloud platform, see the table below:

	Materialize Emulator	Materialize SaaS
Production deployments	❌ Not suitable due to performance and license limitations.	✔️
Performance	❌ Limited. Services are bundled in a single container.	✔️ High. Services are scaled across many machines.
Dedicated Support	❌	✔️
Sample data	✔️ Quickstart data source	✔️ Quickstart data source
Data sources	✔️ Connect using SQL configuration	✔️ Connect using a streamlined GUI
Version upgrades	✔️ Manual, with no data persistence	✔️ Automated, with data persistence
Use case isolation	❌	✔️
Fault tolerance	❌	✔️
Horizontal scalability	❌	✔️
GUI	❌	✔️ Materialize Console

We’ve always used the Materialize Emulator for testing, except for the kind that requires cloud integration with Kubernetes.

If you want to use Materialize in production scenarios, sign up for a free trial account or schedule a demo.

Step-by-Step Walkthrough: How to Use the Materialize Emulator

Let’s walk through a basic example of how to use the Materialize Emulator with a PostgreSQL source. The only requirements are Docker and the postgres-client (psql).

bash

docker network create mznet
docker pull materialize/materialized:latest
docker run --name materialized --network mznet -d -p 127.0.0.1:6875:6875 \
    -p 127.0.0.1:6876:6876 materialize/materialized:latest

We’ll publish the port to localhost, since Materialize is running without authentication. Without a NAT or firewall, anyone on the internet can connect to your Materialize instance. You can specify ports if you want to allow access, such as: -p 6875:6875 or -p 6876:6876.

Now Materialize is running locally and we can connect to it:

$ psql postgres://materialize@127.0.0.1:6875/materialize
NOTICE:  connected to Materialize v0.118.0
  Org ID: 4b733a37-b64d-44a2-8e79-e0ebd8a177ba
  Region: docker/container
  User: materialize
  Cluster: quickstart
  Database: materialize
  Schema: public
  Session UUID: 2631437c-61d6-4984-a68b-433f5751cecf

Issue a SQL query to get started. Need help?
  View documentation: https://materialize.com/s/docs
  Join our Slack community: https://materialize.com/s/chat

psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 9.5.0)
Type "help" for help.

materialize=>

Let’s start up a Postgres server:

bash

docker run --name postgres --network mznet \
    -e POSTGRES_PASSWORD=postgres \
    -e POSTGRES_INITDB_ARGS="-c wal_level=logical" \
    -p 127.0.0.1:5432:5432 -d postgres

Connect to the Postgres server. Then generate a simple table. We will replicate this table to Materialize.

$ psql postgres://postgres:postgres@127.0.0.1:5432/postgres
psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 16.4 (Debian 16.4-1.pgdg120+1))
WARNING: psql major version 15, server major version 16.
         Some psql features might not work.
Type "help" for help.

postgres=# CREATE PUBLICATION mz_source FOR ALL TABLES;
CREATE PUBLICATION
postgres=# CREATE TABLE t (f1 INTEGER);
CREATE TABLE
postgres=# ALTER TABLE t REPLICA IDENTITY FULL;
ALTER TABLE
postgres=# INSERT INTO t VALUES (1), (2), (3);
INSERT 0 3

Now use Materialize to connect to the Postgres instance:

materialize=> CREATE SECRET pgpass AS 'postgres';
CREATE SECRET
materialize=> CREATE CONNECTION pg TO POSTGRES (
    HOST postgres, DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
);
CREATE CONNECTION
materialize=> CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
    PUBLICATION 'mz_source'
) FOR SCHEMAS (public);
CREATE SOURCE
materialize=> SELECT * FROM t;
 f1
----
  1
  2
  3
(3 rows)
materialize=> CREATE MATERIALIZED VIEW mv AS SELECT sum(f1) FROM t;
CREATE MATERIALIZED VIEW
materialize=> SELECT * FROM mv;
 sum
-----
   6
(1 row)

That’s how you replicate the Postgres table in Materialize. Now let’s perform a query. Let’s execute a one-off query on both Materialize and Postgres. We’ll design a heavy workload.

\postgres=# \timing
Timing is on.
postgres=# INSERT INTO t (f1) SELECT * FROM generate_series(4, 10000);
INSERT 0 9997
Time: 10.137 ms
postgres=# SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
      sum
---------------
 1000100000000
(1 row)
Time: 2323.538 ms (00:02.324)

Materialize performs the query in 37 seconds, while Postgres performs the query in 2 seconds. This is because Materialize is not designed for one-off queries.

Materialize is optimized for materialized views that update incrementally. Read more about how materialized views work in Materialize. Let’s create a materialized view as follows:

materialize=> SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
      sum
---------------
 1000100000000
(1 row)
Time: 37277.756 ms (00:37.278)
materialize=> CREATE MATERIALIZED VIEW mv AS
    SELECT sum(t.f1 + t2.f1) FROM t JOIN t AS t2 ON true;
CREATE MATERIALIZED VIEW
Time: 327.252 ms
materialize=> SELECT * FROM mv;
      sum
---------------
 1000100000000
(1 row)
Time: 27.919 ms

With Materialize, every change to the source table (t1) in Postgres will only require a small amount of incremental work to update the mv materialized view. This is done during INSERT, not during SELECT. And you can use declarative SQL to define the whole view.

postgres=# INSERT INTO t (f1) VALUES (10001);
INSERT 0 1
Time: 5.627 ms

materialize=> SELECT * FROM mv;
      sum
---------------
 1000400050002
(1 row)
Time: 40.362 ms

You can also subscribe to the the materialized view and receive instant updates about all of the changes:

materialize=> COPY (SUBSCRIBE (SELECT * FROM mv)) TO STDOUT;
1727715520600	1	1000400050002
1727715526000	1	1000700160012
1727715526000	-1	1000400050002
1727715528000	-1	1000700160012
1727715528000	1	1001000330036

This is the output (timestamp, added (1)/removed (-1), value) when these commands run in Postgres:

postgres=# INSERT INTO t (f1) VALUES (10002);
INSERT 0 1
postgres=# INSERT INTO t (f1) VALUES (10003);
INSERT 0 1

To clean up, we can stop the Docker containers again:

bash

docker stop materialized postgres
docker rm materialized postgres
docker network rm mznet

And that’s it! This is how you launch the Docker image, and define a materialized view, using the Materialize Emulator.

Shell Script: Materialize Emulator as a Docker Compose Project

To tie things together, here is a small shell script (run.sh) that runs the Materialize Emulator as a Docker Compose project.

The shell script contains many of Materialize’s features, including a materialized view mv that combines the data of all these sources:

Also, the script uses the following to get the mv out of Materialize:

You can copy the full shell script below:

bash

#!/bin/bash
set -euo pipefail

PREF="${PWD##*/}"

wait_for_health() {
  echo -n "waiting for container '$PREF-$1' to be healthy"
  while [ "$(docker inspect -f '{{.State.Health.Status}}' "$PREF-$1")" != "healthy" ]; do
    echo -n "."
    sleep 1
  done
  printf "\ncontainer '%s' is healthy\n" "$PREF-$1"
}

cat > docker-compose.yml <<EOF
version: '3.8'
services:
  materialized:
    image: materialize/materialized:latest
    container_name: $PREF-materialized
    environment:
      MZ_SYSTEM_PARAMETER_DEFAULT: "enable_copy_to_expr=true"
    networks:
      - network
    ports:
      - "127.0.0.1:6875:6875"
      - "127.0.0.1:6876:6876"
    healthcheck:
      test: ["CMD", "curl", "-f", "localhost:6878/api/readyz"]
      interval: 1s
      start_period: 60s

  postgres:
    image: postgres:latest
    container_name: $PREF-postgres
    environment:
      POSTGRES_PASSWORD: postgres
      POSTGRES_INITDB_ARGS: "-c wal_level=logical"
    networks:
      - network
    ports:
      - "127.0.0.1:5432:5432"
    healthcheck:
      test: ["CMD", "pg_isready", "-d", "db_prod"]
      interval: 1s
      start_period: 60s

  mysql:
    image: mysql:latest
    container_name: $PREF-mysql
    environment:
      MYSQL_ROOT_PASSWORD: mysql
    networks:
      - network
    ports:
      - "127.0.0.1:3306:3306"
    command:
        - "--log-bin=mysql-bin"
        - "--gtid_mode=ON"
        - "--enforce_gtid_consistency=ON"
        - "--binlog-format=row"
        - "--binlog-row-image=full"
    healthcheck:
      test: ["CMD", "mysqladmin", "ping", "--password=mysql", "--protocol=TCP"]
      interval: 1s
      start_period: 60s

  redpanda:
    image: vectorized/redpanda:latest
    container_name: $PREF-redpanda
    networks:
      - network
    ports:
      - "127.0.0.1:9092:9092"
      - "127.0.0.1:8081:8081"
    command:
        - "redpanda"
        - "start"
        - "--overprovisioned"
        - "--smp=1"
        - "--memory=1G"
        - "--reserve-memory=0M"
        - "--node-id=0"
        - "--check=false"
        - "--set"
        - "redpanda.enable_transactions=true"
        - "--set"
        - "redpanda.enable_idempotence=true"
        - "--set"
        - "--advertise-kafka-addr=redpanda:9092"
    healthcheck:
      test: ["CMD", "curl", "-f", "localhost:9644/v1/status/ready"]
      interval: 1s
      start_period: 60s

  minio:
    image: minio/minio:latest
    container_name: $PREF-minio
    environment:
      MINIO_STORAGE_CLASS_STANDARD: "EC:0"
    networks:
      - network
    ports:
      - "127.0.0.1:9000:9000"
      - "127.0.0.1:9001:9001"
    entrypoint: ["sh", "-c"]
    command: ["mkdir -p /data/$PREF && minio server /data --console-address :9001"]
    healthcheck:
      test: ["CMD", "curl", "-f", "localhost:9000/minio/health/live"]
      interval: 1s
      start_period: 60s

networks:
  network:
    driver: bridge
EOF
docker compose down || true
docker compose up -d

wait_for_health postgres
psql postgres://postgres:postgres@127.0.0.1:5432/postgres <<EOF
CREATE PUBLICATION mz_source FOR ALL TABLES;
CREATE TABLE pg_table (f1 INTEGER);
ALTER TABLE pg_table REPLICA IDENTITY FULL;
INSERT INTO pg_table VALUES (1), (2), (3);
EOF

wait_for_health mysql
mysql --protocol=tcp --user=root --password=mysql <<EOF
CREATE DATABASE public;
USE public;
CREATE TABLE mysql_table (f1 INTEGER);
INSERT INTO mysql_table VALUES (1), (2), (3);
EOF

wait_for_health redpanda
docker compose exec -T redpanda rpk topic create redpanda_table
docker compose exec -T redpanda rpk topic produce redpanda_table <<EOF
{"f1": 1}
{"f1": 2}
{"f1": 3}
EOF

wait_for_health materialized
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
-- Create a Postgres source
CREATE SECRET pgpass AS 'postgres';
CREATE CONNECTION pg TO POSTGRES (
  HOST '$PREF-postgres', DATABASE postgres, USER postgres, PASSWORD SECRET pgpass
);
CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg (
  PUBLICATION 'mz_source'
) FOR SCHEMAS (public);

-- Create a MySQL source
CREATE SECRET mysqlpass AS 'mysql';
CREATE CONNECTION mysql TO MYSQL (
  HOST '$PREF-mysql', USER root, PASSWORD SECRET mysqlpass
);
CREATE SOURCE mysql_source FROM MYSQL CONNECTION mysql FOR ALL TABLES;

-- Create a Webhook source
CREATE SOURCE webhook_table FROM WEBHOOK BODY FORMAT TEXT;

-- Create a Redpanda (Kafka-compatible) source
CREATE CONNECTION kafka_conn TO KAFKA (
    BROKER '$PREF-redpanda:9092', SECURITY PROTOCOL PLAINTEXT
);
CREATE CONNECTION csr_conn TO CONFLUENT SCHEMA REGISTRY (
    URL 'http://$PREF-redpanda:8081/'
);
CREATE SOURCE redpanda_table FROM KAFKA CONNECTION kafka_conn (
    TOPIC 'redpanda_table'
) FORMAT JSON;

-- Simple materialized view, incrementally updated, with data from all sources
CREATE MATERIALIZED VIEW mv AS
SELECT sum(pg_table.f1 + mysql_table.f1 + webhook_table.body::int +
           (redpanda_table.data->'f1')::int)
FROM pg_table
JOIN mysql_table ON TRUE
JOIN webhook_table ON TRUE
JOIN redpanda_table ON TRUE;

-- Create a sink to Redpanda so that the topic will always be up to date
CREATE SINK sink FROM mv INTO KAFKA CONNECTION kafka_conn (TOPIC 'mv')
FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_conn
ENVELOPE DEBEZIUM;

-- One-off export of our materialized view to S3-compatible MinIO
CREATE SECRET miniopass AS 'minioadmin';
CREATE CONNECTION minio TO AWS (
    ENDPOINT 'http://minio:9000',
    REGION 'minio',
    ACCESS KEY ID 'minioadmin',
    SECRET ACCESS KEY SECRET miniopass
);
COPY (SELECT * FROM mv) TO 's3://$PREF/mv' WITH (
    AWS CONNECTION = minio,
    FORMAT = 'csv'
);

-- Allow HTTP API read requests without a token
CREATE ROLE anonymous_http_user;
GRANT SELECT ON TABLE mv TO anonymous_http_user;
EOF

# Write additional data into Webhook source
curl -d "1" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
curl -d "2" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
curl -d "3" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table

# Read latest data from Redpanda
docker compose exec -T redpanda rpk topic consume mv --num 1

# CSV exists on S3-compatible MinIO
docker compose exec -T minio mc ls data/mzemulator/mv

# Use Postgres wire-compatible
psql postgres://materialize@127.0.0.1:6875/materialize <<EOF
SELECT * FROM pg_table;
SELECT * FROM mysql_table;
SELECT * FROM webhook_table;
SELECT * FROM redpanda_table;
SELECT * FROM mv;
EOF

# Use HTTP API
curl -s -X POST -H "Content-Type: application/json" \
    --data '{"queries": [{"query": "SELECT * FROM mv"}]}' \
    http://localhost:6876/api/sql | jq -r ".results[0].rows[0][0]"

Now you can start up a Materialize Emulator in under a minute:

$ cd mzemulator
$ cat run.sh
#!/bin/bash
[...]
$ time ./run.sh
[...]
./run.sh  0.34s user 0.36s system 1% cpu 45.462 total
$ psql postgres://materialize@127.0.0.1:6875/materialize -c "SELECT * FROM mv"
[...]
 sum
-----
 648
(1 row)
$ docker compose down
[+] Running 6/6
 ✔ Container mzemulator-redpanda      Removed                             1.3s
 ✔ Container mzemulator-mysql         Removed                            10.6s
 ✔ Container mzemulator-postgres      Removed                             1.0s
 ✔ Container mzemulator-minio         Removed                             0.7s
 ✔ Container mzemulator-materialized  Removed                             1.2s
 ✔ Network mzemulator_network         Removed                             0.4s

It’s that simple — just use the shell script to launch your Materialize Emulator.

Materialize Emulator: Test Quickly During Development

While Materialize is best experienced in our cloud, the Materialize Emulator allows you to quickly test your releases in a non-production environment.

Although the Materialize Emulator lacks many critical features included in the cloud version, the ability to test rapidly is helpful during development.

Try our Materialize Emulator right now to build your apps more efficiently! And sign up for a free trial of Materialize to see what our full cloud product is like.

How to Use the Materialize Emulator

Transform, Deliver, and Act

Related Posts You’ll Love

Materialize Emulator: What Is It?

Step-by-Step Walkthrough: How to Use the Materialize Emulator

Shell Script: Materialize Emulator as a Docker Compose Project

Materialize Emulator: Test Quickly During Development

More Articles

Transforming Real-Time Data with Operational Data Stores: A Dynamic Pricing Use Case

Fresh Data, Complex Queries: A Guide for PostgreSQL Users

Migrating from dbt-postgres to dbt-materialize

Get Started with Materialize