Get started with Materialize
This guide walks you through getting started with Materialize, covering:
-
Connecting to a streaming data source
-
Getting familiar with views, indexes and materialized views
-
Exploring common patterns like joins and time-windowing
-
Simulating a failure to see active replication in action
Connect
Open a terminal window and connect to Materialize using any Materialize-compatible CLI, like psql
. If you already have psql
installed on your machine, use the connection string provided in the UI to connect:
psql "postgres://user%40domain.com@host:6875/materialize"
Otherwise, you can install psql
using the following instructions:
brew install libpq
brew link --force libpq
sudo apt-get update
sudo apt-get install postgresql-client
Explore a streaming source
Materialize allows you to work with streaming data from multiple external sources using nothing but standard SQL. You write arbitrarily complex queries; Materialize takes care of maintaining the results automatically up to date with very low latency.
We’ll start with some real-time data produced by Materialize’s built-in load generator source, which gives you a quick way to get up and running with no dependencies.
-
Let’s use the
AUCTION
load generator source to simulate an auction house where different users are bidding on an ongoing series of auctions:CREATE SOURCE auction_house FROM LOAD GENERATOR AUCTION FOR ALL TABLES WITH (SIZE = '3xsmall');
The
CREATE SOURCE
statement is a definition of where to find and how to connect to a data source. Submitting the statement will prompt Materialize to start ingesting data into durable storage. -
The
auction_house
source will be automatically demuxed into multiple subsources, each representing a different underlying table populated by the load generator:SHOW SOURCES;
name | type | size ---------------+----------------+--------- accounts | subsource | 3xsmall auction_house | load-generator | 3xsmall auctions | subsource | 3xsmall bids | subsource | 3xsmall organizations | subsource | 3xsmall users | subsource | 3xsmall
-
Now that we have some data to play around with, let’s set up a cluster (logical compute) with one
xsmall
replica (physical compute) so we can start running some queries:CREATE CLUSTER auction_house REPLICAS (xsmall_replica (SIZE = 'xsmall')); SET CLUSTER = auction_house;
Joins
Materialize efficiently supports all types of SQL joins under all the conditions you would expect from a traditional relational database.
The first thing we might want to do with our data is find all the on-time bids: bids that arrived before their corresponding auction closed, and so are eligible to be winners. For that, we’ll enrich the bids
stream with the reference auctions
data.
-
Create a view
on_time_bids
that joinsbids
andauctions
based on the auction identifier:CREATE VIEW on_time_bids AS SELECT bids.id AS bid_id, auctions.id AS auction_id, auctions.seller, bids.buyer, auctions.item, bids.bid_time, auctions.end_time, bids.amount FROM bids JOIN auctions ON bids.auction_id = auctions.id WHERE bids.bid_time < auctions.end_time;
-
To see the results:
SELECT * FROM on_time_bids LIMIT 5;
bid_id | auction_id | seller | buyer | item | bid_time | end_time | amount --------+------------+--------+-------+------------+----------------------------+----------------------------+-------- 512 | 51 | 1389 | 3546 | Custom Art | 2022-09-16 23:29:38.694+00 | 2022-09-16 23:29:46.694+00 | 98 66560 | 6656 | 715 | 874 | Custom Art | 2022-09-17 11:24:38.198+00 | 2022-09-17 11:24:48.198+00 | 27 132352 | 13235 | 202 | 1442 | Custom Art | 2022-09-17 23:17:32.66+00 | 2022-09-17 23:17:40.66+00 | 18 198144 | 19814 | 2593 | 1510 | Custom Art | 2022-09-18 11:07:06.605+00 | 2022-09-18 11:07:12.605+00 | 52 2560 | 256 | 1074 | 1982 | Custom Art | 2022-09-16 23:51:53.236+00 | 2022-09-16 23:52:03.236+00 | 36
If you re-run the
SELECT
statement at different points in time, you can see the updated results based on the latest data.
Indexes
-
Create a view
avg_bids
that keeps track of the average bid price for on-time bids:CREATE VIEW avg_bids AS SELECT auction_id, avg(amount) AS amount FROM on_time_bids GROUP BY auction_id;
One thing to note here is that we created a non-materialized view, which doesn’t store the results of the query but simply provides an alias for the embedded
SELECT
statement. The results of a view can be incrementally maintained in memory within a cluster by creating an index. -
Create an index
avg_bids_idx
;CREATE INDEX avg_bids_idx ON avg_bids (auction_id);
Indexes assemble and incrementally maintain a query’s results updated in memory within a cluster, which speeds up query time.
-
To see the results:
SELECT * FROM avg_bids LIMIT 10;
Regardless of how complex the underlying view definition is, querying an indexed view is computationally free because the results are pre-computed and available in memory.
Materialized views
So far, we’ve built our auction house strictly using views. Depending on your setup, in some cases you might want to use a materialized view: a view that is persisted in durable storage and incrementally updated as new data arrives.
-
Create a view that takes the on-time bids and finds the highest bid for each auction:
CREATE VIEW highest_bid_per_auction AS SELECT grp.auction_id, bid_id, buyer, seller, item, amount, bid_time, end_time FROM (SELECT DISTINCT auction_id FROM on_time_bids) grp, LATERAL ( SELECT * FROM on_time_bids WHERE auction_id = grp.auction_id ORDER BY amount DESC LIMIT 1 );
In other databases, you might have used a window function (like
ROW_NUMBER()
) to implement this query pattern. In Materialize, it can be implemented in a more performant way using aLATERAL
subquery.
Temporal filters
In Materialize, temporal filters allow you to define time-windows over otherwise unbounded streams of data. This is useful to model business processes or simply to limit resource usage.
-
In this case, we need to filter out rows for auctions that are still ongoing:
CREATE MATERIALIZED VIEW winning_bids AS SELECT * FROM highest_bid_per_auction WHERE end_time < mz_now();
The
mz_now()
function is used to keep track of the logical time that your query executes (similar tonow()
in other systems, as explained more in-depth in “now and mz_now functions”). As time advances, only the records that satisfy the time constraint are used in the materialized view. -
To see the results, let’s use
SUBSCRIBE
instead of a vanillaSELECT
:COPY ( SUBSCRIBE ( SELECT auction_id, bid_id, item, amount FROM winning_bids )) TO STDOUT;
To cancel out of the stream, press CTRL+C.
Break something!
-
From a different terminal window, open a new connection to Materialize.
-
Add an additional replica to the
auction_house
cluster:CREATE CLUSTER REPLICA auction_house.bigger SIZE = 'small';
-
To simulate a failure, drop the
xsmall_replica
:DROP CLUSTER REPLICA auction_house.xsmall_replica;
If you switch to the terminal window running the
SUBSCRIBE
command, you’ll see that results are still being pushed out to the client.
Learn more
That’s it! You just created your first materialized view, tried out some common patterns enabled by SQL on streams, and tried to break Materialize! We encourage you to continue exploring the AUCTION
load generator source using the supported SQL commands, and read through “What is Materialize?" for a more comprehensive overview.