Quickstart
Materialize is the Cloud Operational Data Store that delivers the speed of streaming with the ease of a data warehouse. With Materialize, organizations can use SQL to transform, deliver, and act on fast-changing data.
This quickstart will get you up and running in a few minutes and with no dependencies, so you can experience the superpowers of an operational data store first-hand:
-
Interactivity: get immediate responses from indexed warehouse relations and derived results.
-
Freshness: watch results change immediately in response to your input changes.
-
Consistency: results are always correct; never even transiently wrong.
Prerequisite
A Materialize account. If you do not have an account, you can sign up for a free trial.
Alternatively, you can download the Materialize Emulator to test locally. However, the Materialize Emulator does not provide the full experience of using Materialize.
Step 0. Open the SQL Shell or a SQL client.
-
If you have a Materialize account, navigate to the Materialize Console and sign in. By default, you should be in the SQL Shell. If you’re already signed in, you can access the SQL Shell in the left-hand menu.
-
If you are using the Materialize Emulator, connect to the Materialize Emulator using your preferred SQL client.
Step 1. Ingest streaming data
You’ll use a sample auction house data set to build an operational use case around auctions, bidders and (gasp) fraud. 🦹
As the auction house operator, you want to detect fraudulent behavior as soon as it happens, so you can act on it immediately. Lately, you’ve been struggling with auction flippers — users that purchase items only to quickly resell them for profit.
-
Let’s start by kicking off the built-in auction load generator, so you have some data to work with.
CREATE SOURCE auction_house FROM LOAD GENERATOR AUCTION (TICK INTERVAL '1s', AS OF 100000) FOR ALL TABLES;
-
Use the
SHOW SOURCES
command to get an idea of the data being generated:SHOW SOURCES;
name | type ------------------------+----------------- accounts | subsource auction_house | load-generator auction_house_progress | progress auctions | subsource bids | subsource organizations | subsource users | subsource
For now, you’ll focus on the
auctions
andbids
data sets. Data will be continually produced as you walk through the quickstart. -
Before moving on, get a sense for the data you’ll be working with:
SELECT * FROM auctions LIMIT 1;
id | seller | item | end_time ----+--------+--------------------+---------------------------- 1 | 1824 | Best Pizza in Town | 2023-09-10 21:24:54.838+00
SELECT * FROM bids LIMIT 1;
id | buyer | auction_id | amount | bid_time ----+-------+------------+--------+---------------------------- 10 | 3844 | 1 | 59 | 2023-09-10 21:25:00.465+00
Step 2. Use indexes for speed
Operational work requires interactive access to data as soon as it’s available. To identify potential auction flippers, you need to keep track of the winning bids for each completed auction.
-
Create a view that joins data from
auctions
andbids
to get the bid with the highestamount
for each auction at itsend_time
.CREATE VIEW winning_bids AS SELECT DISTINCT ON (auctions.id) bids.*, auctions.item, auctions.seller FROM auctions, bids -- Where all bids occurred during the auction WHERE auctions.id = bids.auction_id AND bids.bid_time < auctions.end_time -- Where all auctions have completed AND mz_now() >= auctions.end_time ORDER BY auctions.id, bids.amount DESC, bids.bid_time, bids.buyer;
Like in other SQL databases, a view in Materialize is just an alias for the embedded
SELECT
statement; results are computed only when the view is called. -
You can query the view directly, but this shouldn’t be very impressive just yet! Querying the view re-runs the embedded statement, which comes at some cost on growing amounts of data.
SELECT * FROM winning_bids WHERE item = 'Best Pizza in Town' ORDER BY bid_time DESC;
Yikes! In Materialize, you use indexes to keep results incrementally updated and immediately accessible.
-
Next, try creating several indexes on the
winning_bids
view using columns that can help optimize operations like point lookups and joins.CREATE INDEX wins_by_item ON winning_bids (item); CREATE INDEX wins_by_bidder ON winning_bids (buyer); CREATE INDEX wins_by_seller ON winning_bids (seller);
These indexes will hold the results of
winning_bids
in memory, and work like a cache — except you don’t need to wire up one, or worry about the results getting stale. -
If you now try to read out of
winning_bids
while hitting one of these indexes (e.g., with a point lookup), things should be a whole lot more interactive.SELECT * FROM winning_bids WHERE item = 'Best Pizza in Town' ORDER BY bid_time DESC;
But to detect and act upon fraud, you can’t rely on manual checks, right? You want to keep a running tab on these flippers. Luckily, the indexes you created in the previous step also make joins more interactive (as in other databases)!
-
Create a view that detects when a user wins an auction as a bidder, and then is identified as a seller for an item at a higher price.
CREATE VIEW fraud_activity AS SELECT w2.seller, w2.item AS seller_item, w2.amount AS seller_amount, w1.item buyer_item, w1.amount buyer_amount FROM winning_bids w1, winning_bids w2 -- Identified as a buyer and seller for any two auctions WHERE w1.buyer = w2.seller -- For the same item AND w1.item = w2.item -- Tries to sell at a higher price AND w2.amount > w1.amount;
Aha! You can now catch any auction flippers in real time, based on the results of this view.
SELECT * FROM fraud_activity LIMIT 100;
Step 3. See results change!
Operational work needs to surface and act on the most recent data. The moment your data changes, Materialize reacts. Let’s verify that this really happens by manually flagging some accounts as fraudulent, and observing results change in real time!
-
Create a table that allows you to manually flag fraudulent accounts.
CREATE TABLE fraud_accounts (id bigint);
-
Open another SQL Shell or SQL client.
-
If using the Materialize Console, in a new browser window, side-by-side with this one, navigate to the Materialize console, and pop open another SQL Shell.
-
If running against the Materialize Emulator, open another instance of your preferred SQL client and connect to the Materialize Emulator.
-
-
To see results change over time, let’s
SUBSCRIBE
to a query that returns the Top 5 auction winners, overall.SUBSCRIBE TO ( SELECT buyer, count(*) FROM winning_bids WHERE buyer NOT IN (SELECT id FROM fraud_accounts) GROUP BY buyer ORDER BY 2 DESC LIMIT 5 );
If running against the Materialize Emulator, run the following command in your preferred SQL client:
COPY (SUBSCRIBE ( SELECT buyer, count(*) FROM winning_bids WHERE buyer NOT IN (SELECT id FROM fraud_accounts) GROUP BY buyer ORDER BY 2 DESC LIMIT 5 )) TO STDOUT;
You can keep an eye on the results, but these may not change much at the moment. You’ll fix that in the next step!
-
Pick one of the buyers from the list maintained in the window that is running the
SUBSCRIBE
, and mark them as fraudulent by adding them to thefraud_accounts
table.INSERT INTO fraud_accounts VALUES (<id>);
This should cause the flagged buyer to immediately drop out of the Top 5! If you click Show diffs, you’ll notice that the picked buyer was kicked out, and the next non-fraudulent buyer in line automatically entered the Top 5.
When you’re done, cancel out of the
SUBSCRIBE
using Stop streaming, and close the secondary browser window.
Step 4. Serve correct results
With fraud out of the way, you can now shift your focus to a different operational use case: profit & loss alerts.
Operational work needs to act on correct and consistent data. Before you warn a user that they’ve spent much more than they’ve earned, you want to be sure your results are trustworthy — it’s real money we’re talking about, after all!
-
Create a view to track the sales and purchases of each auction house user.
CREATE VIEW funds_movement AS SELECT id, SUM(credits) as credits, SUM(debits) as debits FROM ( SELECT seller as id, amount as credits, 0 as debits FROM winning_bids UNION ALL SELECT buyer as id, 0 as credits, amount as debits FROM winning_bids ) GROUP BY id;
If you
SELECT
from this view, you’ll get many results, and results that are changing as new data is generated. This makes it hard to eyeball whether user funds really add up, in the first place. -
To double check, you can write a diagnostic query that makes it easier to spot that results are correct and consistent. As an example, the total credit and total debit amounts should always add up.
SELECT SUM(credits), SUM(debits) FROM funds_movement;
You can also
SUBSCRIBE
to this query, and watch the sums change in lock step as auctions close.SUBSCRIBE TO ( SELECT SUM(credits), SUM(debits) FROM funds_movement );
If running against the Materialize Emulator, run the following command in your preferred SQL client:
COPY (SUBSCRIBE TO ( SELECT SUM(credits), SUM(debits) FROM funds_movement )) TO STDOUT;
It is never wrong, is it?
Step 5. Clean up
As the auction house operator, you should now have a high degree of confidence that Materialize can help you implement and automate operational use cases that depend on fresh, consistent results served in an interactive way.
Once you’re done exploring the auction house source, remember to clean up your environment:
DROP SOURCE auction_house CASCADE;
DROP TABLE fraud_accounts;
What’s next?
To get started ingesting your own data from an external system like Kafka, MySQL or PostgreSQL, check the documentation for sources, and navigate to Data > Sources > New source in the Materialize Console to create your first source.
For help getting started with your data or other questions about Materialize, you can schedule a free guided trial.