Materialize is a new kind of data warehouse built for operational workloads: the instant your data changes, Materialize reacts. This quickstart will get you up and running in a few minutes and with no dependencies, so you can experience the superpowers of an operational data warehouse first-hand:
Interactivity: get immediate responses from indexed warehouse relations and derived results.
Freshness: watch results change immediately in response to your input changes.
Consistency: results are always correct; never even transiently wrong.
Before you begin
All you need is a Materialize account. If you already have one — great! If not, sign up for a playground account first.
When you’re ready, head over to the Materialize console, and pop open the SQL Shell.
Step 1. Ingest streaming data
You’ll use a sample auction house data set to build an operational use case around auctions, bidders and (gasp) fraud. 🦹
As the auction house operator, you want to detect fraudulent behavior as soon as it happens, so you can act on it immediately. Lately, you’ve been struggling with auction flippers — users that purchase items only to quickly resell them for profit.
Let’s start by kicking off the built-in auction load generator, so you have some data to work with.
CREATE SOURCE auction_house FROM LOAD GENERATOR AUCTION (TICK INTERVAL '1s') FOR ALL TABLES WITH (SIZE = '3xsmall');
SHOW SOURCEScommand to get an idea of the data being generated:
name | type | size ------------------------+----------------+------ accounts | subsource | null auction_house | load-generator | 3xsmall auction_house_progress | progress | null auctions | subsource | null bids | subsource | null organizations | subsource | null users | subsource | null
For now, you’ll focus on the
bidsdata sets. Data will be continually produced as you walk through the quickstart.
Before moving on, get a sense for the data you’ll be working with:
SELECT * FROM auctions LIMIT 1;
id | seller | item | end_time ----+--------+--------------------+---------------------------- 1 | 1824 | Best Pizza in Town | 2023-09-10 21:24:54.838+00
SELECT * FROM bids LIMIT 1;
id | buyer | auction_id | amount | bid_time ----+-------+------------+--------+---------------------------- 10 | 3844 | 1 | 59 | 2023-09-10 21:25:00.465+00
Step 2. Use indexes for speed
Operational work requires interactive access to data as soon as it’s available. To identify potential auction flippers, you need to keep track of the winning bids for each completed auction.
Create a view that joins data from
bidsto get the bid with the highest
amountfor each auction at its
CREATE VIEW winning_bids AS SELECT DISTINCT ON (auctions.id) bids.*, auctions.item, auctions.seller FROM auctions, bids WHERE auctions.id = bids.auction_id AND bids.bid_time < auctions.end_time AND mz_now() >= auctions.end_time ORDER BY auctions.id, bids.bid_time DESC, bids.amount, bids.buyer;
Like in other SQL databases, a view in Materialize is just an alias for the embedded
SELECTstatement; results are computed only when the view is called.
You can query the view directly, but this shouldn’t be very impressive just yet! Querying the view re-runs the embedded statament, which comes at some cost on growing amounts of data.
SELECT * FROM winning_bids;
Yikes! In Materialize, you use indexes to keep results incrementally updated and immediately accessible.
Next, try creating several indexes on the
winning_bidsview using columns that can help optimize operations like point lookups and joins.
CREATE INDEX wins_by_item ON winning_bids (item); CREATE INDEX wins_by_bidder ON winning_bids (buyer); CREATE INDEX wins_by_seller ON winning_bids (seller);
These indexes will hold the results of
winning_bidsin memory, and work like a cache — except you don’t need to wire up one, or worry about the results getting stale.
If you now try to read out of
winning_bidswhile hitting one of these indexes (e.g., with a point lookup), things should be a whole lot more interactive.
SELECT * FROM winning_bids WHERE item = 'Best Pizza in Town' ORDER BY bid_time DESC;
But to detect and act upon fraud, you can’t rely on manual checks, right? You want to keep a running tab on these flippers. Luckily, the indexes you created in the previous step also make joins more interactive (as in other databases)!
Create a view that detects when a user wins an auction as a bidder, and then is identified as a seller for an item at a higher price.
CREATE VIEW fraud_activity AS SELECT w2.seller, w2.item AS seller_item, w2.amount AS seller_amount, w1.item buyer_item, w1.amount buyer_amount FROM winning_bids w1, winning_bids w2 WHERE w1.buyer = w2.seller AND w2.amount > w1.amount;
Aha! You can now catch any auction flippers in real time, based on the results of this view.
SELECT * FROM fraud_activity;
Step 3. See results change!
Operational work needs to surface and act on the most recent data. The moment your data changes, Materialize reacts. Let’s verify that this really happens by manually flagging some accounts as fraudulent, and observing results change in real time!
Create a table that allows you to manually flag fraudulent accounts.
CREATE TABLE fraud_accounts (id bigint);
In a new browser window, side-by-side with this one, navigate to the [Materialize console] (https://console.materialize.com/), and pop open another SQL Shell.
To see results change over time, let’s
SUBSCRIBEto a query that returns the Top 5 auction winners, overall.
SUBSCRIBE TO ( SELECT buyer, count(*) FROM winning_bids WHERE buyer NOT IN (SELECT id FROM fraud_accounts) GROUP BY buyer ORDER BY 2 DESC LIMIT 5 );
You can keep an eye on the results, but these may not change much at the moment. You’ll fix that in the next step!
Pick one of the buyers from the list maintained in the window that is running the
SUBSCRIBE, and mark them as fraudulent by adding them to the
INSERT INTO fraud_accounts VALUES (<id>);
This should cause the flagged buyer to immediately drop out of the Top 5! If you click Show diffs, you’ll notice that the picked buyer was kicked out, and the next non-fraudulent buyer in line automatically entered the Top 5.
When you’re done, cancel out of the
SUBSCRIBEusing Stop streaming, and close the secondary browser window.
Step 4. Serve correct results
With fraud out of the way, you can now shift your focus to a different operational use case: profit & loss alerts.
Operational work needs to act on correct and consistent data. Before you warn a user that they’ve spent much more than they’ve earned, you want to be sure your results are trustworthy — it’s real money we’re talking about, after all!
Create a view to track the sales and purchases of each auction house user.
CREATE VIEW funds_movement AS SELECT id, SUM(credits) as credits, SUM(debits) as debits FROM ( SELECT seller as id, amount as credits, 0 as debits FROM winning_bids UNION ALL SELECT buyer as id, 0 as credits, amount as debits FROM winning_bids ) GROUP BY id;
SELECTfrom this view, you’ll get many results, and results that are changing as new data is generated. This makes it hard to eyeball whether user funds really add up, in the first place.
To double check, you can write a diagnostic query that makes it easier to spot that results are correct and consistent. As an example, the total credit and total debit amounts should always add up.
SELECT SUM(credits), SUM(debits) FROM funds_movement;
You can also
SUBSCRIBEto this query, and watch the sums change in lock step as auctions close.
SUBSCRIBE TO ( SELECT SUM(credits), SUM(debits) FROM funds_movement );
It is never wrong, is it?
Step 5. Clean up
As the auction house operator, you should now have a high degree of confidence that Materialize can help you implement and automate operational use cases that depend on fresh, consistent results served in an interactive way.
Once you’re done exploring the auction house source, remember to clean up your environment:
DROP SOURCE auction_house CASCADE; DROP TABLE fraud_accounts;
To get started with your own data, upgrade your playground to a trial account.