Quickstart

Materialize provides always-fresh results while also providing strong consistency guarantees. In Materialize, both indexes and materialized views incrementally update results when Materialize ingests new data; i.e., work is performed on writes. Because work is performed on writes, reads from these objects return up-to-date results while being computationally free.

In this quickstart, you will continuously ingest a sample auction data set to build an operational use case around finding auction winners and auction flippers. Specifically, you will:

  • Create and query various views on sample auction data. The data is continually generated at 1 second intervals to mimic a data-intensive workload.

  • Create an index to compute and store view results in memory. As new auction data arrives, the index incrementally updates view results instead of recalculating the results from scratch, making fresh results immediately available for reads.

  • Create and query views to verify that Materialize always serves consistent results.

Image of Quickstart in the Materialize Console

Prerequisite

A Materialize account. If you do not have an account, you can sign up for a free trial.

Alternatively, you can download the Materialize Emulator to test locally. However, the Materialize Emulator does not provide the full experience of using Materialize.

Step 0. Open the SQL Shell or a SQL client.

  • If you have a Materialize account, navigate to the Materialize Console and sign in. By default, you should be in the SQL Shell. If you’re already signed in, you can access the SQL Shell in the left-hand menu.

  • If you are using the Materialize Emulator, connect to the Materialize Emulator using your preferred SQL client.

Step 1. Create a schema.

By default, you are using the quickstart cluster, working in the materialize.public namespace, where:

  • A cluster is an isolated pool of compute resources (CPU, memory, and scratch disk space) for running your workloads),

  • materialize is the database name, and

  • public is the schema name.

Create a separate schema for this quickstart. For a schema name to be valid:

  • The first character must be either: an ASCII letter (a-z and A-Z), an underscore (_), or a non-ASCII character.

  • The remaining characters can be: an ASCII letter (a-z and A-Z), ASCII numbers (0-9), an underscore (_), dollar signs ($), or a non-ASCII character.

Alternatively, by double-quoting the name, you can bypass the aforementioned constraints with the following exception: schema names, whether double-quoted or not, cannot contain the dot (.).

See also Naming restrictions.

  1. Enter a schema name in the text field and click the Create button.

  2. Switch to the new schema. From the top of the SQL Shell, select your schema from the namespace dropdown.

  1. Use CREATE SCHEMA to create your schema.

    -- Replace <schema> with the name for your schema
    CREATE SCHEMA materialize.<schema>;
    
  2. Select your schema.

    -- Replace <schema> with the name for your schema
    SET SCHEMA <schema>;
    

Step 2. Create the source.

Sources are external systems from which Materialize reads in data. This tutorial uses Materialize’s sample Auction load generator to create the source.

  1. Create the source using the CREATE SOURCE command.

    For the sample Auction load generator, the quickstart uses CREATE SOURCE with the FROM LOAD GENERATOR clause that works specifically with Materialize’s sample data generators. The tutorial specifies that the generator should emit new data every 1s.

    CREATE SOURCE auction_house
    FROM LOAD GENERATOR AUCTION
    (TICK INTERVAL '1s', AS OF 100000)
    FOR ALL TABLES;
    

    CREATE SOURCE can create multiple tables (referred to as subsources in Materialize) when ingesting data from multiple upstream tables. For each upstream table that is selected for ingestion, Materialize creates a subsource.

  2. Use the SHOW SOURCES command to see the results of the previous step.

    SHOW SOURCES;
    

    The output should resemble the following:

    | name                   | type           | cluster    | comment |
    | ---------------------- | -------------- | ---------- | ------- |
    | accounts               | subsource      | quickstart |         |
    | auction_house          | load-generator | quickstart |         |
    | auction_house_progress | progress       | null       |         |
    | auctions               | subsource      | quickstart |         |
    | bids                   | subsource      | quickstart |         |
    | organizations          | subsource      | quickstart |         |
    | users                  | subsource      | quickstart |         |
    

    A subsource is how Materialize refers to a table that has the following properties:

    • A subsource can only be written by the source; in this case, the load-generator.

    • Users can read from subsources.

  3. Use the SELECT statement to query auctions and bids.

    • View a sample row in auctions:

      SELECT * FROM auctions LIMIT 1;
      

      The output should return a single row (your results may differ):

       id    | seller | item               | end_time
      -------+--------+--------------------+---------------------------
       29550 | 2468   | Best Pizza in Town | 2024-07-25 18:24:25.805+00
      
    • View a sample row in bids:

      SELECT * FROM bids LIMIT 1;
      

      The output should return a single row (your results may differ):

       id     | buyer | auction_id | amount | bid_time
      --------+-------+------------+--------+---------------------------
       295641 | 737   | 29564      | 72     | 2024-07-25 18:25:42.911+00
      
    • To view the relationship between auctions and bids, you can join by the auction id:

      SELECT a.*, b.*
      FROM auctions AS a
      JOIN bids AS b
        ON a.id = b.auction_id
      LIMIT 3;
      

      The output should return (at most) 3 rows (your results may differ):

      | id    | seller | item               | end_time                   | id     | buyer | auction_id | amount | bid_time                   |
      | ----- | ------ | ------------------ | -------------------------- | ------ | ----- | ---------- | ------ | -------------------------- |
      | 15575 | 158    | Signed Memorabilia | 2024-07-25 20:30:25.085+00 | 155751 | 215   | 15575      | 27     | 2024-07-25 20:30:16.085+00 |
      | 15575 | 158    | Signed Memorabilia | 2024-07-25 20:30:25.085+00 | 155750 | 871   | 15575      | 63     | 2024-07-25 20:30:15.085+00 |
      | 15575 | 158    | Signed Memorabilia | 2024-07-25 20:30:25.085+00 | 155752 | 2608  | 15575      | 16     | 2024-07-25 20:30:17.085+00 |
      

      Subsequent steps in this quickstart uses a query to find winning bids for auctions to show how Materialize uses views and indexes to provide immediately available up-to-date results for various queries.

Step 3. Create a view to find winning bids.

A view is a saved name for the underlying SELECT statement, providing an alias/shorthand when referencing the query. The underlying query is not executed during the view creation; instead, the underlying query is executed when the view is referenced.

Assume you want to find the winning bids for auctions that have ended. The winning bid for an auction is the highest bid entered for an auction before the auction ended. As new auction and bid data appears, the query must be rerun to get up-to-date results.

  1. Using the CREATE VIEW command, create a view winning_bids for the query that finds winning bids for auctions that have ended.

    CREATE VIEW winning_bids AS
    SELECT DISTINCT ON (a.id) b.*, a.item, a.seller
    FROM auctions AS a
     JOIN bids AS b
       ON a.id = b.auction_id
    WHERE b.bid_time < a.end_time
      AND mz_now() >= a.end_time
    ORDER BY a.id,
      b.amount DESC,
      b.bid_time,
      b.buyer;
    

    SELECT DISTINCT ON (...) is a PostgreSQL expression that keeps only the first row of each set of matching rows.

  2. SELECT from the view to execute the underlying query. For example:

    SELECT * FROM winning_bids
    ORDER BY bid_time DESC
    LIMIT 10;
    
    SELECT * FROM winning_bids
    WHERE item = 'Best Pizza in Town'
    ORDER BY bid_time DESC
    LIMIT 10;
    

    Since new data is continually being ingested, you must rerun the query to get the up-to-date results. Each time you query the view, you are re-running the underlying statement, which becomes less performant as the amount of data grows.

    In Materialize, to make the queries more performant even as data continues to grow, you can create indexes on views. Indexes provide always fresh view results in memory within a cluster by performing incremental updates as new data arrives. Queries can then read from the in-memory, already up-to-date results instead of re-running the underlying statement, making queries computationally free and more performant.

    In the next step, you will create an index on winning_bids.

Step 4. Create an index to provide up-to-date results.

Indexes in Materialize represents query results stored in memory within a cluster. In Materialize, you can create indexes on views to provide always fresh, up-to-date view results in memory within a cluster. Queries can then read from the in-memory, already up-to-date results instead of re-running the underlying statement.

To provide the up-to-date results, indexes perform incremental updates as inputs change instead of recalculating the results from scratch. Additionally, indexes can also help optimize operations like point lookups and joins.

  1. Use the CREATE INDEX command to create the following index on the winning_bids view.

    CREATE INDEX wins_by_item ON winning_bids (item);
    

    During the index creation, the underlying winning_bids query is executed, and the view results are stored in memory within the cluster. As new data arrives, the index incrementally updates the view results in memory. Because incremental work is performed on writes, reads from indexes return up-to-date results and are computationally free.

    This index can also help optimize operations like point lookups and delta joins on the index column(s) as well as support ad-hoc queries.

  2. Rerun the previous queries on winning_bids.

    SELECT * FROM winning_bids
    ORDER BY bid_time DESC
    LIMIT 10;
    
    SELECT * FROM winning_bids
    WHERE item = 'Best Pizza in Town'
    ORDER BY bid_time DESC
    LIMIT 10;
    

    The queries should be faster since they use the in-memory, already up-to-date results computed by the index.

Step 5. Create views and a table to find flippers in real time.

For this quickstart, auction flipping activities are defined as when a user buys an item in one auction and resells the same item at a higher price within an 8-day period. This step finds auction flippers in real time, based on auction flipping activity data and known flippers data. Specifically, this step creates:

  • A view to find auction flipping activities. Results are updated as new data comes in (at 1 second intervals) from the data generator.

  • A table that maintains known auction flippers. You will manually enter new data to this table.

  • A view to immediately see auction flippers based on both the flipping activities view and the known auction flippers table.

  1. Create a view to detect auction flipping activities.

    CREATE VIEW flip_activities AS
    SELECT w2.seller as flipper_id,
           w2.item AS item,
           w2.amount AS sold_amount,
           w1.amount AS purchased_amount,
           w2.amount - w1.amount AS diff_amount,
           datediff('days', w2.bid_time, w1.bid_time) AS timeframe_days
     FROM  winning_bids AS w1
       JOIN winning_bids AS w2
         ON w1.buyer = w2.seller   -- Buyer and seller are the same
            AND w1.item = w2.item  -- Item is the same
     WHERE w2.amount > w1.amount   -- But sold at a higher price
       AND datediff('days', w2.bid_time, w1.bid_time) < 8;
    

    The flip_activities view can use the index created on winning_bids view to provide up-to-date data.

    To view a sample row in flip_activities, run the following SELECT command:

    SELECT * FROM flip_activities LIMIT 10;
    
  2. Use CREATE TABLE to create a known_flippers table that you can manually populate with known flippers. That is, assume that separate from your auction activities data, you receive independent data specifying users as flippers.

    CREATE TABLE known_flippers (flipper_id bigint);
    
  3. Create a view flippers to flag known flipper accounts if a user has more than 2 flipping activities or the user is listed in the known_flippers table.

     CREATE VIEW flippers AS
     SELECT flipper_id
     FROM (
         SELECT flipper_id
         FROM flip_activities
         GROUP BY flipper_id
         HAVING count(*) >= 2
    
         UNION ALL
    
         SELECT flipper_id
         FROM known_flippers
     );
    
NOTE: Both the flip_activities and flippers views can use the index created on winning_bids view to provide up-to-date data. Depending upon your query patterns and usage, an existing index may be sufficient, such as in this quickstart. In other use cases, creating an index only on the view(s) from which you will serve results may be preferred.

Step 6. Subscribe to see results change.

SUBSCRIBE to flippers to see new flippers appear as new data arrives (either from the known_flippers table or the flip_activities view).

  1. Use SUBSCRIBE command to see flippers as new data arrives (either from the known_flippers table or the flip_activities view). SUBSCRIBE returns data from a source, table, view, or materialized view as they occur, in this case, the view flippers.

    SUBSCRIBE TO (
       SELECT *
       FROM flippers
    ) WITH (snapshot = false)
    ;
    

    The optional WITH (snapshot = false) option indicates that the command displays only the new flippers that come in after the start of the SUBSCRIBE operation, and not the flippers in the view at the start of the operation.

    1. In the Materialize Console quickstart page, enter an id (for example 450) into the text input field to insert a new user into the known-flippers table. You can specify any number for the flipper id.

      The flipper should immediately appear in the SUBSCRIBE results.

      You should also see flippers who are flagged by their flip activities. Because of the randomness of the auction data being generated, user activity data that match the definition of a flipper may take some time even though auction data is constantly being ingested. However, once new matching data comes in, you will see it immediately in the SUBSCRIBE results. While waiting, you can enter additional flippers into the known_flippers table.

    2. To cancel out of the SUBSCRIBE, click the Stop streaming button.

    If running Materialize in a Docker container, run the following command in your preferred SQL client:

    COPY (SUBSCRIBE TO (
         SELECT *
         FROM flippers
    ) WITH (snapshot = false)) TO STDOUT;
    

    The optional WITH (snapshot = false) option indicates that the command displays only the new flippers that come in after the start of the SUBSCRIBE operation, and not the flippers in the view at the start of the operation.

    1. Open another SQL client and connect to your Materialize.

    2. Select your schema.

      -- Replace <schema> with the name for your schema
      SET SCHEMA <schema>;
      
    3. Manually insert a row into the known_flippers table. You can specify any number for the flipper id.

      INSERT INTO known_flippers values (450);
      

      In the other client, the flipper should immediately appear in the SUBSCRIBE results.

      You should also see flippers who are flagged by their flip activities. Because of the randomness of the auction data being generated, user activity data that match the definition of a flipper may take some time even though auction data is constantly being ingested. However, once new matching data comes in, you will see it immediately in the SUBSCRIBE results. While waiting, you can enter additional flippers into the known_flippers table.

    4. Cancel out of the SUBSCRIBE.

Step 7. Create views to verify that Materialize returns consistent data.

To verify that Materialize serves consistent results, even as new data comes in, this step creates the following views for completed auctions:

  • A view to keep track of each seller’s credits.

  • A view to keep track of each buyer’s debits.

  • A view that sums all sellers’ credits, all buyers’ debits, and calculates the difference, which should be 0.

  1. Create a view to track credited amounts for sellers of completed auctions.

    CREATE VIEW seller_credits AS
    SELECT seller, SUM(amount) as credits
    FROM winning_bids
    GROUP BY seller;
    
  2. Create a view to track debited amounts for the winning bidders of completed auctions.

    CREATE VIEW buyer_debits AS
    SELECT buyer, SUM(amount) as debits
    FROM winning_bids
    GROUP BY buyer;
    
  3. To verify that the total credit and total debit amounts equal for completed auctions (i.e., to verify that the results are correct and consistent even as new data comes in), create a funds_movement view that calculates the total credits across sellers, total debits across buyers, and the difference between the two.

    CREATE VIEW funds_movement AS
    SELECT SUM(credits) AS total_credits,
           SUM(debits) AS total_debits,
           SUM(credits) - SUM(debits) AS total_difference
    FROM (
      SELECT SUM(credits) AS credits, 0 AS debits
      FROM seller_credits
    
      UNION
    
      SELECT 0 AS credits, SUM(debits) AS debits
      FROM buyer_debits
    );
    

    To see that the sums always equal even as new data comes in, you can SUBSCRIBE to this query:

    SUBSCRIBE TO (
       SELECT *
       FROM funds_movement
    );
    

    Toggle Show diffs to see changes to funds_movement.

    • As new data comes in and auctions complete, the total_credits and total_debits values should change but the total_difference should remain 0.
    • To cancel out of the SUBSCRIBE, click the Stop streaming button.

    If running Materialize in a Docker container, run the following command in your preferred SQL client:

    COPY (SUBSCRIBE TO (
       SELECT *
       FROM funds_movement
    )) TO STDOUT;
    
    • As new data comes in and auctions complete, the total_credits and total_debits values should change but the total_difference should remain 0.
    • Cancel out of the SUBSCRIBE.

Step 8. Clean up.

To clean up the quickstart environment:

  1. Use the DROP SOURCE ... CASCADE command to drop auction_house source and its dependent objects, including views and indexes created on the auction_house subsources.

    DROP SOURCE auction_house CASCADE;
    
  2. Use the DROP TABLE command to drop the separate known_flippers table.

    DROP TABLE known_flippers;
    

Summary

In Materialize, indexes represent query results stored in memory within a cluster. When you create an index on a view, the index incrementally updates the view results (instead of recalculating the results from scratch) as Materialize ingests new data. These up-to-date results are then immediately available and computationally free for reads within the cluster.

General guidelines

This quickstart created an index on a view to maintain in-memory up-to-date results in the cluster. In Materialize, both materialized views and indexes on views incrementally update the view results. Materialized views persist the query results in durable storage and is available across clusters while indexes maintain the view results in memory within a single cluster.

Some general guidelines for usage patterns include:

Usage Pattern General Guideline
View results are accessed from a single cluster only View with an index
View results are accessed across clusters Materialized view
Final consumer of the view is a sink or a SUBSCRIBE operation Materialized view
View maintenance and query serving are scaled independently Materialized view

The quickstart used an index since:

  • The examples did not need to store the results in durable storage.

  • All activities were limited to the single quickstart cluster.

  • Although used, SUBSCRIBE operations were for illustrative/validation purposes and were not the final consumer of the views.

Considerations

Before creating an index (which represents query results stored in memory), consider its memory usage as well as its compute cost implications. For best practices when creating indexes, see Index Best Practices.

Additional information

Next steps

To get started ingesting your own data from an external system like Kafka, MySQL or PostgreSQL, check the documentation for sources, and navigate to Data > Sources > New source in the Materialize Console to create your first source.

For help getting started with your data or other questions about Materialize, you can schedule a free guided trial.

Back to top ↑