Materialize Documentation
Join the Community github/materialize

Get Started with Cloud

Materialize Cloud is in beta and is not subject to Materialize's backwards compatibility guarantee.

This guide walks you through getting started with Materialize Cloud, from setting up an account to creating your first materialized view on top of streaming data. We’ll cover:

Sign up

  1. Sign up for Materialize Cloud at https://cloud.materialize.com.

  2. Once your account has been created, log in.

  3. If you’ve been invited to an existing workspace, jump directly to Deploy and Connect. Otherwise, a dialog will ask you to create a workspace.

    Enter a name and click Next.

Deploy and connect

  1. In the deployments page, click Create deployment in the upper right corner. Enter a unique Name (or use the default) and choose Extra small (XS) as the deployment size.

    Then, click Create.

    Note: The size of new deployments is set to extra small (XS) by default, which is enough to run this walkthrough. For more information on deployment sizes, check Account Limits.

  2. Once the status message reads HEALTHY, the deployment is ready for connections!

    Before you can connect, though, you need to install some TLS certificates on your local machine.

  3. Click the deployment card, then download and unzip the certificate package.

  4. To connect to the deployment, you can use any Materialize-compatible CLI, like psql or mzcli. If you have psql installed on your machine, proceed to the next step.

    Otherwise, follow the installation instructions.

  5. Open a terminal window on your machine, copy the psql command from the connection dialog and run it from the directory that contains the certificates.

    On connection success, you should see an open prompt waiting for your next commands:

    psql (13.2, server 9.5.0)
    SSL connection (protocol: TLSv1.3, cipher:
    TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.
    
    materialize=>
    

Explore a streaming source

Materialize allows you to work with streaming data from multiple external sources using nothing but standard SQL. You write arbitrarily complex queries; Materialize takes care of maintaining the results automatically up to date with very low latency.

We’ll start with some sample real-time data from a PubNub stream receiving the latest market orders for a given marketplace.

  1. Let’s create a PubNub source that connects to the market orders channel with a subscribe key:

    CREATE SOURCE market_orders_raw
    FROM PUBNUB
    SUBSCRIBE KEY 'sub-c-4377ab04-f100-11e3-bffd-02ee2ddab7fe'
    CHANNEL 'pubnub-market-orders';
    

    The CREATE SOURCE statement is a definition of where to find and how to connect to our data source — Materialize won’t start ingesting data just yet.

    To list the columns created:

    SHOW COLUMNS FROM market_orders_raw;
    
  2. The PubNub source produces data as a single text column containing JSON. To extract the JSON fields for each market order, you can use the built-in jsonb operators:

    CREATE VIEW market_orders AS
    SELECT
        ((text::jsonb)->>'bid_price')::float AS bid_price,
        (text::jsonb)->>'order_quantity' AS order_quantity,
        (text::jsonb)->>'symbol' AS symbol,
        (text::jsonb)->>'trade_type' AS trade_type,
        to_timestamp(((text::jsonb)->'timestamp')::bigint) AS ts
    FROM market_orders_raw;
    

    One thing to note here is that we created a non-materialized view, which doesn’t store the results of the query but simply provides an alias for the embedded SELECT statement.

  3. We can now use this view as a base to create a materialized view that computes the average bid price:

    CREATE MATERIALIZED VIEW avg_bid AS
    SELECT symbol,
           AVG(bid_price) AS avg
    FROM market_orders
    GROUP BY symbol;
    

    The avg_bid view is incrementally updated as new data streams in, so you get fresh and correct results with millisecond latency. Behind the scenes, Materialize is indexing the results of the embedded query in memory (i.e. materializing the view).

  4. Let’s check the results:

    SELECT * FROM avg_bid;
    
      symbol    |        avg
    ------------+--------------------
    Apple       | 199.3392717416626
    Google      | 299.40371152970334
    Elerium     | 155.04668809209852
    Bespin Gas  | 202.0260593073953
    Linen Cloth | 254.34273792647863
    

    If you re-run the SELECT statement at different points in time, you can see the updated results based on the latest data.

  5. To see the sequence of updates affecting the results over time, you can use TAIL:

    COPY (TAIL avg_bid) TO stdout;
    

    To cancel out of the stream, press CTRL+C.

Joins

Materialize efficiently supports all types of SQL joins under all the conditions you would expect from a traditional relational database. Let’s enrich the PubNub stream with some reference data as an example!

  1. Create and populate a table with static reference data:

    CREATE TABLE symbols (
        symbol text,
        ticker text
    );
    
    INSERT INTO symbols
    SELECT *
    FROM (VALUES ('Apple','AAPL'),
                 ('Google','GOOG'),
                 ('Elerium','ELER'),
                 ('Bespin Gas','BGAS'),
                 ('Linen Cloth','LCLO')
    );
    
    

    Note: We are using a table for convenience to avoid adding complexity to the guide. It’s unlikely that you’ll need to use tables in real-world scenarios.

  2. Now we can enrich our aggregated data with the ticker for each stock using a regular JOIN:

    CREATE MATERIALIZED VIEW cnt_ticker AS
    SELECT s.ticker AS ticker,
           COUNT(*) AS cnt
    FROM market_orders m
    JOIN symbols s ON m.symbol = s.symbol
    GROUP BY s.ticker;
    
  3. To see the results:

    SELECT * FROM cnt_ticker;
     ticker | cnt
    --------+-----
     AAPL   |  42
     BGAS   |  49
     ELER   |  68
     GOOG   |  51
     LCLO   |  70
    

    If you re-run the SELECT statement at different points in time, you can see the updated results based on the latest data.

Temporal filters

In Materialize, temporal filters allow you to define time-windows over otherwise unbounded streams of data. This can be useful for things like modeling business processes or limiting resource usage.

  1. If, instead of computing and maintaining the overall count of market orders, we want to get the moving count from the past minute, we’d use a temporal filter defined by the mz_logical_timestamp() function:

    CREATE MATERIALIZED VIEW cnt_sliding AS
    SELECT symbol,
           COUNT(*) AS cnt
    FROM market_orders m
    WHERE EXTRACT(EPOCH FROM (ts + INTERVAL '1 minute'))::bigint * 1000 > mz_logical_timestamp()
    GROUP BY symbol;
    

    The mz_logical_timestamp() function is used to keep track of the logical time that your query executes (similar to now() in other systems, as explained more in-depth in “now and mz_logical_timestamp functions”).

  2. To see the results:

    SELECT * FROM cnt_sliding;
    
       symbol    | cnt
    -------------+-----
     Apple       |  31
     Google      |  40
     Elerium     |  46
     Bespin Gas  |  35
     Linen Cloth |  45
    

    As it advances, only the records that satisfy the time constraint are used in the materialized view and contribute to the in-memory footprint.

Learn more

That’s it! You just got up and running with Materialize Cloud, creating your first materialized view and trying out some common queries enabled by SQL on streams. We encourage you to continue exploring the PubNub source using the supported SQL commands. You can read through the following resources for more comprehensive overviews:

Did this info help?
Yes No