Announcing the Materialize Integration with Cube

Connect headless BI tool Cube.js to the read-side of Materialize to get Rest/GraphQL API's, Authentication, metrics modelling, and more out of the box.

Andy Hattemer
Andy HattemerGTM
Igor Lukanin
Igor LukaninHead of DevRel, Cube

Materialize is a different kind of database: on the surface it presents as Postgres, but internally it maintains the results of a query as a materialized view that stays up-to-date as the data changes. While a traditional database does the evaluation when a SELECT statement is issued, Materialize asks for queries upfront and computes the results on the fly as new data arrives.

As we’ll discuss more below, this makes Materialize great for scenarios where you need the complex join and transformation capabilities of a warehouse, but you want to serve the results to applications.

What is Cube?

Cube is a headless business intelligence platform that makes data accessible and consistent across every application. Cube provides data modeling, access control, caching, and a variety of APIs (SQL, GraphQL, and REST) on top of any data warehouse, database, or query engine, including Materialize.

If you’re familiar with the concepts of “headless CMS” or “headless ecommerce”, you already have the right mental model for Cube. It’s essentially a “headless BI”—say, open-source Looker with less UI and more API—that delivers consistent data to your BI tools, data apps, and notebooks.

The Integration

Integrating Cube and Materialize was a perfect example of the benefits of building in public and using well-known protocols.

Cube is open source and has an existing PostgreSQL driver. Materialize contributed a lightly modified version of the PostgreSQL driver to help Cube read from Materialize.

Why are Materialize and Cube useful together?

Data warehouses were built with the end goal of showing people data. They work best for data exploration, analysis and reporting. But just looking at the data is no longer sufficient: now we’re connecting systems (APIs, applications, tools and automation) to the data in the warehouse.

When people point systems at data warehouses, they inevitably face performance problems:

  • Not built for lookup queries
  • Not built for low latency and high concurrency
  • Reactivity is slow and time-to-insight is high

Also, there’s always a gap between how the data is stored and how the data is analyzed. Systems that consume the data (e.g., data apps or BI tools) and people working with them are most comfortable dealing with business-level metrics definitions (e.g., “revenue growth MoM by countries”) rather than calculations over the raw data in the warehouse.

With Materialize, you get warehouse-style views of your business data at application-style availability (always up-to-date, low latency queries). Combine that with Cube’s semantic layer and a variety of APIs to get a “live read API for your entire business”.

Show me an example!

Here’s how you can make Materialize work with Cube for fun and profit. Let’s build an application that consumes a live stream of Hacker News stories but only shows the ones that ranked in top 3 so you can still have plenty of free time.

Running Materialize and Cube. Both tools have managed offerings (cloud-native Materialize and Cube Cloud) but, for the sake of simplicity, we can run them in Docker. The following docker-compose.yml file will run Materialize, seed it with data, and then run Cube:

version: '2.2'

services:
  materialize:
    image: materialize/materialized:v0.26.1
    ports:
      - 6875:6875

  seed:
    image: jbergknoff/postgresql-client
    volumes:
      - .:/seed
    entrypoint: ['sh', 'seed/seed.sh']
    depends_on:
      - materialize

  cube:
    image: cubejs/cube:latest
    ports:
      - 4000:4000
    environment:
      - CUBEJS_DEV_MODE=true
      - CUBEJS_DB_TYPE=materialize
      - CUBEJS_DB_HOST=materialize
      - CUBEJS_DB_PORT=6875
      - CUBEJS_DB_NAME=materialize
      - CUBEJS_DB_USER=materialize
      - CUBEJS_API_SECRET=SECRET
    volumes:
      - .:/cube/conf
    depends_on:
      - seed

In the seed.sh file, we’ll use a vanilla PostgreSQL client to connect to the Materialize instance and run three SQL queries: the first one will connect a live PubNub stream with Hacker News stories data, the other one will transform the stream into a normalized table view with columns, and the last one will create a materialized view of the stories:

#!/bin/bash

cat > seed.sql << EOL

CREATE SOURCE hn_raw
FROM PUBNUB
SUBSCRIBE KEY 'sub-c-c00db4fc-a1e7-11e6-8bfd-0619f8945a4f'
CHANNEL 'hacker-news';

CREATE VIEW hn AS
SELECT
 (item::jsonb)->>'link' AS link,
 (item::jsonb)->>'comments' AS comments,
 (item::jsonb)->>'title' AS title,
 ((item::jsonb)->>'rank')::int AS rank
FROM (
 SELECT jsonb_array_elements(text::jsonb) AS item
 FROM hn_raw
);

CREATE MATERIALIZED VIEW hn_top AS
SELECT link, comments, title, MIN(rank) AS rank
FROM hn
GROUP BY 1, 2, 3;

EOL

psql -U materialize -h materialize -p 6875 materialize -f ./seed.sql

The only missing piece is Cube’s data model file, placed under schema/HN.js. Cube will read from the live-updated materialized view and expose metrics via its APIs.

cube(`HN`, {
  sql: `SELECT * FROM public.hn_top`,

  refreshKey: {
    every: '1 second'
  },
  measures: {
    count: {
      type: `count`
    },

    countTop3: {
      type: `count`,
      filters: [
        {
          sql: `${rank} <= 3`
        }
      ]
    },

    bestRank: {
      sql: `rank`,
      type: `min`
    }
  },
  dimensions: {
    link: {
      sql: `link`,
      type: `string`
    },

    comments: {
      sql: `comments`,
      type: `string`
    },

    title: {
      sql: `title`,
      type: `string`
    },

    rank: {
      sql: `rank`,
      type: `number`
    }
  },

  segments: {
    show: {
      sql: `${title} LIKE 'Show HN:%'`
    }
  }
});

You copy and paste these files or pick them from GitHub. Now, we can run the whole stack with docker compose up.

Exploring the data. Now, if you navigate to Cube’s Playground at localhost:4000, you’ll instantly be able to run queries against the Materialize instance with connected stream by picking measures, dimensions, segments, and filters in the UI.

Here’s how you can check how many stories we have had in the data stream (30) and how many of them have even been ranked in top 3 (no surprise, just three):

However, if you wait a few minutes and refresh the query results, you’ll see that data changes because new stories are posted to HN and existing stories change their ranks.

Then, here’s how you can combine segments and filters to pick only “Show HN” stories ranked in top 3:

Connecting to APIs. Cube exposes SQL, GraphQL, and REST APIs, each with its own query syntax. For instance, the query directly above is expressed as the following JSON query accepted by the REST API:

{
  "dimensions": ["HN.title", "HN.rank"],
  "order": {
    "HN.rank": "asc"
  },
  "filters": [
    {
      "member": "HN.rank",
      "operator": "lte",
      "values": ["10"]
    }
  ],
  "segments": ["HN.show"]
}

Let’s run this query against the REST API and extract the result set with the amazing jq utility:

curl localhost:4000/cubejs-api/v1/load -G -s --data-urlencode "query=$(cat query.json)" | jq '.data'

jq querying cube rest api

Also, if you are a GraphQL aficionado, you can use GraphQL API and an alternative query syntax:

graphiql graphql interface in cube

And, there’s one more thing—Cube’s real-time data fetch based on WebSockets is a great match to the real-time experience of Materialize. You can use that to live-feed your data from the streams through Materialize and Cube all the way down to your end-user front-end application.

Future improvements:

The current implementation of Cube’s Materialize driver makes Cube periodically poll Materialize for updates. A more effective and native for Materialize way to get data out of Materialize is subscribing to changes using the SUBSCRIBE feature. This would require more in-depth changes which we are exploring.

Try Materialize and Cube for yourself

Feel free to check out Materialize and Cube by running the example above on your machine. The full source code is available on GitHub. Also, please don’t hesitate to join Cube’s Slack community or Materialize’s Slack community to share your feedback.

To give both tools a try in a professional setting, sign up for Early Access to cloud-native Materialize and Cube Cloud. The future is real-time, let’s build it together with Materialize and Cube.

More Articles

Technical Article

Let’s talk about Data Apps

What is a Data Application? How do they help our customers? What new challenges do we face when building Data Apps? Here's our perspective.

Joaquin Colacci

Jun 9, 2022

Product Update

Materialize's unbundled cloud architecture

The `materialized` binary is stable and performant, the time has come to break it apart into separate services to enable the next phase: unbounded scale in a cloud architecture.

Frank McSherry

May 6, 2022

Technical Article

Creating a Real-Time Feature Store with Materialize

Let's use Materialize to deliver a feature store that continuously updates dimensions as new data becomes available without compromising on correctness or speed.

Seth Wiesman

Apr 25, 2022

Join the Materialize Community

Join hundreds of other Materialize users and connect directly with our engineers.

Join the Community

© 2022 Materialize, Inc. Terms of Service