May 6, 2022

Materialize's unbundled cloud architecture

The `materialized` binary is stable and performant, the time has come to break it apart into separate services to enable the next phase: unbounded scale in a cloud architecture.

Frank McSherry
Frank McSherryCHIEF SCIENTIST

Materialize: Phase 2

It's been a while since we last told you what we at Materialize are up to. You might have thought "oh, probably more of the same; fast database stuff". As it turns out, well you aren't wrong, but we still think you'll be surprised.

For the past three years we’ve focused on building Materialize as a single binary. That binary interactively serves and incrementally maintains SQL queries really well. It does it so well, in fact, that user demand is pushing us beyond the limits of our current architecture. For that reason, our entire team is working on shipping our biggest change to date: unbundling our binary into a cloud native platform built out of infinitely scalable primitives.

Starting in September, Materialize is going horizontal.

Unbounded Scale

It makes sense that when investing in a platform, you don't want to discover scaling barriers.

  • You want it to support unbounded numbers of users and sessions.
  • You want it to support unbounded numbers of data sources, with unbounded volumes and rates.
  • You want it to support unbounded numbers of views over these data.

So we figured we'd do that.

We're doing the same thing that other smart people have done: "separating storage and compute". Smart people have learned that if you decouple the storage of data from the compute acting on the data, each of the parts can scale independent of the other. New data sources can spill into cloud storage without disrupting your existing installations. New use cases can invoke new, isolated compute resources without impacting existing workloads. If you ever need more of a thing, you can get it without interrupting anyone else.

What's new here is that smart people primarily do this for batch analytics.

Architecture

To remove the limits mentioned above, we've restructured Materialize's internal architecture. There is a lot to say about this, but let's start with just a sketch.

Materialize is based around a data model of explicitly timestamped changelogs of collections.

  • All inputs are first turned into these changelogs, and are durably recorded.
  • All views translate these changelogs into exactly corresponding output changelogs.
  • All queries are performed against such changelogs at specific times.

This data model gives us confidence that we are producing correct answers to specific questions.

However, our data model also allows us to unbundle Materialize's architecture. Ingestion, computation, and querying can each be performed and scaled independently. The explicit, durable timestamps ensure we provide consistent answers even across independent components.

There are a lot of other great features that come on line when you lean in to this data model. We are absolutely going to talk you through all of them.

Timeline

You may have a pile of technical questions, which is totally fair. We'll have a pile of technical details coming up soon. The code is actually public, so you can follow along (and perhaps you have been for the past months that we've been working on it).

We're not deploying or supporting the new horizontal architecture yet, but it should be available soon. The intended experience is essentially identical to the current Materialize, except that your sources and views are backed by an elastic set of resources. There is one new fundamental concept (the CLUSTER) that represents a co-location of in-memory indexed data assets, and between which there is performance and fault isolation. Otherwise, you still just use SQL and get your answers back quickly.

I'm more excited than I can clearly communicate.

More Articles

Ecosystem & Integrations

Announcing the Materialize Integration with Cube

Connect headless BI tool Cube.js to the read-side of Materialize to get Rest/GraphQL API's, Authentication, metrics modelling, and more out of the box.

Andy Hattemer
Igor Lukanin

May 13, 2022

Technical Article

Creating a Real-Time Feature Store with Materialize

Let's use Materialize to deliver a feature store that continuously updates dimensions as new data becomes available without compromising on correctness or speed.

Seth Wiesman

Apr 25, 2022

Technical Article

Subscribe to changes in a view with Materialize

Developers have long wished for the ability to subscribe to changes in a SQL query or a view in a database. Materialize has a SUBSCRIBE primitive that makes it possible.

Joaquin Colacci

Mar 3, 2022

Join the Materialize Community

Join hundreds of other Materialize users and connect directly with our engineers.

Join the Community

© 2022 Materialize, Inc. Terms of Service