Streaming systems consume inputs and produce outputs asyncronously: the output of a system at any moment may not reflect all of the inputs seen so far. These systems provide various guarantees about how their outputs relate to their input. Among the weaker (but not unpopular) guarantees is eventual consistency. Informally, eventual consistency means if the […]
I have some thoughts on the use of Rust for data-intensive computations. Specifically, I’ve found several of Rust’s key idioms line up very well with the performance and correctness needs of data-intensive computing. If you want a tl;dr for the post: I’ve built multiple high-performance, distributed data processing platforms in Rust, and I never learned […]
How do you build a streaming database from scratch?
This is an edited transcript and video of a talk that I gave at Carnegie Mellon’s Database Group Seminar on June 1st, 2020, hosted by Andy Pavlo. You can watch it, or read along! Introduction and Background First things first, you can download Materialize right now! For our agenda today, I’m first going to talk […]
We recently announced Materialize, a real-time streaming SQL database that powers production applications. The latest release of Materialize, version 0.3, is now available. Materialize lets you ask questions in real-time of your streaming data and get the latest answers back in milliseconds — offering the power and flexibility of a SQL database, but scaling to […]
Self-compacting dataflows Those of you familiar with dataflow processing are likely also familiar with the constant attendant anxiety: won’t my constant stream of input data accumulate and eventually overwhelm my system? That’s a great worry! In many cases tools like differential dataflow work hard to maintain a compact representation for your data. At the same […]
In-order reliable message delivery is not enough. Showing views over streams of data requires thinking through additional consistency semantics to deliver correct results.
“Upserts” are a common way to express streams of changing data, especially in relational settings with primary keys. However, they aren’t the best format for working with incremental computation. We’re about to learn why that is, how we deal with this in differential dataflow and Materialize, and what doors this opens up! This post is […]
Materialize lets you ask questions about your streaming data and get fresh answers with incredibly low latency. Since announcing Materialize’s v0.1, we’ve briefly covered its functionality and how one might use it at a high level. Today we’re going to get a little more granular and talk about Materialize’s internals and architecture. The Big Picture […]
Trying out Materialize This post will also be available at my personal blog. We all at Materialize are working from home, and while this is all a bit weird and different, it gives me some time to write a bit more and try and show off some of what we have been up to. I […]