What's new in Materialize? Vol. 2

March 1, 2022

Marta Paes

Sr. Product Manager

Nicolle Eagan

Senior Product Manager

So much is happening in parallel as we embark on a new, exciting phase of product development at Materialize. Eager to see what’s in store? Scroll all the way down to What’s next? 👀.

In the meantime, and to keep you up to speed with what’s happening right now, we're back with a second round of updates! We’ll cover Materialize Core v0.13.0 to v0.21.0, some work coming up to bring the dbt-materialize adapter to the next level and our partnership with Tailscale in Materialize Cloud. For further details on a specific version of Materialize (like breaking changes and bug fixes), check out the release notes!

Materialize Core

Sources and Sinks

Assuming roles in S3 and Kinesis Data Streams sources

To improve the integration with AWS-based sources, Materialize can now assume roles and profiles with the right permissions from credential files (v0.20.0). For an overview of the credentials provider chain, check out the documentation for the S3 and Kinesis Data Streams sources.

PostgreSQL source improvements

With the ultimate goal of moving the PostgreSQL source connector out of beta, we carry on working to harden it for production. Some recent improvements that get us closer to that goal are:

Non-materialized sources (v0.18.0) : materializing the source is no longer required, which lifts the previous limitation of having to provision enough memory in Materialize to hold all synced tables. With these changes, you can create a source that captures changes in your upstream PostgreSQL database, define any given number of intermediate (non-materialized) views to shape and transform the raw data, and then materialize only what you want to keep around in memory.
Faster snapshot loading (v0.20.0) : the step responsible for the initial sync of the tables in the publication was refactored to speed things up (#10299). If you’ve previously run into performance bottlenecks during the snapshotting step, we’d love to hear how this change improved your loading times!

For a refresher on how the source connector works, check out the updated documentation and the Change Data Capture (CDC) guide.

Confluent Schema Registry SSL options

Prior to v0.20.0, the SSL parameters for the Confluent Schema Registry (CSR) defaulted to whatever parameters were provided for the Kafka broker. There are now dedicated CSR parameters that must be provided explicitly (see Confluent Schema Registry options), allowing you to use Materialize in environments where the broker and schema registry use different SSL options.

🤟 Thanks to Alvin Khaled (@aakside) for kickstarting the conversation leading to this change!

SQL

SELECT statements in TAIL

As TAIL becomes more central to application use cases, we’ve been focusing on making its behavior more predictable as well as quality-of-life improvements. From v0.20.0, you can directly embed an arbitrary SELECT statement in the TAIL command and skip creating (and handling) intermediary objects. This allows you, for example, to dynamically apply filters server-side and spare the client some work:

1 TAIL (SELECT * FROM user_actions WHERE user_id = $1) 
 sql

First time hearing about TAIL? We’ve recently published a demo that walks you through combining its power with GraphQL subscriptions for infrastructure performance monitoring. Check it out!

jsonb subscripting

From v0.16.0, you can use array-style subscripting to extract array elements from jsonb columns as an alternative to the standard operators (like -> and ->>). This notation was introduced in PostgreSQL 14 [1] and makes it a little saner to manipulate deeply nested data from JSON sources:

SELECT ('{"1": 2, "a": ["b", "c"]}'::jsonb)['a'][1]; 
jsonb 
------- 
"c" 
 sql

If you plan to use subscripting, it’s worth noting that the output type of the subscript operation is always jsonb (or, equivalent to using the -> operator), which has some quirks around string comparison and null references.

Ecosystem

Hack Day 🎉

We recently ran our first Materialize+dbt+Redpanda Hack Day! If you missed it, you can still play around with the sample project and get a taste for what building a streaming analytics pipeline with this stack looks like.

dbt

When the first version of the dbt-materialize adapter was released, Materialize was still in its early days. We’re now picking up speed in the integration to make the experience smoother and more true to dbt best practices. Starting with the materialize-dbt-utils package, we’ve expanded the set of macros and integration tests supported. We’ve also started exploring how to evolve the adapter (#10600), so you can expect some updates soon!

🤟 Shoutout to Amy Chen (@amychen1776) and Jeremy Cohen (@jtcohen6) from dbt Labs for their feedback and support along the way.

DBeaver

As we continue working on our coverage of pg_catalog tables and psql macros, we’re unlocking integrations with more tools in the ecosystem. DBeaver is a popular open-source SQL CLI and has been a common ask in the community, so we’re glad to share that you can now use it with Materialize v0.18.0+ (through the PostgreSQL driver).

You can also connect DBeaver to a Materialize Cloud instance using the provided SSL certificates:

Materialize Cloud

Deployment

Secure networking with Tailscale

We’ve recently announced a partnership with Tailscale to bring secure networking to Materialize Cloud. All you need to do is generate and provide a one-off Tailscale auth key to your Materialize Cloud instance, and we’ll take care of installing and configuring the service in the background so that the instance can join your private network. This allows you to keep all the moving parts of your streaming pipelines nicely bundled and secure, as all traffic is encrypted and routed over trusted connections. For a deeper dive into the integration, check out Introducing: Tailscale+Materialize and the Cloud documentation.

What’s next? 👀

We have two big (and we mean BIG) development threads underway, as we enter the phase of making Materialize a true cloud-native SQL platform: seamless horizontal scalability [2] and high-availability guarantees. You can read through the initial architecture and user experience design documents to get an idea of the direction we’re taking, but we’ll be publishing an updated roadmap blogpost soon!

While we get ready, a reminder: Materialize Cloud is in open beta, so you can sign up and have a look around! If you take any of the new features for a spin, or if you’re just getting started with Materialize, we’d love to hear from you in our Slack community!

[1] Crunchy Data: Better JSON in Postgres with PostgreSQL 14 [2] Yup, decoupled storage and compute in Materialize is coming (and sooner than you might think)!

If @MaterializeInc manages to decouple storage from compute it’s going to be very, very useful. More like 2023, I think

p

— Stephan Seidt (@seidtgeist) January 22, 2022

Marta Paes

Sr. Product Manager, Materialize

Marta was previously a contributor to Apache Flink. Before finding her mojo in data streaming and community, she worked as a DWH Engineer for 4+ years at Zalando and Accenture.