We’re excited to announce the release of Materialize v0.9! This version of Materialize has been in development for nearly two months, and is focused on hardening Materialize for production in addition to quality-of-life improvements. We invite you to continue reading to learn about our new functionalities, including exactly-once Kafka sinks, the ability to extract and use keys from Kafka messages, and improved decimal support.
Before we get into the details, we were glad to hear how excited you all were about Postgres sources! As a quick follow-up, this feature is now fully stable (ie. no longer under the experimental flag in 0.8) and also available in Materialize Cloud. For a how-to demo on Postgres sources, check out this on-demand webinar, in which engineer Petros Angelatos walks you through getting up and running with Materialize for Change Data Capture (CDC). As data changes in Postgres, you can wire it directly to Materialize to keep materialized views updated in real-time, which is useful to speed up queries in an overloaded database, or build event-driven applications.
Exactly-Once Sinks
Materialize now supports exactly-once semantics for Kafka sinks, allowing you to pick up processing where you left off after a restart without sacrificing correctness or causing disruption to downstream consumers. This feature has been under development for 6 months and comes as a result of recurring conversations with our users.
How does it work, in practice? When creating a sink, you can set the reuse_topic option as true:
With Postgres Sources you can:
- Connect to an upstream database with simple username/password authentication or with TLS authentication
- Sync the initial state of the database and seamlessly switch to streaming
- Preserve transaction boundaries across tables
- Use most common column data types
- Try Materialize out by simply running the materialized binary and pointing it to your postgres database, no extra infrastructure needed
Key Change: PubNub Sources
We now support PubNub sources. PubNub is a streaming SaaS provider that provides a set of public real-time data streams, which are useful for tests and demos, like stock market updates and Twitter streams. The new Cloud Quickstart uses a PubNub source. You can now ingest these (and your own PubNub channels) with CREATE MATERIALIZE SOURCE…FROM PUBNUB syntax.
Key Change: S3 Sources
We’ve supported S3 sources since Materialize 0.7, but for v0.8, we’re lifting the experimental flag. We expect S3 sources to be very useful in unioning old data when you only keep a window of data in Kafka, as well as with materializing a long tail of different machine-produced data from S3.
As a refresher, with S3 sources, you can:
- Connect to Amazon S3 object storage
- Specify object name filters that ensure Materialize is only downloading and processing the objects you need
- Hook in to AWS’ built-in SQS API for notifying downstream services of bucket/object changes so Materialize can ingest new objects as soon as they appear. Views defined downstream of S3 sources with SQS notifications enabled will incrementally update as new objects are added to the bucket!
- Ingest data from S3 as raw text/bytes, CSV, or JSON
- Use gzip-compressed S3 sources
Example of where an S3 source can be useful:
If you only keep recent data in Kafka but have everything in a S3 datalake, you can ingest the S3 data once before starting the Kafka stream to get the full history. In other words, you can combine live Kafka streams with the full history of events from the S3 data lake.
Once Materialize downloads an S3 object it will process each line as an event, much like any other source. Users should source S3 buckets where objects are append-only, Materialize will silently ignore deleted or updated objects in S3. Users can specify which objects should be ingested.
Key Change: Volatility
In 0.8 we introduced a new concept called Volatility, which is used to describe sources that can’t necessarily guarantee Materialize access to the exact same complete set of data between restarts. Examples of volatile sources include PubNub and Amazon Kinesis. Specifically, PubNub is a volatile source because it only provides a message queue-like stream of live events.
While it is possible to connect to volatile sources in Materialize, the system internally tracks the volatility. Upcoming features that rely on deterministic replay, like exactly-once sinks (which are now available in experimental mode), will not support construction atop volatile sources.
Key Change: Debezium Upsert Envelope
We now support Debezium’s upsert envelope, which allows inserts, updates, and deletes to Kafka data streamed to Materialize. The envelope is also compatible with Kafka’s log-compaction feature, and can be useful for users who want to ingest compacted CDC sources in Materialize.
Key Change: Temporal Filters
Temporal Filters have been graduated from experimental feature status. Temporal filters allow you to limit the memory consumption of Materialize by writing views that only retain data from certain time windows. We’re particularly excited about temporal filters because they enable a lot of commonly requested capabilities like sliding and tumbling windows without forcing the user to break out of their SQL workflow. All you really need is SQL, and the ability to refer to time, to make your data run!
Quality-of-life improvements
- COPY FROM copies data into a table using the Postgres COPY protocol
- You can set offsets for Kafka partitions
- Sort NULLs last, to match the default sort order in PostgreSQL
- New operators and functions:
- #> and #>> jsonb operators
- New SQL functions, such as pow, jsonb_agg_object, repeat and encode / decode, to convert binary data to and from several textual representations.
- New SQL functions, trigonometric and cube root operators.
- Equality operators on array data
- Upsert envelope for Debezium sources
- Default logical-compaction-window was changed from 60s to 1ms
- Removed CREATE SINK…AS OF, which did not have sensible behavior after Materialize restarted. We intend to reintroduce this feature with a more formal model of AS OF timestamps.
- round behavior now matches PostgresSQL, in which ties are rounded to the nearest even number, rather than away from zero
- Added default support for encryption-at-rest to Materialize Cloud
- Lots of performance, memory utilization, and usability improvements plus bugfixes!
For the full feed of updates, including upcoming changes, see the Materialize changelog in docs and the Stable Releases. You can install Materialize today here!
Version 0.9 will have additional bug fixes and process improvements in addition to key user-facing features, including decimals and SOC 2 Compliance for Materialize Cloud.