important
Note: When this article was published, Materialize used TAIL
syntax for the query subscription primitive. Code snippets and references have been updated to reflect the current SUBSCRIBE
syntax. For more info, see SUBSCRIBE
Docs
Most of the internet is built on a “pull” or “request” paradigm: A user loads a page or takes an action, a backend does some work and sends a response. Job done.
Even when we broke out of the webpage model and built JavaScript applications, it was still requests and responses, albeit smaller and more frequent ones. The tooling for the “request” model is ubiquitous, well understood and mature.
On the other side, there’s a growing category of applications and features that need a “push” or “subscription” paradigm. For example: live dashboards, multi-player and collaboration tools, notifications, and automation. We use words like “magical” or “smart” to describe these kinds of applications when they’re done well, but it’s generally only the biggest players, with their massive engineering teams, that are doing it well.
Having spent the last few years researching and building products in this “push” category, it is clear that part of the challenge is that our tools are lacking:
On the one side, there are databases where the only option is to “poll” via repeated queries. Depending on the case, we might need to slow things down, or waste resources querying and processing the same data over and over again when we only need a few updates as they occur.
On the other side, we have infinite variations of message queues, stream processing frameworks and custom application logic that can get the job done, but at a high cost in terms of engineering hours and maintenance burden due to their complexity.
When I first encountered Materialize, many things stood out, but the watershed moment was finding the feature called SUBSCRIBE. It is a powerful query command that can capture updates in sources, views, tables or an arbitrary SELECT statement. As Materialize adds, removes, and updates new rows, a subscription will send us a comprehensive list of updates.
Subscription Example
As an example, let’s look at some temperature data and display the average values for the past minute:
-- Our source table
CREATE TABLE temperatures(
temperature numeric,
updated_at numeric
);
-- Windowed aggregation
CREATE MATERIALIZED VIEW avg_last_minute_temperature AS
SELECT
DATE_TRUNC('second', to_timestamp(updated_at / 1000)) as ts_second,
AVG(temperature)
FROM temperatures
WHERE (updated_at + 60000) > mz_logical_timestamp()
GROUP BY ts_second;
As new temperatures are inserted into the table, and each time a temperature update is no longer in the last minute, a change will occur in the materialized view, leaving any already executed “pull” query less reflective of what the actual data says. A subscribe command will send updates as fast as the view processes the new temperatures, keeping those who request the information up to date.
Let’s start a subscription and visualize the behavior using a table to represent the temperature updates and a chart to display the temperature in time as it is received:
-- Subscribe command
-- Shorthand for 'SUBSCRIBE (SELECT * FROM avg_last_minute_temperature) ...'
SUBSCRIBE avg_last_minute_temperature WITH (SNAPSHOT = false);
Sounds pretty simple and predictable, right? Indeed, this is not all.
Subscriptions with Snapshots
Suppose that a temperature application needs to bootstrap effectively. Using only the latest updates wouldn’t be enough to display the whole last-minute information. A query command should be run aside from a subscribe command to retrieve everything, but this could lead to possible out-of-sync data since one command can start or finish after the other within a few milliseconds difference. Here is where a subscription can turn itself into a more powerful one: subscription with (snapshot)
Since the temperature changes fast another similar scenario would be pending on the updates to display the current degree—but this makes it sensitive to disruptions and delays. Requesting the snapshot can give you a better experience retrieving all the data you already have in place and keep the subscription for the following updates.
After a while, it’s possible to establish a comparison between these two approaches (with and without using a snapshot):
-- Updated subscription command
SUBSCRIBE avg_last_minute_temperature WITH (SNAPSHOT);
While the standard subscription (left) is only receiving the new updates, the one with the snapshot (right) can also request the last minute of data available in the materialized view (grouping the last minute temperature by second) in one command.
Besides the comparison, now is the moment to notice the big difference between the traditional recurrent query and a single query with a subscription to the updates. The subscribe with a snapshot could out-perform it with low effort.
Subscriptions with custom Compaction Windows
Let’s keep going deeper; are there more ways to improve the output of our SUBSCRIBE command? Imagine getting more detailed historical updates in our data to understand behaviors, troubleshoot, or even accurately represent our information. In such cases, compaction will reduce this detailed history of updates to be more resource-efficient, but it is possible to tweak it and trade with the use case needs.
Back to our example, after creating a new view with a custom index, where the compaction tuning occurs, it is possible to know the maximum and minimum temperature at the last minute for a given point of time.
Let’s see how it works:
-- Windowed aggregation as a VIEW
CREATE VIEW avg_last_minute_temperature AS
SELECT
DATE_TRUNC('second', to_timestamp(updated_at / 1000)) as ts_second,
AVG(temperature)
FROM temperatures
WHERE (updated_at + 60000) > mz_logical_timestamp()
GROUP BY ts_second;
-- Indexing view (Materializing) with a custom compaction
CREATE INDEX avg_last_minute_temperature_idx
ON avg_last_minute_temperature (ts_second)
WITH (logical_compaction_window = '1minute');
If all the different types of subscriptions were running at the same time:
-- New subscribe command using compaction
SUBSCRIBE avg_last_minute_temperature WITH (SNAPSHOT) AS OF NOW() - INTERVAL '1 minute')
Since less compaction reveals how the average temperature fluctuates for a particular second, it can also express the maximum and minimum average value per second.
To better understand the difference between these three options, and what you’d use when, let’s imagine we wanted to trigger some alerting based on our temperature data:
- Subscribe: As a trigger for events when the temperature is greater than fifty.
- Subscribe with Snapshot: Same as the single subscription but also checks past events in the last minute where the temperature was greater than fifty.
- Subscribe with Snapshot and compaction: Same as Subscribe with Snapshot but with more details about how the temperature changed in time before the alert.
These behaviors translate to other use cases, like logs, stock, pricing, etc. As well as other more general subscribe features like: - Micro batches using PROGRESS. - Point of Recovery for subscriptions using progress and compaction.
At this point, we’ve seen how flexible subscriptions can be. It is an efficient and realistic way to represent and transmit our data and its real-time behavior, resulting in more natural and consistent architectures, services, and experiences.
For those looking for more stuff. Timely Dataflow and Differential Dataflow are Materialize underlying technologies.
More details on creating a simple and efficient real-time application using subscribe command.