For data processing systems, there is an ongoing tension between control complexity and data throughput. Control paths need coordination and correctness guarantees, while data paths need bandwidth and efficiency. When these concerns get tangled together, both suffer.
I recently worked on a feature that illustrates this tension nicely: lifting the result size limitation for SELECT
queries in Materialize. On the surface, this is about allowing users to retrieve larger result sets. But the more interesting story is how we achieved this by better separating the control and data planes.
The Problem
Previously, all query results flowed through Materialize’s compute protocol from clusters back to environmentd
, our coordinator process. This meant that:
- Large results would “clog up” cluster-to-controller communication
- Results had to be fully materialized in
environmentd
memory before streaming to clients - The coordinator’s memory budget became a hard limit on query result sizes
This is a classic example of control and data concerns getting entangled. The coordination needed for query processing was forcing all data through a bottleneck designed for control messages.
The Solution: Out-of-Band Data Transfer
The solution was to create a “peek stash” system (SELECTs are internally called peeks) that routes large results through an entirely different path. When a query result exceeds a configurable threshold, instead of sending it through the compute protocol, we:
- Write the results to persist (our storage layer) as temporary batches
- Send back metadata about where to find the data
- Stream the results directly from persist to the client
This approach uses persist’s blob store for what it’s good at: efficiently storing and retrieving large amounts of data. The compute protocol continues to handle what it’s designed for: coordination and small control messages.
Implementation Details
We kept the existing code path for sending results through the control protocol and made the switch to the new system happen automatically based on result size. When a query starts returning results, we use the normal control path. But if the results grow beyond a certain threshold, we switch to out-of-band transfer on the fly.
This switch is seamless—no user configuration or awareness needed. The system just detects when results are getting too large and reroutes them to persist instead.
The work of writing to persist happens in the background, so the compute thread stays free to continue processing other parts of the query. This keeps the system responsive while handling large results efficiently.
Architecture Benefits
This change represents a broader architectural principle: decouple control and data paths wherever possible. The compute protocol is designed for coordination messages that need ordering guarantees and immediate processing. Large result sets are just data that needs to get from point A to point B efficiently.
By routing these different types of traffic through appropriate channels, we get:
- Better isolation: large queries don’t interfere with cluster coordination
- Better scalability: data bandwidth is no longer limited by control path capacity
- Better resource utilization: persist is optimized for large data transfers
This follows the same decoupling principles we’ve applied elsewhere in Materialize’s architecture. Storage and compute are separated. Read and write paths are independent. Now control and data transfer are properly isolated.
Broader Implications
This pattern shows up in many distributed systems. Consider how modern object stores separate metadata operations from data transfer, or how CDNs route content delivery separately from origin coordination.
The temptation is often to route everything through a single, well-understood path. But as systems scale, the intersection of different traffic patterns becomes a bottleneck. The solution is usually not to make that single path faster, but to recognize that different types of traffic have different requirements and should use different infrastructure.
For us, this is just the beginning. The same out-of-band transfer mechanism can be used for SUBSCRIBE
results, in our write paths, and other (potentially yet unknown) high-bandwidth data flows. By establishing the right abstractions, we’ve created reusable building blocks for future features.
Sometimes the most important part of a feature isn’t what it enables directly, but how it changes the underlying architecture to enable better things in the future.