Over the past year, our team at Materialize has been hard at work turning our already powerful operational data warehouse into an enterprise-grade Cloud offering. There are a lot of things we’ve learned along the way, but we want to use this post to highlight one thing in particular: networking in the cloud is hard.
This will probably be obvious to you, but an operational data warehouse is only useful if you can put data into it and then, later, read some data out of it. When you run Materialize as a binary, that process is simple enough. But when we tried to run Materialize in a customer’s private network, things got complicated. We suddenly needed to connect services in a way that’s secure, reliable, performant, and (ideally) easy to configure.
After exploring our options, we decided to solve this problem by integrating with Tailscale. Tailscale is a VPN solution based on the state-of-the-art WireGuard protocol. There are a lot of nitty gritty details that drove us to choose Tailscale, but luckily our Tailscale integration hides all that complexity from you, the user.
If you’re ready to get started with Materialize and Tailscale, check out these docs that will show you how to run Materialize Cloud in your Tailscale VPN in just a few easy steps. If you’re interested in learning more about the challenges inherent in networking in the cloud, read on.
Why is secure, reliable, performant networking hard?
Cloud products are often optimized for systems that use the request-response model. There are countless tutorials for running web servers, hosting API servers, caching content, and so on. They all assume the client exists outside a trust boundary, and that the client is initiating a connection to one or more hosted cloud services running inside a trust boundary.
We’re trying to do something very different. Our customers want their Materialize instance completely integrated with their existing data pipeline. This means Materialize needs to be able to read from the customer’s data sources - including Kafka topics, Change data capture (CDC) feeds from business-critical databases, or on-disk reference data - and to be able to push processed results out for use by other parts of the pipeline.
Servers we run as part of Materialize Cloud will need to reach into the customer’s private network and get access to these other services. This type of network communication crosses cloud accounts, trust boundaries, is bidirectional, and often moves large amounts of data over long-running persistent connections. In addition, these connections should be authenticated and encrypted as they transit untrusted networks. This is very far from the typical web-serving use cases.
For many of you reading this, this may seem like a natural fit for a VPN. The entire purpose of a VPN is to connect people and services together over an untrusted network! This is certainly an option, but implementing a VPN often comes with a lot of complexity and burden. This would not only require our customers to set up a VPN solution, but it would also require work on both the customer and Materialize’s side to continuously monitor and manage the VPN.
And this short list of complexity completely ignores the inevitable nightmare fuel: was the VPN set up correctly? Were the correct permissions given? Were there any mistakes in the configuration? Are certificates in use being rotated properly? The list goes on.
How does Tailscale solve this problem for Materialize?
To let ourselves and our customers sleep well at night, we’ve decided to let Tailscale handle all the VPN complexity for us. As mentioned earlier, Tailscale is a VPN solution based on the state-of-the-art WireGuard protocol. It supports and promotes security best practices (like automatic key rotation) out of the box. And, better yet, the team at Tailscale has made creating your own VPN as seamless as possible - they use SSO for authentication, and it only takes minutes to get started.
So, as a user of Materialize, this is all you need to do to run Materialize in your very own VPN:
- You use Tailscale to generate a one-off auth key.
- You give this one-off key to your Materialize Cloud instance.
… and that’s it! Behind the scenes, Materialize Cloud installs and configures Tailscale for you, which will cause the managed database to join your network. Not only can you connect to Materialize directly from your local machine, but you can read from your sources and sinks securely, as if everything was running together in-house. Meanwhile, all traffic is encrypted using modern cryptography standards and routed over trusted connections. If you’re already using Tailscale, you can use an auth key with an ACL tag to limit what Materialize can access in your tailnet
And, your plaintext data will not go through Tailscale servers. Your data is transferred peer to peer between your service and Materialize, only using Tailscale in the control plane to broker connections, and failover relays when direct connections aren’t possible. This allows for some of the best throughput that one can get with a VPN solution.
Try it out!
If you’ve been blocked on starting with Materialize Cloud because of secure networking, it’s time to take another look. Get started with a free trial by signing up for a new Cloud instance today! And, as always, feel free to reach out in Slack or on GitHub if you have any thoughts or feedback.