Microservices break applications into smaller, independent services enabling modular development, scalability, and easier maintenance. While these benefits are undeniable, microservices typically have their own isolated databases which complicates cross-service data access, forcing services to independently handle challenges like combining or joining data from different sources—often sacrificing consistency and increasing complexity. But what if we challenged the widely held assumption that microservices cannot expose data through a shared database?
In this blog post, we’ll explore the trade-offs of introducing a central database for cross-service data access, addressing common concerns like coupling and scalability. We’ll dive into how technologies like materialized views can mitigate these challenges by enabling efficient, consistent data sharing across services, while offering a simpler system design and minimizing implementation effort.
Whether you’re a skeptic of shared databases or just curious about modern architectural patterns, this post delivers practical insights for anyone rethinking microservices design.
Benefits and Obstacles in Microservices Design
Microservices offer clear advantages such as modularity, scalability, and agility by breaking applications into smaller, independent deployable services. Teams can develop and deploy features faster, adopt diverse technologies, and scale specific components to handle varying load.
One key aspect of microservices design is enforcing loose coupling of services through lightweight protocols such as REST, gRPC, or message queues: Each microservice exposes these well-defined interfaces which standardize communication and ensure interoperability. To achieve loose coupling, microservices typically manage their own data within isolated databases, ensuring that access to data is only possible through the defined interfaces. This design ensures teams can adapt data storage and structure internally without affecting external consumers, as long as APIs remain backward-compatible. Teams can even switch underlying database technologies without impacting other services.
But while this isolation directly contributes to the benefits that have made microservices so popular, it comes with tradeoffs. Services often need to collaborate by accessing data from other services. For example, when a user places an order on an e-commerce site, an order service may need to confirm with the inventory service that the ordered items are in stock before notifying the payment service to process the order.
In a monolithic application with a single database, this operation may be as a simple join between the order and inventory table. But it’s common wisdom that microservices must avoid using a central database directly, as it increases coupling and creates a single point of failure. Instead, services must retrieve data through appropriate APIs or by consuming state changes from other services through a central message queue.
In one scenario, the inventory service may publish updates on stock levels via an immutable message queue. For the order service to confirm item availability, it must consume these updates, maintain a local copy of the inventory state, and rebuild the supply level over time. Only then can the order service query its local database to confirm item availability.
Even when the inventory service provides a direct API for querying current stock, this process can still become cumbersome. Sending API requests introduces latency and is often asynchronous to improve throughput. But even if synchronous communication is used, a service cannot retrieve data from multiple sources at the exact same time, leading to inconsistencies or outdated results when the data is combined. E.g., by the time the order service receives a response from the inventory service and combines all relevant data to make a decision, the inventory data may already be outdated.
Although these challenges are well-understood, implementing patterns to address them often increases complexity. Additionally, each service must independently reimplement the capability to consume external data. For example, an analytics service may also want to access the inventory data and therefore needs to reimplement the consumption of data that has already been solved by the order service, wasting development resources and amplifying complexity.
How Central Databases Simplify Microservice Data Integration
Using a shared database to query data across services can drastically simplify the interactions between services. Instead of relying on asynchronous API calls or rebuilding state from event logs, data from all applications becomes immediately accessible for queries using SQL. Even performing complex operations like joins and aggregations across services become streamlined into simple SQL queries across multiple tables.
However, conventional wisdom warns that this approach introduces downsides such as tight coupling and resource contention, which are both violations of core microservices principles. But let’s examine what actually breaks when microservices expose data through a shared database—and explore potential solutions to these issues.
Imagine each service exposes a read-only copy of its data in a shared database. Services still use an internal database for their write traffic, but the data is replicated into the shared database for other services to query. This already provides benefits, such as straightforward access to cross-service data, eliminating the need for asynchronous API calls, and avoiding rebuilding state by consuming changes from message queues. The data is available to be queried with SQL, enabling even complex aggregations or joins across service boundaries.
But this apparent simplicity comes at a cost. Directly exposing internal schemas to external teams risks breaking their queries whenever schema changes occur. Just imagine what would happen if the inventory team decided to rename the stock_quantity
column to available_stock
without telling the order team. To avoid such disruptions, schemas must either remain static, or changes require careful coordination across teams—both of which hamper team agility, which is one of the promises of adopting microservice architectures in the first place.
But it doesn’t stop there. Shared databases also introduce performance bottlenecks. Services must compete for shared resources, and a poorly optimized query can degrade overall system performance. For instance, an analyst running a historical analysis of popular items might inadvertently execute a cross join, consuming all available memory and impacting other services.
In traditional microservice designs, services scale independently and enforce safeguards like throttling or blocking misbehaving clients. Achieving similar protections in a shared database environment is far more complex, in particular if multiple teams need to agree on how much resources they get allocated.
So although querying data becomes easier with a shared database, it indeed leads to a much tighter coupling between services in addition to performance and availability challenges. So let’s see how we can mitigate these downsides.
Creating Stable Interfaces with Database Views
Microservices avoid consumer-breaking changes by using clearly defined interfaces. Services can evolve their internal data models as long as the external interface remains unchanged. Even significant structural changes that might otherwise break compatibility can be made transparent by applying a mapping layer inside the service that translates the new structure into the existing interface.
We can apply a similar principle to shared databases. Instead of exposing all the internal data directly to other services, teams can share data through carefully defined database views. A view is essentially just a named query—when queried, the database replaces it with the underlying query definition that provides the mapping between the structure of the underlying data and the interface that has been agreed on for data exchange.
This approach empowers teams to control exactly what data they expose, ensuring that schema changes remain internal by updating the view’s definition. For example, the inventory team can change the internal column name to stock_quantity
while the view maps it back to available_stock
, preventing the order team’s queries from breaking. This mirrors the mapping strategies used in traditional microservices.
But although views offer flexibility, they may introduce overhead. Queries executed on views are processed dynamically when the query is issued which can affect performance, especially when view definitions are complex.
Fortunately, these limitations can be mitigated with an established database optimization: materialized views.
Optimizing Data Access with Materialized Views and Incremental View Maintenance
Materialized views are precomputed query results stored physically in a database, offering significant performance improvements for complex and resource-intensive queries. Unlike regular views, which dynamically execute the underlying query each time they are accessed, materialized views store the query results as a persistent object, allowing for rapid data retrieval and avoiding recomputation.
The support for materialized views varies across different databases. Traditional systems often require manual refreshes of materialized views or, unless very specific constraints are met, recompute the entire result from scratch for every refresh. This leads to stale results that are served between refreshes and excessive resource usage. For instance, when a single order is fulfilled that includes a single item, the stock level of that item (and only that item) decreases by one, but a refresh would still require recomputing the stock level for all items, even though their stock level did not change. At least the precomputed results can be retrieved quickly instead of recomputing them with each query execution.
Incremental view maintenance addresses these shortcomings. Instead of recomputing results from scratch, it only applies the necessary changes (inserts, updates, or deletes) from the inputs to update the result of the materialized view. As a result, it becomes feasible to apply updates continuously while they arrive in the system rather than executing refreshes on a fixed schedule. This method significantly improves the efficiency of the computation and the freshness of data, particularly in systems with frequent updates.
Together, incrementally maintained materialized views provide faster data access to even complicated queries in dynamic and large-scale environments. They allow teams to expose stable, predefined interfaces and serve as explicit data products, carefully designed by a service for external consumption. The precomputed and stable nature of materialized views eliminates the performance pitfalls associated with dynamic query execution, ensuring efficient and reliable data access.
But although incrementally maintained materialized views combine stability with performance, they cannot completely isolate workloads across services. For instance, an analyst running an unoptimized cross-join query could still consume excessive resources, impacting other services. Workload isolation requires additional strategies.
Workload Isolation through Shared Storage
Workload isolation through the separation of storage and compute is a design pattern often used in modern data systems. By decoupling storage and compute resources, systems can scale these components independently to meet the needs of diverse workloads. This separation allows multiple compute clusters to operate on the same underlying data while avoiding resource contention.
Systems like Snowflake and Apache Spark implement this pattern. We can apply a similar approach to incrementally maintained materialized views. Instead of storing materialized view results in a single database, they can be stored in shared object storage, enabling access across clusters. Each team can then use a physically isolated cluster, ensuring resource usage remains siloed.
In this setup, resource-hogging queries, such as the runaway cross-join from an analyst, might exhaust only their cluster’s resources. Critical queries from services like inventory or order processing remain unaffected. This architecture preserves the independence and scalability of microservices while enabling centralized and simplified data access.
This approach retains the best aspects of microservices—scalability and isolation—while significantly reducing complexity in data-sharing workflows.
Microservices Data Integration with Materialize
So far, this discussion has been largely theoretical. However, the tools to realize this architecture already exist. At Materialize, we have built an operational data store that provides all the necessary building blocks: native connectors to source databases and message queues, incrementally maintained materialized views, use-case isolation via separated storage and compute layers, and strict serializability to ensure consistent, trustworthy query results.
Here’s how the architecture works when using Materialize as a central data store.
The inventory and order service consuming change data capture events directly from its database’s replication slot to create a materialized view representing the latest inventory and order information. These materialized views are exposed as data products, making them available for SQL queries across teams. For example, the order service can use the inventory data product to confirm stock levels during order processing. Analysts and other teams can consume and combine these data products to create new derived data products, like joining inventory and order data products to track trends in order fulfillment.
This architecture enables teams to focus on core business logic without worrying about the complexities of data access and sharing. They can consume live data products using SQL, a widely understood and declarative language. With Materialize, they no longer need to compensate for eventual consistency or implement workarounds to ensure correctness. And they no longer need to waste effort building bespoke services to consume and process data from different sources.
Materialize fits seamlessly into existing microservices architectures. You can start small, exposing only a few data products while keeping most incumbent services unchanged. For instance, the inventory service could keep publishing inventory updates to a message queue. But instead of having multiple services rebuild inventory levels from raw events, the inventory team would define a materialized view that consolidates these updates into the current inventory level that is easily consumable by everyone who is interested. In this way, it’s possible to query data for the inventory service inside Materialize but the inventory service itself remains unaltered.
Redefining Microservices Data Integration with Materialize
Integrating a centralized database like Materialize into your microservices architecture can simplify data sharing while maintaining core principles of loose coupling, scalability, and fault isolation. This architecture, also referred to as an operational data mesh, leverages tools like incrementally maintained materialized views and the separation of storage and compute not only preserve microservices’ autonomy but also enhance data accessibility and consistency.
Materialize empowers teams to streamline operations, reduce complexity, and unlock real-time insights with minimal overhead. Whether you’re struggling with cross-service data access or future-proofing your architecture for scale, Materialize provides a practical and efficient solution.
Curious to see how Materialize can transform your data architecture? Schedule a demo with our team today, or explore our detailed resources to learn more about simplifying data integration in a microservices world.