Strategies for Reducing Data Warehouse Costs: Part 2

As we discussed in our first post of this series, there are a number of ways you can save money on your data warehouse bill. That post highlighted incrementally maintained views and offloading real-time use cases.

But cost reduction will depend on your use case, so let’s examine more ways you can save money. In our latest white paper — Top 6 Strategies for Reducing Data Warehouse Costs — we also highlight data mesh and normalized data models as methods for reducing data warehouse bills.

Read the following blog to learn what data mesh and normalized data models are, and how they can save you money.

3. Data Mesh

For decades, most data architectures were designed as centralized, top-down infrastructures with a single controlling entity. This entity — typically a data team — implemented the data governance for all business domains, from sales, to marketing, to product.

In this paradigm, costs quickly ballooned. The price of extracting, transforming, and delivering data to business users in their preferred format becomes expensive across so many pipelines, databases, and other infrastructure. In this system, it became difficult to produce, catalog, and find high quality data.

This old paradigm also disempowers domain experts who know their teams best. For instance, instead of allowing the marketing team to control and govern its own attribution data, centralized data architectures outsourced this responsibility to data team members who lacked subject matter knowledge.

Recently, new paradigms such as data mesh have emerged. Data mesh is a decentralized data architecture that organizes data by business domain, rather than as a central, monolithic infrastructure.

With data mesh, domain experts can set the governance policies and protocols that work best for their teams. Team members can access data through self-service and iterate on data quickly, rather than waiting on data experts.

By giving business users direct access to their data, the data mesh framework removes costs associated with older architectures. Some of the ways you can save money with a data mesh architecture include:

Data belongs to business owners - Business owners control their own data, reducing friction between IT and business users. Teams can deliver high quality data products in less time.
Data is easier to find - Data cataloging and data quality tools make it easier for business users to find high quality data.
Automated data governance - Federated computational governance automates data governance, producing high quality data with less manual effort.
Eliminates redundancy - With a holistic view of the business teams, you can eliminate duplicative data and processes.

This last point is important. If you have several different teams working in a decentralized architecture, the cost of duplicative data, queries, and other processes can rapidly grow in your data warehouse.

But with Materialize, teams can leverage a flexible operational data store to power a decentralized data architecture. Teams can collaborate across different business units to build data products, all without raising costs.

Here’s an example. With Materialize, you can share materialized views across multiple clusters.

For instance, let’s say Team A has a materialized view that Team B wants to incorporate into a real-time data product.

Both teams have their own clusters. However, Team B can leverage Team A’s precomputed materialized view without having to use their own compute resources.

Team A can publish their data product as a materialized view. Team B can read those results in their own cluster. Then Team B can implement those results in their own real-time data product.

This allows Team B to avoid running duplicative queries, thus saving money on compute costs. This is what operationalizing the data mesh looks like.

If done correctly, the cost savings of data mesh are real. However, an inefficient implementation can lead to a higher data warehouse bill.

That’s why you need a malleable data warehouse such as Materialize to unlock the full potential of decentralized data architecture.

4. Normalized Data Models

Normalization organizes database tables into smaller, more manageable units to reduce data redundancy and improve data integrity. This helps eliminate data duplication, so you can avoid anomalies when you insert, update, or delete data.

Normalized data is divided into normal forms, or rules that ensure data accuracy and consistency. The most implemented normal forms include:

1NF - Eliminates duplicate dependencies
2NF - Removes partial dependencies
3NF - Eliminates transitive dependencies

For example, let’s say you have a table that tracks customer orders. You can normalize the table by breaking it into smaller tables, for customer address, name, order, and so on. The tables are linked together with a foreign key.

By removing data redundancy and leveraging smaller tables, normalized data enables you to construct more efficient and pliable data models. Normalized data makes modeling a business much easier.

However, joins are more difficult with normalized data. Joining a lot of tables all at once can negatively impact query performance. Queries that join many different tables require more time and CPU resources to calculate results. This can not only delay computation, but can also drive up compute costs.

Large joins become a scaling issue for normalized data models. This leads to performance sacrifices with the data model. To add joins, teams must engage in time-consuming revisions of their models. As a result, data models suffer from limitations, and require more work for teams to build.

To fix this performance issue, some teams attempt to denormalize data. Denormalization combines data into larger tables, which can reintroduce redundant data. This is essentially the opposite of normalization.

Queries can access the data faster, improving query time and performance. This, in turn, can lower costs. However, denormalized data reintroduces redundant data and other anomalies, which can corrupt data modeling.

Materialize offers the best of both normalization and denormalization. With Materialize, you can keep all the benefits of a normalized data model, while still performing large joins.

With Materialize, you can execute streaming joins. Materialize ingests real-time data, and updates materialized views (containing your JOINs) every time the underlying data for the query changes.

This enables you to incrementally join data as it streams into Materialize, instead of all at once upon query execution. By incrementally maintaining joins, Materialize eliminates the scaling issues associated with large joins.

Materialize allows you to maintain a disciplined, normalized data model without making performance sacrifices in terms of joins. When business requirements change, you can simply add another join to the view instead of revising the entire data model.

This extensibility empowers you to build flexible, powerful data models that take your business analysis to the next level.

Save Money on Your Data Warehouse Bill with Data Mesh and Normalized Data Models

With data mesh and normalized data models, you can save money on your data warehouse bill. Leverage Materialize to decentralize your data architecture and build flexible data models while reducing costs.

To discover more strategies for reducing data warehouse costs, check out our new white paper: Top 6 Strategies for Reducing Data Warehouse Costs. Download the free white paper now to learn how our customers are saving money.

Strategies for Reducing Data Warehouse Costs: Part 2

Transform, Deliver, and Act

Related Posts You’ll Love

3. Data Mesh

4. Normalized Data Models

Save Money on Your Data Warehouse Bill with Data Mesh and Normalized Data Models

More Articles

Strategies for Reducing Data Warehouse Costs: Part 3

Announcing our new CEO: Nate Stewart

Building a MySQL source for Materialize

Get Started with Materialize