We’ll be in attendance at dbt’s upcoming Coalesce 2024 conference next week, and we look forward to seeing you there! Our very own Steffen Hausmann — Field Engineer at Materialize — will speak with Wolf Rendall — Director of Data Products at Vontive — about how to build real-time loan underwriting systems with dbt.

Vontive used PostgreSQL with dbt to power its loan underwriting operation. PostgreSQL is a reliable, versatile operational database that works well with dbt, making it a popular choice for teams handling both transactions and analytical models. By performing analytics directly on your transactional data, you ensure access to the freshest possible information.

However, as your data and queries grow more complex, you may start to notice PostgreSQL hitting performance limits. PostgreSQL is optimized for transactional workloads and data-intensive queries involving multiple joins and aggregations can slow things down. 

In this guide, we’ll explore why many teams rely on PostgreSQL for analytics, the challenges they face as queries become more complex, and how Materialize solves these problems by offering real-time access to fresh data, eliminating performance trade-offs.

PostgreSQL often ends up as the default choice for a first analytics pipeline because it’s already the backbone for transactional data—customer information, orders, and business records—critical for day-to-day operations.

By running analytics directly on PostgreSQL, teams avoid the complexity of building data pipelines to move data between different systems. Using dbt with PostgreSQL enhances this by integrating data models and transformations into a familiar and maintainable workflow.

Initially, this setup works well. PostgreSQL handles basic queries and small-scale transformations efficiently. But as your data volume increases or queries become more complex (e.g., multi-table joins, aggregations, or window functions), performance bottlenecks can arise.

The Challenges of Complex Analytics in PostgreSQL

PostgreSQL is optimized for transactional workloads—storing, retrieving, and updating rows in real time. However, performance drops when it comes to complex analytical queries.

To avoid slowing down write traffic on the primary database, many teams offload these read-heavy analytical queries to operational data stores like PostgreSQL read replicas. While this strategy helps keep the primary node focused on transactions, it introduces its own set of challenges.

This is either because results are refreshed periodically, causing delayed updates, or because the execution engine isn’t optimized for complex analytical queries like joins, aggregations, and window functions.

These limitations make real-time decision-making difficult and often require costly infrastructure, especially as data scales and query complexity increases. Many dbt-postgres users turn to incremental models to handle this, processing only new or changed data instead of recalculating everything from scratch.

A common approach to reduce the burden of updating complex queries is to process only new or changed data, instead of recomputing results from scratch. Many users building analytics pipelines on PostgreSQL use dbt to manage their data models, and make it easier to maintain the logic required to perform periodic incremental updates.

However, managing incremental models can quickly become complex and fragile, especially with late-arriving data or multiple runs. Here’s a simplified example of tracking total sales by customer using an incremental model:

sql
-- models/customer_revenue_incremental.sql
{{ config(materialized='incremental', unique_key='customer_id') }}

WITH latest_orders AS (
    SELECT * FROM orders WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
),
updated_totals AS (
    SELECT customer_id, SUM(order_total) AS total_sales FROM latest_orders GROUP BY customer_id
),
existing_totals AS (
    SELECT customer_id, total_sales FROM {{ this }} WHERE customer_id NOT IN (SELECT customer_id FROM updated_totals)
)
SELECT * FROM updated_totals
UNION ALL
SELECT * FROM existing_totals;

This approach works, but it pulls you away from SQL’s simplicity by introducing logic to manage row-level changes and ensure consistency. Over time, this added complexity increases maintenance overhead.

Materialize: A Modern Operational Data Store

Materialize takes a different approach. Rather than relying on periodic refreshes or struggling with stale replicas, it continuously ingests and processes data as it arrives, so your queries always run on the most up-to-date data.

This eliminates the trade-off between fresh data and query performance. Materialize ensures you can run complex queries with low latency, without affecting write traffic on your primary database.

For dbt-postgres users, this means no more need for incremental models or manually managing data freshness. Materialize’s incremental view maintenance (IVM) keeps views up to date automatically as new data arrives, allowing you to run fast, accurate queries without the complexity of maintaining row-level updates. Here’s the same revenue tracking query in Materialize:

sql
-- models/customer_revenue.sql
{{ config(materialized='view', indexes=[{'columns': ['customer_id']}]) }}

SELECT
    customer_id,
    SUM(order_total) AS total_sales
FROM orders
GROUP BY customer_id;

Materialize will automatically update this model as new orders come in or existing orders are changed or deleted, so there is no need for manual incremental logic. As your operational data store, Materialize delivers fast, fresh results without the added maintenance.

Efficient Analytics with Indexed Views

One key feature that sets Materialize apart is its ability to index views, not just tables. This means even complex queries with joins or aggregations can benefit from low-latency execution. These indexed views automatically stay up to date as the underlying data changes, ensuring fast, accurate results with minimal operational overhead.

This efficiency can translate into significant cost savings, especially for applications where real-time data is critical. In some workloads, Materialize has demonstrated up to 94% cost savings compared to PostgreSQL read replicas for data-intensive use cases.

PostgreSQL Compatibility: Seamless Integration and Real-Time Data Ingestion

Migrating from PostgreSQL to Materialize is straightforward. Since Materialize is compatible with PostgreSQL, most existing queries work without modification.

Materialize ingests data directly from PostgreSQL using standard replication slots, ensuring that your views and queries are always up to date in real time. If you’re using dbt, you can easily update your project to use the dbt-materialize adapter with just a few minor adjustments, which are outlined in our migration guide.

Materialize also supports cross-database joins, enabling you to aggregate data from multiple sources—whether it’s other PostgreSQL or MySQL databases, Kafka topics, or SaaS apps—without the need for complex data pipelines.

The Future of Operational Analytics with Materialize

Materialize offers a way for businesses to overcome the limitations of PostgreSQL when it comes to complex queries and fresh data.

Materialize helps you get more out of your data without additional infrastructure or complexity by simplifying the management of models and providing fresh, low-latency analytics.

It’s a natural evolution for teams looking to scale their analytics without sacrificing simplicity or speed. To see why Materialize is ideal for running complex analytics, try a free trial today.

Try Materialize Free