Performance Benchmark: Aurora PostgreSQL vs. Materialize

Seth Wiesman
August 12, 2024

This blog examines the performance of Materialize vs. Aurora PostgreSQL read replicas for computationally intensive workloads. We demonstrate that Materialize outperforms Aurora for complex queries over relatively small data volumes.

Specifically, for the same on-demand cost, Materialize delivers 100x greater throughput with 1000x lower latency. And unlike other solutions that offload computation from OLTP databases, Materialize does so without sacrificing correctness or requiring external change data capture (CDC) tools to move data between systems.

Read on to learn what our benchmark tests uncovered about Materialize and Aurora PostgreSQL.

Aurora PostgreSQL: Not Designed for Complex, Read-Intensive Queries

OLTP databases like Aurora PostgreSQL are the backbone of modern applications. They excel at handling vast amounts of transactional operations, ensure data integrity, and deliver fast, atomic transactions.

However, these strengths can become limitations. Intricate joins, aggregations, and data transformations that modern applications thrive on can bog down an OLTP system, leading to performance bottlenecks and a degraded user experience.

To mitigate this load, many organizations implement read replicas to distribute read operations across multiple database copies. While this offloading can help with performance, it does not fully address the challenges posed by complex queries. Read replicas are designed for straightforward read operations and often struggle with tasks involving multiple joins, large aggregations, and otherwise non-trivial data transformations.

Materialize offers a radically different approach by providing efficient incremental computation. Unlike traditional read replicas, Materialize shifts the computational burden from the read phase to the write phase by precomputing views and incrementally materializing results.

This ensures data remains fresh and up-to-date, delivering low-latency results. Materialize transforms how we handle complex queries—making them faster while maintaining consistency—and ultimately solves the challenges that traditional read replicas and caching mechanisms cannot.

Benchmarking Use Case: Dynamic Pricing for an Online Retailer

Now that we’ve explained the conceptual differences, let’s perform a quantitative comparison of Materialize and Aurora PostgreSQL. We’ll use dynamic pricing for an online retailer as a sample use case.

In this use case, the price of any given item fluctuates based on available inventory, snap promotions, popularity, and other factors. The goal is a scalable solution that offers low latency and always displays the current price for an item.

The logic for dynamic pricing is encapsulated in a standard SQL view definition, which references six tables [1]. The entire database contains only 1GB of data across all tables. You can see the code for the view below:

1
CREATE VIEW dynamic_pricing AS
2
WITH recent_prices AS (
3
    SELECT grp.product_id, AVG(price) AS avg_price
4
    FROM (SELECT DISTINCT product_id FROM sales) grp, 
5
    LATERAL (
6
        SELECT product_id, price 
7
        FROM sales
8
        WHERE sales.product_id = grp.product_id 
9
        ORDER BY sale_date DESC LIMIT 10
10
    ) sub
11
    GROUP BY grp.product_id
12
),
13

14
promotion_effect AS (
15
    SELECT 
16
        p.product_id,
17
        MIN(pr.promotion_discount) AS promotion_discount
18
    FROM promotions pr
19
    JOIN products p ON pr.product_id = p.product_id
20
    WHERE pr.active = TRUE
21
    GROUP BY p.product_id
22
),
23

24
popularity_score AS (
25
    SELECT 
26
        s.product_id,
27
        RANK() OVER (PARTITION BY p.category_id ORDER BY COUNT(s.sale_id) DESC) AS popularity_rank,
28
        COUNT(s.sale_id) AS sale_count
29
    FROM sales s
30
    JOIN products p ON s.product_id = p.product_id
31
    GROUP BY s.product_id, p.category_id
32
),
33

34
inventory_status AS (
35
    SELECT 
36
        i.product_id,
37
        SUM(i.stock) AS total_stock,
38
        RANK() OVER (ORDER BY SUM(i.stock) DESC) AS stock_rank
39
    FROM inventory i
40
    GROUP BY i.product_id
41
),
42

43
high_demand_products AS (
44
    SELECT 
45
        p.product_id,
46
        AVG(s.sale_price) AS avg_sale_price,
47
        COUNT(s.sale_id) AS total_sales
48
    FROM products p
49
    JOIN sales s ON p.product_id = s.product_id
50
    GROUP BY p.product_id
51
    HAVING COUNT(s.sale_id) > (SELECT AVG(total_sales) FROM (SELECT COUNT(*) AS total_sales FROM sales GROUP BY product_id) subquery)
52
),
53

54
dynamic_pricing AS (
55
    SELECT 
56
        p.product_id,
57
        p.base_price,
58
        CASE 
59
            WHEN pop.popularity_rank <= 3 THEN 1.2
60
            WHEN pop.popularity_rank BETWEEN 4 AND 10 THEN 1.1
61
            ELSE 0.9
62
        END AS popularity_adjustment,
63
        rp.avg_price,
64
        COALESCE(1.0 - (pe.promotion_discount / 100), 1) AS promotion_discount,
65
        CASE 
66
            WHEN inv.stock_rank <= 3 THEN 1.1
67
            WHEN inv.stock_rank BETWEEN 4 AND 10 THEN 1.05
68
            ELSE 1
69
        END AS stock_adjustment,
70
        CASE 
71
            WHEN p.base_price > rp.avg_price THEN 1 + (p.base_price - rp.avg_price) / rp.avg_price
72
            ELSE 1 - (rp.avg_price - p.base_price) / rp.avg_price
73
        END AS demand_multiplier,
74
        hd.avg_sale_price,
75
        CASE 
76
            WHEN p.product_name ilike '%cheap%' THEN 0.8
77
            ELSE 1.0
78
        END AS additional_discount
79
    FROM products p 
80
    LEFT JOIN recent_prices rp ON p.product_id = rp.product_id
81
    LEFT JOIN promotion_effect pe ON p.product_id = pe.product_id
82
    JOIN popularity_score pop ON p.product_id = pop.product_id
83
    LEFT JOIN inventory_status inv ON p.product_id = inv.product_id
84
    LEFT JOIN high_demand_products hd ON p.product_id = hd.product_id
85
)
86

87
SELECT 
88
    dp.product_id,
89
    dp.base_price * dp.popularity_adjustment * dp.promotion_discount * dp.stock_adjustment * dp.demand_multiplier * dp.additional_discount AS adjusted_price
90
FROM dynamic_pricing dp;
91

sql

The benchmark test is to query dynamic_price for the current price of a single product at different scales. All measurements are taken via dbbench on an otherwise idle c5a.4xlarge EC2 instance:

1
SELECT * FROM dymamic_pricing WHERE product_id = $1
2

sql

Configurations for Benchmark Testing

Aurora PostgreSQL Configuration

To set up Aurora PostgreSQL for benchmarking, we connected to a db.r6gd.xlarge read replica instance with read optimizations enabled. All queries in PostgreSQL were executed against this instance.

The primary keys for each table were set, along with secondary indexes based on the view's needs. Foreign keys were also established to provide the query optimizer with as much information as possible to enhance performance. You can view the code below:

1
ALTER TABLE products 
2
ADD CONSTRAINT products_pkey PRIMARY KEY (product_id);
3

4
ALTER TABLE categories 
5
ADD CONSTRAINT categories_pkey PRIMARY KEY (category_id);
6

7
ALTER TABLE suppliers 
8
ADD CONSTRAINT suppliers_pkey PRIMARY KEY (supplier_id);
9

10
ALTER TABLE sales 
11
ADD CONSTRAINT sales_pkey PRIMARY KEY (sale_id);
12

13
ALTER TABLE inventory 
14
ADD CONSTRAINT inventory_pkey PRIMARY KEY (inventory_id);
15

16
ALTER TABLE promotions 
17
ADD CONSTRAINT promotions_pkey PRIMARY KEY (promotion_id);
18

19
ALTER TABLE public.inventory
20
ADD CONSTRAINT inventory_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id); 
21

22
ALTER TABLE public.promotions
23
ADD CONSTRAINT promotions_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);
24

25
ALTER TABLE public.sales
26
ADD CONSTRAINT sales_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);
27

28
CREATE INDEX idx_products_product_name ON products(product_name);
29
CREATE INDEX idx_sales_product_id ON sales(product_id); 
30
CREATE INDEX idx_sales_sale_date ON sales(sale_date);
31
CREATE INDEX idx_sales_product_id_sale_date ON sales(product_id, sale_date);
32
CREATE INDEX idx_promotions_product_id ON promotions(product_id);
33
CREATE INDEX idx_promotions_active ON promotions(active);
34
CREATE INDEX idx_promotions_product_id_active ON promotions(product_id, active);
35
CREATE INDEX idx_inventory_product_id ON inventory(product_id);
36

sql

Materialize Configuration

We configured a 50cc cluster on the Materialize side, which has approximately the same on-demand cost as the Aurora read-replica. Materialize also connects directly to the primary writer instance to consume changes using the built-in Postgres source [2].

The system maintains consistency by applying all changes within an upstream transaction atomically. Unlike other CDC tools, Materialize will never show partial writes. Within Materialize, the same view is defined and indexed directly.

Indexing a view in Materialize initiates the pre-computation of results and ensures they are kept up-to-date as the upstream database is modified. Queries executed against this view leverage the index to short-circuit any work that is already completed, significantly improving query performance. The code is as follows:

1
CREATE INDEX ON dynamic_pricing (product_id);
2

sql

Overview of Test Scenarios

We designed three test scenarios to comprehensively evaluate the performance of Materialize versus read-optimized Aurora PostgreSQL read replicas. These scenarios simulate common real-world use cases, and stress test the systems under varying conditions.

Scenario 1: Single Database Connection with Continuous Writes

In this scenario, a single database connection runs one SELECT query at a time as fast as possible. Simultaneously, a continuous stream of writes (inserts, updates, and deletes) is performed on the database. This tests the ability of each system to maintain read performance under write-heavy conditions.

Scenario 2: Ten Database Connections with Continuous Writes

Here, ten database connections issue SELECT queries concurrently while a continuous stream of writes is performed on the database. This scenario tests each system’s overall robustness and concurrency handling under mixed read/write workloads.

Scenario 3: 100 Database Connections with Continuous Writes

In this final scenario, 100 database connections issue SELECT queries concurrently while the database undergoes a continuous stream of writes. This scenario assesses the robustness and scalability of each system under a read-heavy workload with simultaneous writes.

Test Scenarios: Results for Materialize and Aurora PostgreSQL

Scenario 1

Aurora PostgreSQL

  • Performance: 80 transactions (0.131 TPS)
  • Mean Latency: 7.639s ± 13.38ms
  • P99 Latency: 7.757s
  • Max Latency: 7.892s
1
4.294967296s - 8.589934592s \[    80]: ██████████████████████████████████████████████████
2

text

Materialize

  • Performance: 46,254 transactions (77.088 TPS)
  • Mean Latency: 12.953ms ± 176.35µs
  • P99 Latency: 56.124ms
  • Max Latency: 268.435ms
1
Transactions:
2
4.194304ms -   8.388608ms \[ 28845]: ██████████████████████████████████████████████████
3
8.388608ms -  16.777216ms \[  9252]: ████████████████
4
16.777216ms -  33.554432ms \[  6623]: ███████████▍
5
33.554432ms -  67.108864ms \[  1195]: ██
6
67.108864ms - 134.217728ms \[   244]: ▍
7

text

Scenario 2

Aurora PostgreSQL

  • Performance: 277 transactions (0.451 TPS)
  • Mean Latency: 21.983s ± 225.77ms
  • P99 Latency: 24.516s
  • Max Latency: 25.076s
1
Transactions:
2
8.589934592s - 17.179869184s \[     3]: ▌
3
17.179869184s - 34.359738368s \[   274]: ██████████████████████████████████████████████████
4

text

Materialize

  • Performance: 96221 transactions (160.365 TPS)
  • Mean Latency: 29.35ms ± 114.129µs
  • P99 Latency: 644.438ms
  • Max Latency: 825.438ms
1
Transactions:
2
4.194304ms -   8.388608ms \[     5]: ▏
3
8.388608ms -  16.777216ms \[  7445]: ██████▎
4
16.777216ms -  33.554432ms \[ 78498]: ██████████████████████████████████████████████████
5
33.554432ms -  67.108864ms \[  3777]: ███████████████████▍
6
67.108864ms - 134.217728ms \[   662]: █▍
7
134.217728ms - 268.435456ms \[    44]: ▏
8
268.435456ms - 536.870912ms \[   687]: ▌
9
536.870912ms - 825.438664ms \[  5103]: ████▎
10

text

Scenario 3

Aurora PostgreSQL

The Aurora read replica crashed under the load, and we were unable to complete the benchmark.

Materialize

  • Performance: 276,486 transactions (460.719 TPS)
  • Latency: 216.996ms±583.073µs
  • P99 Latency: 713.052ms
  • Max Latency: 3.975s
1
Transactions:
2
16.777216ms -  33.554432ms \[    52]: ▏
3
33.554432ms -  67.108864ms \[   949]: ▏
4
67.108864ms - 134.217728ms \[ 20931]: █████
5
134.217728ms - 268.435456ms \[207898]: ██████████████████████████████████████████████████
6
268.435456ms - 536.870912ms \[ 36042]: ████████▋
7
536.870912ms - 1.073741824s \[ 10397]: ██▌
8
1.073741824s - 2.147483648s \[   216]: ▏
9
2.147483648s - 3.975392663s \[     1]: ▏
10

text

Conclusion: Materialize Outperforms Aurora PostgreSQL

In modern applications, computationally intensive workloads are the norm. And our benchmarking demonstrated that for complex queries, Materialize significantly outperforms Aurora PostgreSQL read replicas. As an operational data platform, Materialize is ideal for offloading queries from OLTP systems.

Some of the most critical finds of our benchmarking tests include:

  • Higher throughput and lower latency: Materialize achieves 100x greater throughput and 1000x lower latency than Aurora PostgreSQL for complex queries over small data volumes.
  • Efficient resource utilization: Materialize’s CPU usage never exceeded 13%, compared to Aurora’s frequent CPU alerts and eventual crash under high load.
  • Consistent fresh data: Materialize maintains a lag of less than 1 second, ensuring near-real-time data accuracy without the risk of stale results. Additionally, Materialize has a feature in private preview called real-time recency that guarantees users can always read their writes across systems, making it ideal for use cases that can never tolerate stale results.

Stay tuned for more benchmarking tests from us! And feel free to post on our Twitter or LinkedIn to request benchmarking tests.

Appendix

[1] Table Definitions

1
CREATE TABLE products ( 
2
product_id SERIAL PRIMARY KEY,
3
product_name VARCHAR(255) NOT NULL,
4
base_price NUMERIC(10, 2) NOT NULL,
5
category_id INTEGER NOT NULL,
6
supplier_id INTEGER NOT NULL,
7
available BOOLEAN NOT NULL 
8
);
9

10
CREATE TABLE categories (
11
category_id SERIAL PRIMARY KEY,
12
category_name VARCHAR(255) NOT NULL
13
);
14

15
CREATE TABLE suppliers (
16
supplier_id SERIAL PRIMARY KEY,
17
supplier_name VARCHAR(255) NOT NULL
18
); 
19

20
CREATE TABLE sales ( 
21
sale_id SERIAL PRIMARY KEY,
22
product_id INTEGER NOT NULL,
23
sale_price NUMERIC(10, 2) NOT NULL,
24
sale_date TIMESTAMP NOT NULL, price NUMERIC(10, 2) NOT NULL
25
); 
26

27
CREATE TABLE inventory (
28
inventory_id SERIAL PRIMARY KEY,
29
product_id INTEGER NOT NULL,
30
stock INTEGER NOT NULL,
31
warehouse_id INTEGER NOT NULL,
32
restock_date TIMESTAMP NOT NULL
33
); 
34

35
CREATE TABLE promotions ( 
36
promotion_id SERIAL PRIMARY KEY,
37
product_id INTEGER NOT NULL,
38
promotion_discount NUMERIC(10, 2) NOT NULL,
39
start_date TIMESTAMP NOT NULL,
40
end_date TIMESTAMP NOT NULL,
41
active BOOLEAN NOT NULL
42
);
43

sql

[2] Materialize also offers a built-in MySQL source that offers comparable functionality for MySQL databases.

Seth

Seth Wiesman

Materialize

Seth Wiesman joined Materialize in 2021 and now serves as Field CTO, where he leads technical strategy and works closely with customers to address their most challenging operational data needs. As a long-time contributor and committer to Apache Flink, Seth brings over a decade of experience in data and streaming technologies, including work on some of the world's largest and most complex data projects.