Last value in group

Overview

The “last value in each group” query pattern returns the last value, according to some ordering, in each group.

Materialize and window functions

For window functions, when an input record in a partition (as determined by the PARTITION BY clause of your window function) is added/removed/changed, Materialize recomputes the results for the entire window partition. This means that when a new batch of input data arrives (that is, every second), the amount of computation performed is proportional to the total size of the touched partitions.

For example, assume that in a given second, 20 input records change, and these records belong to 10 different partitions, where the average size of each partition is 100. Then, amount of work to perform is proportional to computing the window function results for 10*100=1000 rows.

As a rule of thumb, if the total size of all touched window partitions is at most 1000000 rows per second, then the system should be able to keep up with the input data as it arrives. However, if your use case has higher performance requirements, consider rewriting your query to not use window functions. If your query cannot be rewritten without the window functions and the performance of window functions is insufficient for your use case, please contact our team.

Idiomatic Materialize SQL

Idiomatic Materialize SQL: To find the last value in each group, use the MIN() or MAX() aggregate function in a subquery.

Idiomatic Materialize SQL

Use a subquery that uses the MIN() or MAX() aggregate function.


SELECT tableA.fieldA, tableA.fieldB, minmax.Z
 FROM tableA,
 (SELECT fieldA,
    MAX(fieldZ),
    MIN(fieldZ)
 FROM tableA
 GROUP BY fieldA) minmax
WHERE tableA.fieldA = minmax.fieldA
ORDER BY fieldA ... ;
Anti-pattern

Do not use LAST_VALUE() OVER (PARTITION BY ... ORDER BY ... RANGE ...) window function for last value in each group queries.

NOTE: Materialize does not support RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

-- Unsupported --
SELECT fieldA, fieldB,
  LAST_VALUE(fieldZ)
    OVER (PARTITION BY fieldA ORDER BY fieldZ
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING),
  LAST_VALUE(fieldZ)
    OVER (PARTITION BY fieldA ORDER BY fieldZ DESC
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING)
FROM tableA
ORDER BY fieldA, ...;

Query hints

To further improve the memory usage of the idiomatic Materialize SQL, you can specify a AGGREGATE INPUT GROUP SIZE query hint in the idiomatic Materialize SQL.

SELECT tableA.fieldA, tableA.fieldB, minmax.Z
 FROM tableA,
 (SELECT fieldA,
    MAX(fieldZ),
    MIN(fieldZ)
 FROM tableA
 OPTIONS (AGGREGATE INPUT GROUP SIZE = ...)
 GROUP BY fieldA) minmax
WHERE tableA.fieldA = minmax.fieldA
ORDER BY fieldA ... ;

For more information on setting AGGREGATE INPUT GROUP SIZE, see Optimization.

Examples

NOTE: The example data can be found in the Appendix.

Use MAX() to find the last value

Using idiomatic Materialize SQL, the following example finds the highest item price in each order and calculates the difference between the price of each item in the order and the highest price. The example uses a subquery that groups by the order_id and selects MAX(price) to find the highest price (i.e., the last price if ordered by ascending price values):

Idiomatic Materialize SQL
SELECT o.order_id, minmax.highest_price, o.item, o.price,
  o.price - minmax.highest_price AS diff_highest_price
FROM orders_view o,
     (SELECT order_id,
        MAX(price) AS highest_price
     FROM orders_view
     GROUP BY order_id) minmax
WHERE o.order_id = minmax.order_id
ORDER BY o.order_id, o.item;
Anti-pattern

Do not use of LAST_VALUE() OVER (PARTITION BY ... ORDER BY ... RANGE ...) for last value in each group queries.

NOTE: Materialize does not support RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
-- Unsupported --
SELECT order_id,
  LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS highest_price,
  item,
  price,
  price - LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price
           RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS diff_highest_price
FROM orders_view
ORDER BY order_id, item;

Use MIN() to find the last values

Using idiomatic Materialize SQL, the following example finds the lowest item price in each order and calculates the difference between the price of each item in the order and the lowest price. That is, use a subquery that groups by the order_id and selects MIN(price) as the lowest price (i.e., last price if ordered by descending price value)

Idiomatic Materialize SQL
SELECT o.order_id, minmax.lowest_price, o.item, o.price,
  o.price - minmax.lowest_price AS diff_lowest_price
FROM orders_view o,
     (SELECT order_id,
        MIN(price) AS lowest_price
     FROM orders_view
     GROUP BY order_id) minmax
WHERE o.order_id = minmax.order_id
ORDER BY o.order_id, o.item;
Anti-pattern

Do not use LAST_VALUE() OVER (PARTITION BY ... ORDER BY ... RANGE ... ) for last value in each group queries.

NOTE: Materialize does not support RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
-- Unsupported --
SELECT order_id,
  LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price DESC
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS lowest_price,
  item,
  price,
  price - LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price DESC
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS diff_lowest_price
FROM orders_view
ORDER BY order_id, item;

Use MIN() and MAX() to find the last values

Using idiomatic Materialize SQL, the following example finds the lowest and highest item price in each order and calculate the difference for each item in the order from these prices. That is, use a subquery that groups by the order_id and selects MIN(price) as the lowest price (i.e., last value if ordered by descending price values) and MAX(price) as the highest price (i.e., last value if ordered by ascending price values).

Idiomatic Materialize SQL
SELECT o.order_id, minmax.lowest_price, minmax.highest_price, o.item, o.price,
  o.price - minmax.lowest_price AS diff_lowest_price,
  o.price - minmax.highest_price AS diff_highest_price
FROM orders_view o,
      (SELECT order_id,
         MIN(price) AS lowest_price,
         MAX(price) AS highest_price
      FROM orders_view
      GROUP BY order_id) minmax
WHERE o.order_id = minmax.order_id
ORDER BY o.order_id, o.item;
Anti-pattern

Do not use LAST_VALUE() OVER (PARTITION BY ... ORDER BY ) for last value within groups queries.

NOTE: Materialize does not support RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
-- Unsupported --
SELECT order_id,
  LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price DESC
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS lowest_price,
  LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS highest_price,
  item,
  price,
  price - LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price DESC
          RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS diff_lowest_price,
  price - LAST_VALUE(price)
    OVER (PARTITION BY order_id ORDER BY price
           RANGE BETWEEN
            UNBOUNDED PRECEDING AND
            UNBOUNDED FOLLOWING) AS diff_highest_price
FROM orders_view
ORDER BY order_id, item;

See also

Back to top ↑