Top-K in group
Overview
The “Top-K in group” query pattern groups by some key and return the first K elements within each group according to some ordering.
Materialize and window functions
For window functions, when an input record
in a partition (as determined by the PARTITION BY
clause of your window
function) is added/removed/changed, Materialize recomputes the results for the
entire window partition. This means that when a new batch of input data arrives
(that is, every second), the amount of computation performed is proportional
to the total size of the touched partitions.
For example, assume that in a given second, 20 input records change, and these records belong to 10 different partitions, where the average size of each partition is 100. Then, amount of work to perform is proportional to computing the window function results for 10*100=1000 rows.
As a rule of thumb, if the total size of all touched window partitions is at most 1000000 rows per second, then the system should be able to keep up with the input data as it arrives. However, if your use case has higher performance requirements, consider rewriting your query to not use window functions. If your query cannot be rewritten without the window functions and the performance of window functions is insufficient for your use case, please contact our team.
Idiomatic Materialize SQL
For K >= 1
Idiomatic Materialize SQL: For Top-K queries where K >= 1, use a subquery to SELECT DISTINCT on the grouping key and perform a LATERAL join (by the grouping key) with another subquery that specifies the ordering and the limit K.
Use a subquery to
SELECT DISTINCT on the grouping key (e.g.,
|
|
|
Query hints
To further improve the memory usage of the idiomatic Materialize SQL, you can
specify a LIMIT INPUT GROUP SIZE
query hint in the
idiomatic Materialize SQL.
SELECT fieldA, fieldB, ...
FROM (SELECT DISTINCT fieldA FROM tableA) grp,
LATERAL (SELECT fieldB, ... , fieldZ FROM tableA
WHERE fieldA = grp.fieldA
OPTIONS (LIMIT INPUT GROUP SIZE = ...)
ORDER BY fieldZ ... LIMIT K) -- K is a number >= 1
ORDER BY fieldA, fieldZ ... ;
For more information on setting LIMIT INPUT GROUP SIZE
, see
Optimization.
For K = 1
Idiomatic Materialize SQL: For K = 1, use a SELECT DISTINCT
ON() on the grouping key (e.g., fieldA
) and
order the results first by the DISTINCT ON
key and then the Top-K ordering
key (e.g., fieldA, fieldZ [ASC|DESC]
).
Alternatively, you can also use the more general Top-K where K >= 1 pattern, specifying 1 as the limit.
|
|
|
Query hints
To further improve the memory usage of the idiomatic Materialize SQL, you can
specify a DISTINCT ON INPUT GROUP SIZE
query hint
in the idiomatic Materialize SQL.
SELECT DISTINCT ON(fieldA) fieldA, fieldB, ...
FROM tableA
OPTIONS (DISTINCT ON INPUT GROUP SIZE = ...)
ORDER BY fieldA, fieldZ ... ;
For more information on setting DISTINCT ON INPUT GROUP SIZE
, see
Optimization.
Examples
Select Top-3 items
Using idiomatic Materialize SQL, the following example finds the top 3 items (by
descending subtotal) in each order. The example uses a subquery to SELECT
DISTINCT on the grouping key (order_id
), and
performs a LATERAL join (by the grouping
key) with another subquery that specifies the ordering (ORDER BY subtotal DESC
) and limits its results to 3 (LIMIT 3
).
|
|
|
Select Top-1 item
Using idiomatic Materialize SQL, the following example finds the top 1 item (by
descending subtotal) in each order. The example uses a query to SELECT DISTINCT
ON() on the grouping key (order_id
) with an
ORDER BY order_id, subtotal DESC
(i.e., ordering first by the DISTINCT ON
/grouping key, then the descending subtotal). 1
|
|
|
See also
-
Alternatively, you can also use the idiomatic Materialize SQL for the more general Top K query, specifying 1 as the limit. ↩︎