Top-K in group
View as MarkdownOverview
The “Top-K in group” query pattern groups by some key and return the first K elements within each group according to some ordering.
Materialize and window functions
For indexed views and materialized views that contain window
functions (including aggregate functions used
with an OVER clause), when an input record in a partition is
added/removed/changed, Materialize recomputes the results from scratch for
that partition (instead of using incremental computation).
The PARTITION BY clause of your window function determines your partitions. If
PARTITION BY is omitted, all records belong to a single partition (i.e., any
record change results in a recomputation from scratch over the whole input).
To avoid performance issues that may arise as the number of records grows, consider rewriting your indexed views and materialized views to use idiomatic Materialize SQL instead of window functions. If your view definitions cannot be rewritten without the window functions and the performance of window functions is insufficient for your use case, please contact our team.
Idiomatic Materialize SQL
For K >= 1
Idiomatic Materialize SQL: For Top-K queries where K >= 1, use a subquery to SELECT DISTINCT on the grouping key and perform a LATERAL join (by the grouping key) with another subquery that specifies the ordering and the limit K.
|
Use a subquery to
SELECT DISTINCT on the grouping key (e.g.,
|
|
|
|
Query hints
To further improve the memory usage of the idiomatic Materialize SQL, you can
specify a LIMIT INPUT GROUP SIZE query hint in the
idiomatic Materialize SQL.
SELECT fieldA, fieldB, ...
FROM (SELECT DISTINCT fieldA FROM tableA) grp,
LATERAL (SELECT fieldB, ... , fieldZ FROM tableA
WHERE fieldA = grp.fieldA
OPTIONS (LIMIT INPUT GROUP SIZE = ...)
ORDER BY fieldZ ... LIMIT K) -- K is a number >= 1
ORDER BY fieldA, fieldZ ... ;
For more information on setting LIMIT INPUT GROUP SIZE, see
Optimization.
For K = 1
Idiomatic Materialize SQL: For K = 1, use a SELECT DISTINCT
ON() on the grouping key (e.g., fieldA) and
order the results first by the DISTINCT ON key and then the Top-K ordering
key (e.g., fieldA, fieldZ [ASC|DESC]).
Alternatively, you can also use the more general Top-K where K >= 1 pattern, specifying 1 as the limit.
|
|
|
|
Query hints
To further improve the memory usage of the idiomatic Materialize SQL, you can
specify a DISTINCT ON INPUT GROUP SIZE query hint
in the idiomatic Materialize SQL.
SELECT DISTINCT ON(fieldA) fieldA, fieldB, ...
FROM tableA
OPTIONS (DISTINCT ON INPUT GROUP SIZE = ...)
ORDER BY fieldA, fieldZ ... ;
For more information on setting DISTINCT ON INPUT GROUP SIZE, see
EXPLAIN ANALYZE HINTS.
Examples
Select Top-3 items
Using idiomatic Materialize SQL, the following example finds the top 3 items (by
descending subtotal) in each order. The example uses a subquery to SELECT
DISTINCT on the grouping key (order_id), and
performs a LATERAL join (by the grouping
key) with another subquery that specifies the ordering (ORDER BY subtotal DESC) and limits its results to 3 (LIMIT 3).
|
|
|
|
Select Top-1 item
Using idiomatic Materialize SQL, the following example finds the top 1 item (by
descending subtotal) in each order. The example uses a query to SELECT DISTINCT
ON() on the grouping key (order_id) with an
ORDER BY order_id, subtotal DESC (i.e., ordering first by the DISTINCT ON/grouping key, then the descending subtotal). 1
|
|
|
|
See also
-
Alternatively, you can also use the idiomatic Materialize SQL for the more general Top K query, specifying 1 as the limit. ↩︎