Lag over
Overview
The “lag over (order by )” query pattern accesses the field value of the previous row as determined by some ordering.
For “lag over (order by)” queries whose ordering can be represented by some equality condition (such as when ordering by a field that increases at a regular interval), Materialize provides an idiomatic SQL as an alternative to the window function.
Materialize and window functions
For window functions, when an input record
in a partition (as determined by the PARTITION BY
clause of your window
function) is added/removed/changed, Materialize recomputes the results for the
entire window partition. This means that when a new batch of input data arrives
(that is, every second), the amount of computation performed is proportional
to the total size of the touched partitions.
For example, assume that in a given second, 20 input records change, and these records belong to 10 different partitions, where the average size of each partition is 100. Then, amount of work to perform is proportional to computing the window function results for 10*100=1000 rows.
As a rule of thumb, if the total size of all touched window partitions is at most 1000000 rows per second, then the system should be able to keep up with the input data as it arrives. However, if your use case has higher performance requirements, consider rewriting your query to not use window functions. If your query cannot be rewritten without the window functions and the performance of window functions is insufficient for your use case, please contact our team.
Idiomatic Materialize SQL
Exclude the first row in results
Idiomatic Materialize SQL: To access the lag (previous row’s field value)
ordered by some field that increases in a regular pattern, use a self join
that specifies an equality condition on the order by field (e.g., WHERE t1.order_field = t2.order_field + 1
, WHERE t1.order_field = t2.order_field * 2
, etc.). The query excludes the first row since it does not have a previous
row.
Use a self join that specifies an equality match on the lag’s order by field
(e.g.,
! Important: The idiomatic Materialize SQL applies only to those “lag over” queries whose
ordering can be represented by some equality condition.
|
|
Avoid the use of
|
Include the first row in results
Idiomatic Materialize SQL: To access the lag (previous row’s field value)
ordered by some field that increases in a regular pattern, use a self
LEFT JOIN/LEFT OUTER JOIN
that specifies
an equality condition on the order by field (e.g., ON t1.order_field = t2.order_field + 1
, ON t1.order_field = t2.order_field * 2
, etc.). The LEFT JOIN/LEFT OUTER JOIN
query includes the first row, returning null
as its
lag value.
Use a self
! Important: The idiomatic Materialize SQL applies only to those “lag over” queries whose
ordering can be represented by some equality condition.
|
|
Avoid the use of
|
Examples
Find previous row’s value (exclude the first row in results)
Using idiomatic Materialize SQL, the following example finds the previous day’s
order total. That is, the example uses a self join on orders_daily_totals
. The
row ordering on the order_date
field is represented by an equality
condition using an interval of 1 DAY
. The
query excludes the first row in the results since the first row does not have a
previous row.
! Important: The idiomatic Materialize SQL applies only to those “lag over” queries whose
ordering can be represented by some equality condition.
|
|
|
Find previous row’s value (include the first row in results)
Using idiomatic Materialize SQL, the following example finds the previous day’s
order total. The example uses a self LEFT JOIN/LEFT OUTER JOIN
on orders_daily_totals
. The
row ordering on the order_date
field is represented by an equality
condition using an interval of 1 DAY
. The
query includes the first row in the results, using null
as the previous value.
! Important: The idiomatic Materialize SQL applies only to those “lag over” queries whose
ordering can be represented by some equality condition.
|
|
|