Window function is actually quite similar like groupBy
, but I think it’s more powerful and flexible. It’s purpose to group a set of data and apply aggregate operations, such as AVG
, MAX
, etc.
Based on the results, the difference of window functions operation is evaluated in each row of data, while groupBy evaluated become one row of results.
Group By:
λ> selecta.id as "ID",a.code as "Code",count(a.id) as "Count"from order_items oiinner join assets a on a.id = oi."assetId"group by a.id+-------------------------------+| ID | Code | Count |+-------------------------------+| 15 | 1231212 | 2 || 19 | 12341928 | 2 |+-----+-------------------------+
Group By Asset Id
Window Functions
λ> selecta.id as "ID",a.code as "Code",count(a.id) OVER(partition by oi."assetId") as "Count"from order_items oiinner join assets a on a.id = oi."assetId"+-------------------------------+| ID | Code | Count |+-------------------------------+| 15 | 1231212 | 2 || 15 | 1231212 | 2 || 19 | 12341928 | 2 || 19 | 12341928 | 2 |+-----+-------------------------+
Inside window functions, as you can see the operations is performed without the GROUP BY
clause. But, the interesting parts is in this section:
count(a.id) OVER(partition by oi."assetId") as "Count"
Agregate function
It’s called window functions, because of the aggregate functions is operated with the OVER
clause.
- The pattern look like this:
<function_name>(<argument(s)>) OVER(PARTITION BY <column(s)> ORDER BY <column(s)>) <alias>
Over Clause
Over clause, defines the size of window you want to create. Example:
OVER()
if the parameter’s empty, then the partition is created based on the calculations of the whole data. If referring to the case above:
λ> select...count(a.id) OVER() as "Count"...+-------------------------------+| ID | Code | Count |+-------------------------------+| 15 | 1231212 | 4 || 15 | 1231212 | 4 || 19 | 12341928 | 4 || 19 | 12341928 | 4 |+-----+-------------------------+
- Create window partitions, based on grouping columns:
count(a.id) OVER(partition by oi."assetId")
So, in the case above, it means count based on the existing Asset Id , within Asset ID 15 with a total of 2, and Asset ID 19
- Create window + sorting partitions:
λ> select...oi.price as "Price",oi.id as "Item ID",...DENSE_RANK() OVER(partition by oi."assetId" order by oi.price desc) as "Rank"...+---------------------------------+------------+---------------+| ID | Code | Price | Item ID | Rank+---------------------------------+------------+---------------+| 15 | 1231212 | 200000 | 303 | 1 || 15 | 1231212 | 100000 | 302 | 2 || 19 | 12341928 | 813984 | 301 | 1 || 19 | 12341928 | 813984 | 304 | 1 |+-----+---------------------------+------------+---------------+
DENSE_RANK()
is used to view rank by largest price in order
Why Window Functions
So far I think, GROUP BY
is less flexible for query operations that require data from other tables:
λ> selectoi.price, -- non agregatesum(oi.price) -- agregate,......from order_items oiinner join assets a on a.id = oi."assetId"inner join asset_events ae on ae."assetId" = a.id...group by 1, ... ? ..?---- Error:-- column "oi.price" must appear in the GROUP BY clause-- or be used in an aggregate function
The problem’s, I have to define group by value above per each non aggregate data that will be shown, besides that the result might not be valid. Maybe there’s another way and approach with group by, but it seems more complex.
Well, that’s why window function can solve the complexity problem above. Just by adding:
λ> selecta.id as "ID",a.code as "Code",oi.price,count(a.id) OVER(partition by oi."assetId") as "Count",sum(oi.price) OVER(partition by oi."assetId") as "Total"from order_items oiinner join assets a on a.id = oi."assetId"
TIPS: If the problem statement does not require combining both aggregated and non-aggregated columns in a single query, then your best approach is to use a GROUP BY
clause instead.
CTE + Window Functions + Sub Query
In this case, get last asset events of each asset booked by the customer
with latest_asset_events as (select *,DENSE_RANK() OVER(PARTITION BY ae."assetId" order by ae."createdAt" desc) AS rankfrom asset_events ae) select * from latest_asset_events laewhere lae."assetId" in (select a.idfrom orders oinner join order_items oi on oi."orderId" = o.idinner join assets a on a.id = oi."assetId")and rank=1
rank=1
is used to get first rank from latest asset event dateorder by ae."createdAt" desc
Additional reading:
- SQL Window Functions performance https://selectfrom.dev/sql-performance-of-window-functions-vs-subqueries-on-big-data-3fa533fc64bf