Saturday, March 20, 2010

I love windows functions

... or whatever they call those functions that allow ranking and anyway working on "windows" of data (or sets inside sets or ...).

A couple of questions on Devshed forums I recently answered using those functions.

First one is about recordset paging (well, there are hundred of ways to do recordset paging, just google, but ayway) LIMIT being one of them, stored procedures another one (and ugly IMHO).

Code:
select
bb.code
from
(
select
aa.code,
rank() over( partition by 1 order by aa.code) row_num,
trunc(((rank() over( partition by 1 order by aa.code))-1)/10)+1 pages
from
paging_test aa
) bb
where
bb.pages = 2
where 10 is the number of rows per page and 2 is the page you want to retrieve.

In this case the rank() is over the entire set (note the partition clause) basically turning it into a row number.

The other one about finding two (top, random, whatever) elements inside sets (i.e. two employees per department).
In this case the OP has already got the fact that LIMIT is not enough for solving the problem cleanly, so I gave him a little example:

Code:
select
emp_rk.emp_name,
emp_rk.emp_dept
from
(
select
rank() over (partition by emp_dept order by emp_name) rk,
emp_name,
emp_dept
from
employee
) emp_rk
where
emp_rk.rk <>
Of course the trick is to partition by the set definer and order by whatever makes sense to you, even random ordering can be the solution.

Unfortunately only PostgreSQL currently supports those functions, Firebird should when v3.0 will be out, MySQL doesn't seem too interested, they already gave us the LIMIT clause.


EDIT: another one here.
This time is about accessing the ever elusive "previous row", the solution is using the lag() function, note that a lead() function exists and that both can be defined over the entire ordered table or over sets defined by user through the partition by clause (as the functions described above).

Code:
select
tts.temp_date_time,
tts.temp_value,
lag(tts.temp_value) over (partition by 1 order by tts.temp_date_time),
tts.temp_value - coalesce(
lag(tts.temp_value) over (partition by 1 order by tts.temp_date_time), 0
) delta_previous
from
test_temp_seq tts;
Partition by 1 actually means no partition at all, I'm partitioning by a constant ;)