SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Windowing Functions
          Hitoshi Harada
     (umi.tanuki@gmail.com)
    Head of Engineering Dept.
           FORCIA, Inc.
           David Fetter
  (david.fetter@pgexperts.com)
         PGCon2009, 21-22 May 2009 Ottawa
What are Windowing Functions?




        PGCon2009, 21-22 May 2009 Ottawa
What are the Windowing Functions?

• Similar to classical aggregates but does more!
• Provides access to set of rows from the current row
• Introduced SQL:2003 and more detail in SQL:2008
   – SQL:2008 (http://wiscorp.com/sql200n.zip) 02: SQL/Foundation
      • 4.14.9 Window tables
      • 4.15.3 Window functions
      • 6.10 <window function>
      • 7.8 <window clause>
• Supported by Oracle, SQL Server, Sybase and DB2
   – No open source RDBMS so far except PostgreSQL (Firebird trying)
• Used in OLAP mainly but also useful in OLTP
   – Analysis and reporting by rankings, cumulative aggregates




                        PGCon2009, 21-22 May 2009 Ottawa
How they work and what you get




        PGCon2009, 21-22 May 2009 Ottawa
How they work and what do you get




• Windowed table
  – Operates on a windowed table
  – Returns a value for each row
  – Returned value is calculated from the rows in the window




                         PGCon2009, 21-22 May 2009 Ottawa
How they work and what do you get




     • You can use…
        – New window functions
        – Existing aggregate functions
        – User-defined window functions
        – User-defined aggregate functions




            PGCon2009, 21-22 May 2009 Ottawa
How they work and what do you get




• Completely new concept!
  – With Windowing Functions, you can reach outside the current row




                      PGCon2009, 21-22 May 2009 Ottawa
How they work and what you get
[Aggregates]   SELECT key, SUM(val) FROM tbl GROUP BY key;




                    PGCon2009, 21-22 May 2009 Ottawa
How they work and what you get
[Windowing Functions]   SELECT key, SUM(val) OVER (PARTITION BY key) FROM tbl;




                             PGCon2009, 21-22 May 2009 Ottawa
How they work and what you get
Who is the highest paid relatively compared with the department average?


        depname             empno                    salary
        develop             10                       5200
        sales               1                        5000
        personnel           5                        3500
        sales               4                        4800
        sales               6                        550
        personnel           2                        3900
        develop             7                        4200
        develop             9                        4500
        sales               3                        4800
        develop             8                        6000
        develop             11                       5200

                        PGCon2009, 21-22 May 2009 Ottawa
How they work and what you get
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname)::int,

        salary - avg(salary) OVER (PARTITION BY depname)::int AS diff
FROM empsalary ORDER BY diff DESC


          depname      empno         salary         avg        diff
          develop      8             6000           5020       980
          sales        6             5500           5025       475
          personnel    2             3900           3700       200
          develop      11            5200           5020       180
          develop      10            5200           5020       180
          sales        1             5000           5025       -25
          personnel    5             3500           3700       -200
          sales        4             4800           5025       -225
          sales        3             4800           5025       -225
          develop      9             4500           5020       -520
          develop      7             4200           5020       -820


                            PGCon2009, 21-22 May 2009 Ottawa
Anatomy of a Window
• Represents set of rows abstractly as:
• A partition
  – Specified by PARTITION BY clause
  – Never moves
  – Can contain:
• A frame
  – Specified by ORDER BY clause and frame clause
  – Defined in a partition
  – Moves within a partition
  – Never goes across two partitions
• Some functions take values from a partition.
  Others take them from a frame.
                    PGCon2009, 21-22 May 2009 Ottawa
A partition
• Specified by PARTITION BY clause in OVER()
• Allows to subdivide the table, much like
  GROUP BY clause
• Without a PARTITION BY clause, the whole
  table is in a single partition

func
(args)
             OVER   (                                                 )
                        partition-clause    order-
                                            clause
                                                             frame-
                                                             clause


         PARTITION BY expr, …
                          PGCon2009, 21-22 May 2009 Ottawa
A frame
• Specified by ORDER BY clause and frame
  clause in OVER()
• Allows to tell how far the set is applied
• Also defines ordering of the set
• Without order and frame clauses, the whole of
  partition is a single frame
func
(args)
            OVER   (                                          )
                       partition-clause   order-
                                          clause
                                                   frame-
                                                   clause

ORDER BY expr [ASC|DESC]             (ROWS|RANGE) BETWEEN UNBOUNDED
[NULLS FIRST|LAST], …                (PRECEDING|FOLLOWING) AND CURRENT
                                     ROW
                         PGCon2009, 21-22 May 2009 Ottawa
The WINDOW clause
• You can specify common window definitions
  in one place and name it, for convenience
• You can use the window at the same query
  level
SELECT target_list, … wfunc() OVER w
FROM table_list, …
WHERE qual_list, ….
GROUP BY groupkey_list, ….
HAVING groupqual_list, …
WINDOW w      AS   (                                               )
                       partition-clause     order-
                                            clause
                                                          frame-
                                                          clause




                       PGCon2009, 21-22 May 2009 Ottawa
Built-in Windowing Functions




         PGCon2009, 21-22 May 2009 Ottawa
List of built-in Windowing Functions

•   row_number()                 •   lag()
•   rank()                       •   lead()
•   dense_rank()                 •   first_value()
•   percent_rank()               •   last_value()
•   cume_dist()                  •   nth_value()
•   ntile()


              PGCon2009, 21-22 May 2009 Ottawa
row_number()
     • Returns number of the current row

   SELECT val, row_number() OVER (ORDER BY val DESC) FROM tbl;

     val                             row_number()
     5                               1
     5                               2
     3                               3
     1                               4
Note: row_number() always incremented values independent of frame



                    PGCon2009, 21-22 May 2009 Ottawa
rank()
• Returns rank of the current row with gap

    SELECT val, rank() OVER (ORDER BY val DESC) FROM tbl;


    val                             rank()
    5                               1
                                                                   gap
    5                               1
    3                               3
    1                               4
    Note: rank() OVER(*empty*) returns 1 for all rows, since all rows
    are peers to each other


                       PGCon2009, 21-22 May 2009 Ottawa
dense_rank()
• Returns rank of the current row without gap

       SELECT val, dense_rank() OVER (ORDER BY val DESC) FROM tbl;


       val                             dense_rank()
       5                               1
                                                                     no gap
       5                               1
       3                               2
       1                               3
Note: dense_rank() OVER(*empty*) returns 1 for all rows, since all rows
are peers to each other

                          PGCon2009, 21-22 May 2009 Ottawa
percent_rank()
• Returns relative rank; (rank() – 1) / (total row – 1)
   SELECT val, percent_rank() OVER (ORDER BY val DESC) FROM tbl;

       val                             percent_rank()
       5                               0
       5                               0
       3                               0.666666666666667
       1                               1

      values are always between 0 and 1 inclusive.

Note: percent_rank() OVER(*empty*) returns 0 for all rows, since all rows
are peers to each other



                         PGCon2009, 21-22 May 2009 Ottawa
cume_dist()
• Returns relative rank; (# of preced. or peers) / (total row)

 SELECT val, cume_dist() OVER (ORDER BY val DESC) FROM tbl;


   val                             cume_dist()
   5                               0.5                          =2/4

   5                               0.5                          =2/4

   3                               0.75                         =3/4

   1                               1                            =4/4
       The result can be emulated by
       “count(*) OVER (ORDER BY val DESC) / count(*) OVER ()”
       Note: cume_dist() OVER(*empty*) returns 1 for all rows, since all rows
       are peers to each other
                          PGCon2009, 21-22 May 2009 Ottawa
ntile()
• Returns dividing bucket number

   SELECT val, ntile(3) OVER (ORDER BY val DESC) FROM tbl;


   val                               ntile(3)
   5                                 1                                    4%3=1

   5                                 1
   3                                 2
   1                                 3
    The results are the divided positions, but if there’s remainder add
    row from the head
    Note: ntile() OVER (*empty*) returns same values as above, since
    ntile() doesn’t care the frame but works against the partition
                        PGCon2009, 21-22 May 2009 Ottawa
lag()
      • Returns value of row above

SELECT val, lag(val) OVER (ORDER BY val DESC) FROM tbl;


val                            lag(val)
5                              NULL
5                              5
3                              5
1                              3
            Note: lag() only acts on a partition.



                  PGCon2009, 21-22 May 2009 Ottawa
lead()
• Returns value of the row below

SELECT val, lead(val) OVER (ORDER BY val DESC) FROM tbl;


val                            lead(val)
5                              5
5                              3
3                              1
1                              NULL

         Note: lead() acts against a partition.


                  PGCon2009, 21-22 May 2009 Ottawa
first_value()
• Returns the first value of the frame

    SELECT val, first_value(val) OVER (ORDER BY val DESC) FROM tbl;


    val                            first_value(val)
    5                              5
    5                              5
    3                              5
    1                              5



                      PGCon2009, 21-22 May 2009 Ottawa
last_value()
• Returns the last value of the frame
    SELECT val, last_value(val) OVER
     (ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEEDING
    AND UNBOUNDED FOLLOWING) FROM tbl;

    val                            last_value(val)
    5                              1
    5                              1
    3                              1
    1                              1
    Note: frame clause is necessary since you have a frame between
    the first row and the current row by only the order clause


                      PGCon2009, 21-22 May 2009 Ottawa
nth_value()
• Returns the n-th value of the frame
    SELECT val, nth_value(val, val) OVER
     (ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEEDING
    AND UNBOUNDED FOLLOWING) FROM tbl;

    val                               nth_value(val, val)
    5                                 NULL
    5                                 NULL
    3                                 3
    1                                 5
   Note: frame clause is necessary since you have a frame between
   the first row and the current row by only the order clause
                      PGCon2009, 21-22 May 2009 Ottawa
aggregates(all peers)
• Returns the same values along the frame

    SELECT val, sum(val) OVER () FROM tbl;


    val                            sum(val)
    5                              14
    5                              14
    3                              14
    1                              14
   Note: all rows are the peers to each other



                      PGCon2009, 21-22 May 2009 Ottawa
cumulative aggregates
• Returns different values along the frame

    SELECT val, sum(val) OVER (ORDER BY val DESC) FROM tbl;


    val                            sum(val)
    5                              10
    5                              10
    3                              13
    1                              14
    Note: row#1 and row#2 return the same value since they are the peers.
    the result of row#3 is sum(val of row#1…#3)


                      PGCon2009, 21-22 May 2009 Ottawa
Inside the Implementation




      PGCon2009, 21-22 May 2009 Ottawa
Where are they?
• Windowing Functions are identified by a flag
  in pg_proc, which means they are very similar
  to plain functions
  – pg_proc.proiswindow : boolean
• All the aggregates including user-defined
  ones in any language can be called in
  WindowAgg, too
  – Your old code gets new behaviour!
• src/include/catalog/pg_proc.h
  – pg_proc catalog definitions
• src/backend/utils/adt/windowfuncs.c
  – implementations of built-in window functions
                   PGCon2009, 21-22 May 2009 Ottawa
Added relevant nodes
• WindowFunc
  – primitive node for function itself “wfunc(avg1, …)”
• WindowDef
  – parse node for window definition “over (partition by …
    order by …)”
• WindowClause
  – parse node for window clause “window (partition by …
    order by …) as w”
• WindowAgg
  – plan node for window aggregate
• WindowAggState
  – executor node for window aggregate

                    PGCon2009, 21-22 May 2009 Ottawa
Hacking the planner
      SELECT depname, empno, salary,
      
      avg(salary) over (partition by depname) AS a,
      
      rank() over (partition by depname order by salary desc) AS r
      FROM empsalary ORDER BY r

Sort (cost=215.75..218.35 rows=1040 width=44)
 Output: depname, empno, salary, (avg(salary) OVER (?)), (rank() OVER (?))
 Sort Key: (rank() OVER (?))
 -> WindowAgg (cost=142.83..163.63 rows=1040 width=44)
     Output: depname, empno, salary, (avg(salary) OVER (?)), rank() OVER (?)
     -> Sort (cost=142.83..145.43 rows=1040 width=44)
         Output: depname, empno, salary, (avg(salary) OVER (?))
         Sort Key: depname, salary
         -> WindowAgg (cost=72.52..90.72 rows=1040 width=44)
              Output: depname, empno, salary, avg(salary) OVER (?)
              -> Sort (cost=72.52..75.12 rows=1040 width=44)
                 Output: depname, empno, salary
                 Sort Key: depname
                 -> Seq Scan on empsalary (cost=0.00..20.40 rows=1040 width=44)
                     Output: depname, empno, salary



                              PGCon2009, 21-22 May 2009 Ottawa
Hacking the planner
                                                 Final output
                             TargetEntry1:depname
                             TargetEntry2:empno
                             TargetEntry3:salary
                             TargetEntry4:avg
                             TargetEntry5:rank




   PGCon2009, 21-22 May 2009 Ottawa
Hacking the planner
                                                 Final output
                             TargetEntry1:depname
                             TargetEntry2:empno
                             TargetEntry3:salary
                             TargetEntry4:avg
                             TargetEntry5:rank




                        SeqScan
                              Var1:depname
                              Var2:empno
                              Var3:salary

   PGCon2009, 21-22 May 2009 Ottawa
Hacking the planner
                                                   Final output
                               TargetEntry1:depname
                               TargetEntry2:empno
                               TargetEntry3:salary
                               TargetEntry4:avg
                               TargetEntry5:rank




   WindowAgg1
      Var1:depname
      Var2:empno
      Var3:salary
      WindowFunc1:avg

                    Sort1
                            SeqScan
                                Var1:depname
                                Var2:empno
                                Var3:salary

   PGCon2009, 21-22 May 2009 Ottawa
Hacking the planner
                                                                               Final output
Sort3
        WindowAgg2                                         TargetEntry1:depname

          Var1:depname                                     TargetEntry2:empno

          Var2:empno                                       TargetEntry3:salary

          Var3:salary                                      TargetEntry4:avg

          Var4:avg                                         TargetEntry5:rank

          WindowFunc:rank

                         Sort2
                             WindowAgg1
                                 Var1:depname
                                 Var2:empno
                                 Var3:salary
                                 WindowFunc1:avg

                                               Sort1
                                                       SeqScan
                                                           Var1:depname
                                                           Var2:empno
                                                           Var3:salary

                             PGCon2009, 21-22 May 2009 Ottawa
Hacking the planner
• We could optimize it by relocating WindowAgg
   SELECT depname, empno, salary,
   
       rank() over (partition by depname order by salary desc) AS r,
   
        avg(salary) over (partition by depname) AS a
     FROM empsalary ORDER BY r

 Sort (cost=161.03..163.63 rows=1040 width=44)
  Output: depname, empno, salary, (rank() OVER (?)), (avg(salary) OVER (?))
  Sort Key: (rank() OVER (?))
  -> WindowAgg (cost=72.52..108.92 rows=1040 width=44)
      Output: depname, empno, salary, (rank() OVER (?)), avg(salary) OVER (?) No   Sort!
      -> WindowAgg (cost=72.52..93.32 rows=1040 width=44)
          Output: depname, empno, salary, rank() OVER (?)
          -> Sort (cost=72.52..75.12 rows=1040 width=44)
               Output: depname, empno, salary
               Sort Key: depname, salary
               -> Seq Scan on empsalary (cost=0.00..20.40 rows=1040 width=44)
                   Output: depname, empno, salary


                           PGCon2009, 21-22 May 2009 Ottawa
How the executor creates a window
Normal Scan




              row#1

              row#2
                                                         Destination
              row#3
 Table
              row#4

               Values are never shared among
               returned rows




                      PGCon2009, 21-22 May 2009 Ottawa
How the executor creates a window
WindowAgg




         row#1                             row#1

         row#2        Window               row#2
                       Object
         row#3        powered              row#3
                        by                           Destination
         row#4       Tuplestore            row#4
 Table
            Buffering rows in the Window
            Object, so the functions can
            fetch values from another row.



            This kills performance if the
            Window Object holds millions of
            rows.
                  PGCon2009, 21-22 May 2009 Ottawa
How the executor creates a window
WindowAgg                       winobj->markpos


         row#1                             row#1
         row#2                             row#2

         row#3                             row#3
                      Window
         row#4         Object              row#4
                      powered                        Destination
         row#5          by                 row#5
 Table               Tuplestore
         row#6                             row#6




            If the rest of rows don’t need
            buffered rows any more, trim the
            Tuplestore from its head to “mark
            pos”

                  PGCon2009, 21-22 May 2009 Ottawa
How the Windowing Functions work
• Functions workflow is very similar to plain functions
• Each function has its own seek position

WindowAgg calls the wfunc1()



wfunc1(fcinfo){                                           mark pos
  WindowObject winobj = PG_WINDOW_OBJECT();
                                                                        Window
    /* do something using winobj */                       fetch value   Object
}
                                                          fetch value




Return a value


                           PGCon2009, 21-22 May 2009 Ottawa
How the Windowing Functions work
                     • What about aggregates?

                           aggregate
                                                 Initialize aggregate
                          trans value



                  Final    aggregate          Trans
result1                                       func
                  func    trans value          Trans
                                               func
                                                           Window
                                                           Object
                  Final    aggregate          Trans
result2
                  func    trans value         func


          Trans value is cached and reused after final function call.
          -> BE CAREFUL of anything that will break if the final function is called
          more than once.
          http://archives.postgresql.org//pgsql-hackers/2008-12/msg01820.php
                               PGCon2009, 21-22 May 2009 Ottawa
Write Your Own Windowing Function




          PGCon2009, 21-22 May 2009 Ottawa
Write your own Windowing Function
• NOT documented for now, but you can do it
• The normal argument API like PG_GETARG_XXX()
  does not work for window functions
• STRICT-ness is meaningless
• The function must use the V1 calling convention
• Get WindowObject using PG_WINDOW_OBJECT()
• Check window-ness by WindowObjectIsValid()
• Call APIs to fetch values
• More details in src/include/windowapi.h, see
  also:
  – src/executor/nodeWindowAgg.c
  – src/utils/adt/windowfuncs.c
• These may be changed in the future releases
                   PGCon2009, 21-22 May 2009 Ottawa
Windowing Function APIs
• PG_WINDOW_OBJECT(fcinfo)
  – Retrieve WindowObject, which is an interface of the
    window, for this call.
• WinGetPartitionLocalMemory(winobj, sz)
  – Store its own memory. It is used in rank() to save the
    current rank, for example.
• WinGetCurrentPosition(winobj)
  – Get current position in the partition (not in the frame).
    Same as row_number()
• WinGetPartitionRowCount(winobj)
  – Count up how many rows in the partition. Used in ntile() for
    example.


                     PGCon2009, 21-22 May 2009 Ottawa
Windowing Function APIs


• WinSetMarkPosition(winobj, markpos)
  – Give to winobj a hint that you don’t want rows preceding to
    markpos anymore
• WinRowsArePeers(winobj, pos1, pos2)
  – Rows at pos1 and at pos2 are peers? “Peers” means both
    rows have the same value in the meaning of ORDER BY
    columns




                    PGCon2009, 21-22 May 2009 Ottawa
Windowing Function APIs
• WinGetFuncArgInPartition(winobj, argno,
  relpos, seektype, set_mark, isnull, isout)
  – Fetch argument from another row in the partition. “isout”
    will be set to true if “relpos” is out of range of the partition.
• WinGetFuncArgInFrame(winobj, argno, relpos,
  seektype, set_mark, isnull, isout)
  – Fetch argument from another row in the frame. “isout” will
    be set to true if “relpos” is out of range of the frame (may
    be within the partition.)
• WinGetFuncArgCurrent(winobj, argno, isnull)
  – Same as PG_GETARG_XXX(argno). Fetch argument from the
    current row.


                      PGCon2009, 21-22 May 2009 Ottawa
Moving average example


• http://archives.postgresql.org/pgsql-
  hackers/2009-01/msg00562.php
• With standard SQL, you have to wait for
  extended frame clause support like ROWS n
  (PRECEDING|FOLLOWING) to calculate curve
  smoothing, but this sample does it now.




               PGCon2009, 21-22 May 2009 Ottawa
Future work




PGCon2009, 21-22 May 2009 Ottawa
Future work for later versions
• More support for frame clause
  – ROWS n (PRECEDING|FOLLOWING)
  – RANGE val (PRECEDING|FOLLOWING)
  – EXCLUDE (CURRENT ROW|GROUP|TIES|NO OTHERS)
  – Not so difficult for window functions but integration with
    classical aggregates is hard
• Performance improvements
  – Reduce tuplestore overhead
  – Relocate WindowAgg node in the plan
• Support PLs for writing Windowing Functions
  – Let PLs call Windowing Function APIs



                    PGCon2009, 21-22 May 2009 Ottawa
Finally…




PGCon2009, 21-22 May 2009 Ottawa
Thanks all the hackers!!
• Simon Riggs
   – discussion over my first ugly and poor design
• Heikki Linnakangas
   – improvement of window object and its buffering strategy
• Tom Lane
   – code rework overall and deep reading in the SQL spec
• David Fetter
   – help miscellaneous things like git repository and this session
• David Rowley
   – many tests to catch early bugs, as well as spec investigations
• Takahiro Itagaki, Pavel Stehule, and All the Hackers
• I might have missed your name here, but really
  appreciate your help.


                         PGCon2009, 21-22 May 2009 Ottawa

Contenu connexe

Similaire à Introducing Windowing Functions (pgCon 2009)

Difference between group by and order by in sql server
Difference between group by and  order by in sql serverDifference between group by and  order by in sql server
Difference between group by and order by in sql serverUmar Ali
 
Enabling Applications with Informix' new OLAP functionality
 Enabling Applications with Informix' new OLAP functionality Enabling Applications with Informix' new OLAP functionality
Enabling Applications with Informix' new OLAP functionalityAjay Gupte
 
Olap Functions Suport in Informix
Olap Functions Suport in InformixOlap Functions Suport in Informix
Olap Functions Suport in InformixBingjie Miao
 
Fortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASFortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASJongsu "Liam" Kim
 
Firebird 3 Windows Functions
Firebird 3 Windows  FunctionsFirebird 3 Windows  Functions
Firebird 3 Windows FunctionsMind The Firebird
 
R07_Senegacnik_CBO.pdf
R07_Senegacnik_CBO.pdfR07_Senegacnik_CBO.pdf
R07_Senegacnik_CBO.pdfcookie1969
 
Programmable logic devices
Programmable logic devicesProgrammable logic devices
Programmable logic devicesAmmara Javed
 
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012Eduserv
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 

Similaire à Introducing Windowing Functions (pgCon 2009) (13)

Difference between group by and order by in sql server
Difference between group by and  order by in sql serverDifference between group by and  order by in sql server
Difference between group by and order by in sql server
 
Enabling Applications with Informix' new OLAP functionality
 Enabling Applications with Informix' new OLAP functionality Enabling Applications with Informix' new OLAP functionality
Enabling Applications with Informix' new OLAP functionality
 
Olap Functions Suport in Informix
Olap Functions Suport in InformixOlap Functions Suport in Informix
Olap Functions Suport in Informix
 
SQL Windowing
SQL WindowingSQL Windowing
SQL Windowing
 
MySQL 5.1 and beyond
MySQL 5.1 and beyondMySQL 5.1 and beyond
MySQL 5.1 and beyond
 
Fortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASFortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLAS
 
Firebird 3 Windows Functions
Firebird 3 Windows  FunctionsFirebird 3 Windows  Functions
Firebird 3 Windows Functions
 
Mutant Tests Too: The SQL
Mutant Tests Too: The SQLMutant Tests Too: The SQL
Mutant Tests Too: The SQL
 
Table functions - Planboard Symposium 2013
Table functions - Planboard Symposium 2013Table functions - Planboard Symposium 2013
Table functions - Planboard Symposium 2013
 
R07_Senegacnik_CBO.pdf
R07_Senegacnik_CBO.pdfR07_Senegacnik_CBO.pdf
R07_Senegacnik_CBO.pdf
 
Programmable logic devices
Programmable logic devicesProgrammable logic devices
Programmable logic devices
 
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
Big Science, Big Data: Simon Metson at Eduserv Symposium 2012
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 

Plus de PostgreSQL Experts, Inc.

PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALEPostgreSQL Experts, Inc.
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsPostgreSQL Experts, Inc.
 

Plus de PostgreSQL Experts, Inc. (20)

Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
Give A Great Tech Talk 2013
Give A Great Tech Talk 2013Give A Great Tech Talk 2013
Give A Great Tech Talk 2013
 
Pg py-and-squid-pypgday
Pg py-and-squid-pypgdayPg py-and-squid-pypgday
Pg py-and-squid-pypgday
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
Five steps perform_2013
Five steps perform_2013Five steps perform_2013
Five steps perform_2013
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
 
PWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with PerlPWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with Perl
 
10 Ways to Destroy Your Community
10 Ways to Destroy Your Community10 Ways to Destroy Your Community
10 Ways to Destroy Your Community
 
Open Source Press Relations
Open Source Press RelationsOpen Source Press Relations
Open Source Press Relations
 
5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community
 
Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)
 
Development of 8.3 In India
Development of 8.3 In IndiaDevelopment of 8.3 In India
Development of 8.3 In India
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
50 Ways To Love Your Project
50 Ways To Love Your Project50 Ways To Love Your Project
50 Ways To Love Your Project
 
8.4 Upcoming Features
8.4 Upcoming Features 8.4 Upcoming Features
8.4 Upcoming Features
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and Variants
 

Dernier

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Dernier (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Introducing Windowing Functions (pgCon 2009)

  • 1. Windowing Functions Hitoshi Harada (umi.tanuki@gmail.com) Head of Engineering Dept. FORCIA, Inc. David Fetter (david.fetter@pgexperts.com) PGCon2009, 21-22 May 2009 Ottawa
  • 2. What are Windowing Functions? PGCon2009, 21-22 May 2009 Ottawa
  • 3. What are the Windowing Functions? • Similar to classical aggregates but does more! • Provides access to set of rows from the current row • Introduced SQL:2003 and more detail in SQL:2008 – SQL:2008 (http://wiscorp.com/sql200n.zip) 02: SQL/Foundation • 4.14.9 Window tables • 4.15.3 Window functions • 6.10 <window function> • 7.8 <window clause> • Supported by Oracle, SQL Server, Sybase and DB2 – No open source RDBMS so far except PostgreSQL (Firebird trying) • Used in OLAP mainly but also useful in OLTP – Analysis and reporting by rankings, cumulative aggregates PGCon2009, 21-22 May 2009 Ottawa
  • 4. How they work and what you get PGCon2009, 21-22 May 2009 Ottawa
  • 5. How they work and what do you get • Windowed table – Operates on a windowed table – Returns a value for each row – Returned value is calculated from the rows in the window PGCon2009, 21-22 May 2009 Ottawa
  • 6. How they work and what do you get • You can use… – New window functions – Existing aggregate functions – User-defined window functions – User-defined aggregate functions PGCon2009, 21-22 May 2009 Ottawa
  • 7. How they work and what do you get • Completely new concept! – With Windowing Functions, you can reach outside the current row PGCon2009, 21-22 May 2009 Ottawa
  • 8. How they work and what you get [Aggregates] SELECT key, SUM(val) FROM tbl GROUP BY key; PGCon2009, 21-22 May 2009 Ottawa
  • 9. How they work and what you get [Windowing Functions] SELECT key, SUM(val) OVER (PARTITION BY key) FROM tbl; PGCon2009, 21-22 May 2009 Ottawa
  • 10. How they work and what you get Who is the highest paid relatively compared with the department average? depname empno salary develop 10 5200 sales 1 5000 personnel 5 3500 sales 4 4800 sales 6 550 personnel 2 3900 develop 7 4200 develop 9 4500 sales 3 4800 develop 8 6000 develop 11 5200 PGCon2009, 21-22 May 2009 Ottawa
  • 11. How they work and what you get SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname)::int, salary - avg(salary) OVER (PARTITION BY depname)::int AS diff FROM empsalary ORDER BY diff DESC depname empno salary avg diff develop 8 6000 5020 980 sales 6 5500 5025 475 personnel 2 3900 3700 200 develop 11 5200 5020 180 develop 10 5200 5020 180 sales 1 5000 5025 -25 personnel 5 3500 3700 -200 sales 4 4800 5025 -225 sales 3 4800 5025 -225 develop 9 4500 5020 -520 develop 7 4200 5020 -820 PGCon2009, 21-22 May 2009 Ottawa
  • 12. Anatomy of a Window • Represents set of rows abstractly as: • A partition – Specified by PARTITION BY clause – Never moves – Can contain: • A frame – Specified by ORDER BY clause and frame clause – Defined in a partition – Moves within a partition – Never goes across two partitions • Some functions take values from a partition. Others take them from a frame. PGCon2009, 21-22 May 2009 Ottawa
  • 13. A partition • Specified by PARTITION BY clause in OVER() • Allows to subdivide the table, much like GROUP BY clause • Without a PARTITION BY clause, the whole table is in a single partition func (args) OVER ( ) partition-clause order- clause frame- clause PARTITION BY expr, … PGCon2009, 21-22 May 2009 Ottawa
  • 14. A frame • Specified by ORDER BY clause and frame clause in OVER() • Allows to tell how far the set is applied • Also defines ordering of the set • Without order and frame clauses, the whole of partition is a single frame func (args) OVER ( ) partition-clause order- clause frame- clause ORDER BY expr [ASC|DESC] (ROWS|RANGE) BETWEEN UNBOUNDED [NULLS FIRST|LAST], … (PRECEDING|FOLLOWING) AND CURRENT ROW PGCon2009, 21-22 May 2009 Ottawa
  • 15. The WINDOW clause • You can specify common window definitions in one place and name it, for convenience • You can use the window at the same query level SELECT target_list, … wfunc() OVER w FROM table_list, … WHERE qual_list, …. GROUP BY groupkey_list, …. HAVING groupqual_list, … WINDOW w AS ( ) partition-clause order- clause frame- clause PGCon2009, 21-22 May 2009 Ottawa
  • 16. Built-in Windowing Functions PGCon2009, 21-22 May 2009 Ottawa
  • 17. List of built-in Windowing Functions • row_number() • lag() • rank() • lead() • dense_rank() • first_value() • percent_rank() • last_value() • cume_dist() • nth_value() • ntile() PGCon2009, 21-22 May 2009 Ottawa
  • 18. row_number() • Returns number of the current row SELECT val, row_number() OVER (ORDER BY val DESC) FROM tbl; val row_number() 5 1 5 2 3 3 1 4 Note: row_number() always incremented values independent of frame PGCon2009, 21-22 May 2009 Ottawa
  • 19. rank() • Returns rank of the current row with gap SELECT val, rank() OVER (ORDER BY val DESC) FROM tbl; val rank() 5 1 gap 5 1 3 3 1 4 Note: rank() OVER(*empty*) returns 1 for all rows, since all rows are peers to each other PGCon2009, 21-22 May 2009 Ottawa
  • 20. dense_rank() • Returns rank of the current row without gap SELECT val, dense_rank() OVER (ORDER BY val DESC) FROM tbl; val dense_rank() 5 1 no gap 5 1 3 2 1 3 Note: dense_rank() OVER(*empty*) returns 1 for all rows, since all rows are peers to each other PGCon2009, 21-22 May 2009 Ottawa
  • 21. percent_rank() • Returns relative rank; (rank() – 1) / (total row – 1) SELECT val, percent_rank() OVER (ORDER BY val DESC) FROM tbl; val percent_rank() 5 0 5 0 3 0.666666666666667 1 1 values are always between 0 and 1 inclusive. Note: percent_rank() OVER(*empty*) returns 0 for all rows, since all rows are peers to each other PGCon2009, 21-22 May 2009 Ottawa
  • 22. cume_dist() • Returns relative rank; (# of preced. or peers) / (total row) SELECT val, cume_dist() OVER (ORDER BY val DESC) FROM tbl; val cume_dist() 5 0.5 =2/4 5 0.5 =2/4 3 0.75 =3/4 1 1 =4/4 The result can be emulated by “count(*) OVER (ORDER BY val DESC) / count(*) OVER ()” Note: cume_dist() OVER(*empty*) returns 1 for all rows, since all rows are peers to each other PGCon2009, 21-22 May 2009 Ottawa
  • 23. ntile() • Returns dividing bucket number SELECT val, ntile(3) OVER (ORDER BY val DESC) FROM tbl; val ntile(3) 5 1 4%3=1 5 1 3 2 1 3 The results are the divided positions, but if there’s remainder add row from the head Note: ntile() OVER (*empty*) returns same values as above, since ntile() doesn’t care the frame but works against the partition PGCon2009, 21-22 May 2009 Ottawa
  • 24. lag() • Returns value of row above SELECT val, lag(val) OVER (ORDER BY val DESC) FROM tbl; val lag(val) 5 NULL 5 5 3 5 1 3 Note: lag() only acts on a partition. PGCon2009, 21-22 May 2009 Ottawa
  • 25. lead() • Returns value of the row below SELECT val, lead(val) OVER (ORDER BY val DESC) FROM tbl; val lead(val) 5 5 5 3 3 1 1 NULL Note: lead() acts against a partition. PGCon2009, 21-22 May 2009 Ottawa
  • 26. first_value() • Returns the first value of the frame SELECT val, first_value(val) OVER (ORDER BY val DESC) FROM tbl; val first_value(val) 5 5 5 5 3 5 1 5 PGCon2009, 21-22 May 2009 Ottawa
  • 27. last_value() • Returns the last value of the frame SELECT val, last_value(val) OVER (ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING) FROM tbl; val last_value(val) 5 1 5 1 3 1 1 1 Note: frame clause is necessary since you have a frame between the first row and the current row by only the order clause PGCon2009, 21-22 May 2009 Ottawa
  • 28. nth_value() • Returns the n-th value of the frame SELECT val, nth_value(val, val) OVER (ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEEDING AND UNBOUNDED FOLLOWING) FROM tbl; val nth_value(val, val) 5 NULL 5 NULL 3 3 1 5 Note: frame clause is necessary since you have a frame between the first row and the current row by only the order clause PGCon2009, 21-22 May 2009 Ottawa
  • 29. aggregates(all peers) • Returns the same values along the frame SELECT val, sum(val) OVER () FROM tbl; val sum(val) 5 14 5 14 3 14 1 14 Note: all rows are the peers to each other PGCon2009, 21-22 May 2009 Ottawa
  • 30. cumulative aggregates • Returns different values along the frame SELECT val, sum(val) OVER (ORDER BY val DESC) FROM tbl; val sum(val) 5 10 5 10 3 13 1 14 Note: row#1 and row#2 return the same value since they are the peers. the result of row#3 is sum(val of row#1…#3) PGCon2009, 21-22 May 2009 Ottawa
  • 31. Inside the Implementation PGCon2009, 21-22 May 2009 Ottawa
  • 32. Where are they? • Windowing Functions are identified by a flag in pg_proc, which means they are very similar to plain functions – pg_proc.proiswindow : boolean • All the aggregates including user-defined ones in any language can be called in WindowAgg, too – Your old code gets new behaviour! • src/include/catalog/pg_proc.h – pg_proc catalog definitions • src/backend/utils/adt/windowfuncs.c – implementations of built-in window functions PGCon2009, 21-22 May 2009 Ottawa
  • 33. Added relevant nodes • WindowFunc – primitive node for function itself “wfunc(avg1, …)” • WindowDef – parse node for window definition “over (partition by … order by …)” • WindowClause – parse node for window clause “window (partition by … order by …) as w” • WindowAgg – plan node for window aggregate • WindowAggState – executor node for window aggregate PGCon2009, 21-22 May 2009 Ottawa
  • 34. Hacking the planner SELECT depname, empno, salary, avg(salary) over (partition by depname) AS a, rank() over (partition by depname order by salary desc) AS r FROM empsalary ORDER BY r Sort (cost=215.75..218.35 rows=1040 width=44) Output: depname, empno, salary, (avg(salary) OVER (?)), (rank() OVER (?)) Sort Key: (rank() OVER (?)) -> WindowAgg (cost=142.83..163.63 rows=1040 width=44) Output: depname, empno, salary, (avg(salary) OVER (?)), rank() OVER (?) -> Sort (cost=142.83..145.43 rows=1040 width=44) Output: depname, empno, salary, (avg(salary) OVER (?)) Sort Key: depname, salary -> WindowAgg (cost=72.52..90.72 rows=1040 width=44) Output: depname, empno, salary, avg(salary) OVER (?) -> Sort (cost=72.52..75.12 rows=1040 width=44) Output: depname, empno, salary Sort Key: depname -> Seq Scan on empsalary (cost=0.00..20.40 rows=1040 width=44) Output: depname, empno, salary PGCon2009, 21-22 May 2009 Ottawa
  • 35. Hacking the planner Final output TargetEntry1:depname TargetEntry2:empno TargetEntry3:salary TargetEntry4:avg TargetEntry5:rank PGCon2009, 21-22 May 2009 Ottawa
  • 36. Hacking the planner Final output TargetEntry1:depname TargetEntry2:empno TargetEntry3:salary TargetEntry4:avg TargetEntry5:rank SeqScan Var1:depname Var2:empno Var3:salary PGCon2009, 21-22 May 2009 Ottawa
  • 37. Hacking the planner Final output TargetEntry1:depname TargetEntry2:empno TargetEntry3:salary TargetEntry4:avg TargetEntry5:rank WindowAgg1 Var1:depname Var2:empno Var3:salary WindowFunc1:avg Sort1 SeqScan Var1:depname Var2:empno Var3:salary PGCon2009, 21-22 May 2009 Ottawa
  • 38. Hacking the planner Final output Sort3 WindowAgg2 TargetEntry1:depname Var1:depname TargetEntry2:empno Var2:empno TargetEntry3:salary Var3:salary TargetEntry4:avg Var4:avg TargetEntry5:rank WindowFunc:rank Sort2 WindowAgg1 Var1:depname Var2:empno Var3:salary WindowFunc1:avg Sort1 SeqScan Var1:depname Var2:empno Var3:salary PGCon2009, 21-22 May 2009 Ottawa
  • 39. Hacking the planner • We could optimize it by relocating WindowAgg SELECT depname, empno, salary, rank() over (partition by depname order by salary desc) AS r, avg(salary) over (partition by depname) AS a FROM empsalary ORDER BY r Sort (cost=161.03..163.63 rows=1040 width=44) Output: depname, empno, salary, (rank() OVER (?)), (avg(salary) OVER (?)) Sort Key: (rank() OVER (?)) -> WindowAgg (cost=72.52..108.92 rows=1040 width=44) Output: depname, empno, salary, (rank() OVER (?)), avg(salary) OVER (?) No Sort! -> WindowAgg (cost=72.52..93.32 rows=1040 width=44) Output: depname, empno, salary, rank() OVER (?) -> Sort (cost=72.52..75.12 rows=1040 width=44) Output: depname, empno, salary Sort Key: depname, salary -> Seq Scan on empsalary (cost=0.00..20.40 rows=1040 width=44) Output: depname, empno, salary PGCon2009, 21-22 May 2009 Ottawa
  • 40. How the executor creates a window Normal Scan row#1 row#2 Destination row#3 Table row#4 Values are never shared among returned rows PGCon2009, 21-22 May 2009 Ottawa
  • 41. How the executor creates a window WindowAgg row#1 row#1 row#2 Window row#2 Object row#3 powered row#3 by Destination row#4 Tuplestore row#4 Table Buffering rows in the Window Object, so the functions can fetch values from another row. This kills performance if the Window Object holds millions of rows. PGCon2009, 21-22 May 2009 Ottawa
  • 42. How the executor creates a window WindowAgg winobj->markpos row#1 row#1 row#2 row#2 row#3 row#3 Window row#4 Object row#4 powered Destination row#5 by row#5 Table Tuplestore row#6 row#6 If the rest of rows don’t need buffered rows any more, trim the Tuplestore from its head to “mark pos” PGCon2009, 21-22 May 2009 Ottawa
  • 43. How the Windowing Functions work • Functions workflow is very similar to plain functions • Each function has its own seek position WindowAgg calls the wfunc1() wfunc1(fcinfo){ mark pos WindowObject winobj = PG_WINDOW_OBJECT(); Window /* do something using winobj */ fetch value Object } fetch value Return a value PGCon2009, 21-22 May 2009 Ottawa
  • 44. How the Windowing Functions work • What about aggregates? aggregate Initialize aggregate trans value Final aggregate Trans result1 func func trans value Trans func Window Object Final aggregate Trans result2 func trans value func Trans value is cached and reused after final function call. -> BE CAREFUL of anything that will break if the final function is called more than once. http://archives.postgresql.org//pgsql-hackers/2008-12/msg01820.php PGCon2009, 21-22 May 2009 Ottawa
  • 45. Write Your Own Windowing Function PGCon2009, 21-22 May 2009 Ottawa
  • 46. Write your own Windowing Function • NOT documented for now, but you can do it • The normal argument API like PG_GETARG_XXX() does not work for window functions • STRICT-ness is meaningless • The function must use the V1 calling convention • Get WindowObject using PG_WINDOW_OBJECT() • Check window-ness by WindowObjectIsValid() • Call APIs to fetch values • More details in src/include/windowapi.h, see also: – src/executor/nodeWindowAgg.c – src/utils/adt/windowfuncs.c • These may be changed in the future releases PGCon2009, 21-22 May 2009 Ottawa
  • 47. Windowing Function APIs • PG_WINDOW_OBJECT(fcinfo) – Retrieve WindowObject, which is an interface of the window, for this call. • WinGetPartitionLocalMemory(winobj, sz) – Store its own memory. It is used in rank() to save the current rank, for example. • WinGetCurrentPosition(winobj) – Get current position in the partition (not in the frame). Same as row_number() • WinGetPartitionRowCount(winobj) – Count up how many rows in the partition. Used in ntile() for example. PGCon2009, 21-22 May 2009 Ottawa
  • 48. Windowing Function APIs • WinSetMarkPosition(winobj, markpos) – Give to winobj a hint that you don’t want rows preceding to markpos anymore • WinRowsArePeers(winobj, pos1, pos2) – Rows at pos1 and at pos2 are peers? “Peers” means both rows have the same value in the meaning of ORDER BY columns PGCon2009, 21-22 May 2009 Ottawa
  • 49. Windowing Function APIs • WinGetFuncArgInPartition(winobj, argno, relpos, seektype, set_mark, isnull, isout) – Fetch argument from another row in the partition. “isout” will be set to true if “relpos” is out of range of the partition. • WinGetFuncArgInFrame(winobj, argno, relpos, seektype, set_mark, isnull, isout) – Fetch argument from another row in the frame. “isout” will be set to true if “relpos” is out of range of the frame (may be within the partition.) • WinGetFuncArgCurrent(winobj, argno, isnull) – Same as PG_GETARG_XXX(argno). Fetch argument from the current row. PGCon2009, 21-22 May 2009 Ottawa
  • 50. Moving average example • http://archives.postgresql.org/pgsql- hackers/2009-01/msg00562.php • With standard SQL, you have to wait for extended frame clause support like ROWS n (PRECEDING|FOLLOWING) to calculate curve smoothing, but this sample does it now. PGCon2009, 21-22 May 2009 Ottawa
  • 51. Future work PGCon2009, 21-22 May 2009 Ottawa
  • 52. Future work for later versions • More support for frame clause – ROWS n (PRECEDING|FOLLOWING) – RANGE val (PRECEDING|FOLLOWING) – EXCLUDE (CURRENT ROW|GROUP|TIES|NO OTHERS) – Not so difficult for window functions but integration with classical aggregates is hard • Performance improvements – Reduce tuplestore overhead – Relocate WindowAgg node in the plan • Support PLs for writing Windowing Functions – Let PLs call Windowing Function APIs PGCon2009, 21-22 May 2009 Ottawa
  • 54. Thanks all the hackers!! • Simon Riggs – discussion over my first ugly and poor design • Heikki Linnakangas – improvement of window object and its buffering strategy • Tom Lane – code rework overall and deep reading in the SQL spec • David Fetter – help miscellaneous things like git repository and this session • David Rowley – many tests to catch early bugs, as well as spec investigations • Takahiro Itagaki, Pavel Stehule, and All the Hackers • I might have missed your name here, but really appreciate your help. PGCon2009, 21-22 May 2009 Ottawa