2. Agenda
•What is OLAP
•OLAP functions in Informix
– the OVER clause
– supported OLAP functions
•Questions?
3. What is OLAP?
• On-Line Analytical Processing
• Commonly used in Business
Intelligence (BI) tools
– ranking products, salesmen, items, etc
– exposing trends in sales from historic data
– testing business scenarios (forecast)
– sales breakdown or aggregates on multiple
dimensions (Time, Region, Demographics, etc)
4. OLAP Functions in Informix
• Supports subset of commonly used
OLAP functions
• Enables more efficient query
processing from BI tools such as
Cognos
5. Example query with group by
select customer_num, count(*)
from orders
where customer_num <= 110
group by customer_num;
customer_num (count(*))
101 1
104 4
106 2
110 2
4 row(s) retrieved.
6. Example query with OLAP function
select customer_num, ship_date, ship_charge,
count(*) over (partition by customer_num)
from orders
where customer_num <= 110;
customer_num ship_date ship_charge (count(*))
101 05/26/2008 $15.30 1
104 05/23/2008 $10.80 4
104 07/03/2008 $5.00 4
104 06/01/2008 $10.00 4
104 07/10/2008 $12.20 4
106 05/30/2008 $19.20 2
106 07/03/2008 $12.30 2
110 07/06/2008 $13.80 2
110 07/16/2008 $6.30 2
9 row(s) retrieved.
7. Where does OLAP function fit?
Joins, group by,
having, aggregation
OLAP functions
Final order by
8. OLAP function as predicates
• Use derived table query block to compute
OLAP function first
select * from
(select customer_num, ship_date,
ship_charge,
count(*) over (partition by
customer_num) as cnt
from orders
where customer_num <= 110)
where cnt >= 3;
9. OLAP function example
• Running 3-month average sales for a
particular product during a particular period
select product_name,
avg(sales) over (
partition by region
order by year, month
rows between 1 preceding and 1 following
)
from total_sales
where product_id = 105
and year between 2001 and 2010;
10. The over() Clause
olap_func(arg) over(partition by clause
order by clause window frame clause)
• Defines the “domain” of OLAP function
calculation
– partition by: divide into partitions
– order by: ordering within each partition
– window frame: sliding window within each
partition
– all clauses optional
11. Partition By
sum(x) over (
partition by a, b
order by c, d
rows between 2 preceding and 2 following)
a=1, b=1
a=2, b=2
a=1, b=2
a=2, b=1
12. Order By
sum(x) over (
partition by a, b
order by c, d
rows between 2 preceding and 2 following)
partition a=1, b=2
c=1,d=1
c=1,d=2
c=1,d=3
c=2,d=2
c=2,d=4
c=3,d=1
c=4,d=1
c=4,d=2
14. Partition By
• Divide result set of query into partitions for
computing of an OLAP function
• If partition by clause is not specified, then
entire result set is a single partition
max(salary) over (partition by dept_id)
sum(sales) over (partition by region)
avg(price) over ()
15. Order By
• Ordering within each partition
• Required for some OLAP functions
–ranking, window frame clause
• Support ASC/DESC, NULLS FIRST/NULLS LAST
rank() over (partition by dept
order by salary desc)
dense_rank() over(order by total_sales
nulls last)
16. Window Frame
• Defines a sliding window within a partition
• OLAP function value computed from rows in the
sliding window
• Order by clause is required
17. Physical vs. Logical Window Frame
• Physical window frame
– ROWS keyword
– count offset by position
– fixed window size
– order by one or more column expressions
• Logical window frame
– RANGE keyword
– count offset by value
– window size may vary
– order by single column (numeric, date or datetime type)
18. Window Frame Examples
avg(price) over (order by year, day
rows between 6 preceding and current row)
count(*) over (order by ship_date
range between 2 preceding and 2 following)
• Current row can be physically outside the window
avg(sales) over (order by month
range between 3 preceding and 1 preceding)
sum(sales) over (order by month
rows between 2 following and 5 following)
19. Order By – Special Semantics
• “cumulative” semantics in absence of window
frame clause
– for OLAP function that allows window frame clause
– equivalent to “ROWS between unbounded preceding
and current row”
select sales, sum(sales) over (order by quarter)
from sales where year = 2012
sales (sum)
120 120
135 255
127 382
153 535
21. Ranking Functions
• Partition by clause is optional
• Order by clause is required
• Window frame clause is NOT allowed
• Duplicate value handling is different between
rank() and dense_rank()
– same rank given to all duplicates
– next rank used “skips” ranks already covered by
duplicates in rank(), but uses next rank for
dense_rank()
22. RANK vs DENSE_RANK
select emp_num, sales,
rank() over (order by sales) as rank,
dense_rank() over (order by sales) as dense_rank
from sales;
emp_num sales rank dense_rank
101 2,000 1 1
102 2,400 2 2
103 2,400 2 2
104 2,500 4 3
105 2,500 4 3
106 2,650 6 4
23. PERCENT_RANK and CUME_DIST
• Calculates ranking information as a percentile
• Returns value between 0 and 1
select emp_num, sales,
percent_rank() over (order by sales) as per_rank,
cume_dist() over (order by sales) as cume_dist
from sales;
emp_num sales per_rank cume_dist
101 2,000 0 0.166666667
102 2,400 0.2 0.500000000
103 2,400 0.2 0.500000000
104 2,500 0.6 0.833333333
105 2,500 0.6 0.833333333
106 2,650 1.0 1.000000000
24. NTILE
• Divides the ordered data set into N
number of tiles indicated by the
expression.
• Number of tiles needs to be exact
numeric with scale zero
25. NTILE Example
select name, salary,
ntile(5) over (partition by dept order by salary)
from employee;
name salary (ntile)
John 35,000 1
Jack 38,400 1
Julie 41,200 2
Manny 45,600 2
Nancy 47,300 3
Pat 49,500 4
Ray 51,300 5
26. LEAD and LAG
LEAD(expr, offset, default)
LAG(expr, offset, default)
• Gives LEAD/LAG value of the expression at the
specified offset
• offset is optional, default to 1 if not specified
• default is optional, NULL if not specified
– default used when offset goes beyond current partition
boundary
• NULL handling
– RESPECT NULLS (default)
– IGNORE NULLS
27. LEAD/LAG Example
select name, salary, lag(salary)
over (partition by dept order by salary),
lead(salary, 1, 0)
over (partition by dept order by salary)
from employee;
name salary (lag) (lead)
John 35,000 38,400
Jack 38,400 35,000 41,200
Julie 41,200 38,400 45,600
Manny 45,600 41,200 47,300
Nancy 47,300 45,600 49,500
Pat 49,500 47,300 51,300
Ray 51,300 49,500 0
28. LEAD/LAG NULL handling
select price,
lag(price ignore nulls, 1) over (order by day),
lead(salary, 1) ignore nulls over (order by day)
from stock_price;
price (lag) (lead)
18.25 18.37
18.37 18.25 19.03
18.37 19.03
18.37 19.03
19.03 18.37 18.59
18.59 19.03 18.21
18.21 18.59
29. Numbering Functions
• Partition by clause and order by clause are
optional
• Window frame clause is NOT allowed
• Provides sequential row number to result set
– regardless of duplicates when order by is specified
30. ROW_NUMBER Example
select row_number() over (order by sales),
emp_num, sales
from sales;
(row_number) emp_num sales
1 101 2,000
2 102 2,400
3 103 2,400
4 104 2,500
5 105 2,500
6 106 2,650
31. Aggregate Functions
• Partition by, order by and window frame
clauses are all optional
– window frame clause requires order by clause
• All currently supported aggregate functions
– SUM, COUNT, MIN, MAX, AVG, STDEV, RANGE,
VARIANCE
• New aggregate functions
– FIRST_VALUE/LAST_VALUE
– RATIO_TO_REPORT
32. Aggregate Function Example
select price,
avg(price) over (order by day
rows between 1 preceding and 1 following)
from stock_price;
price (avg)
18.25 18.31
18.37 18.31
18.37
19.03
19.03 18.81
18.59 18.61
18.21 18.40
33. DISTINCT handling
• DISTINCT is supported, however DISTINCT is mutually
exclusive with order by clause or window frame
clause
select emp_id, manager_id,
count(distinct manager_id)
over (partition by department)
from employee;
emp_id manager_id (count)
101 103 3
102 103 3
103 100 3
104 110 3
105 110 3
34. FIRST_VALUE and LAST_VALUE
• Gives FIRST/LAST value of current
partition
• NULL handling
–RESPECT NULLS (default)
–IGNORE NULLS
35. FIRST_VALUE/LAST_VALUE Example
select price, price – first_value(price)
over (partition by year order by day)
as diff_price
from stock_price;
price diff_price
18.25 0
18.37 0.12
19.03 0.78
18.59 0.34
18.21 -0.04
36. RATIO_TO_REPORT
• Computes the ratio of current value to
sum of all values in current partition or
window frame.
select emp_num, sales,
ratio_to_report(sales) over (partition by
year order by sales)
from sales;
37. RATIO_TO_REPORT Example
select year, sales, ratio_to_report(sales)
over (partition by year)
from sales;
year sales (ratio_to_report)
1998 2400 0.2308
1998 2550 0.2452
1998 2650 0.2548
1998 2800 0.2692
1999 2450 0.2311
1999 2575 0.2429
1999 2725 0.2571
1999 2850 0.2689
38. Nested OLAP Functions
• OLAP function can be nested inside another
OLAP function
select emp_id, salary, salary – first_value(salary)
over (order by rank() over (order by salary))
as diff_salary
from employee;
select sum(ntile(10) over (order by salary))
over (partition by department)
from employee;
39. OLAP functions and IWA
• Queries containing OLAP functions can be
accelerated by Informix Warehouse
Accelerator (IWA)
• IWA processes majority of the query block
– scan, join, group by, having, aggregation
• Informix server processes OLAP functions
based on query result from IWA
40. For more information
• Links to OLAP function in Informix 12.1
documentation
http://pic.dhe.ibm.com/infocenter/informix/v121/inde
x.jsp?topic=%2Fcom.ibm.sqls.doc
%2Fids_sqs_2583.htm
http://pic.dhe.ibm.com/infocenter/informix/v121/inde
x.jsp?topic=%2Fcom.ibm.acc.doc
%2Fids_acc_queries1.htm