Tata AIG General Insurance Company - Insurer Innovation Award 2024
Presentation db2 best practices for optimal performance
1. October 25–29, 2009 • Mandalay Bay • Las Vegas, Nevada
0
DB2 Best Practices for Optimal Performance
Sunil Kamath
Senior Technical Staff Member
IBM Toronto Labs
sunil.kamath@ca.ibm.com
3. Agenda
Basics
– Sizing workloads
– Best Practices for Physical Design
Benchmarks
DB2 9.7 Performance Improvements
Summary
1
– Scan Sharing
– XML in DPF
– Statement Concentrator
– Currently Committed
– LOB Inlining
– Compression
– Index Compression
– Temp Table
Compression
– XML Compression
– Range Partitioning with
local indexes
4. Performance “Truisms”
There is always a bottleneck!
Remember the 5 fundamental bottleneck areas:
1. Application
2. CPU
3. Memory
4. Disk
5. Network
Balance is key!
2
5. Ideally one should understand:
– The application
– Load process requirements
– Number of concurrent users/jobs
– Largest tables' sizes
– Typical query scenarios
– Size of answer sets being generated
– Response time objectives for loads and queries
– Availability requirements
– …
Sizing a Configuration
3
6. Sizing “Rules of Thumb”
Platform choice
CPU
Memory
Disk
– Space
– Spindles
4
7. Platform Selection
DB2 is highly optimized for all major platforms
– AIX, Linux, Windows, Solaris, HP-UX
– 64 bit is strongly recommended
Much more than a performance question
– Integration with other systems
– Skills / Ease of Use
– $$$
Often more than 1 “good” choice
5
8. Selecting DB2 with and without Data Partitioning (InfoSphere
Warehouse)
Differences becoming smaller
– Function and manageability gaps
Data Partitioning is less common for
– OLTP,ERP,CRM
Data Partitioning is most common for
– Data Warehousing
6
9. Memory! How Much Do I Need?
Highly dependent on many factors
– Depends on number of users (connections)
– Depends on the query workload
– Depends on whether or not other software is sharing the machines
being measured
Advisable to allocate 5% of active data for bufferpool sizing
New systems use 64-bit processors
– If using 32-bit Windows/Linux/DB2
just use 4GB.
7
10. Disk! How Many GB Do I Need?
More than you think!
Don’t forget about
– Working storage
– Tempspace
– Indexes, MQT’s etc.
But big drives tend to give lots of space
– 146/300GB drives now standard
Raw data x 4 (unmirrored)*
Raw data x 5 (RAID5)*
Raw data x 8 (RAID10)*
* Assumes no compression8
11. Disk! How Many Spindles Do I Need?
Need to define a balanced system
– Don't want too few large disks
• Causes I/O bottleneck
Different kinds of requirements
– IOPS
• Latency
– MB/sec
• Throughput
Don’t share disks for table/indexes with logs
Don’t know how many disks in the SAN?
– Make friends with storage Admin!
9
12. Basic Rules of Thumb (RoT)
Meant to be approximate guidelines:
– 150-200 GB active data per core
– 50 concurrent connections per core
– 8 GB RAM per core
– 1500-2000 IOPS per core
The above guidelines works for most virtualization
environments as well
These RoT are NOT meant to be a replacement or alternative
to real workload sizing
10
13. Additional Considerations for Virtualized environments
Performance overhead with Hypervisor
– Varies with type of hypervisor and environment
Effect of over committing CPU at “system” level
Effect of over committing memory at “system” level
Effects of sharing same disks for multiple workloads
11
15. Physical Database Design
Create 1 database for each DB2 instance
Issue “create database” with
– Unicode codeset
• Default starting with DB2 9.5
– Automatic Storage
• Storage paths for tables/indexes etc
• DBPATH for log etc.
– Suitable pagesize
Example
– CREATE DB <DBNAME> AUTOMATIC STORAGE YES
ON /fs1/mdmdb, /fs2/mdmdb, /fs3/mdmdb, /fs4/mdmdb
DBPATH on /fs0/mdmdb
USING CODESET UTF-8 TERRITORY <TERRITORY>
COLLATE USING UCA400_NO PAGESIZE 8K;
Suggestion: Make everything explicit to facilitate understanding
13
16. Selecting a Page Size
Use a single page size if possible
– For example, 8K or 16K
With LARGE tablespaces there is ample capacity for growth
OLTP
– Smaller page sizes may be better (e.g. 8K)
Warehouse
– Larger pages sizes often beneficial (e.g. 16K)
XML
– Use 32K page size
Choosing an appropriate pagesize should depend on access pattern of rows
(sequential Vs random)
With DB2 9.7, the tablespace limits have increased by 4x; For example, with 4K
page size, the max tablespace size is now 8 TB
14
17. Tablespace Design
Use automatic storage
– Significant enhancements in DB2 9.7
Use Large tablespaces
– Default since DB2 9.5
Disable file system caching via DDL as appropriate
Ensure temp tablespaces exist
– 1 for each page size, ideally just 1
Keep number of tablespaces reasonably small
– 1 for look up tables in single node nodegroup
– 1 for each fact table (largest tables)
– 1 for all others
Create separate tablespaces for indexes, LOBs
Large tablespaces further help exploit
table/index/temp compression
15
18. Choosing DMS vs. SMS
Goal:
– Performance of RAW
– Simplicity/usability of SMS
DMS FILE is the preferred choice
– Performance is near DMS RAW
• Especially when bypassing filesystem caching
– Ease of use/management is similar to SMS
• Can gradually extend the size
– Flexible
• Can add/drop containers
• Can separate data/index/long objects into their own table space
– Potential to transition to Automatic Storage
Automatic storage is built on top of DMS FILE
– But it automates container specification / management
16
19. Choosing DMS FILE vs. Automatic Storage
Goal:
– To maximize simplicity/usability
Automatic Storage is the preferred choice with DB2 9.5
– Strategic direction
• Receives bulk of development investment
– Key enabler/prerequisite for future availability/scalability
enhancements
– Performance is equivalent to DMS FILE
– Ease of use/management is superior
• No need to specify any containers
• Makes it easy to have many table spaces
– Flexible
• Can add/drop storage paths
17
20. Consider Schema optimizations
Decide on how to structure your data
– Consider distributing your data across nodes
• Using DPF hash-partitioning
– Consider partitioning your data by ranges
• Using table range partitioning
– Consider organizing your data
• Using MDC (multi dimensional clustering)
Auxiliary data structures
– Do the right indexes exist ?
• Clustered, clustering, include columns for unique index
– Would Materialized query tables (MQT) help?
You can feed dynamic snapshot into design advisor
18
21. Table Design
OK to have multiple tables in a tablespace
Once defined, use ALTER table to select options
– APPEND MODE - use for tables where inserts are at end of table (ALTER
TABLE ... APPEND ON)
• This also enables concurrent append points for high concurrent INSERT activity
– LOCKSIZE - use to select table level locking (ALTER TABLE ... LOCKSIZE
TABLE)
– PCTFREE - use to reserve space during load/reorg (ALTER TABLE
...PCTFREE 10)
Add pk/fk constraints after index creation
19
22. Table Design - Compression
Compress base table data at row level
– Build a static dictionary, one per table
On-disk and in-memory image is smaller
Need to uncompress data before processing
Classic tradeoff: more CPU for less disk I/O
– Great for IO-bound systems that have spare CPU cycles
Large, rarely referenced tables are ideal
20
23. Index Design
In general, every table should have at least 1 index
– Ideally a unique index / primary key index
Choose appropriate options
– PCTFREE - should be 0 for read-only table
– PAGE SPLIT HIGH/LOW – for ascending inserts especially
– CLUSTER - define a clustering index
– INCLUDE columns - extra cols in unique index for index-only access
– COLLECT STATISTICS while creating an index
With DB2 9.7 indexes can be compressed too!
21
25. World Record Performance With TPC-C
4,033,378
3,210,540
6,085,166
200,000
1,200,000
2,200,000
3,200,000
4,200,000
5,200,000
6,200,000
7,200,000
tpmC
DB2 8.2 on 64-way POWER5
DB2 9.1 on 64-way POWER5+
DB2 9.5 on 64-way POWER6
64x 1.9GHz
POWER5
2 TB RAM
6400 disks
64x 2.3GHz
POWER5+
2 TB RAM
6400 disks
TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council.
• DB2 8.2 on IBM System p5 595 (64 core POWER5 1.9GHz): 3,210,540 tpmC @ $5.07/tpmC available: May 14, 2005
• DB2 9.1 on IBM System p5 595 (64 core POWER5+ 2.3GHz): 4,033,378 tpmC @ 2.97/tpmC available: January 22, 2007
• DB2 9.5 on IBM POWER 595 (64 core POWER6 5.0GHz): 6,085,166 tpmC @ 2.81/tpmC available: December 10, 2008
Results current as of June 24, 2009 Check
http://www.tpc.org for latest results
64x 5GHz
POWER6
4 TB RAM
10,900 disks
• Higher is
better
23
26. World Record TPC-C Performance on x64 with
RedHat Linux
1,200,632
1,020,000
841,809
220,000
420,000
620,000
820,000
1,420,000
1,220,000
DB2 9.5 SQL Server 2005
tpmC
IBM x3950 M2
Intel Xeon7460
RHEL 5.2
IBM x3950 M2
Intel Xeon7350
Win2003
TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council.
•DB2 9.5 on IBM System x3950 M2 (8 Processor 48 core Intel Xeon 7460 2.66GHz): 1,200,632 tpmC @ $1.99/tpmC
available: December 10, 2008
• SQL Server 2005 on HP DL580G5G4 (8 Processor 32 core Intel Xeon 7350 2.93GHz): 841,809 tpmC @$3.46/tpmC
available: April 1, 2008
• Higher
is better
Results current as of June 24, 2009.
Check http://www.tpc.org for latest results
24
27. World record 10 TB TPC-H result on IBM Balanced
Warehouse E7100
IBM System p6 570 & DB2 9.5 create top 10TB TPC-H performance
208457
108099
343551
60,000
0
180,00
0
120,00
0
300,00
0
240,00
0
360,00
0
QphH
IBM p6 570/DB2 9.5
HP Integrity Superdome-DC Itanium/Oracle 11g
Sun Fire 25K/Oracle 10g
•Significant proof-point for the IBM
Balanced Warehouse E7100
•DB2 Warehouse 9.5 takes DB2
performance on AIX to new levels
•65% faster than Oracle 11g best
result
•Loaded 10TB data @ 6 TB / hour
(incl. data load, index creation,
runstats)
• Higher
is better
TPC Benchmark, TPC-H, QphH, are trademarks of the Transaction Processing Performance Council.
•DB2 Warehouse 9.5 on IBM System p6 570 (128 core p6 4.7GHz), 343551 QphH@10000GB,
32.89 USD per QphH@10000GB available: April 15, 2008
•Oracle 10g Enterprise Ed R2 w/ Partitioning on HP Integrity Superdome-DC Itanium 2 (128 core Intel Dual Core Itanium
2 9140 1.6 GHz), 208457 QphH@10000GB, 27.97 USD per QphH@10000GB, available: September 10, 2008
•Oracle 10g Enterprise Ed R2 w/ Partitioning on Sun Fire E25K (144 core Sun UltraSparc IV+ - 1500 MHz): 108099
QphH @53.80 USD per QphH@10000GB available: January 23, 2006
Results current as of June 24, 2009
Check http://www.tpc.org for latest results
25
28. World record SAP 3-tier SD Benchmark
This benchmark represents a 3
tier SAP R/3 environment in
which the database resides on
its own server where database
performance is the critical factor
DB2 outperforms Oracle by 68%
and SQL Server by 80%
– DB2 running on 32-way p5 595
– Oracle and SQL Server 2000
running on 64-way HP
Top SAP SD 3-tier Results byDBMS Vendor
168300
100000
93000
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
SDUsers
DB2 8.2 on 32way p5 595
SQL Server on 64-way HPIntegrity
Oracle 10g on 64way HP Integrity
Results current as of June 24, 2009
Check http://www.sap.com/benchmark for latest results
26
• Higher
is better
29. More SAP performance than any 8-socket server
Result comparable to a 32-socket 128-core Sun M9000
32-core
Sun T5440
4-sockets 8-sockets 32-sockets
24-core
Opteron
32-core
Power 750
48-core
Opteron
48-core
Opteron
128-core
Sun M9000
Power 750
Express
15,600
SAP SD 2-Tier Users on
The IBM Power 750 Express
With DB2 9.7 on AIX 6.1
27 http://www.sap.com/benchmark for latest results
Results current as of March 03, 2010 Check
30. Best SAP SD 2-Tier performance with SAP 6 ERP 4
20% more performance, 1/4 the number of cores vs. Sun M9000
4p/32c/128t 8p/64c/256t
Sun M9000
SPARC
32p/128c/256-t
32 sockets
Sun M9000
SPARC
64p/256c/512t
64 sockets
IBM Power System 780, 8p / 64c / 256t, POWER7, 3.8 GHz, 1024 GB memory, 37,000 SD users, dialog resp.: 0.98s, line items/hour: 4,043,670, Dialog steps/hour: 12,131,000, SAPS: 202,180,
DB time (dialog/ update):0.013s / 0.031s, CPU utilization: 99%, OS: AIX 6.1, DB2 9.7, cert# 2010013. SUN M9000, 64p / 256c / 512t, 1156 GB memory, 32,000 SD users, SPARC64 VII, 2.88
SAP SD
Users
All results are with SAP ERP 6 EHP4
Sun T5440
SPARC
4p/32c/256t
IBM X3850
Nehalem-EX
4p/32c/64t
4 sockets
Power 750 Sun X4640
Opteron
8p/48c/48t
Fujitsu 1800E
Nehalem-EX
8p/64c/128t
8 sockets
Power 780
37,000SAP users on SAP SD 2 Tier
Power 780
with DB2
#1
4-so ket
Windows
#1
#1Overall
4-socket
Power 750
with DB2
System
x3850 X5
with DB2
GHz, Solaris 10, Oracle 10g , cert# 2009046.
28 Results current as of April 07, 2010. Check
31. Benchmark
Multi-tier end-to-end performance
benchmark for Java EE 5
Single node result: 1014.40 EjOPS
8 nodes cluster result: 7903.16
EjOPS
– Approx. 38,500 tx/sec,
135,000 SQL/sec
– WAS 7 on 8x HS22 Blades
(Intel Xeon X5570 2-socket/8-
core)
– DB2 9.7 FP1 on x3850 M2
(Intel Xeon X7460 4-socket/24-
core),
SLES 10 SP2
Result published on January 7, 2010
First to Publish SPECjEnterprise2010
29 Results as of January 7, 2010
32. More Efficient performance than Ever
30
3,000
Infor Baan ERP 2-Tier Users on
The IBM Power 750 Express
using DB2 9.7.
More performance, with less space and far less energy
consumption than ever
Infor ERP LN Benchmark results on P6 / P7
P6 P7
System p 570 p 750
Processor Speed 5 GHz 3.55 GHz
No. of chips or sockets 8 2
cores / chip 2 8
Total number of cores 16 16
Total Memory 256 GB 256 GB
AIXversion 6.1 6.1
DB2 Version 9.7 GA 9.7 GA
# Infor Baan Users 2800 3000
# users / core 175 187.5
# users / chip 350 1500
33. Performance Improvements
DB2 9.7 has tremendous new capabilities that can
substantially improve performance
When you think about the new features …
– “It depends”
– We don’t know everything (yet)
– Your mileage will vary
– Please provide feedback!
31
35. Performance Advantages of the Threaded Architecture
Context switching between threads is generally faster than between
processes
– No need to switch address space
– Less cache “pollution”
Operating system threads require less context than processes
– Share address space, context information (such as uid, file handle table,
etc)
– Memory savings
Significantly fewer system file descriptors used
– All threads in a process can share the same file descriptors
– No need to have each agent maintain its own file descriptor table
33
36. From the existing DB2 9 Deep Compression …
Reduce storage costs
Improve performance
Easy to implement
1.5 Times
Better
3.3 Times
Better
2.0 Times
Better
8.7 Times
Better
DB2 9 Other
“With DB2 9, we’re seeing compression rates up to 83% on the Data
Warehouse. The projected cost savings are more than $2 million initially
with ongoing savings of $500,000 a year.” - Michael Henson
“We achieved a 43 per cent saving in total storage requirements when using DB2 with
Deep Compression for its SAP NetWeaver BI application, when compared with the former
Oracle database, The total size of the database shrank from 8TB to 4.5TB, and
response times were improved by 15 per cent. Some batch applications and change
runs were reduced by a factor of ten when using IBM DB2.” - Markus Dell ermann
34
37. Index Compression
What is Index Compression?
The ability to decrease the storage
requirements from indexes through
compression.
By default, if the table is
compressed the indexes created
for the table will also be
compressed.
– including the XML indexes
Index compression can be
explicitly enabled/disabled when
creating or altering an index.
Why do we need Index Compression?
Index compression reduces disk cost
and TCO (total cost of ownership)
Index compression can improve
runtime performance of queries that
are I/O bound.
When does Index Compression work
best?
– Indexes for tables declared in a
large RID DMS tablespaces (default
since DB2 9).
– Indexes that have low key
cardinality & high cluster ratio.
35
38. Index Compression
Page Header
Index Page (pre DB2 9.7)
Fixed Slot Directory (maximum size reserved)
AAAB, 1, CCC
AAAB, 1, CCD
BBBZ, 1, ZZZ
1055, 1056
3011, 3025, 3026, 3027, 3029, 3033, 3035, 3036, 3037
3009, 3012, 3013, 3015, 3016, 3017, 3109
BBBZ, 1, ZZCCAAAE 6008, 6009, 6010, 6011
Index Key RID List
How does Index
Compression Work?
• DB2 will consider multiple
compression algorithms to
attain maximum index
space savings through
index compression.
36
39. Index Compression
Page Header
Index Page (DB2 9.7)
Saved Space from
Variable Slot Directory
AAAB, 1, CCC
AAAB, 1, CCD
BBBZ, 1, ZZZ
1055, 1056
3011, 3025, 3026, 3027, 3029, 3033, 3035, 3036, 3037
3009, 3012, 3013, 3015, 3016, 3017, 3109
BBBZ, 1, ZZCCAAAE 6008, 6009, 6010, 6011
Variable Slot Directory
• In 9.7, a slot directory is
dynamically adjusted in order
to fit as many keys into an
index page as possible.
Variable Slot
Directory
Index Key RID List
37
40. 1055, 1 Saved
3011, 14, 1, 1, 2, 4, 2, 1, 1
3009, 3, 1, 2, 1, 1, 92
Saved from RID List
Saved
Saved
Index Compression
Page Header
Index Page (DB2 9.7)
Saved Space from
Variable Slot Directory
RID Deltas
AAAB, 1, CCC
AAAB, 1, CCD
BBBZ, 1, ZZZ
BBBZ, 1, ZZCCAAAE 6008, 1, 1, 1
Variable Slot
Directory
First RID
Index Key
Compressed
RID
RID List Compression
38
• Instead of saving the full version of a
RID, we can save some space by
storing the delta between two RIDs.
• RID List compression is enabled when
there are 3 or more RIDs in an index
page.
41. Saved
Saved from RID List and Prefix Compression
Saved
Saved
Index Compression
C 1055, 1
D 3011, 14, 1, 1, 2, 4, 2, 1, 1
Z 3009, 3, 1, 2, 1, 1, 92
CCAAAE 6008, 1, 1, 1
COMMON
PREFIX
Index Page (DB2 9.7)
Page Header
Saved Space from
Variable Slot Directory
Prefix Compression
Compressed
• Instead of saving all key values, we can save some
space by storing a common prefix and suffix records.
• During index creation or insertion, DB2 will compare
the new key with adjacent index keys and find the
longest common prefixes between them.
Variable Slot
Directory
AAAB, 1, CC
BBBZ, 1, ZZ
0,
2
SUFFIX
RECORDS
Key Compressed
RID
39
42. Simple Index Compr ession Tests - E lapsed Time
49.12
49.24
83.99
53.89
28.31
33.67
68.3
44.07
0 10 20 30 40
Seconds
50 60 70
80 90
Simple Select
Simple Insert
Simple Update
Simple Delete
Without Index Compression With Index Compression
Index Compression
34.5 34.8
16.2 20.8 23.6
33.9
6.8 10.5
1.6
2.0
2.6
2.5
3.1
3.3
52.2
52.1
0%
20%
40%
Select: Select:
Base Ixcomp
Insert: Insert:
Base Ixcomp
Update: Update:
Base Ixcomp
Delete: Delete:
Base Ixcomp
MachineUtiliza
tion
user system idle iowait
ComplexQueryDatabase
WarehouseTested
* Higher is better
SimpleIndexCompressionTests
16.7 17.5
37.1 36.4
49.1
46.3
48.2
45.0
11.7 11.4
33.3 30.9 25.9
18.5
38.0 34.2
60%
80%
100%
Estimated In dex C ompression Savin gs
16%
10% 20% 30% 40% 50% 60% 70%
Percentage Com pressed (Indexes)
20%
24%
31%
50%
55%
57%
0%
W arehouse #1
W arehouse #2
W arehouse #3
W arehouse #4
W arehouse #5
W arehouse #6
W arehouse #7
Average 36%
Runs
18% Faster
Runs
19% Faster
Runs
As fast
• Lower is better
Results in a Nutshell
• Index compression uses idle CPU
cycles and idle cycles spent waiting
for I/O to compress & decompress
index data.
• When we are not CPU bound, we are
able to achieve better performance in
all inserts, deletes and updates.
Runs
40
16% Faster
43. Temp Table Compression
What is Temp Table Compression?
The ability to decrease storage
requirements by compressing temp
table data
Temp tables created as a result of
the following operations are
compressed by default:
– Temps from Sorts
– Created Global Temp Tables
– Declared Global Temp Tables
– Table queues (TQ)
Why do we need Temp Table
Compression on relational
databases?
Temp table spaces can account
for up to 1/3 of the overall
tablespace storage in some
database environments.
Temp compression reduces disk
cost and TCO (total cost of
ownership)
41
44. Temp Table Compression
Canada|Ontario|Toronto|Matthew
Canada|Ontario|Toronto|Mark
USA|Illinois|Chicago|Luke
USA|Illinois|Chicago|John
0x12f0 – CanadaOntarioToronto …
0xe57a – Mathew …
0xff0a – Mark …
0x15ab – USAIllinoixChicago …
0xdb0a – Luke …
0x544d – John …
Create dictionary from sample data
String of data across a row
How does Temp Table Compression Work?
– It extends the existing row-level compression mechanism that currently
applies to permanent tables, into temp tables.
0x12f0,0xe57a
0x12f0,0xff0a
0x15ab,0xdb0a
0x15ab,0x544d
Saved data (compressed)
Lempel-Ziv Algorithm
42
45. Query Workload CPU Analysis for Temp Compression
39.26
46.50
1.7
1.3
29.00
29.50
22.19
14.61
0%
20%
40%
60%
80%
100%
Baseline Temp Compression
user sys idle iowait
Temp Table Compression
SpaceSavingsforComplexWarehouseQuerieswithTemp
Compression
78.3
50.2
0.0
20.0
40.0
60.0
80.0
100.0
WithoutTempCompTotalBytesStored WithTempCompBytesStored
Size(Gigabyt
es)
Saves
35%
Space
Effective
CPU
Usage
• Lower is better
ElapsedTimeforComplexWarehouseQuerieswithTemp
Compression
183.98
175.56
120.00
130.00
140.00
150.00
160.00
170.00
180.00
190.00
200.00
WithoutTempCompRuntime WithTempCompRuntime
Minu
tes
5%
Faster
• Lower is better
Results in a Nutshell
For affected temp compression
enabled complex queries, an average
of 35% temp tablespace space
savings was observed. For the
100GB warehouse database setup,
this sums up to over 28GB of saved
temp space.
43
46. XML Data Compression
What is XML Data Compression?
The ability to decrease the storage
requirements of XML data through
compression.
XML Compression extends row
compression support to the XML
documents.
If row compression is enabled for
the table, the XML data will be also
compressed. If row compression is
not enabled, the XML data will not
be compressed either.
Why do we need XML Data
Compression?
Compressing XML data can improve
storage efficiency and runtime
performance of queries that are I/O
bound.
XML compression reduces disk cost and
TCO (total cost of ownership) for
databases with XML data
44
47. XML Data Compression
Relational
Data
Data (uncompressed)
< 32KB
XML Data
32KB – 2GB
XML Data
Comp.
Data
Data (compressed)
Inlined
< 32KB
XML Data
Compressed
32KB – 2GB
XML Data
Dictionary
#1
Dictionary
#2
How does XML Data Compression
Work?
– Small XML documents (< 32k) can be
inlined with any relational data in the
row and the entire row is compressed.
• Available since DB2 9.5
– Larger XML documents that reside in
a data area separate from relational
data can also be compressed. By
default, DB2 places XML data in the
XDA to handle documents up to 2GB
in size.
– XML compression relies on a separate
dictionary than the one used for row
compression.
45
48. XML Data Compression
X M L C o m p re s s io n S a v in g s
4 3 %
6 1 %
6 3 %
6 3 %
7 4 %
7 7 %
7 7 %
0 % 2 0 % 4 0 % 6 0 %
P e r c e n ta g e C o m p r e s s e d
8 0 %
X M L D B Test # 1
X M L D B Test # 2
X M L D B Test # 3
X M L D B Test # 4
X M L D B Test # 5
X M L D B Test # 6
X M L D B Test # 7
XMLDatabaseTested
Results in a Nutshell
Significantly improved query
performance for I/O-bound
workloads.
Achieved 30% faster
maintenance operations
such as RUNSTATS, index
creation, and import.
Average compression
savings of ⅔ across 7
different XML customer
databases and about ¾
space savings for 3 of those
7 databases.
Average Elapsed Time for SQLXML and Xquery Queries over an XML
and Relational Data database using XDA Compression
31.1
19.7
0
5
10
15
20
25
30
35
Without XML Compression With XML Compression
Time(sec)
Average 67%
• Lower is better
• Higher is better
37%
Faster
46
49. Range Partitioning with Local Indexes
47
What does Range Partitioning
with Local Indexes mean?
– A partitioned index is an index
which is divided up across
multiple storage objects, one per
data partition, and is partitioned in
the same manner as the table
data
– Local Indexes can be created
using the PARTITIONED
keyword when creating an index
on a partitioned table (Note:
MDC block indexes are
partitioned by default)
Why do we need Range
Partitioning with local Indexes?
– Improved ATTACH and DETACH
partition operations
– More efficient access plans
– More efficient REORGs.
When does Range Partitioning with
Local Indexes work best?
– When frequents roll-in and roll-out of
data are performed
– When one tablespace is defined per
range.
50. Index siz e com parison: Leaf page count
18,409
13,476
0
4,000
8,000
12,000
16,000
20,000
global index on RP table local index on RP table
Indexleafpages
Results in a Nutshell
Partition maintenance with ATTACH:
– 20x speedup compared to DB2 9.5
global index because of reduced
index maintenance.
– 3000x less log space used than with
DB 9.5 global indexes.
Asynchronous index maintenance on
DETACH is eliminated.
Local indexes occupy fewer disk
pages than 9.5 global indexes.
– 25% space savings is typical.
– 12% query speedup over global
indexes for index queries – fewer
page reads.
25%
Space
Savings
• Lower is better
Local Indexes
* Lower is better
Range Partitioning with Local Indexes
Total Time and Log Space required to ATTACH 1.2
million rows
651.84
0.05
0.03
0.21
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
V9.5 Global Indexes V9.7 Local Indexes V9.7 Local IndexesNo Indexes - Baseline
built during ATTACHbuilt before ATTACH
LogSpacerequired(MB)
180.00
160.00
140.00
120.00
100.00
80.00
60.00
40.00
20.00
0.00
Attach/SetIntegritytime(sec)
Log Space used,
MB
Attach/Set Integrity
time (sec)
48
51. Scan Sharing
What is Scan Sharing?
It is the ability of one scan to exploit
the work done by another scan This
feature targets heavy scans such
as table scans or MDC block index
scans of large tables.
Scan Sharing is enabled by default
on DB2 9.7
Why do we need Scan Sharing?
Improved concurrency
Faster query response times
Increased throughput
When does Scan Sharing work
best?
Scan Sharing works best on
workloads that involve several
clients running similar queries
(simple or complex), which involve
the same heavy scanning
mechanism (table scans or MDC
block index scans).
49
52. Scan Sharing
How does Scan Sharing work?
– When applying scan sharing, scans
may start somewhere other than the
usual beginning, to take advantage of
pages that are already in the buffer
pool from scans that are already
running.
– When a sharing scan reaches the end
of file, it will start over at the beginning
and finish when it reaches the point
that it started.
– Eligibility for scan sharing and for
wrapping are determined
automatically in the SQL compiler.
– In DB2 9.7, scan sharing is supported
for table scans and block index
scans.
Unshared Scan
Shared Scan
A
scan
B
scan
Re-read pages
causing extra I/O
A
scan
Shared
A & B scan
B
scan
50
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3
4 5 6 7 8
1 2 3
53. Block Index Scan Test : Q1 and Q6 Interleaved
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
QueryRan
staggeringevery10sec
0 50 100 150 200 250 300 350 400 450 500 550 600
Scan Sharing
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
Q1
Q6
QueryRan
staggeringevery10sec
0 50 100 150 200 250 300 350 400 450 500 550 600
No Scan Sharing
Q1 : CPU Intensive
Q6 : IO Intensive
Scan Sharing Tests on Table Scan
1,284.6
90.3
0.0
200.0
400.0
600.0
800.0
1,000.0
1,200.0
1,400.0
No Scan Sharing Scan Sharing
Average of running 100 Instances of Q1
Seconds
Scan Sharing
• Lower is better
• Lower is better
Runs
14x
Faster!
• MDC Block Index Scan Sharing
shows 47% average query
improvement gain.
• The fastest query shows up to
56% runtime gain with scan
sharing.
• 100 concurrent table scans
now run 14 times faster
with scan sharing!
Runs
47%
51
Faster!
54. Complex Queries per Hour Throughputfor a 10GBWarehouse
Database: 16 Parallel Streams
381.92
636.43
0
100
200
300
400
500
600
700
Scan SharingOFF Scan SharingON
Scan Sharing
• Higher is better
67%
Throughput
Improved
Results in a Nutshell
When running 16 concurrent streams of complex queries in parallel, a 67% increase in
throughput is attained when using scan sharing.
Scan sharing works fully on UR and CS isolation and by design, has limited applicability on
RR and RS isolation levels.
52
55. XML Scalability on Infosphere Warehouse (a.k.a DPF)
What does it mean?
Tables containing XML
column definitions can now
be stored and distributed on
any partition.
XML data processing is
optimized based on their
partitions.
Why do we need XML in database partitioned environments?
As customers adopt the XML datatype in their warehouses, XML data
needs to scale just as relational data
XML data also achieves the same benefit from performance
improvements attained from the parallelization in DPF environments.
53
56. XML Scalability on Infosphere Warehouse (a.k.a DPF)
Simple query: Elapsed time speedup from 4 to 8 partitions
0
0.5
1
1.5
2
2.5
count w ith count, no grouped agg
index index
update colo join noncolo join
Elapsedtime4P/8P
rel xml xmlrel
*
Results in a Nutshell
Table results show the elapsed time
performance speedup of complex
queries from a 4 partition setup to an
8 partition setup. Queries tested
have a similar star-schema balance
for relational and XML.
Each query run in 2 or 3 equivalent
variants:
– Completely relational (“rel”)
– Completely XML (“xml”)
– XML extraction/predicates with
relational joins (“xmlrel”) (join
queries only)
Queries/updates/deletes scale as
well as relational ones.
Average XML query-speedup is 96%
of relational
Complex query: Elapsed time speedup from 4 to 8 partitions
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10
Query number
Elapsedtime4P/8P
rel xml xmlrel
54
57. Statement Concentrator
Why do we need the statement
concentrator?
This feature is aimed at OLTP workloads
where simple statements are repeatedly
generated with different literal values. In
these workloads, the cost of recompiling
the statements many times adds a
significant overhead.
Statement concentrator avoids this
compilation overhead by allowing the
compiled statement to be reused,
regardless of the values of the literals.
What is the statement
concentrator?
It is a technology that allows
dynamic SQL statements
that are identical, except for
the value of its literals, to
share the same access plan.
The statement concentrator
is disabled by default, and
can be enabled either
through the database
configuration parameter
(STMT_CONC) or from the
prepare attribute
55
58. Statement Concentrator
Effect of the Statement Concentrator on Prepare
times for 20,000 statements using 20 users
436
23
0
100
200
300
400
500
Concentrator off Concentrator on
PrepareTime(sec)
19x
Reduction
in Prepare
time!
• Lower is better
Results in a Nutshell
The statement
concentrator allows
prepare time to run up to
25x faster for a single user
and 19x faster for 20
users.
The statement
concentrator improved
throughput by 35% in a
typical OLTP workload
using 25 users
Effect of the Statement Concentrator for an OLTP workload
133
180
200
180
160
140
120
100
80
60
40
20
0
Concentrator Off Concentrator On
Throughpu
t
• Higher is better
35%
Throughput
Improved!
56
59. Currently Committed
What is Currently Committed?
Currently Committed semantics
have been introduced in DB2 9.7
to improve concurrency where
readers are not blocked by
writers to release row locks when
using Cursor Stability (CS)
isolation.
The readers are given the last
committed version of data, that
is, the version prior to the start of
a write operation.
Currently Committed is
controlled with the
CUR_COMMIT database
configuration parameter
Why do we need the Currently
Committed feature?
Customers running high
throughput database applications
cannot tolerate waiting on locks
during transaction processing and
require non-blocking behavior for
read transactions.
57
60. Currently Committed
Results in a Nutshell
By enabling currently
committed, we use CPU that
was previously idle (18%),
leading to an increase of over
28% in throughput.
Throughput of OLTP Workload using Currently
Committed
981.25
1,260.89
0
300
600
900
1,200
1,500
Currently Commit Disabled Currently Commit Enabled
Transactionspersecond
CPU Analysis - CPU Analysis on Currently Committed
45.0
58.9
12.9
17.2
33.5
5.0
8.7
19.0
0%
20%
40%
60%
80%
100%
CC Disabled CC Enabled
user system idle iowait
Effective
CPU
usage
Allows
28% more
throughput
• Higher is better
With currently committed
enabled, we see reduced
LOCK WAIT time by
nearly 20%.
We observe expected
increases in LSN GAP
cleaners and increased
logging.
58
61. LOB Inlining
Why do we need the LOB Inlining
feature?
Performance will increase for queries
that access inlined LOB data as no
additional I/O is required to fetch the
LOB data.
LOBS are prime candidates for
compression given their size and the
type of data they represent. By
inlining LOBS, this data is then
eligible for compression, allowing
further space savings and I/O from
this feature.
What is LOB INLINING?
LOB inlining allows customers to
store LOB data within a formatted
data row in a data page instead of
creating separate LOB object.
Once the LOB data is inlined into
the base table row, LOB data is
then eligible to be compressed.
59
62. LOB Inlining
Inlined LOB vs. Non-Inlined LOB
75% 75%
64%
55%
70%
65%
7%
22%
30%
10%
0%
80%
70%
60%
50%
40%
30%
20%
8kLob
16kLob
32kLob
Size of LOB
Insert Performance Select Performance Update Performance
%Improvement
Results in a Nutshell
INSERT and SELECT
operations are the ones
with more benefit. The
smaller the LOB the
bigger the benefit of the
inlining
For UPDATE operations
the larger the LOB the
better the improvements
We can expect the inlined
LOBs will have the same
performance as a
varchar(N+4)
60
* Higher is better
63. Summary of Key DB2 9.7 Performance Features
Compression for indexes, temp tablespaces and XML data results on space
savings and better performance
Range Partitioning with local indexes results in space savings and better
performance including increased concurrency for certain operations like
REORG and set integrity. It also makes roll-in and roll-out of data more
efficient.
Scan Sharing improves workloads that have multiple heavy scans in the
same table.
XML Scalability allows customers to exploit the same benefits in data
warehouses as they exist for relational data
Statement Concentrator improves the performance of queries that use
literals reducing their prepare times
Currently Committed increases throughput and reduces the contention on
locks
LOB Inlining allows this type of data to be eligible for compression
61
64. A glimpse at the Future
Expect more leadership benchmark results on POWER7
and Nehalam EX
Preparing for new workloads
– Combined OLTP and Analytics
Preparing for new operating environments
– Virtualization
– Cloud
– Power-aware
Preparing for new hardware
– SSD storage
– POWER7
– Nehalem EX
62
65. Conclusion
DB2 is the performance benchmark leader
New features in DB2 9.7 that further boost performance
– For BOTH the OLTP and Data warehouse areas
Performance is a critical and integral part of DB2!
– Maintaining excellent performance
• On current hardware
• Over the course of DB2 maintenance
– Preparing for future hardware/OS technology
63
66. Appendix – Mandatory SAP publication data
Required SAP Information
For more information regarding these results and SAP benchmarks, visit www.sap.com/benchmark.
These benchmark fully complies with the SAP Benchmark Council regulations and has been audited and certified by SAP AG
SAP 3-tier SD Benchmark:
168,300 SD benchmark users. SAP R/3 4.7. 3-tier with database server: IBM eServer p5 Model 595, 32-way SMP, POWER5 1.9 GHz, 32 KB(D) + 64 KB(I)
L1 cache per processor, 1.92 MB L2 cache and 36 MB L3 cache per 2 processors. DB2 v8.2.2, AIX 5.3 (cert # 2005021)
100,000 SD benchmark users. SAP R/3 4.7. 3-tier with database server: HP Integrity Model SD64A, 64-way SMP, Intel Itanium 2 1.6 GHz, 32 KB L1 cache,
256 KB L2 cache, 9 MB L3 cache. Oracle 10g, HP-UX11i (cert # 2004068)
93,000 SD benchmark users. SAP R/3 4.7. 3-tier with database server: HP Integrity Superdome 64P Server, 64-way SMP, Intel Itanium 2 1.6 GHz, 32 KB L1
cache, 256 KB L2 cache, 9 MB L3 cache . SQL Server 2005, Windows 2003 (cert # 2005045)
SAP 3-tier BW Benchmark:
311,004 throughput./hour query navigation steps.. SAP BW 3.5. Cluster of 32 servers, each with IBM x346 Model 884041U, 1 processor/ 1 core/ 2 threads,
Intel XEON 3.6 GHz, L1 Execution Trace Cache, 2 MB L2 cache, 2 GB main memory. DB2 8.2.3 SLES 9. (cert # 2005043)
SAP TRBK Benchmark:
15,519,000. Day processing no. of postings to bank accounts/hour. SAP Deposit Management 4.0. IBM System p570, 4 core, POWER6, 64GB RAM. DB2 9
on AIX 5.3. (cert # 2007050)
10,012,000 Day processing no. of postings to bank accounts/hour. SAP Account Management 3.0. Sun Fire E6900, 16 core, UltraSPARC1V, 56GB RAM,
Oracle 10g on Solaris 10, (cert # 2006018)
8,279,000 Day processing no. of postings to bank accounts/hour/ SAP Account Management 3.0. HP rx8620, 16 core, HP mx2 DC,64 GB RAM, SQL Server
on Windows Server (cert # 2005052)
SD 2-tier SD Benchmark:
39,100 SD benchmark users, SAP ECC 6.0. Sun SPARC Enterprise Server M9000, 64 processors / 256 cores / 512 threads, SPARC64 VII, 2.52 GHz, 64
KB(D) + 64 KB(I) L1 cache per core, 6 MB L2 cache per processor, 1024 GB main memory, Oracle 10g on Solaris 10. (cert # 2008-042-1)
35,400 SD benchmark users, SAP ECC 6.0. IBM Power 595, 32 processors / 64 cores / 128 threads, POWER6 5.0 GHz, 128 KB L1 cache and 4 MB L2
cache per core, 32 MB L3 cache per processor, 512 GB main memory. DB2 9.5, AIX 6.1. (Cert# 2008019).
30,000 SD benchmark users. SAP ECC 6.0. HP Integrity SD64B , 64 processors/128 cores/256 threads, Dual-Core Intel Itanium 2 9050 1.6 GHz, 32 KB(I) +
32 KB(D) L1 cache, 2 MB(I) + 512 KB(D) L2 cache, 24 MB L3 cache, 512 GB main memory. Oracle 10g on HP-UX 11iV3. (cert # 2006089)
23,456 SD benchmark users. SAP ECC 5.0. Central server: IBM System p5 Model 595, 64-way SMP, POWER5+ 2.3GHz, 32 KB(D) + 64 KB(I) L1 cache per
processor, 1.92 MB L2 cache and 36 MB L3 cache per 2 processors. DB2 9, AIX 5.3 (cert # 2006045)
20,000 SD benchmark users. SAP ECC 4.7. IBM eServer p5 Model 595, 64-way SMP, POWER5, 1.9 GHz, 32 KB(D) + 64 KB(I) L1 cache per processor, 1.92
MB L2 cache and 36 MB L3 cache per 2 processors, 512 GB main memory. (cert # 2004062)
These benchmarks fully comply with SAP Benchmark Council's issued benchmark regulations and have been audited and certified by SAP. For more
information, see http://www.sap.com/benchmark
64