SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
::IBM Informix indexing techniques:
which one to use when ?
Eric Vercelletto Session A12
Begooden IT Consulting 4/23/2013 3:35 PM
• Introduction to Response Time measuring
• Identify the relevant indexing techniques
• Describe implementation method
• Confirm/recognize its use by accurate monitoring
• Measure its efficency as response time and
effective use in the database (sqltrace,sqexplain)
• Identify pros and cons
Agenda / methodology
4/24/2013 Session F12 2
Introduction
• Begooden IT Consulting is an IBM ISV company, mainly
focused on Informix technology services.
• Our 15+ years experience within Informix Software
France and Portugal helped us to acquire in depth
product knowledge as well as solid field experience.
• Our services include Informix implementation auditing,
performance tuning, issue management, administration
mentoring …
• We also happen to be the Querix reseller for France and
French speaking countries (except Québec and Louisiana)
• The company is based in Pont l’Abbé, Finistère, France
4/24/2013
3
Some basics not to forget about
There are 2 ways to measure response times
• The « cold » measure: the response time is measured just after
starting the engine, when data and index pages are not yet loaded
into Shared Memory IFMX buffers. Disk IO must be performed to
read the data and index pages, which will increase the RT.
• The « hot » measure: RT is measured when data and index pages
are loaded into SHMEM. No or few disk IO => RT is much shorter.
• This point can often explain surprising RT differences according to
how the data accessed.
• Broad range or DS queries most often access data and/or indexes in
disk pages
• OLTP queries mostly access data and indexes in SHMEM pages
4
Derivated thoughts and facts
• Reading data pages and/or index pages on disk always take
longer than in SHMEM. Full table scans can take minutes or
more, according to table size
• Reading data pages in SHMEM is very fast. Full scan of a
table in SHMEM take fractions of seconds or seconds, rarely
more.
• Reading index pages in SHMEM is also very fast. Added to
this, due to the B TREE structure, reading index pages
generally handles more contents than reading data pages.
• This often makes difficult the comparison of the efficiency
of 2 different indexes on the same table, when reading in
SHMEM.
5
Derivated thoughts and facts (continued)
• When running hot measures on indexes, the differences
can be as low as milliseconds BUT …
• Repeating millions of times 3 unuseful milliseconds can
make a difference!
• When the Response Times get to such a low level, sqltrace
is the tool you need to understand the query behaviour.
• In certain situations, saving milliseconds on a query will
make the difference. In other situations, saving seconds will
not make the difference.
• A bad response time can be caused by an unappropriate
indexation, but can also be caused by some « unusual »
logic adding unuseful efforts to be performed by the
applications and the server.
6
Comparing cold measure with hot measure (1)
• full scan of a mid-sized table tpcc:order_line,
containing 24 millions of rows
se l e ct * from order_line
on s t at -g his output
« Cold » read: performed just after oninit -v
« Hot read: performed after the first scan
Many disk pages read
zero disk pages read47.4 secs 19,4 secs secs
All buffer reads
7
Comparing cold measure with hot measure (2)
• Cold use of a poor selectivity index
select * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values)
Cold read Hot read
Few disk readsMany disk reads
Execution time: 5,9 secs Execution time: 1.1 secs
8
BATCHEDREAD_INDEX: description
• This feature has been taken from XPS and
introduced in 11.50xC5.
• The purpose is to maximize the index keys access
by grouping the reading of many index keys into
large buffers, then fetching the rows associated
with those keys
• This technique brings strong savings in terms of
CPU and IO, therefore reducing Response Time.
• This technique is suitable and efficient for
massive index reads (DS/OLAP), not for pinpoint-
type (OLTP) index access.
9
BATCHEDREAD_INDEX: the test
• We will run the following query against a 30
millions rows clients table. The table has an
index on ‘lastname’. Row size is 328 bytes
output to /dev/null
select lastname,count(*)
from clients
group by 1
• This query returns 2,188,286 rows
10
BATCHEDREAD_INDEX: facts
• All those response times are measured as « cold »
AUTO_READAHEAD 0
BATCHEDREAD_INDEX 0
• AUTO_READAHEAD 0
BATCHEDREAD_INDEX 1
• AUTO_READAHEAD 1
BATCHEDREAD_INDEX 1
See the difference
11
BATCHEDREAD_INDEX: how ?
• BATCHEDREAD_INDEX can be set, as well as
BATCHEDREAD_TABLE, either in the onconfig file
• Or used as an environment variable before
launching the application
export IFX_BATCHEDREAD_INDEX=1
• Or as an SQL statement
SET ENVIRONMENT IFX_BATCHEDREAD_INDEX '1';
• Monitor index scan activity with onstat –g scn
•
12
Attached or Detached Index?
• The « Antique Informix Disk Layout » used to create the index pages in the same
extents as the data pages for the attached indexes. The expected result was
reducing disk IO.
• This layout happened to become a problem because the data pages were often
located far from the index pages, causing the opposite effect of increasing disk IO.
The official recommandation was at this time to create detached indexes for this
reason.
• Nowadays, index pages are created in a different partition than the data pages,
causing the attached indexes to have the same level of performance as the
detached indexes.
• But.. If you have the possibility to create the data dbspaces and the index
dbspaces on independant disks and channels , you will increase your disk IO
performance by reducing disk contention.
• This gain will be observed mainly during intensive sessions doing massive data
changes.
• Watch out the output of onstat –g iof and look for low IO thruput per second.
13
Few columns or many columns in the same index?
Key points to consider
• Remember about « cold » reads and « hot » reads when
testing the efficiency of an index. Results can be
dramatically different between cold and hot.
• The choice is as often a hard to obtain trade-off, and
definately a long subject to discuss!
• Many columns in a index can make it more selective, but it
also will consume more CPU/disk resource when updating
keys (see b-tree cleaner tuning)
• Few columns in an index can make it less selective, but it
will consume less CPU/disk resource when updating keys
• Integrity constraints are not negotiable, but some integrity
constraints indexes can be negotiated…
14
Few columns or many columns?
Techniques to evaluate efficiency
• time dbaccess dbname queryfile gives an
indication on the efficiency of an index, but can be
misleading due to cold and hot measure huge
differences.
• onmode –Y sessnum 1 will identify which
index(es) are used, also will inform on how many rows
have been scanned against how many rows have been
returned
• onstat –g his (sqltrace) will give fine detail
about response time, buffer and disk access, lock waits
etc…
• A complete diagnostic will be done with the 3 tools.
15
Few columns or Many columns?
Let’s analyze a real case: one column
16
Rows scanned: 4913
Response time: 0.0368’’
1 column index
buffer reads: 5900
Few columns or many columns?
Same case, index with 2 columns
17
Rows scanned: 106
Response time: 0.0047’’
2 columns index
Buffer reads: 122
Highly duplicated lead columns
indexes: how was life before?
• The Antique Informix Rule stated to avoid multi-
columns indexes with low selectivity for the
leading keys, due to poor efficiency.
Ex: warehouse_id,district_id,order_id,order_line
• Querying on order_line required to specify the
lead columns in the query predicate, or create
another index with order_line as lead column
• Restructuring indexes following those rules was a
complex, long and risky task, not to mention the
fact that any downtime due to index rebuilding
was poorly accepted by Operations Managers…
18
Index key first & self join : it’s magic!
• The key-first scan was introduced in 7.3. It has been enhanced so
that an index can be used even the lead columns are not specified
in the where clause
• The index self join technique has been introduced in IDS 11.10,
although many DBA’s didn’t even notice it!
• By scanning subsets of the poorly selective composite index, the
engine manages to use the non-subsequent index keys as index
filters, transforming the index into a highly selective index.
• Hierarchical-like indexes with highly duplicated lead columns now
need no redefinition to be efficient.
• You need not building new indexes with highly selective lead
columns. This saves optimizer work and disk space.
• Index self join is enabled by default. You can, if you persist in not
using it, disable it either by setting INDEX_SELFJOIN 0 in onconfig or
with an optimizer directive {+AVOID_INDEX_SJ}
19
Index self-join: the test
• We will use the order_line TPC-C table, that contains
23,735,211 rows
• The index follows the hierarchy, which was formerly
considered as a poor implementation:
ol_w_id: warehouse id (50 distinct values)
ol_d_id: district id (10 distinct values)
ol_o_id: order number ( 9279 distinct values)
ol_number: order line number (14 distinct values)
• The challenging query is
SELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount)
FROM order_line
GROUP BY 1,2
ORDER BY 2,3
20
No Self join
• Use onmode -wm INDEX_SELFJOIN=0 to disable self join
21
Index is taken, but only key first
Many rows scanned
Response time: 11.258’’
Self join: find the differences!
22
Key-first + self join access
Rows scanned: =~ 100 times less
RT: 3.313’’
The Antique Informix Rule says:
“you will use only one index per table”
The AIR says:
“you will use only one index per table”
• The Antique Informix Rule stated that only one
index per table could be used
• The optimizer had to choose only one index
among several indexes for the same table,
although several indexes were needed.
• Many not so unrealistic query cases had to be
drastically re-written in order to provide
acceptable response times
• The trick was generally to use an UNION or a
nested query, but the query code readability and
maintenability suffered from that.
24
What A.I.R. obliged you to do
• Generally, the best way to workaround the RT
issue was to use either UNION or nested queries
• The trick could be efficient in terms of Response
Time, but the code got more complex to read and
to maintain
• This workaround needed to strongly modify the
application code, and needed detailed and
accurate tests to obtain the same results as with
the initial query
25
The optimizer constantly getting
smarter across releases
• An optimizer enhancement introduced the use
of several indexes on the same table, but only
if the where clauses were linked with the ‘OR’
operator.
• The query path is like a usual INDEX PATH, the
difference being the use of several indexes
26
Measure with INDEX PATH
Use of 3 indexes!
Simple INDEX PATH
Scanned rows: 376,000
RT: 2.489’’
27
Disk reads:: 34136
Multi index: different path
33% gain in RT
Multi-index /skip scan enabled
Response Time is shorter
3 indexes used
Disk reads: 1984
28
Multiple indexes:
what should be done?
• Generally, the optimizer decides correctly which is the best path
• You can compare the results with the use of UNION, then decide
between keeping hard to maintain code or not
• You can nonetheless use optimizer directives to force the access
method, like
{+ AVOID_MULTI_INDEX (clients)}
To force INDEX PATH
• Or
{+ MULTI_INDEX (clients)}
TO force multi index SKIP SCAN path
• Can get tricky to make a self choice if AND and OR conditions are
set on the involved indexes
• The difference is almost not visible in case of hot measure
• Statistics on indexes are very important, the access method can
change according to them!
29
Star join
• Star join is an extension of the MULTI INDEX concept
• It combines this technique with DYNAMIC HASH JOINS
• The technique has been ported from XPS to IDS 11.70
• It is used exclusively for DS/OLAP queries where a FACT
table is the center point of many dimension tables
• Requires PDQPRIORITY ( Ultimate Edition or Enterprise
Edition )
• If you consider using Star Join, you are an excellent
candidate to see a demo of Informix Warehouse
Accelerator!
30
The A.I.R says:
« you will avoid indexes with too many tree levels »
• Ok, but what could I do to solve that ?
My indexes are built with the data they
have inside, and nothing or almost
nothing can be done
• Databases and tables are getting
bigger and bigger, and
splitting/archiving part of the data is
not always an acceptable solution
31
FOREST OF TREES INDEXES
• The forest of trees index type has been
introduced in 11.70 xC1
• It replicates the model of a traditionnal B-
TREE, having several root nodes instead of
only one root node
• The forest of trees brings benefits when
contention against the root node is observed
32
Reducing b-tree levels number
on index « lastname,firstname »
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree
=> The initial number of b-trees levels is 6
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 10 buckets
=> The number of b-trees levels decreased to 5
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 100 buckets
=> The number of b-trees levels decreased to 4
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 1000 buckets
=> The number of b-trees levels decreased to 3
33
Tpcc with regular b-tree indexes
• Index iu_stock_01 has 4 levels
Tpcc result is 14093 tpmC
High contention on
iu_stock_01: 8,704,052 spins
in 4 mn
34
Tpcc with FOT on iu_stock_01
• create unique index iu_stock_01 on stock (s_w_id,s_i_id)
using btree in data03 HASH on (s_w_id) with 50 buckets;
• Index iu_stock_01 has now 3 levels
Result grew to 16413 tpmC
Contention on iu_stock_01
decreased from 8,704,000
to 149,600 spins in 4mn
iu_oorder_01 is now a good
candidate for FOT!
35
Main facts on FOT indexes
• FOT is very efficient on reducing concurrency on indexes
access => Better RT in OLTP context
• FOT is very efficient to reduce levels of B-TREE => Better
overall RT
• Ideal for primary keys and foreign keys in an high
concurrency OLTP context
• Implementation is easy and fast
• Supports main index functionality: ER, PK, FK, b-tree
cleaning…
• Does not support aggregate queries, range scans on HASH
ON columns
• Also does not support index clustering, index fillfactor and
functional(UDR based) indexes
36
Optimizing big index creation:
PSORT_NPROCS
• The PSORT_NPROCS env variable is used to allocate more
threads to the sort package, which is also used for parallel
index creation.
• Significant performance improvements on index creation
can be obtained on multi-core/multi-processor servers
• It can be used even with non PDQPRIORITY-enabled
editions if the server has more than one core/CPU.
• PSORT_NPROCS can unleash the memory consumption:
please check for available memory on the server.
• The onconfig parameter DS_NONPDQ_QUERY_MEM has to
be checked if using PSORT_NPROCS.
37
Optimizing big index creation
DBSPACETEMP or PSORT_DBTEMP
• The env variables DBSPACETEMP overrides the
same onconfig parameter.
• Generally raw-device based temp dbspaces offer
more performance than file system based files.
• PSORT_DBTEMP write temporary sort files in the
specified file-system based directories instead of
DBSPACETEMP.
• It is useful to spread the temporary sort files to a
wider list of directories mounted on different
spindles
38
PSORT_NPROCS/PSORT_DBTEMP:
facts
• create index id_clients_02 on clients(lastname,firstname)
• unset PSORT_NPROCS
unset PSORT_DBTEMP
=> 13m28.709s
• export PSORT_NPROCS=3
export PSORT_DBTEMP=
/tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/id
s_chunks/ids_space03
=> 6m19
• A ram disk, or even a SSD drive can improve performance a lot:
export PSORT_NPROCS=3
export PSORT_DBTEMP=/mnt/myramdisk
=> 4m22.030s
• To check the environment of the session:
onstat –g env SessionNumber
39
Index disable: What happens?
• Disabling an existing index will prevent the server from using this
index, but it will « remember » the index schema.
• This technique can be applied before executing massive data insert
or update, since it will alleviate the index keys update workload.
• Heavy side effects can be expected: loss of key unicity, loss of
performance…
• If you run a query on a disabled index, the optimizer will probably
choose a sequential scan unless a better path is found.
• The index will be seen as ‘disabled’ in dbschema, but will not be
seen in oncheck –pT no oncheck –pe
• Disabling an index will make its former disk space available in the
dbspace
• Disabling an index is immediate
• Syntax is: set indexes IndexName disabled
40
Index enable: what happens?
• Enabling an index will rebuild the index physically,
with the same definition as before
• Enabling an index takes as much time as creating
the same index
• But the enable statement is simpler to type than the
create index statement 
• + you do not have to remember the initial create
index statement 
• Syntax is: set indexes IndexName enabled
41
Digging for more performance:
Disable foreign key indexes
• Many times, foreign key indexes are a part of the same table’s primary
key.
• order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number)
order_line foreign key (ol_w_id,ol_d_id,ol_o_id)
• Using ‘disable index’ in the add constraint statement will save the
creation of an ‘unuseful’ index, because its structure is already existing
in the primary key.
• ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id)
REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED);
• This implementation will save disk space by dropping an index
• CPU resource will be saved when updating/deleting/creating index keys,
• and consequently disk IO will also be saved.
• Check that disabling the constraint index has no hidden side effects, an
mistake can have expensive consequences!
42
I need to create a new index,
but users are always connected to the table!
• Sometimes a new index needs to be created, but
the tables are accessed by users or batches.
• IDS 11.10 introduced the possibility to create an
index without putting an exclusive lock on the table,
called index online.
• Users can SELECT, INSERT, UPDATE or DELETE rows
in the table while the index is being created
• Syntax is:
create index id_clients_01 on clients(lastname,firstname)ONLINE
• Drop index online is also available in the same
conditions
43
Create index online:
precautions & restrictions
• The create index online is a complex operation, involving
table snapshot, base index build catch up and more.
• It will request additional resources, such as disk space, CPU
and memory in order to make the operation safe and as
fast as possible.
• Long transactions may happen: check logical logs size
before diving
• The index pre-image pool memory size is managed with the
onconfig parameter ONLIDX_MAXMEM, updatable with
onmode –wm
• No appliable for cluster index, UDT columns, no UDR
indexes
• Only one create index online per table at the same time
44
Index compression
• IDS introduced table compression in 11.50 xC4. This technology is now
used successfully in large databases implementations.
• Index compression is a new feature of IDS 12.10. It is based on the
same technology as table compression.
• The principle is to compress the key columns values at b-tree leaf level,
but not the rowids attached to these key values
• Index compression is very effective for indexes having large key values:
names, item names etc…
• The compression dictionary must contain at least 2000 unique key
values
• Index compression is an excellent way to save disk space, and …
• Since more key values fit in an index page, more key values can be read
in one IO cycle => IO is more efficient
• Reducing IO must enhance index access performance in large queries
45
Index compression:
Disk space gained
• Execute function task ("index compress", "id_clients_01", "staging");
• Or
execute function task(“index compress”, “j”,“testdb”);
• Or
create index id_clients_01 on clients(lastname,firstname) compressed
More than 50% compression rate
46
Cluster index
• The creation or alter of a cluster index will physically sort
the table data by the first column of this index at creation
time
• Accessing a table data with a cluster index will read already
sorted data pages.
• Generally makes IO on data pages easier because they are
contiguous => Decrease RT
• The cluster level will decrease as long as new rows are
insert
• High cost of administration: re-clustering this index will
rewrite the table data pages
• Cluster index can be good for stable tables accessed in a
ordered sequential way
47
Statistics on indexes
• Introduced in 11.70: when one creates an index,
the distributions for this index are automatically
created
• High mode statistics are generated for the lead
column
• Index levels statistics are also generated in low
mode
• This will not stop you from regularly updating
statistics for those indexes, but it is no more
required to do it just after the index creation
Questions?
Indexing techniques: which one to use when
Eric Vercelletto Begooden IT Consulting eric.vercelletto@begooden-it.com

Contenu connexe

En vedette

Covering Indexes Ordersof Magnitude Improvements
Covering  Indexes  Ordersof Magnitude  ImprovementsCovering  Indexes  Ordersof Magnitude  Improvements
Covering Indexes Ordersof Magnitude ImprovementsPerconaPerformance
 
IBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinIBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinAjay Gupte
 
Optimizer Enhancement in Informix
Optimizer Enhancement in InformixOptimizer Enhancement in Informix
Optimizer Enhancement in InformixBingjie Miao
 
IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7Pradeep Natarajan
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesIsaac Mosquera
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain ExplainedJeremy Coates
 

En vedette (8)

Covering Indexes Ordersof Magnitude Improvements
Covering  Indexes  Ordersof Magnitude  ImprovementsCovering  Indexes  Ordersof Magnitude  Improvements
Covering Indexes Ordersof Magnitude Improvements
 
Explain that explain
Explain that explainExplain that explain
Explain that explain
 
IBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinIBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash Join
 
Optimizer Enhancement in Informix
Optimizer Enhancement in InformixOptimizer Enhancement in Informix
Optimizer Enhancement in Informix
 
IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7IBM Informix - What's new in 12.10.xc7
IBM Informix - What's new in 12.10.xc7
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best Practices
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 

Similaire à A12 vercelletto indexing_techniques

Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxJasonTuran2
 
SQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should KnowSQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should KnowDean Richards
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computersSyed Zaid Irshad
 
Scaling Security Workflows in Government Agencies
Scaling Security Workflows in Government AgenciesScaling Security Workflows in Government Agencies
Scaling Security Workflows in Government AgenciesAvere Systems
 
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops VMworld
 
InfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh PPlusOrMinusZero
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql serverChris Adkin
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgaoEdhole.com
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Guy Harrison
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgaoEdhole.com
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big timeproitconsult
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 

Similaire à A12 vercelletto indexing_techniques (20)

Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptx
 
SQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should KnowSQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should Know
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
 
Scaling Security Workflows in Government Agencies
Scaling Security Workflows in Government AgenciesScaling Security Workflows in Government Agencies
Scaling Security Workflows in Government Agencies
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Percona FT / TokuDB
Percona FT / TokuDBPercona FT / TokuDB
Percona FT / TokuDB
 
Scaling tappsi
Scaling tappsiScaling tappsi
Scaling tappsi
 
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
 
InfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux vs_RDBMS
InfiniFlux vs_RDBMS
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql server
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 

Plus de BeGooden-IT Consulting

Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl applicationQuerix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl applicationBeGooden-IT Consulting
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linuxBeGooden-IT Consulting
 
IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...BeGooden-IT Consulting
 
Ibm informix security functionality overview
Ibm informix security functionality overviewIbm informix security functionality overview
Ibm informix security functionality overviewBeGooden-IT Consulting
 
F12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmarkF12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmarkBeGooden-IT Consulting
 

Plus de BeGooden-IT Consulting (8)

Querix lycia presentation v1.2 fr
Querix lycia presentation v1.2 frQuerix lycia presentation v1.2 fr
Querix lycia presentation v1.2 fr
 
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl applicationQuerix 4 gl app analyzer 2016 journey to the center of your 4gl application
Querix 4 gl app analyzer 2016 journey to the center of your 4gl application
 
Querix Lycia: 4GL is modern!
Querix Lycia: 4GL is modern!Querix Lycia: 4GL is modern!
Querix Lycia: 4GL is modern!
 
A15 ibm informix on power8 power linux
A15 ibm informix on power8  power linuxA15 ibm informix on power8  power linux
A15 ibm informix on power8 power linux
 
IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...IBM informix: compared performance efficiency between physical server and Vir...
IBM informix: compared performance efficiency between physical server and Vir...
 
Informix4gl status
Informix4gl statusInformix4gl status
Informix4gl status
 
Ibm informix security functionality overview
Ibm informix security functionality overviewIbm informix security functionality overview
Ibm informix security functionality overview
 
F12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmarkF12 vercelletto innovator-c_tpc_benchmark
F12 vercelletto innovator-c_tpc_benchmark
 

Dernier

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

A12 vercelletto indexing_techniques

  • 1. ::IBM Informix indexing techniques: which one to use when ? Eric Vercelletto Session A12 Begooden IT Consulting 4/23/2013 3:35 PM
  • 2. • Introduction to Response Time measuring • Identify the relevant indexing techniques • Describe implementation method • Confirm/recognize its use by accurate monitoring • Measure its efficency as response time and effective use in the database (sqltrace,sqexplain) • Identify pros and cons Agenda / methodology 4/24/2013 Session F12 2
  • 3. Introduction • Begooden IT Consulting is an IBM ISV company, mainly focused on Informix technology services. • Our 15+ years experience within Informix Software France and Portugal helped us to acquire in depth product knowledge as well as solid field experience. • Our services include Informix implementation auditing, performance tuning, issue management, administration mentoring … • We also happen to be the Querix reseller for France and French speaking countries (except Québec and Louisiana) • The company is based in Pont l’Abbé, Finistère, France 4/24/2013 3
  • 4. Some basics not to forget about There are 2 ways to measure response times • The « cold » measure: the response time is measured just after starting the engine, when data and index pages are not yet loaded into Shared Memory IFMX buffers. Disk IO must be performed to read the data and index pages, which will increase the RT. • The « hot » measure: RT is measured when data and index pages are loaded into SHMEM. No or few disk IO => RT is much shorter. • This point can often explain surprising RT differences according to how the data accessed. • Broad range or DS queries most often access data and/or indexes in disk pages • OLTP queries mostly access data and indexes in SHMEM pages 4
  • 5. Derivated thoughts and facts • Reading data pages and/or index pages on disk always take longer than in SHMEM. Full table scans can take minutes or more, according to table size • Reading data pages in SHMEM is very fast. Full scan of a table in SHMEM take fractions of seconds or seconds, rarely more. • Reading index pages in SHMEM is also very fast. Added to this, due to the B TREE structure, reading index pages generally handles more contents than reading data pages. • This often makes difficult the comparison of the efficiency of 2 different indexes on the same table, when reading in SHMEM. 5
  • 6. Derivated thoughts and facts (continued) • When running hot measures on indexes, the differences can be as low as milliseconds BUT … • Repeating millions of times 3 unuseful milliseconds can make a difference! • When the Response Times get to such a low level, sqltrace is the tool you need to understand the query behaviour. • In certain situations, saving milliseconds on a query will make the difference. In other situations, saving seconds will not make the difference. • A bad response time can be caused by an unappropriate indexation, but can also be caused by some « unusual » logic adding unuseful efforts to be performed by the applications and the server. 6
  • 7. Comparing cold measure with hot measure (1) • full scan of a mid-sized table tpcc:order_line, containing 24 millions of rows se l e ct * from order_line on s t at -g his output « Cold » read: performed just after oninit -v « Hot read: performed after the first scan Many disk pages read zero disk pages read47.4 secs 19,4 secs secs All buffer reads 7
  • 8. Comparing cold measure with hot measure (2) • Cold use of a poor selectivity index select * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values) Cold read Hot read Few disk readsMany disk reads Execution time: 5,9 secs Execution time: 1.1 secs 8
  • 9. BATCHEDREAD_INDEX: description • This feature has been taken from XPS and introduced in 11.50xC5. • The purpose is to maximize the index keys access by grouping the reading of many index keys into large buffers, then fetching the rows associated with those keys • This technique brings strong savings in terms of CPU and IO, therefore reducing Response Time. • This technique is suitable and efficient for massive index reads (DS/OLAP), not for pinpoint- type (OLTP) index access. 9
  • 10. BATCHEDREAD_INDEX: the test • We will run the following query against a 30 millions rows clients table. The table has an index on ‘lastname’. Row size is 328 bytes output to /dev/null select lastname,count(*) from clients group by 1 • This query returns 2,188,286 rows 10
  • 11. BATCHEDREAD_INDEX: facts • All those response times are measured as « cold » AUTO_READAHEAD 0 BATCHEDREAD_INDEX 0 • AUTO_READAHEAD 0 BATCHEDREAD_INDEX 1 • AUTO_READAHEAD 1 BATCHEDREAD_INDEX 1 See the difference 11
  • 12. BATCHEDREAD_INDEX: how ? • BATCHEDREAD_INDEX can be set, as well as BATCHEDREAD_TABLE, either in the onconfig file • Or used as an environment variable before launching the application export IFX_BATCHEDREAD_INDEX=1 • Or as an SQL statement SET ENVIRONMENT IFX_BATCHEDREAD_INDEX '1'; • Monitor index scan activity with onstat –g scn • 12
  • 13. Attached or Detached Index? • The « Antique Informix Disk Layout » used to create the index pages in the same extents as the data pages for the attached indexes. The expected result was reducing disk IO. • This layout happened to become a problem because the data pages were often located far from the index pages, causing the opposite effect of increasing disk IO. The official recommandation was at this time to create detached indexes for this reason. • Nowadays, index pages are created in a different partition than the data pages, causing the attached indexes to have the same level of performance as the detached indexes. • But.. If you have the possibility to create the data dbspaces and the index dbspaces on independant disks and channels , you will increase your disk IO performance by reducing disk contention. • This gain will be observed mainly during intensive sessions doing massive data changes. • Watch out the output of onstat –g iof and look for low IO thruput per second. 13
  • 14. Few columns or many columns in the same index? Key points to consider • Remember about « cold » reads and « hot » reads when testing the efficiency of an index. Results can be dramatically different between cold and hot. • The choice is as often a hard to obtain trade-off, and definately a long subject to discuss! • Many columns in a index can make it more selective, but it also will consume more CPU/disk resource when updating keys (see b-tree cleaner tuning) • Few columns in an index can make it less selective, but it will consume less CPU/disk resource when updating keys • Integrity constraints are not negotiable, but some integrity constraints indexes can be negotiated… 14
  • 15. Few columns or many columns? Techniques to evaluate efficiency • time dbaccess dbname queryfile gives an indication on the efficiency of an index, but can be misleading due to cold and hot measure huge differences. • onmode –Y sessnum 1 will identify which index(es) are used, also will inform on how many rows have been scanned against how many rows have been returned • onstat –g his (sqltrace) will give fine detail about response time, buffer and disk access, lock waits etc… • A complete diagnostic will be done with the 3 tools. 15
  • 16. Few columns or Many columns? Let’s analyze a real case: one column 16 Rows scanned: 4913 Response time: 0.0368’’ 1 column index buffer reads: 5900
  • 17. Few columns or many columns? Same case, index with 2 columns 17 Rows scanned: 106 Response time: 0.0047’’ 2 columns index Buffer reads: 122
  • 18. Highly duplicated lead columns indexes: how was life before? • The Antique Informix Rule stated to avoid multi- columns indexes with low selectivity for the leading keys, due to poor efficiency. Ex: warehouse_id,district_id,order_id,order_line • Querying on order_line required to specify the lead columns in the query predicate, or create another index with order_line as lead column • Restructuring indexes following those rules was a complex, long and risky task, not to mention the fact that any downtime due to index rebuilding was poorly accepted by Operations Managers… 18
  • 19. Index key first & self join : it’s magic! • The key-first scan was introduced in 7.3. It has been enhanced so that an index can be used even the lead columns are not specified in the where clause • The index self join technique has been introduced in IDS 11.10, although many DBA’s didn’t even notice it! • By scanning subsets of the poorly selective composite index, the engine manages to use the non-subsequent index keys as index filters, transforming the index into a highly selective index. • Hierarchical-like indexes with highly duplicated lead columns now need no redefinition to be efficient. • You need not building new indexes with highly selective lead columns. This saves optimizer work and disk space. • Index self join is enabled by default. You can, if you persist in not using it, disable it either by setting INDEX_SELFJOIN 0 in onconfig or with an optimizer directive {+AVOID_INDEX_SJ} 19
  • 20. Index self-join: the test • We will use the order_line TPC-C table, that contains 23,735,211 rows • The index follows the hierarchy, which was formerly considered as a poor implementation: ol_w_id: warehouse id (50 distinct values) ol_d_id: district id (10 distinct values) ol_o_id: order number ( 9279 distinct values) ol_number: order line number (14 distinct values) • The challenging query is SELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount) FROM order_line GROUP BY 1,2 ORDER BY 2,3 20
  • 21. No Self join • Use onmode -wm INDEX_SELFJOIN=0 to disable self join 21 Index is taken, but only key first Many rows scanned Response time: 11.258’’
  • 22. Self join: find the differences! 22 Key-first + self join access Rows scanned: =~ 100 times less RT: 3.313’’
  • 23. The Antique Informix Rule says: “you will use only one index per table”
  • 24. The AIR says: “you will use only one index per table” • The Antique Informix Rule stated that only one index per table could be used • The optimizer had to choose only one index among several indexes for the same table, although several indexes were needed. • Many not so unrealistic query cases had to be drastically re-written in order to provide acceptable response times • The trick was generally to use an UNION or a nested query, but the query code readability and maintenability suffered from that. 24
  • 25. What A.I.R. obliged you to do • Generally, the best way to workaround the RT issue was to use either UNION or nested queries • The trick could be efficient in terms of Response Time, but the code got more complex to read and to maintain • This workaround needed to strongly modify the application code, and needed detailed and accurate tests to obtain the same results as with the initial query 25
  • 26. The optimizer constantly getting smarter across releases • An optimizer enhancement introduced the use of several indexes on the same table, but only if the where clauses were linked with the ‘OR’ operator. • The query path is like a usual INDEX PATH, the difference being the use of several indexes 26
  • 27. Measure with INDEX PATH Use of 3 indexes! Simple INDEX PATH Scanned rows: 376,000 RT: 2.489’’ 27 Disk reads:: 34136
  • 28. Multi index: different path 33% gain in RT Multi-index /skip scan enabled Response Time is shorter 3 indexes used Disk reads: 1984 28
  • 29. Multiple indexes: what should be done? • Generally, the optimizer decides correctly which is the best path • You can compare the results with the use of UNION, then decide between keeping hard to maintain code or not • You can nonetheless use optimizer directives to force the access method, like {+ AVOID_MULTI_INDEX (clients)} To force INDEX PATH • Or {+ MULTI_INDEX (clients)} TO force multi index SKIP SCAN path • Can get tricky to make a self choice if AND and OR conditions are set on the involved indexes • The difference is almost not visible in case of hot measure • Statistics on indexes are very important, the access method can change according to them! 29
  • 30. Star join • Star join is an extension of the MULTI INDEX concept • It combines this technique with DYNAMIC HASH JOINS • The technique has been ported from XPS to IDS 11.70 • It is used exclusively for DS/OLAP queries where a FACT table is the center point of many dimension tables • Requires PDQPRIORITY ( Ultimate Edition or Enterprise Edition ) • If you consider using Star Join, you are an excellent candidate to see a demo of Informix Warehouse Accelerator! 30
  • 31. The A.I.R says: « you will avoid indexes with too many tree levels » • Ok, but what could I do to solve that ? My indexes are built with the data they have inside, and nothing or almost nothing can be done • Databases and tables are getting bigger and bigger, and splitting/archiving part of the data is not always an acceptable solution 31
  • 32. FOREST OF TREES INDEXES • The forest of trees index type has been introduced in 11.70 xC1 • It replicates the model of a traditionnal B- TREE, having several root nodes instead of only one root node • The forest of trees brings benefits when contention against the root node is observed 32
  • 33. Reducing b-tree levels number on index « lastname,firstname » • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree => The initial number of b-trees levels is 6 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 10 buckets => The number of b-trees levels decreased to 5 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 100 buckets => The number of b-trees levels decreased to 4 • create index "informix".id_clients_02 on "informix".clients (lastname, firstname) using btree hash on (lastname) with 1000 buckets => The number of b-trees levels decreased to 3 33
  • 34. Tpcc with regular b-tree indexes • Index iu_stock_01 has 4 levels Tpcc result is 14093 tpmC High contention on iu_stock_01: 8,704,052 spins in 4 mn 34
  • 35. Tpcc with FOT on iu_stock_01 • create unique index iu_stock_01 on stock (s_w_id,s_i_id) using btree in data03 HASH on (s_w_id) with 50 buckets; • Index iu_stock_01 has now 3 levels Result grew to 16413 tpmC Contention on iu_stock_01 decreased from 8,704,000 to 149,600 spins in 4mn iu_oorder_01 is now a good candidate for FOT! 35
  • 36. Main facts on FOT indexes • FOT is very efficient on reducing concurrency on indexes access => Better RT in OLTP context • FOT is very efficient to reduce levels of B-TREE => Better overall RT • Ideal for primary keys and foreign keys in an high concurrency OLTP context • Implementation is easy and fast • Supports main index functionality: ER, PK, FK, b-tree cleaning… • Does not support aggregate queries, range scans on HASH ON columns • Also does not support index clustering, index fillfactor and functional(UDR based) indexes 36
  • 37. Optimizing big index creation: PSORT_NPROCS • The PSORT_NPROCS env variable is used to allocate more threads to the sort package, which is also used for parallel index creation. • Significant performance improvements on index creation can be obtained on multi-core/multi-processor servers • It can be used even with non PDQPRIORITY-enabled editions if the server has more than one core/CPU. • PSORT_NPROCS can unleash the memory consumption: please check for available memory on the server. • The onconfig parameter DS_NONPDQ_QUERY_MEM has to be checked if using PSORT_NPROCS. 37
  • 38. Optimizing big index creation DBSPACETEMP or PSORT_DBTEMP • The env variables DBSPACETEMP overrides the same onconfig parameter. • Generally raw-device based temp dbspaces offer more performance than file system based files. • PSORT_DBTEMP write temporary sort files in the specified file-system based directories instead of DBSPACETEMP. • It is useful to spread the temporary sort files to a wider list of directories mounted on different spindles 38
  • 39. PSORT_NPROCS/PSORT_DBTEMP: facts • create index id_clients_02 on clients(lastname,firstname) • unset PSORT_NPROCS unset PSORT_DBTEMP => 13m28.709s • export PSORT_NPROCS=3 export PSORT_DBTEMP= /tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/id s_chunks/ids_space03 => 6m19 • A ram disk, or even a SSD drive can improve performance a lot: export PSORT_NPROCS=3 export PSORT_DBTEMP=/mnt/myramdisk => 4m22.030s • To check the environment of the session: onstat –g env SessionNumber 39
  • 40. Index disable: What happens? • Disabling an existing index will prevent the server from using this index, but it will « remember » the index schema. • This technique can be applied before executing massive data insert or update, since it will alleviate the index keys update workload. • Heavy side effects can be expected: loss of key unicity, loss of performance… • If you run a query on a disabled index, the optimizer will probably choose a sequential scan unless a better path is found. • The index will be seen as ‘disabled’ in dbschema, but will not be seen in oncheck –pT no oncheck –pe • Disabling an index will make its former disk space available in the dbspace • Disabling an index is immediate • Syntax is: set indexes IndexName disabled 40
  • 41. Index enable: what happens? • Enabling an index will rebuild the index physically, with the same definition as before • Enabling an index takes as much time as creating the same index • But the enable statement is simpler to type than the create index statement  • + you do not have to remember the initial create index statement  • Syntax is: set indexes IndexName enabled 41
  • 42. Digging for more performance: Disable foreign key indexes • Many times, foreign key indexes are a part of the same table’s primary key. • order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number) order_line foreign key (ol_w_id,ol_d_id,ol_o_id) • Using ‘disable index’ in the add constraint statement will save the creation of an ‘unuseful’ index, because its structure is already existing in the primary key. • ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id) REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED); • This implementation will save disk space by dropping an index • CPU resource will be saved when updating/deleting/creating index keys, • and consequently disk IO will also be saved. • Check that disabling the constraint index has no hidden side effects, an mistake can have expensive consequences! 42
  • 43. I need to create a new index, but users are always connected to the table! • Sometimes a new index needs to be created, but the tables are accessed by users or batches. • IDS 11.10 introduced the possibility to create an index without putting an exclusive lock on the table, called index online. • Users can SELECT, INSERT, UPDATE or DELETE rows in the table while the index is being created • Syntax is: create index id_clients_01 on clients(lastname,firstname)ONLINE • Drop index online is also available in the same conditions 43
  • 44. Create index online: precautions & restrictions • The create index online is a complex operation, involving table snapshot, base index build catch up and more. • It will request additional resources, such as disk space, CPU and memory in order to make the operation safe and as fast as possible. • Long transactions may happen: check logical logs size before diving • The index pre-image pool memory size is managed with the onconfig parameter ONLIDX_MAXMEM, updatable with onmode –wm • No appliable for cluster index, UDT columns, no UDR indexes • Only one create index online per table at the same time 44
  • 45. Index compression • IDS introduced table compression in 11.50 xC4. This technology is now used successfully in large databases implementations. • Index compression is a new feature of IDS 12.10. It is based on the same technology as table compression. • The principle is to compress the key columns values at b-tree leaf level, but not the rowids attached to these key values • Index compression is very effective for indexes having large key values: names, item names etc… • The compression dictionary must contain at least 2000 unique key values • Index compression is an excellent way to save disk space, and … • Since more key values fit in an index page, more key values can be read in one IO cycle => IO is more efficient • Reducing IO must enhance index access performance in large queries 45
  • 46. Index compression: Disk space gained • Execute function task ("index compress", "id_clients_01", "staging"); • Or execute function task(“index compress”, “j”,“testdb”); • Or create index id_clients_01 on clients(lastname,firstname) compressed More than 50% compression rate 46
  • 47. Cluster index • The creation or alter of a cluster index will physically sort the table data by the first column of this index at creation time • Accessing a table data with a cluster index will read already sorted data pages. • Generally makes IO on data pages easier because they are contiguous => Decrease RT • The cluster level will decrease as long as new rows are insert • High cost of administration: re-clustering this index will rewrite the table data pages • Cluster index can be good for stable tables accessed in a ordered sequential way 47
  • 48. Statistics on indexes • Introduced in 11.70: when one creates an index, the distributions for this index are automatically created • High mode statistics are generated for the lead column • Index levels statistics are also generated in low mode • This will not stop you from regularly updating statistics for those indexes, but it is no more required to do it just after the index creation
  • 49. Questions? Indexing techniques: which one to use when Eric Vercelletto Begooden IT Consulting eric.vercelletto@begooden-it.com