This presentation talks about the available (as per April 2013) index related techniques with IBM Informix.
It includes indexing techniques available with IBM Informix 12.1
See all iiug presentations available on http://www.iiug.com / member area
1. ::IBM Informix indexing techniques:
which one to use when ?
Eric Vercelletto Session A12
Begooden IT Consulting 4/23/2013 3:35 PM
2. • Introduction to Response Time measuring
• Identify the relevant indexing techniques
• Describe implementation method
• Confirm/recognize its use by accurate monitoring
• Measure its efficency as response time and
effective use in the database (sqltrace,sqexplain)
• Identify pros and cons
Agenda / methodology
4/24/2013 Session F12 2
3. Introduction
• Begooden IT Consulting is an IBM ISV company, mainly
focused on Informix technology services.
• Our 15+ years experience within Informix Software
France and Portugal helped us to acquire in depth
product knowledge as well as solid field experience.
• Our services include Informix implementation auditing,
performance tuning, issue management, administration
mentoring …
• We also happen to be the Querix reseller for France and
French speaking countries (except Québec and Louisiana)
• The company is based in Pont l’Abbé, Finistère, France
4/24/2013
3
4. Some basics not to forget about
There are 2 ways to measure response times
• The « cold » measure: the response time is measured just after
starting the engine, when data and index pages are not yet loaded
into Shared Memory IFMX buffers. Disk IO must be performed to
read the data and index pages, which will increase the RT.
• The « hot » measure: RT is measured when data and index pages
are loaded into SHMEM. No or few disk IO => RT is much shorter.
• This point can often explain surprising RT differences according to
how the data accessed.
• Broad range or DS queries most often access data and/or indexes in
disk pages
• OLTP queries mostly access data and indexes in SHMEM pages
4
5. Derivated thoughts and facts
• Reading data pages and/or index pages on disk always take
longer than in SHMEM. Full table scans can take minutes or
more, according to table size
• Reading data pages in SHMEM is very fast. Full scan of a
table in SHMEM take fractions of seconds or seconds, rarely
more.
• Reading index pages in SHMEM is also very fast. Added to
this, due to the B TREE structure, reading index pages
generally handles more contents than reading data pages.
• This often makes difficult the comparison of the efficiency
of 2 different indexes on the same table, when reading in
SHMEM.
5
6. Derivated thoughts and facts (continued)
• When running hot measures on indexes, the differences
can be as low as milliseconds BUT …
• Repeating millions of times 3 unuseful milliseconds can
make a difference!
• When the Response Times get to such a low level, sqltrace
is the tool you need to understand the query behaviour.
• In certain situations, saving milliseconds on a query will
make the difference. In other situations, saving seconds will
not make the difference.
• A bad response time can be caused by an unappropriate
indexation, but can also be caused by some « unusual »
logic adding unuseful efforts to be performed by the
applications and the server.
6
7. Comparing cold measure with hot measure (1)
• full scan of a mid-sized table tpcc:order_line,
containing 24 millions of rows
se l e ct * from order_line
on s t at -g his output
« Cold » read: performed just after oninit -v
« Hot read: performed after the first scan
Many disk pages read
zero disk pages read47.4 secs 19,4 secs secs
All buffer reads
7
8. Comparing cold measure with hot measure (2)
• Cold use of a poor selectivity index
select * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values)
Cold read Hot read
Few disk readsMany disk reads
Execution time: 5,9 secs Execution time: 1.1 secs
8
9. BATCHEDREAD_INDEX: description
• This feature has been taken from XPS and
introduced in 11.50xC5.
• The purpose is to maximize the index keys access
by grouping the reading of many index keys into
large buffers, then fetching the rows associated
with those keys
• This technique brings strong savings in terms of
CPU and IO, therefore reducing Response Time.
• This technique is suitable and efficient for
massive index reads (DS/OLAP), not for pinpoint-
type (OLTP) index access.
9
10. BATCHEDREAD_INDEX: the test
• We will run the following query against a 30
millions rows clients table. The table has an
index on ‘lastname’. Row size is 328 bytes
output to /dev/null
select lastname,count(*)
from clients
group by 1
• This query returns 2,188,286 rows
10
11. BATCHEDREAD_INDEX: facts
• All those response times are measured as « cold »
AUTO_READAHEAD 0
BATCHEDREAD_INDEX 0
• AUTO_READAHEAD 0
BATCHEDREAD_INDEX 1
• AUTO_READAHEAD 1
BATCHEDREAD_INDEX 1
See the difference
11
12. BATCHEDREAD_INDEX: how ?
• BATCHEDREAD_INDEX can be set, as well as
BATCHEDREAD_TABLE, either in the onconfig file
• Or used as an environment variable before
launching the application
export IFX_BATCHEDREAD_INDEX=1
• Or as an SQL statement
SET ENVIRONMENT IFX_BATCHEDREAD_INDEX '1';
• Monitor index scan activity with onstat –g scn
•
12
13. Attached or Detached Index?
• The « Antique Informix Disk Layout » used to create the index pages in the same
extents as the data pages for the attached indexes. The expected result was
reducing disk IO.
• This layout happened to become a problem because the data pages were often
located far from the index pages, causing the opposite effect of increasing disk IO.
The official recommandation was at this time to create detached indexes for this
reason.
• Nowadays, index pages are created in a different partition than the data pages,
causing the attached indexes to have the same level of performance as the
detached indexes.
• But.. If you have the possibility to create the data dbspaces and the index
dbspaces on independant disks and channels , you will increase your disk IO
performance by reducing disk contention.
• This gain will be observed mainly during intensive sessions doing massive data
changes.
• Watch out the output of onstat –g iof and look for low IO thruput per second.
13
14. Few columns or many columns in the same index?
Key points to consider
• Remember about « cold » reads and « hot » reads when
testing the efficiency of an index. Results can be
dramatically different between cold and hot.
• The choice is as often a hard to obtain trade-off, and
definately a long subject to discuss!
• Many columns in a index can make it more selective, but it
also will consume more CPU/disk resource when updating
keys (see b-tree cleaner tuning)
• Few columns in an index can make it less selective, but it
will consume less CPU/disk resource when updating keys
• Integrity constraints are not negotiable, but some integrity
constraints indexes can be negotiated…
14
15. Few columns or many columns?
Techniques to evaluate efficiency
• time dbaccess dbname queryfile gives an
indication on the efficiency of an index, but can be
misleading due to cold and hot measure huge
differences.
• onmode –Y sessnum 1 will identify which
index(es) are used, also will inform on how many rows
have been scanned against how many rows have been
returned
• onstat –g his (sqltrace) will give fine detail
about response time, buffer and disk access, lock waits
etc…
• A complete diagnostic will be done with the 3 tools.
15
16. Few columns or Many columns?
Let’s analyze a real case: one column
16
Rows scanned: 4913
Response time: 0.0368’’
1 column index
buffer reads: 5900
17. Few columns or many columns?
Same case, index with 2 columns
17
Rows scanned: 106
Response time: 0.0047’’
2 columns index
Buffer reads: 122
18. Highly duplicated lead columns
indexes: how was life before?
• The Antique Informix Rule stated to avoid multi-
columns indexes with low selectivity for the
leading keys, due to poor efficiency.
Ex: warehouse_id,district_id,order_id,order_line
• Querying on order_line required to specify the
lead columns in the query predicate, or create
another index with order_line as lead column
• Restructuring indexes following those rules was a
complex, long and risky task, not to mention the
fact that any downtime due to index rebuilding
was poorly accepted by Operations Managers…
18
19. Index key first & self join : it’s magic!
• The key-first scan was introduced in 7.3. It has been enhanced so
that an index can be used even the lead columns are not specified
in the where clause
• The index self join technique has been introduced in IDS 11.10,
although many DBA’s didn’t even notice it!
• By scanning subsets of the poorly selective composite index, the
engine manages to use the non-subsequent index keys as index
filters, transforming the index into a highly selective index.
• Hierarchical-like indexes with highly duplicated lead columns now
need no redefinition to be efficient.
• You need not building new indexes with highly selective lead
columns. This saves optimizer work and disk space.
• Index self join is enabled by default. You can, if you persist in not
using it, disable it either by setting INDEX_SELFJOIN 0 in onconfig or
with an optimizer directive {+AVOID_INDEX_SJ}
19
20. Index self-join: the test
• We will use the order_line TPC-C table, that contains
23,735,211 rows
• The index follows the hierarchy, which was formerly
considered as a poor implementation:
ol_w_id: warehouse id (50 distinct values)
ol_d_id: district id (10 distinct values)
ol_o_id: order number ( 9279 distinct values)
ol_number: order line number (14 distinct values)
• The challenging query is
SELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount)
FROM order_line
GROUP BY 1,2
ORDER BY 2,3
20
21. No Self join
• Use onmode -wm INDEX_SELFJOIN=0 to disable self join
21
Index is taken, but only key first
Many rows scanned
Response time: 11.258’’
22. Self join: find the differences!
22
Key-first + self join access
Rows scanned: =~ 100 times less
RT: 3.313’’
24. The AIR says:
“you will use only one index per table”
• The Antique Informix Rule stated that only one
index per table could be used
• The optimizer had to choose only one index
among several indexes for the same table,
although several indexes were needed.
• Many not so unrealistic query cases had to be
drastically re-written in order to provide
acceptable response times
• The trick was generally to use an UNION or a
nested query, but the query code readability and
maintenability suffered from that.
24
25. What A.I.R. obliged you to do
• Generally, the best way to workaround the RT
issue was to use either UNION or nested queries
• The trick could be efficient in terms of Response
Time, but the code got more complex to read and
to maintain
• This workaround needed to strongly modify the
application code, and needed detailed and
accurate tests to obtain the same results as with
the initial query
25
26. The optimizer constantly getting
smarter across releases
• An optimizer enhancement introduced the use
of several indexes on the same table, but only
if the where clauses were linked with the ‘OR’
operator.
• The query path is like a usual INDEX PATH, the
difference being the use of several indexes
26
27. Measure with INDEX PATH
Use of 3 indexes!
Simple INDEX PATH
Scanned rows: 376,000
RT: 2.489’’
27
Disk reads:: 34136
28. Multi index: different path
33% gain in RT
Multi-index /skip scan enabled
Response Time is shorter
3 indexes used
Disk reads: 1984
28
29. Multiple indexes:
what should be done?
• Generally, the optimizer decides correctly which is the best path
• You can compare the results with the use of UNION, then decide
between keeping hard to maintain code or not
• You can nonetheless use optimizer directives to force the access
method, like
{+ AVOID_MULTI_INDEX (clients)}
To force INDEX PATH
• Or
{+ MULTI_INDEX (clients)}
TO force multi index SKIP SCAN path
• Can get tricky to make a self choice if AND and OR conditions are
set on the involved indexes
• The difference is almost not visible in case of hot measure
• Statistics on indexes are very important, the access method can
change according to them!
29
30. Star join
• Star join is an extension of the MULTI INDEX concept
• It combines this technique with DYNAMIC HASH JOINS
• The technique has been ported from XPS to IDS 11.70
• It is used exclusively for DS/OLAP queries where a FACT
table is the center point of many dimension tables
• Requires PDQPRIORITY ( Ultimate Edition or Enterprise
Edition )
• If you consider using Star Join, you are an excellent
candidate to see a demo of Informix Warehouse
Accelerator!
30
31. The A.I.R says:
« you will avoid indexes with too many tree levels »
• Ok, but what could I do to solve that ?
My indexes are built with the data they
have inside, and nothing or almost
nothing can be done
• Databases and tables are getting
bigger and bigger, and
splitting/archiving part of the data is
not always an acceptable solution
31
32. FOREST OF TREES INDEXES
• The forest of trees index type has been
introduced in 11.70 xC1
• It replicates the model of a traditionnal B-
TREE, having several root nodes instead of
only one root node
• The forest of trees brings benefits when
contention against the root node is observed
32
33. Reducing b-tree levels number
on index « lastname,firstname »
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree
=> The initial number of b-trees levels is 6
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 10 buckets
=> The number of b-trees levels decreased to 5
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 100 buckets
=> The number of b-trees levels decreased to 4
• create index "informix".id_clients_02 on "informix".clients (lastname,
firstname) using btree hash on (lastname) with 1000 buckets
=> The number of b-trees levels decreased to 3
33
34. Tpcc with regular b-tree indexes
• Index iu_stock_01 has 4 levels
Tpcc result is 14093 tpmC
High contention on
iu_stock_01: 8,704,052 spins
in 4 mn
34
35. Tpcc with FOT on iu_stock_01
• create unique index iu_stock_01 on stock (s_w_id,s_i_id)
using btree in data03 HASH on (s_w_id) with 50 buckets;
• Index iu_stock_01 has now 3 levels
Result grew to 16413 tpmC
Contention on iu_stock_01
decreased from 8,704,000
to 149,600 spins in 4mn
iu_oorder_01 is now a good
candidate for FOT!
35
36. Main facts on FOT indexes
• FOT is very efficient on reducing concurrency on indexes
access => Better RT in OLTP context
• FOT is very efficient to reduce levels of B-TREE => Better
overall RT
• Ideal for primary keys and foreign keys in an high
concurrency OLTP context
• Implementation is easy and fast
• Supports main index functionality: ER, PK, FK, b-tree
cleaning…
• Does not support aggregate queries, range scans on HASH
ON columns
• Also does not support index clustering, index fillfactor and
functional(UDR based) indexes
36
37. Optimizing big index creation:
PSORT_NPROCS
• The PSORT_NPROCS env variable is used to allocate more
threads to the sort package, which is also used for parallel
index creation.
• Significant performance improvements on index creation
can be obtained on multi-core/multi-processor servers
• It can be used even with non PDQPRIORITY-enabled
editions if the server has more than one core/CPU.
• PSORT_NPROCS can unleash the memory consumption:
please check for available memory on the server.
• The onconfig parameter DS_NONPDQ_QUERY_MEM has to
be checked if using PSORT_NPROCS.
37
38. Optimizing big index creation
DBSPACETEMP or PSORT_DBTEMP
• The env variables DBSPACETEMP overrides the
same onconfig parameter.
• Generally raw-device based temp dbspaces offer
more performance than file system based files.
• PSORT_DBTEMP write temporary sort files in the
specified file-system based directories instead of
DBSPACETEMP.
• It is useful to spread the temporary sort files to a
wider list of directories mounted on different
spindles
38
39. PSORT_NPROCS/PSORT_DBTEMP:
facts
• create index id_clients_02 on clients(lastname,firstname)
• unset PSORT_NPROCS
unset PSORT_DBTEMP
=> 13m28.709s
• export PSORT_NPROCS=3
export PSORT_DBTEMP=
/tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/id
s_chunks/ids_space03
=> 6m19
• A ram disk, or even a SSD drive can improve performance a lot:
export PSORT_NPROCS=3
export PSORT_DBTEMP=/mnt/myramdisk
=> 4m22.030s
• To check the environment of the session:
onstat –g env SessionNumber
39
40. Index disable: What happens?
• Disabling an existing index will prevent the server from using this
index, but it will « remember » the index schema.
• This technique can be applied before executing massive data insert
or update, since it will alleviate the index keys update workload.
• Heavy side effects can be expected: loss of key unicity, loss of
performance…
• If you run a query on a disabled index, the optimizer will probably
choose a sequential scan unless a better path is found.
• The index will be seen as ‘disabled’ in dbschema, but will not be
seen in oncheck –pT no oncheck –pe
• Disabling an index will make its former disk space available in the
dbspace
• Disabling an index is immediate
• Syntax is: set indexes IndexName disabled
40
41. Index enable: what happens?
• Enabling an index will rebuild the index physically,
with the same definition as before
• Enabling an index takes as much time as creating
the same index
• But the enable statement is simpler to type than the
create index statement
• + you do not have to remember the initial create
index statement
• Syntax is: set indexes IndexName enabled
41
42. Digging for more performance:
Disable foreign key indexes
• Many times, foreign key indexes are a part of the same table’s primary
key.
• order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number)
order_line foreign key (ol_w_id,ol_d_id,ol_o_id)
• Using ‘disable index’ in the add constraint statement will save the
creation of an ‘unuseful’ index, because its structure is already existing
in the primary key.
• ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id)
REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED);
• This implementation will save disk space by dropping an index
• CPU resource will be saved when updating/deleting/creating index keys,
• and consequently disk IO will also be saved.
• Check that disabling the constraint index has no hidden side effects, an
mistake can have expensive consequences!
42
43. I need to create a new index,
but users are always connected to the table!
• Sometimes a new index needs to be created, but
the tables are accessed by users or batches.
• IDS 11.10 introduced the possibility to create an
index without putting an exclusive lock on the table,
called index online.
• Users can SELECT, INSERT, UPDATE or DELETE rows
in the table while the index is being created
• Syntax is:
create index id_clients_01 on clients(lastname,firstname)ONLINE
• Drop index online is also available in the same
conditions
43
44. Create index online:
precautions & restrictions
• The create index online is a complex operation, involving
table snapshot, base index build catch up and more.
• It will request additional resources, such as disk space, CPU
and memory in order to make the operation safe and as
fast as possible.
• Long transactions may happen: check logical logs size
before diving
• The index pre-image pool memory size is managed with the
onconfig parameter ONLIDX_MAXMEM, updatable with
onmode –wm
• No appliable for cluster index, UDT columns, no UDR
indexes
• Only one create index online per table at the same time
44
45. Index compression
• IDS introduced table compression in 11.50 xC4. This technology is now
used successfully in large databases implementations.
• Index compression is a new feature of IDS 12.10. It is based on the
same technology as table compression.
• The principle is to compress the key columns values at b-tree leaf level,
but not the rowids attached to these key values
• Index compression is very effective for indexes having large key values:
names, item names etc…
• The compression dictionary must contain at least 2000 unique key
values
• Index compression is an excellent way to save disk space, and …
• Since more key values fit in an index page, more key values can be read
in one IO cycle => IO is more efficient
• Reducing IO must enhance index access performance in large queries
45
46. Index compression:
Disk space gained
• Execute function task ("index compress", "id_clients_01", "staging");
• Or
execute function task(“index compress”, “j”,“testdb”);
• Or
create index id_clients_01 on clients(lastname,firstname) compressed
More than 50% compression rate
46
47. Cluster index
• The creation or alter of a cluster index will physically sort
the table data by the first column of this index at creation
time
• Accessing a table data with a cluster index will read already
sorted data pages.
• Generally makes IO on data pages easier because they are
contiguous => Decrease RT
• The cluster level will decrease as long as new rows are
insert
• High cost of administration: re-clustering this index will
rewrite the table data pages
• Cluster index can be good for stable tables accessed in a
ordered sequential way
47
48. Statistics on indexes
• Introduced in 11.70: when one creates an index,
the distributions for this index are automatically
created
• High mode statistics are generated for the lead
column
• Index levels statistics are also generated in low
mode
• This will not stop you from regularly updating
statistics for those indexes, but it is no more
required to do it just after the index creation