1. Applying a Blockcentric Approach to Oracle
Tuning
Overview
In recent years Oracle tuning methodology has shifted away from ratios and percentages to waits and events.
The new methods focus on the key component to user satisfaction, response time. By determining elapsed time
for each set of operations and identifying the bottlenecks, system performance can be improved, with a
corresponding increase in user satisfaction.
This paper presents a new method for application tuning that shifts the focus away from rows and towards
blocks. Simply stated, tune by reducing the number of blocks that a statement must read. It is built upon a
foundation of solid research from the leaders (see references) in Oracle tuning and troubleshooting. The
presentation is very high level and will not provide all the answers. In fact, the intent is to make you begin to
ask questions, and provide some guidance in finding the answers that are right for your database. “Trust, But
Verify.”
Your current perceptions of tuning may be challenged. Be patient, kind reader, and all shall be made clear (I
hope). In the meantime, indulge my tuning heresy.
“Burn him. Burn him at the stake. He speaks not the truth”
Oracle’s I/O Model
The cornerstone of the Blockcentric Approach is the Oracle I/O subsystem, how it reads and writes data. Most
of the work that the database does is the storage and retrieval of data. Let's set the foundation (of nice dry,
burnable wood) for the approach.
It’s The Block
When a query requests a single-byte column from a single row in a table, the block is read in its entirety. In
order to make large read operations more efficient, Oracle can request multiple blocks from disk with a single
operation. These multiblock reads are most often associated with full table scans, but are also used with full
index scans. Full table scans read ALL blocks from the segment header up to the high water mark, the last block
ever formatted to hold data.
Logical Reads
A logical read is one where the block is already resident in memory. However, it is not as simple as looking up
a memory address. The data block address must be determined and then hashed to a lookup value. Latches must
be acquired, chains must be walked, additional latches allocated, if possible. If a read consistent version must be
created from undo, the undo block(s) must be read using the same process and the undo applied.
Physical Reads
A physical read is where the block is not found in memory and must be requested from the storage subsystem.
Many of the current disk storage systems do not always issue a disk read request for each block read request
2. that Oracle issues. Blocks may be stored in hardware caches or prefetched while Oracle is processing the block
and preparing the next request.
Block Sizes
As databases have grown, so have standard block sizes. Not too many versions ago, many databases used 2k for
a block size. Now 8k is the defacto minimum, with some braving the waters of 32k. Larger block sizes may
mean more data per block and a higher density of rows. This allows for more rows to be read with a single I/O
request. While this has many benefits, it also has implications for tuning.
Mythconceptions
Buffer Cache Hit Ratio
The higher the buffer cache hit ratio the better the statement
This Buffer Cache Hit Ratio (BCHR) has long been a keystone of the tuning foundation. Although many
performance specialists regard this measure as something less than useful, it still persists among texts,
documentation, papers and other sources. When faced with a less than optimal BCHR, the usual response was to
increase the block buffers. If the ratio was still not good enough, add more memory so you could increase the
amount devoted to block buffers.
Unfortunately, there is little correlation between this ratio and the user’s satisfaction with system performance.
This ratio does not measure the response time. When faced with a slower than desired query, a user is rarely
satisfied with the response that it is "as good as it can be since the statement’s BCHR is 99.999%". In reality,
this does not address the user issue.
A statement with a lower BCHR may perform better
If you compare two statements that return the same data and one has a higher BCHR, is this the better
performer? Not necessarily! (“Burn him…”) The key is to compare the amount of resource that each statement
consumes, as measured by i/o operations, both logical and physical. By calculating the comparative costs, the
less resource intensive one can be identified. In order to determine statement cost, we need to assign costs to
each of 2 components, logical i/os and physical i/os. Please note that we are not addressing either the parse nor
execute phases of sql.
Testing and research have shown that a read from memory is not nearly as expensive as has been accepted. How
could this be? A logical i/o is not simply reading from disk, but it requires cpu processing, latch acquisition,
consistent read generation, etc. A physical i/o does not necessarily require a disk read as the block may already
reside in the disk cache.
In reality, physical I/O operations are only 40 times as expensive as logical I/Os, not 10,000 times as is
commonly accepted. Where does 40 come from? This is based upon data gathered and analyzed by Hotsos,
et.al. The data is not tied to a particular Oracle release or server platform. Several independent sources have
similar numbers, so the data can be presumed to be accurate.
3. The intent is not to provide a 'HERE IS THE MAGIC NUMBER' solution, but rather a simple method for
calculating relative costs. There is a relatively simple formula for determining statement cost in terms of I/O.
Statement_cost = logical_ios + (physical_ios * 40).
Let’s apply this formula to two statements that return the exact same data. If the first statement has a 99%
BCHR and does 6000 logical and 60 physical reads, while the second statement has a 75% BCHR and 400
logical and 100 physical reads, the second statement will require less resource and should run faster. To
calculate the cost of each statement, multiply the number of physical reads by 40 and add to the number of
logical reads. Using this method, the first statement will cost 8400 (6000 + (60*40)) while the second costs
4400 (400 + (100*40)), thus indicating that the second statement is actually better performing.
Identifying Tuning Candidates
There are 2 ways to determine the statements most in need of tuning. You can wait until the line of disgruntled
users is beginning to form outside your cubicle or you can look at what is currently being executed in the
database. Many tuning tools will allow you to see the current sql statements with statistical information.
V$SQLAREA contains the columns that will provide you the information you need. If you sort according to the
sum of DISK_READS and BUFFER_GETS (in descending order), the most 'expensive' statements will be
listed first. In the example below, the first statement listed is consuming almost 20 times the resource as the
second 'worst' statement. Also notice that the top 4 most resource intensive statements have a 100% Buffer
Cache Hit Ratio…all buffer gets and no disk reads!
select address, substr(sql_text,1,50), disk_reads disk,
buffer_gets buffer, executions execs
from v$sqlarea
where (disk_reads + buffer_gets) > 100000
order by (disk_reads+buffer_gets) desc;
ADDRESS SUBSTR(SQL_TEXT,1,50) DISK BUFFER EXECS
---------------- -------------------------------------------------- ----- -------- -------
0000000387DFD0B0 DELETE FROM USER_INFORMATION_V WHERE CONNECTED_USE 0 99857399 85968
00000003889FC7C0 DELETE FROM CONNECTED_SESSION WHERE ID = :1 0 5059551 4324
0000000384327720 DELETE FROM USER_INFORMATION_V WHERE CONNECTED_USE 0 5049585 4324
00000003855230D8 select count(1) from "VIRT"."USER_INFORMATION" whe 0 5020164 4324
0000000389C1A098 SELECT di.db_name from global_name gn, db_name 8 3012275 501844
0000000387E59AE0 BEGIN CreateNewUserAndConnection_v3(:1, :2, :3, : 69 2825583 30550
0000000389816890 INSERT INTO ACTIVE_USER_V (ID, CON 50 2235420 30550
0000000387E3AFF0 UPDATE concurrent_users_v set running_co 22 1760499 33152
0000000389C62170 select count(*) from dual 4 945086 315018
0000000389B22088 SELECT user_session_seq.NEXTVAL FROM DUAL 0 165386 54359
Index Usage
If more than 15% of the rows are returned use an index
This ratio has long been one of the most important considerations to create and use an index. If the number of
rows to be returned by the query was less than 15% (5% for parallel query), then it was ‘more efficient’ to use
an index. This is a result of reading several index blocks to locate the requested value and the number of blocks
that can be read at a single time with multiple blocks.
4. The number of rows to be read is not a valid criteria , as Oracle does not read rows. The rows of interest could
be clustered together, scattered amongst many blocks or have chained/migrated so that each row requires
several blocks to be read.
The percentage of blocks read is the criteria
Remember, when a query retrieves rows, it reads blocks. If the same number of rows are clustered in a few
blocks or scattered amongst most of the blocks, different access paths are indicated. If the table is badly
fragmented with many empty blocks below the high water mark, index access paths may produce better
performing statements. It may be better to use an index to return 100% of the rows and a full table scan to return
less than 1% of the rows. (“Sniff, sniff…is that burning wood I smell?”)
The more telling criteria are the number of blocks to be returned. Ask the question, "How many blocks must
this statement read in order to satisfy the query?" Answer the question in proportion to the number of blocks in
the table that are below the high water mark, not the number of blocks that contain data. There are three areas
that highlight this issue. The examples below are from a combination of tests, using a 1 million row table.
Table Block Fragmentation
Table Block Fragmentation occurs when there are empty blocks in the table below the high watermark. The
usual culprits are massive deletes and failed transactions. When rows are deleted and blocks become 'empty'
they are still allocated to the table so that they can be reused.
If the table contains a large proportion of empty blocks below the high water mark, an index may be the best
choice, even when all of the rows are to be returned by the query. If a full table scan is performed, many of the
blocks read will be empty of data. An index lookup will only read those blocks that actually contain data.
Retrieve 100% of the Rows using I/Os
Full Table Scan 14,769
Using an Index 8,102
As you can see from the above example, with a badly fragmented table, using an index instead of a full table
scan consumes 45% less I/O. The table had 1 million rows inserted, then all but 30,000 rows were deleted so
that all the contents of individual blocks were removed, leaving completely empty blocks while others were
filled with data.
When a table has a high high watermark, the I/O savings are significantly higher. Even though a small fraction
of the blocks actually contain data, the highwater mark is set as if the table contained 10 times the data.
Retrieve 100% of the Rows using I/Os
Full Table Scan 13,406
Using an Index 1,560
In this example, the difference in I/O is very significant, with an Index read consuming almost 1/10 the amount
of I/O that a Full Table Scan uses. This table had 1 million rows inserted and then all but 10,000 rows were
deleted. All the rows were in the blocks at the ‘beginning’ of the segment, in the same extent as the segment
header.
5. Scattered Data
If the data of interest is scattered among a high proportion of the blocks, a full table scan may be a better
performing access path even when a very small fraction of the rows are to be returned. One key factor to
consider is the density of the rows within blocks. For a very small row size with a large block size, scattering
becomes more common.
Retrieve 1% of the Rows using I/Os
Using an Index 13,451
Full Table Scan 13,180
In this example, the full table scan performed fewer I/Os. The I/O savings are not nearly as dramatic as the
other examples, but it does indicate that the accepted rule of indexing does provide correct answers at all times.
Because the data is scattered, all of the table blocks (and a significant number of index blocks) must be read.
Clustered Data
If the data of interest is clustered, index scans may produce better results. Depending on the health of the index,
an index may be more efficient with over 15% of the rows. Clustering can occur naturally, as in the case of
daily sales records, or it may be artificial, as in the case of partitioned tables.
Retrieve 7% of the Rows using an Index I/Os
Data Scattered 7,282
Data Clustered 7,122
Data Partitioned (Scan of Partition) 3,751
In this example, the clustering of data made a small (< 5%) improvement in the number of I/Os. Oracle provides
partitioning as a method to enforce data clustering and enable the sql engine to take advantage of it. Here, the
number of I/Os drops by almost 50% by scanning the table partition.
Determining Block Selectivity
To determine the block selectivity for either a table or a data value, you need to determine how many blocks are
below the high watermark and how many blocks are used by data. The tables must be analyzed and you must be
able to execute the dbms_rowid package. In the user_table view, the column blocks indicates the number of
blocks below the high watermark. This is the number of blocks read by a full table scan.
There is one condition that makes it very difficult to determine the block selectivity of individual data values. If
a row is chained or migrated, the dbms_rowid package only reads the rowid of the head piece. If the value of
chain_cnt in the user_table view is high, it may cause the calculations to be incorrect.
Identifying Table Block Fragmentation
For each table, use the following query to determine the number of blocks containing data and those that are
below the high watermark. Replace sequential_data with the table name that you want to examine.
select count(dbms_rowid.rowid_block_number(s.rowid)),
blocks
from sequential_data s,
user_tables
6. where table_name = 'SEQUENTIAL_DATA'
group by blocks;
In the below example, the query was executed for 3 different tables. It indicates that a full table scan of
hwm_data will read 6483 blocks, while only 76 of the blocks actually contain data, while a full table scan of
sequential_data will access very few empty blocks.
Table Blocks In Use Blocks below HWM % Of Used Blocks
sequential_data 6348 6406 99%
frag_data 1967 6483 30%
hwm_data 76 6483 1%
Identifying Block Selectivity for Data Values
In the past, indexing was based upon the % of Rows rule. However, this is not a reliable indicator as we have
seen. It is better to identify the number of blocks to be read for a given data value. Using the num_rows and
blocks columns from the user_tables view, we can determine the block selectivity for a given value. In the
statement below, we are checking the block selectivity for the column small_random_num in the
sequential_data table.
select t.small_random_num data_value,
count(t.small_random_num) data_rows,
num_rows,
round(((count(t.small_random_num)/num_rows)*100),2) pct_rows,
count(distinct dbms_rowid.rowid_block_number(t.rowid)) data_blocks,
blocks,
round(((count(distinct dbms_rowid.rowid_block_number(t.rowid))/blocks)*100)
,2) pct_blocks
from sequential_data t,
user_tables
where table_name = 'SEQUENTIAL_DATA'
group by blocks, num_rows, t.small_random_num;
The query provides the following data, which indicates that there is a slight discrepancy between the row level
selectivity of a value and the block level selectivity. This indicates that the data value of 0 exists in almost 25%
of the data blocks.
DATA_VALUE DATA_ROWS NUM_ROWS PCT_ROWS DATA_BLOCKS BLOCKS PCT_BLOCKS
---------- --------- -------- -------- ----------- ------ ----------
0 152495 1000000 15.25 1564 6406 24.41
1 148751 1000000 14.88 1572 6406 24.54
2 147019 1000000 14.70 1584 6406 24.73
3 149802 1000000 14.98 1567 6406 24.46
4 132798 1000000 13.28 1396 6406 21.79
5 133659 1000000 13.37 1409 6406 22.00
6 135476 1000000 13.55 1424 6406 22.23
If we execute the same query (modified for the new table) against the frag_data table, the results are quite
different. In this case, the data value of 0 exists in 5% of the blocks.
DATA_VALUE DATA_ROWS NUM_ROWS PCT_ROWS DATA_BLOCKS BLOCKS PCT_BLOCKS
---------- --------- -------- -------- ----------- ------ ----------
7. 0 4431 30480 14.54 322 6483 4.97
1 4410 30480 14.47 325 6483 5.01
2 4753 30480 15.59 350 6483 5.40
3 4755 30480 15.60 347 6483 5.35
4 3887 30480 12.75 284 6483 4.38
5 4105 30480 13.47 297 6483 4.58
6 4139 30480 13.58 291 6483 4.49
Examining a different data value in the sequential_data table, where the data value is very, very scattered, the
results are quite dramatic. Even though the individual values (not all shown) compose an extremely small
percentage of the rows, the value exists in almost every block in the table!
DATA_VALUE DATA_ROWS NUM_ROWS PCT_ROWS DATA_BLOCKS BLOCKS PCT_BLOCKS
---------- --------- -------- -------- ----------- ------ ----------
0 6348 1000000 .63 6348 6406 99.09
1 6348 1000000 .63 6348 6406 99.09
2 6348 1000000 .63 6348 6406 99.09
3 6348 1000000 .63 6348 6406 99.09
4 6348 1000000 .63 6348 6406 99.09
5 6348 1000000 .63 6348 6406 99.09
6 6348 1000000 .63 6348 6406 99.09
7 6348 1000000 .63 6348 6406 99.09
8 6348 1000000 .63 6348 6406 99.09
9 6348 1000000 .63 6348 6406 99.09
10 6348 1000000 .63 6348 6406 99.09
Conclusion
I realize that I have presented no definitive answers to tuning and probably generated far more doubt and
questions. If I have, then I have succeeded. At some point, you have probably raised your eyebrows and said,
“That is WRONG!” My response, “Prove it!” No, seriously, set up a test case, run it, send me the results. Of
course, if your test case proves me right, send it to me as well. Any and all information related to this approach
will help the Oracle community. I can be reached at optimaldba@yahoo.com and the ongoing results of the
research will be posted at www.optimaldba.com.
References
The number of papers, presentations and books are too numerous to list individually. Instead, here is a list of
websites and 1 book that has provided valuable information and research. I am deeply indebted to the authors
for their hard work that I have synthesized within this article.
Oracle Performance Tuning 101 – Vaidyanatha, Deshpande & Kostelac. THE performance tuning manual.
“Learn it, Know it, Live it.”
www.hotsos.com - Cary Millsap’s website. His Oracle Internal’s presentation at IOUG-A Live 2002 provided
me with the ‘Ah-Ha’ moment that inspired this approach.
www.evdbt.com - Tim Gorman’s site, with great contributions by Jeff Maresh. Excellent research, papers and
presentations.
8. www.orapub.com - Craig Shallahamer’s site. Another of the gurus that consistently provide new ideas backed
with solid research.