Design and develop with performance in mind
Establish a tuning environment
Index wisely
Reduce parsing
Take advantage of Cost Based Optimizer
Avoid accidental table scans
Optimize necessary table scans
Optimize joins
Use array processing
Consider PL/SQL for “tricky” SQL
2. Top 10 Oracle SQL tuning tips
1. Design and develop with performance in mind
2. Establish a tuning environment
3. Index wisely
4. Reduce parsing
5. Take advantage of Cost Based Optimizer
6. Avoid accidental table scans
7. Optimize necessary table scans
8. Optimize joins
9. Use array processing
10. Consider PL/SQL for “tricky” SQL
3. Hint #1: Design and develop with
performance in mind
Explicitly identify performance targets
Focus on critical transactions
– Test the SQL for these transactions against simulations of production
data
Measure performance as early as possible
Consider prototyping critical portions of the applications
Consider de-normalization and other performance by design
features early on
4. Hint #2: Establish a tuning and
development environment
A significant portion of SQL that performs poorly in production
was originally crafted against empty or nearly empty tables.
Make sure you establish a reasonable sub-set of production
data that is used during development and tuning of SQL
Make sure your developers understand EXPLAIN PLAN and
tkprof, or equip them with commercial tuning tools.
5. Understanding SQL tuning tools
The foundation tools for SQL tuning are:
– The EXPLAIN PLAN command
– The SQL Trace facility
– The tkprof trace file formatter
Effective SQL tuning requires either familiarity with
these tools or the use of commercial alternatives such
as SQLab
6. EXPLAIN PLAN
The EXPLAIN PLAN reveals the execution plan for an SQL
statement.
The execution plan reveals the exact sequence of steps that
the Oracle optimizer has chosen to employ to process the SQL.
The execution plan is stored in an Oracle table called the “plan
table”
Suitably formatted queries can be used to extract the execution
plan from the plan table.
7. A simple EXPLAIN PLAN
SQL> EXPLAIN PLAN FOR select count(*) from sales
where product_id=1;
Explained.
SQL> SELECT RTRIM (LPAD (' ', 2 * LEVEL) || RTRIM (operation)
||' '||RTRIM (options) || ' ' || object_name) query_plan
2 FROM plan_table
3 CONNECT BY PRIOR id = parent_id
4* START WITH id = 0
QUERY_PLAN
--------------------------------------------
SELECT STATEMENT
SORT AGGREGATE
TABLE ACCESS FULL SALES
8. Interpreting EXPLAIN PLAN
The more heavily indented an access path is, the
earlier it is executed.
If two steps are indented at the same level, the
uppermost statement is executed first.
Some access paths are “joined” – such as an index
access that is followed by a table lookup.
9. A more complex EXPLAIN PLAN
SELECT STATEMENT
VIEW SYS_DBA_SEGS
UNION-ALL
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
VIEW SYS_OBJECTS
UNION-ALL
TABLE ACCESS FULL TAB$
TABLE ACCESS FULL TABPART$
TABLE ACCESS FULL CLU$
TABLE ACCESS FULL IND$
TABLE ACCESS FULL INDPART$
TABLE ACCESS FULL LOB$
TABLE ACCESS FULL TABSUBPART$
TABLE ACCESS FULL INDSUBPART$
TABLE ACCESS FULL LOBFRAG$
TABLE ACCESS BY INDEX ROWID OBJ$
INDEX UNIQUE SCAN I_OBJ1
10. SQL_TRACE and tkprof
ALTER SESSION SET SQL_TRACE TRUE causes a trace of
SQL execution to be generated.
The TKPROF utility formats the resulting output.
Tkprof output contains breakdown of execution statistics,
execution plan and rows returned for each step. These stats
are not available from any other source.
Tkprof is the most powerful tool, but requires a significant
learning curve.
12. Using SQLab
Because EXPLAIN PLAN and tkprof are unwieldy and hard to
interpret, third party tools that automate the process and
provide expert advice improve SQL tuning efficiency.
The Quest SQLab product:
– Identifies SQL your database that could benefit from tuning
– Provides a sophisticated tuning environment to examine, compare and
evaluate execution plans.
– Incorporates an expert system to advise on indexing and SQL
statement changes
13. SQLab SQL tuning lab
– Display execution plan in a variety of intuitive ways
– Provide easy access to statistics and other useful data
– Model changes to SQL and immediately see the results
14. SQLab Expert Advice
– SQLab provides specific advice on how to tune an SQL
statement
15. SQLab SQL trace integration
– SQLab can also retrieve the execution statistics that are otherwise only
available through tkprof
16. Hint #3: Index wisely
Index to support selective WHERE clauses and join conditions
Use concatenated indexes where appropriate
Consider overindexing to avoid table lookups
Consider advanced indexing options
– Hash Clusters
– Bit mapped indexes
– Index only tables
17. Effect of adding columns to a
concatenated index
– Novice SQL programmers often are satisfied if the
execution plan shows an index
– Make sure the index has all the columns required
4
6
20
40
700
0 100 200 300 400 500 600 700 800
Logical IO
Index on
Surname+firstname+dob+phoneo
Index on
Surname+firstname+DOB
Index on Surname+firstname
Merge 3 indexes
Surname index only
18. Bit-map indexes
– Contrary to widespread belief, can be effective when there
are many distinct column values
– Not suitable for OLTP however
0.01
0.1
1
10
100
1 10 100 1,000 10,000 100,000 1,000,000
Distinct values
Elapsedtime(s)
Bitmap index B*-Tree index Full table scan
19. Hint #4: Reduce parsing
Use bind variables
– Bind variables are key to application scalability
– If necessary in 8.1.6+, set cursor CURSOR_SHARING to
FORCE
Reuse cursors in your application code
– How to do this depends on your development language
Use a cursor cache
– Setting SESSION_CACHED_CURSORS (to 20 or so) can
help applications that are not re-using cursors
20. Hint #5: Take advantage of the Cost
Based Optimizer
The older rule based optimizer is inferior in almost
every respect to the modern cost based optimizer
Using the cost based optimizer effectively involves:
– Regular collection of table statistics using the ANALYZE or
DBMS_STATS command
– Understand hints and how they can be used to influence
SQL statement execution
– Choose the appropriate optimizer mode: FIRST_ROWS is
best for OLTP applications; ALL_ROWS suits reporting and
OLAP jobs
21. Hint #6: Avoid accidental tablescans
Tablescans that occur unintentionally are a major
source of poorly performing SQL. Causes include:
– Missing Index
– Using “!=“, “<>” or NOT
• Use inclusive range conditions or IN lists
– Looking for values that are NULL
• Use NOT NULL values with a default value
– Using functions on indexed columns
• Use “functional” indexes in Oracle8i
22. Hint #7: Optimize necessary table
scans
There are many occasions where a table scan is the only option.
If so:
– Consider parallel query option
– Try to reduce size of the table
• Adjust PCTFREE and PCTUSED
• Relocate infrequently used long columns or BLOBs
• Rebuild when necessary to reduce the high water mark
– Improve the caching of the table
• Use the CACHE hint or table property
• Implement KEEP and RECYCLE pools
– Partition the table (if you really seek a large subset of data)
– Consider the fast full index scan
23. Fast full index scan performance
– Use when you must read every row, but not every column
– Counting the rows in a table is a perfect example
2.44
4.94
5.23
12.53
17.76
19.83
0 5 10 15 20
Elapsed time (s)
Parallel fast full index
fast full index
Parallel table scan
Full table scan
Full index scan
Index range scan (RULE)
24. Hint #8: Optimize joins
Pick the best join method
– Nested loops joins are best for indexed joins of subsets
– Hash joins are usually the best choice for “big” joins
Pick the best join order
– Pick the best “driving” table
– Eliminate rows as early as possible in the join order
Optimize “special” joins when appropriate
– STAR joins for data-warehousing applications
– STAR_TRANSFORMATION if you have bitmap indexes
– ANTI-JOIN methods for NOT IN sub-queries
– SEMI-JOIN methods for EXISTS sub-queries
– Properly index CONNECT BY hierarchical queries
25. Optimizes queries using EXISTS where there is no
supporting index
select *
from customers c
where exists
(select 1 from employees e
where e.surname=c.contact_surname
and e.firstname=c.contact_firstname
and e.date_of_birth=c.date_of_birth)
Oracle 8 semi-joins
No index on employees
26. Oracle 8 semi-joins
Without the semi-join or supporting index, queries like
the one on the preceding slide will perform very badly.
Oracle will perform a tablescan of the inner table for
each row retrieved by the outer table
If customers has 100,000 rows, and employees 800
rows then 80 MILLION rows will be processed!
In Oracle7, you should create the index or use an IN-
based subquery
In Oracle8, the semi-join facility allows the query to be
resolved by a sort-merge or hash join.
27. To Use semi-joins
Set ALWAYS_SEMI_JOIN=HASH or MERGE in
INIT.ORA, OR
Use a MERGE_SJ or HASH_SJ hint in the subquery
of the SQL statement
SELECT *
FROM customers c
WHERE exists
(select /*+merge_sj*/ 1
from employees e
where ….)
28. Oracle8 semi-joins
The performance improvements are impressive (note
the logarithmic scale)
6.69
6.83
31.01
1,343.19
1 10 100 1,000 10,000
Elapsed time (logarithmic scale)
IN-based subquery
EXISTS - merge semi-join
EXISTS no semi-join but with index
EXISTS no semi-join or indexes
29. Star Join improvements
A STAR join involves a large “FACT” table being
joined to a number of smaller “dimension” tables
30. Star Join improvements
The Oracle7 Star join algorithm works well when there is a
concatenated index on all the FACT table columns
But when there are a large number of dimensions, creating
concatenated indexes for all possible queries is impossible.
Oracle8’s “Star transformation” involves re-wording the query
so that it can be supported by combinations of bitmap indexes.
Since bitmap indexes can be efficiently combined, a single
bitmap index on each column can support all possible queries.
31. To enable the star transformation
Create bitmap indexes on each of the FACT table
columns which are used in star queries
Make sure that
STAR_TRANSFORMATION_ENABLED is TRUE,
either by changing init.ora or using an ALTER
SESSION statement.
Use the STAR_TRANSFORMATION hint if necessary.
32. Drawback of Star transformation
Bitmap indexes reduce concurrency (row-level locking
may break down).
But remember that large number of distinct column
values may not matter
33. Star transformation performance
When there is no suitable concatenated index, the
Star transformation results in a significant
improvement
0.01
0.24
9.94
0.35
0 1 2 3 4 5 6 7 8 9 10
Elapsed time (s)
Concatenated index
No suitable concatenated index
Star_transformation
Star
34. Hint #9: Use ARRAY processing
– Retrieve or insert rows in batches, rather than one at a time.
– Methods of doing this are language specific
0
10
20
30
40
50
60
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
Array size
Elapsedtime
35. Hint #10: Consider PL/SQL for
“tricky” SQL
With SQL you specify the data you want, not how to
get it. Sometime you need to specifically dictate your
retrieval algorithms.
For example:
– Getting the second highest value
– Doing lookups on a low-high lookup table
– Correlated updates
– SQL with multiple complex correlated subqueries
– SQL that seems to hard to optimize unless it is broken into
multiple queries linked in PL/SQL
37. Bonus hint: When your SQL is
tuned, look to your Oracle
configuration
When SQL is inefficient there is limited benefit in investing in
Oracle server or operating system tuning.
However, once SQL is tuned, the limiting factor for
performance will be Oracle and operating system
configuration.
In particular, check for internal Oracle contention that
typically shows up as latch contention or unusual wait
conditions (buffer busy, free buffer, etc)
Editor's Notes
All too often, SQL tuning is performed on an application designed and developed with virtually no consideration given to performance requirements. SQL tuning is often the best option for tuning such applications, but it is both more efficient and effective to design an application with performance in mind. Producing good performance by design requires that the following activities occur within the development life-cycle: Explicit specification of the performance characteristics of the system at an early stage of development. Focus on critical transactions and queries during logical and physical modeling. Simulation of these SQL statements on prototype designs can often reveal key performance weaknesses. Effective simulation of the target environment. This can include simulating concurrent user loads and/or acquiring representative data volumes. Continual measurement of critical transaction performance. If possible, this should be integrated with other Quality Assurance measures.
It’s not uncommon for SQL that seemed to be working well in the development environment to exhibit poor performance once it is released to a production system. A primary cause of these unpleasant surprises is an inadequate development or volume testing environment. In particular, environments without realistic or representative data volumes are bound to lead to unrealistic SQL performance. The ideal tuning or development environment is one in which: Data volumes are realistic or at least proportional. With today’s increasingly large production databases it’s often not possible to duplicate production data volumes exactly. However, it should always be possible to construct a reasonable sub-set of the data. For instance, a 20% sample of a 1GB database may be adequate for performance tuning. At all costs, avoid the situation in which SQL developers are constructing and testing code against empty or almost empty tables - even the proverbial “SQL query from hell” will appear efficient in these environments. Tuning facilities are available. Supply your volume and development environments with as many tuning tools as you have available. This could involve third party SQL tuning tools but will at least involve enabling the default Oracle tools. Make sure developers know how to use EXPLAIN PLAN, SQL trace and tkprof. Make sure that relevant database options are set to enable their effective use (see Chapter 7). Documentation is available. This documentation should include the database design, index details, performance requirements and volume metrics. The SQL programmer needs all this information in order to effectively produce efficient SQL.
Image 3 slide Include name for graphic here. Graphic size = 300x354 pixels. X=5.08”, Y = 2.08”
Indexes exist primarily to improve the performance of SQL statements. In many cases, establishing the “best” indexes is easiest path to high performance. Use concatenated indexes Try not to use two indexes when one would do. If searching for SURNAME and FIRSTNAME, don’t necessarily create separate indexes for each column. Instead create a concatenated index on both SURNAME and FIRSTNAME. You can use the “leading” portions of a concatenated index on their own, so if you sometimes query on the SURNAME column without supplying the FIRSTNAME then SURNAME should come first in the index. Overindex to avoid a table lookup You can sometimes improve SQL execution by “overindexing”. Overindexing involves concatenating columns which appear in the SELECT clause, but not in the WHERE clause to the index. For instance, imagine we are searching on SURNAME and FIRSTNAME in order to find EMPLOYEE_ID. Our concatenated index on SURNAME and FIRSTNAME will allow us to quickly locate the row containing the appropriate EMPLOYEE_ID, but we will need to access both the index and the table. If there is an index on SURNAME, FIRSTNAME and EMPLOYEE_ID, then the query can be satisfied using the index alone. This technique can be particularly useful when optimizing joins, since intermediate tables in a join are sometimes queried merely to obtain the join key for the next table. Consider advanced indexing options Oracle default B*-tree indexes are flexible and efficient and are suitable for the majority of situations. However, Oracle offers a number of alternative indexing schemes that can improve performance in specific situations. Index clusters allow rows from one or more tables to be located in cluster key order. Clustering tables can result in a substantial improvement to join performance. However, table scans of an individual tables in the cluster can be severely degraded. Index clusters are usually only recommended for tables which are always accessed together. Even then, alternatives such as denormalization should be considered. In hash clusters , the key values are translated mathematically to a hash value. Rows are stored in the hash cluster based on this hash value. Locating a row when the hash key is known may require only a single I/O, rather than the 2-3 I/Os required by an index lookup. However, range scans of the hash key cannot be performed. Furthermore, if the cluster is poorly configured, or if the size of the cluster changes, then overflows on the hash keys can occur or the cluster can become sparsely populated. In the first case, hash key retrieval can degrade and in the second case table scans will be less efficient. Bit-mapped indexes suit queries on multiple columns made against multiple columns which each have relatively few distinct values. They are more compact that a corresponding concatenated index and, unlike the concatenated index, they can support queries in which the columns appear in any combination. However, bit-mapped indexes are not suitable for tables which high modification rates, since locking of bit-mapped indexes occurs at the block, rather than row level. In an Index-organized table , all table data is stored within a B*-tree index structure. This can improve access to data via the primary key, and reduces the redundancy of storing key values both in the index and in the table. Index-organized tables can be configured with infrequently accessed columns located in an overflow segment. This keeps the index structure relatively small and efficient. Make sure your query uses the best index Novice SQL programmers are often satisfied providing that the execution plan for their SQL statement uses any index. However, there will sometimes be a choice of indexed retrievals and the Oracle optimizer - especially the older Rule Based Optimizer - will not always choose the best index. Make sure that the indexes being selected by Oracle are the most appropriate and use hints (discussed below) to change the index if necessary
Image 2 slide Include name for graphic here. Graphic size = 612x282 pixels. X=0.92”, Y = 3.17”
Image 2 slide Include name for graphic here. Graphic size = 612x282 pixels. X=0.92”, Y = 3.17”
“ Parsing” an SQL statement is the process of validating the SQL and determining the optimal execution plan. For SQL which has low I/O requirements but is frequently executed (for example, the SQL generated by OLTP-type applications) reducing the overhead of SQL parsing is very important. When an Oracle session needs to parse an SQL statement, it first looks for an identical SQL statement in the Oracle shared pool . If a matching statement cannot be found then Oracle will determine the optimal execution plan for the statement and store the parsed representation in the shared pool. The process of parsing SQL is CPU intensive. When I/O is well-tuned, the overhead of parsing the SQL can be a significant portion of the total overhead of executing the SQL. There are a number of ways of reducing the overhead of SQL statement parsing: Use bind variables . Bind variables allow the variable part of a query to be represented by “pointers” to program variables. If you use bind variables, the text of the SQL statement will not change from execution to execution and Oracle will usually find a match in the shared pool, dramatically reducing parse overhead Re-use cursors. Cursors (or context areas) are areas of memory that store the parsed representation of SQL statements. If you re-execute the same SQL statement more than once, then you can re-open an existing cursor and avoid issuing a parse call altogether. The mechanism of re-using cursors varies across development tools and programming languages. Appendix C contains some guidelines for specific tools. Use a cursor cache . If your development tool makes it hard or impossible to re-use cursors, you can instruct Oracle to create a cursor “cache” for each session using the SESSION_CACHED_CURSORS init.ora parameter. If SESSION_CACHED_CURSORS is greater than 0, then Oracle will store that number of recently re-executed cursors in a cache. If an SQL statement is re-executed, it may be found in the cache and a parse call avoided.
The component of the Oracle software that determines the execution plan for an SQL statement is known as the optimizer . Oracle supports two approaches to query optimization. The rule based optimizer determines the execution plan based on a set of rules which rank various access paths. For instance, an index-based retrieval has a lower rank than a full table scan and so the rule based optimizer will use indexes wherever possible. The cost based optimizer determines the execution plan based on an estimate of the computer resources (“the cost”) required to satisfy various access methods. The cost based optimizer uses statistics including the number of rows in a table and the number of distinct values in indexes to determine this optimum plan. Early experiences with the cost based optimizer in Oracle7 were often disappointing and gave the cost based optimizer a poor reputation in some quarters. However, the cost based optimizer has been improving in each release while the rule based optimizer is virtually unchanged since Oracle 7.0. Many advanced SQL access methods, such as star and hash joins are only available when you use the cost based optimizer. The cost based optimizer is the best choice for almost all new projects and converting from rule to cost based optimization will be worthwhile for many existing projects. Consider then following guidelines for getting the most from the cost based optimizer: Optimizer_mode . The default mode of the cost based optimizer (optimizer_mode=CHOOSE) will attempt to optimize the throughput - time taken to retrieve all rows - of SQL statements and will often favor full table scans over index lookups. When converting to cost based optimization, many users are disappointed to find that previously well-tuned index lookups change to long running table scans. To avoid this, set OPTMIZER_MODE=FIRST_ROWS in init.ora or ALTER SESSION SET OPTIMIZER_GOAL=FIRST_ROWS in your code. This instructs the cost based optimizer to minimize the time taken to retrieve the first row in your result set and encourages the use of indexes. Hints . No matter how sophisticated the cost based optimizer becomes, there will still be occasions when you need to modify it’s execution plan. SQL “hints” are usually the best way of doing this. Using hints, you can instruct the optimizer to pursue your preferred access paths (such as a preferred index), use the parallel query option, select a join order and so on. Hints are entered as comments following the first word in an SQL statement. The plus sign “+” in the comment lets Oracle know that the comment contains a hint. Hints are fully documented in Appendix A. In the following example, a hint is instructing the optimizer to use the CUST_I2 index: SELECT /*+ INDEX(CUSTOMERS CUST_I2) */ * FROM CUSTOMERS WHERE NAME=:cust_name Analyze your tables . The Cost Based Optimizer’s execution plans are calculated using table statistics collected by the analyze command. Make sure you analyze your tables regularly, that you analyze all your tables and that you analyze them at peak volumes (for instance, don’t analyze a table just before it is about to be loaded by a batch job). For small-medium tables, use ANALYZE TABLE table_name COMPUTE STATISTICS, for, for larger tables take a sample such as ANALYZE TABLE table_name ESTIMATE STATISTICS SAMPLE 20 PERCENT. Use histograms . Prior to Oracle 7.3, the, the Cost Based Optimizer would have available the number of distinct values in a column but not the distribution of data within the column. This meant that it might decline to use an index on a column with only a few values even if the particular value in question was very rare and would benefit from an index lookup. Histograms, introduced in Oracle 7.3, allow, allow column distribution data to be collected and will allow the Cost Based Optimizer to make better decisions. You create histograms with the FOR COLUMNS clause of the analyze command (for instance ANALYZE TABLE table_name COMPUTE STATISTICS FOR ALL INDEXED COLUMNS). Note that you can’t take advantage of histograms if you are using bind variables (which we discussed earlier).
One of the most fundamental SQL tuning problems is the “accidental” table scan. Accidental table scans usually occur when the SQL programmer tries to perform a search on an indexed column that can’t be supported by an index. This can occur when: Using != (not equal to ). Even if the not equals condition satisfies only a small number of rows, Oracle will not use an index to satisfy such a condition. Often, you can re-code these queries using > or IN conditions, which can be supported by index lookups. Searching for NULLS . Oracle won’t use an index to find null values, since null values are not usually stored in an index (the exception is a concatenated index entry where only some of the values are NULL). If you’re planning to search for values which are logically missing, consider changing the column to NOT NULL with a DEFAULT clause. For instance, you could set a default value of “UNKNOWN” and use the index to find these values. Using functions on indexed columns . Any function or operation on an indexed column will prevent Oracle from using an index on that column. For instance, Oracle can’t use an index to find SUBSTR(SURNAME,1,4)=’SMIT’. Instead of manipulating the column, try to manipulate the search condition. In the previous example, a better formulation would be SURNAME LIKE ‘SMIT%’. In Oracle8i you can create functional indexes , which are indexes created on functions, providing that the function always returns the same result when provided with the same inputs. This allows you, for instance, to create an index on UPPER(surname).
In many cases, avoiding a full table scan by using the best of all possible indexes is your aim. However, it’s often the case that a full table scan cannot be avoided. In these situations, you could consider some of the following techniques to improve table scan performance: Use the parallel query optionParallel Query Option Oracle’s Parallel Query Option is the most effective - although most resource intensive - way of improving the performance of full table scans. Parallel Query allocates multiple processes to an SQL statement that is based at least partially on a full table scan. The table is partitioned into distinct sets of blocks, and each process works on a different set of data. Further processes may be allocated - or the original processes recycled - to perform joins, sorts and other operations. The approach of allocating multiple processes to the table scan can reduce execution time dramatically if the hardware and database layout is suitable. In particular, the host computer should have multiple CPUs and/or the database should be spread across more than one disk device. You can enable the Parallel Query option with a PARALLEL hint or make it the default for a table with the PARALLEL table clause. Reduce the size of the table The performance of a full table scan will generally be proportional to the size of the table to be scanned. There are ways of reducing the size of the table quite substantially and thereby improving full table scan performance. Reduce PCTFREE . The PCTFREE table setting reserves a certain percentage of space in each block to allow for updates that increase the length of a row. By default, PCTFREE is set to 10%. If your table is rarely updated, or if the updates rarely increase the length of the row, you can reduce PCTFREE and hence reduce the overall size of the table. Increase PCTUSED . The PCTUSED table setting determines at what point blocks that have previously hit PCTFREE will again become eligible for inserts. The default value is 40%, which means that after hitting PCTFREE, the block will only become eligible for new rows when deletes reduce the amount of used space to 40%. If you increase PCTUSED, rows will be inserted into the table at an earlier time, blocks will be fuller on average and the table will be smaller. There may be a negative effect on INSERT performance - you’ll have to assess the trade off between scan and insert performance. Relocate long or infrequently used columns . If you have LONG (or big VARCHAR2 ) columns in the table which are not frequently accessed and never accessed via a full table scan (perhaps a bitmap image or embedded document) you should consider relocating these to a separate table. By relocating these columns you can substantially reduce the table’s size and hence improve full table scan performance. Note that Oracle8 LOB types will almost always be stored outsidestored outside of the core table data anyway. The CACHE hint Normally, rows retrieved by most full table scans are flushed almost immediately from Oracle’s cache. This is sensible since otherwise full table scans could completely saturate the cache and push out rows retrieved from index retrievals. However, this does mean that subsequent table scans of the same table are unlikely to find a match in the cache and will therefore incur a high physical IO rate. You can encourage Oracle to keep these rows within the cache by using the CACHE hint or the CACHE table setting. Oracle will then place the rows retrieved at the Least Recently Used end of the LRU chain and they will persist in the cache for a much longer period of time. Use partitioning If the number of rows you want to retrieve from a table is greater than an index lookup could effectively retrieve, but still only a fraction of the table itself (say between 10% and 40% of total), you could consider partitioning the table. For instance, suppose that a SALES table contains all sales records for the past 4 years and you frequently need to scan all sales records for the current financial year in order to calculate year-to-date totals. The proportion or rows scanned is far greater than an index lookup would comfortably support but is still only a fraction of the total table. If you partition the table by financial year you can restrict processing to only those records that match the appropriate financial year. This could potentially reduce scan time by 75% or more. Partitioning is discussed in detail in Chapter 13. Consider fast full index scan If a query needs to access all or most of the rows in a table, but only a subset of the columns, you can consider using a fast full index scan to retrieve the rows. To do this, you need to create an index which contains all the columns included in the select and where clauses of the query. If these columns comprise only a small subset of the entire table then the index will be substantially smaller and Oracle will be able to scan the index more quickly than it could scan the table. There will, of course, be an overhead involved in maintaining the index that will affect the performance of INSERT, UPDATE and DELETE statements.
Determining the optimal join order and method is a critical consideration when optimizing the performance of SQL that involves multi-table joins. Join Method Oracle supports three join methods: In a nested loops join, Oracle performs a search of the inner table for each row found in the outer table. This type of access is most often seen when there is an index on the inner table—since, otherwise, multiple “nested” table scans may result. When performing a sort merge join, Oracle must sort each table (or result set) by the value of the join columns. Once sorted, the two sets of data are merged, much as you might merge two sorted piles of numbered pages. When performing a hash join, Oracle builds a hash table for the smaller of the two tables. This hash table is then used to find matching rows in a somewhat similar fashion to the way an index is used in a nested loops join. The Nested loops method suits SQL that joins together subsets of table data and where there is an index to support the join. When larger amounts of data must be joined and/or there is no index, use the sort-merge or hash-join method. Hash join usually out-performs sort-merge, but will only be used if Cost Based Optimization is in effect or if a hint is used. Join Order Determining the best join order can be a hit-and-miss affair. The Cost Based Optimizer will usually pick a good join order but if it doesn’t, you can use the ORDERED clause to force the tables to be joined in the exact order in which they appear in the FROM clause. In general, it is better to eliminate rows earlier rather than later in the join process, so if a table has a restrictive WHERE clause condition you should favor it earlier in the join process. Special Joins Oracle provides a number of optimization for “special” join types. Some of these optimizations will be performed automatically if you are using the Cost Based Optimizer, but all can be invoked in either optimizer by use of hints: The Star join algorithm optimizes the join of a single massive “fact” table to multiple, smaller, “dimension” tables. The optimization can be invoked with the STAR hint. A further optimization rewrites the SQL to take advantage of bitmap indexes that might exist in the fact table. This optimization can be invoked by the STAR_TRANSFORMATION hint. The “anti-join” is usually expressed as a subquery using the NOT IN clause. Queries of this type can perform badly under the Rule based optimizer, but can run efficiently if the init.ora parameter ALWAYS_ANTI_JOIN is set to “HASH” or if a HASH_AJ hint is added to the subquery. The “semi-join” usually expressed as a subquery using the EXISTS clause. This query may perform poorly if there is no index supporting the subquery, but can be run efficiently if the init.ora parameter ALWAYS_SEMI_JOIN is set to HASH or if a HASH_SJ hint is added to the subquery. Hierarchical queries using the CONNECT BY operator will degrade rapidly as table volumes increase unless there is an index to support the CONNECT BY join condition.
Semi joins allow Oracle to efficiently optimize queries which include an EXISTS subquery, but which don’t have an index to support efficient execution of the subquery. Typically, these statements would perform very badly in Oracle7, because the EXISTS clause causes the subquery to be executed once for each row returned by the parent query. If the subquery involves a table scan or inefficient index scan then performance would be expected to be poor. In Oracle7, the solution was either to create a supporting index, or re-code the statement as a join or as a subquery using the IN operator.
Array processing refers to Oracle’s ability to insert or select more than one row in a single operation. For SQL which deals with multiple rows of data, array processing usually results in reductions of 50% or more in execution time (more if you’re working across the network). In some application environments, array processing is implemented automatically and you won’t have to do anything to enable this feature. In other environments array processing may be the responsibility of the programmer. Appendix C outlines how array processing can be activated in some popular development tools. On the principle that “if some is good, more must be better”, many programmers implement huge arrays. This can be overkill and may even reduce performance by increasing memory requirements for the program. Most of the gains of array processing are gained by increasing the array size from 1 to about 20. Further increases result in diminishing gains and you won’t normally see much improvement when increasing the array size over 100.
SQL is a non-procedural language that is admirably suited for most data retrieval and manipulation tasks. However, there are many circumstances in which a procedural approach will yield better results. In these circumstances, the PL/SQL language (or possibly Java) can be used in place of standard SQL. Although it’s not possible to exhaustively categorize all the situations in which PL/SQL can be used in place of standard SQL, it’s possible that PL/SQL is a valid alternative when: There is little or no requirement to return large quantities of data. For instance, UPDATE transactions or when retrieving only a single value or row. Standard SQL requires more resources than seems logically required and no combination of hints seem to work. This is particularly likely if there are some implicit characteristics of the data which the optimizer cannot “understand” or where the SQL is particularly complex. You have a clear idea of how the data should be retrieved and processed, but can’t implement your algorithm using standard SQL. Some of the specific circumstances in which PL/SQL was found to improve performance within this book were: Determining 2 nd highest values. Performing range lookups for table that have a LOW_VALUE and HIGH_VALUE column. Performing correlated updates where the same table is referenced within the WHERE and SET clauses of an UPDATE statement with a subquery within the SET clause. PL/SQL triggers can also be invaluable when implementing de-normalization.