3. What Are Wait Events?
Basically a section of code
Uses gettimeofday
All time is stuck in a bucket with a name
Multiple things can be stuck in the same bucket
Oracle is very well instrumented (but It’s not 100%)
There are 41 classes of wait events in 10.2.0.3
There are 878 wait events in 10.2.0.3
209 enqueue events
29 latch events
41 I/O events
4. What do you do with Wait Events?
See where Oracle is spending it’s time
Create a Resource Profile
Aggregate the time by Wait Event
Order the profile by time
Include # of occurrences and avg. time/event
Shows where to focus
Shows how much improvement can be made
5. Profiles
Basic Tool – maybe the most important
Shows up in Statspack / AWR reports
Shows up in tkprof output (10g)
Can be pulled out of trace files
Can be pulled from V$ tables (v$system_event)
Can be pulled from ash tables
Oracle is very well instrumented
6. Profiles
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 37,297 52.27
SQL*Net message from dblink 7,065,802 11,312 15.85
async disk IO 943,852 5,265 7.38
db file sequential read 13,042,432 3,493 4.90
direct path write 108,862 3,410 4.78
-------------------------------------------------------------
7. Why should developers know this stuff?
1. Because you need to know
where your code is spending it’s
time
2. Scalability is Important
8. 1. These are the events you should know by heart
2. The other 260 or so you can figure out
3. You need to know what causes them
4. You need to know what values are reasonable
5. You need to know what you can do to fix them
Top 10 List of Wait Events
select name, parameter1, parameter2, parameter3
from v$event_name
where name like nvl('&event_name',name)
order by name
9. Not a wait event – but often included in a profile
Time spent but not accounted for in other buckets
Primarily time spent doing lio (we hope)
We want it to be on top
It may be an indicator of inefficient plans
CPU
10. Single block read
Usually index block or data block via rowid
Can be done by fts to get chained rows
P1=file, P2=block, P3=# of blocks (always 1)
Always a user’s server process reading into buffer cache
Reasonable e values: <10ms
DB File Sequential Read
WAIT #14: nam='db file sequential read' ela= 36745 file#=1 block#=67745 blocks=1
obj#=84920
11. Do Less Work
tune SQL
reduce lio’s 7 / obj rule of thumb
Explain plan lies (see notes on this slide)
bigger buffer cache
Do the Work Faster
Speed up I/O
Call Jack
DB File Sequential Read – “fixes”
12. Explain plan appears to executes a separate code path
Even if it didn’t there is no guarantee TEST matches PROD
Best bet is to execute and look at V$SQL_PLAN
This is easy in 10g with dbms_xplan
select * from table(dbms_xplan.display_cursor('&sql_id','&child_no',''))
See an example in the notes section for this slide
Digression – Explain Plan Lies
13. Multi block read
Usually full table scan or index fast full scan
P1=file, P2=block, P3=# of blocks (always < MBRC)
Always a user’s server process reading into buffer cache
Reasonable ela values: <10ms
Relevant parameters:
DBFMBRC, System Stats: MBRC,
_db_file_exec_read_count, _db_file_optimizer_read_count
DB File Scattered Read
WAIT #14: nam='db file scattered read' ela= 19389 file#=139 block#=44 blocks=5
obj#=83580
14. Do Less Work (fewer multi-block reads)
MBRC
Better Indexes
System Stats
bigger buffer cache
Tune SQL
Do the Work Faster
Speed up I/O
DB File Scattered Read – “fixes”
15. Usually sorting to Temp
Can also be parallel query
Could also be insert append, etc…
Reasonable ela values: <20ms
Relevant parameters:
DBFMBRC
_db_file_direct_io_count (1M)
Direct Path Read/Write
WAIT #10: nam='direct path write' ela= 4475 p1=401 p2=1518353 p3=57
WAIT #10: nam='direct path read' ela= 16770 p1=401 p2=1482031 p3=63
17. User process wait for LGWR to flush dirty buffers on commit
Synchronous write event
P1=log buffer block
Reasonable ela values: <4ms
Log File Sync
WAIT #16: nam='log file sync' ela= 1286 buffer#=11910 p2=0 p3=0 obj#=84920
18. Do Less Work (fewer commits)
Autocommit?
Row at a time processing with commits?
Do the Work Faster
Speed Up I/O
No RAID 5
No Contention (other db, arch, ASM)
SS Disk
Improve LGWR Scheduling
Renice
Log File Sync – “fixes”
19. Wait for LGWR to write to current log files
Not necessarily synchronous write event
System I/O event
Reasonable ela values: <4ms
Occurs as follows:
Every 3 seconds
Log_buffer is 1/3 full or redo = 1M ( default _log_io_size)
Commit or rollback by user session
RAC needs to transfer a dirty block
Log File Parallel Write
20. Often fixed by fixing Log File Sync (i.e. speed up i/o, etc)
Force LGWR to write more often
Log File Parallel Write – “fixes”
21. Happens when database can’t switch to next log file
Freezes the whole database
Several Flavors
Checkpoint incomplete
Archiving needed
Reasonable ela values: 0ms
Log file switch …
WAIT #9: nam='log file switch (checkpoint incomplete)' ela= 986212 p1=0 p2=0 p3=0
22. Do less work
Make total online log space bigger
Force more frequent incremental checkpoints
Speed up i/o
Log file switch … – “fixes”
23. Contention event for the same block
Read by other session (new event in 10g)
Held by other session in incompatible mode
New event in 10g – read by other session
Multiple sections of code throw time in this bucket
P1=file, P2=block, P3=reason code
See metalink note (34405.1) for details
Buffer Busy Waits
WAIT #7: nam='read by other session' ela= 3251 file#=4 block#=188557 class#=1
obj#=53707 tim=1183096831514887
24. It’s complicated – but basically - eliminate the contention
For Read by Other Session
Do Less Work (tune SQL to do less lio)
Eliminate i/o – bigger buffer cache, etc…
Speed up i/o
For Others
Find type of hot objects and statements suffering
Address issue - ASSM, Freelist groups (RAC), etc..
Buffer Busy Waits - “fixes”
25. No place to put a new block
Free Buffer Waits
26. Do Less Work
Reduce the amount of lio
Make more buffer space available
Add memory to the buffer cache
Speed up DBWR to flush blocks more quickly
Async_io
More dbwr processes
Renice
Free Buffer Waits - “fixes”
27. Database is idle – waiting on next request from client
Often marked as an idle event which it may be
Time distribution can be skewed
Common technique to ignore anything over 1 sec
Reasonable e values: <2ms
Protocol should be tcp (or beq for local processes)
SQL*Net message from client
WAIT #1: nam='SQL*Net message from client' ela= 336 driver id=1650815232 #bytes=1 p3=0 obj#=83660
28. Speed up app server
Reduce the number of calls
Use the right protocol (tcp to beq)
Call network guy
Get the users to type faster
SQL*Net message from client – “fixes”
29. More than 1 packet required to send SQL statement
SQL*Net more data from client
30. Don’t send 50K SQL statements
Use packages
SQL*Net more data from client – “fixes”
31. Time to send a SQL*Net message to a client (sort of)
Actually time it took to write the return data from Oracle’s userland
SDU buffer to OS kernel-land TCP socket buffer (Tanel Poder)
Often marked as an idle event which it may be
Reasonable e values: <1ms (micro seconds actually)
P1=bytes (usually 1 ???)
SQL*Net message to Client
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660
32. Use correct protocol
Fix network issue
Call Andy
SQL*Net message to Client – “fixes”
33. When more than one packet required
Rows spanning block
LOB’s
Array fetches
Often marked as an idle event which it may be
Reasonable e values: <1ms
P1=bytes (usually 1 ???)
SQL*Net more data to client
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660
34. Fix chained/migrated rows
Don’t use LOBs
Nothing to fix
SQL*Net more data to client – “fixes”
35. Locks for objects
Que up in order
9i lumped them all together
10g has 209 separate events
Enqueues
36. P1 has encoded type and mode
Enqueue in 9i
SELECT
chr(bitand(p1,-16777216)/16777215)||
chr(bitand(p1, 16711680)/65535) Type,
mod(p1,16) lmode
from v$session_wait
where event=‘enqueue’;
37. Several Major Categories
TM – table modification
TX – Transaction locks
UL – user lock
CI – Cross Instance
CU – Cursor Bind
HW – High Water
RO – Reusable Object
ST – Space Transaction
TS – Temporary Space
Parameters depend on type
Enqueue in 10g
38. User can’t modify block until other user commits
Enqueue: TX row lock
39. Use ash tables to find statements waiting
Fix the code
Or do what one of our clients did:
write some code to kill any processes blocking for more
than 60 seconds – lot’s easier than changing the app
Enqueue: TX row lock – “fixes”
40. Waiting on exclusive access to a sequence
Seems minor but can actually be a big issue on some
systems
Enqueue: SQ
41. Cache the sequence
Use NOORDER in RAC
Enqueue: SQ – “fixes”
42. Locks internal memory structures
Not ordered
Very light weight
Kid Asking for Candy Model
Spins vs. Sleeps (_spin_count, _latch_class_X)
Wait events do not include time spent spinning
9i lumped them all together
10g has 28 separate events
Latches
43. All dumped in the Latch Free event
P2 has latch#
Latches in 9i
WAIT #9: nam='latch free' ela= 185 p1=-4611686003650090584 p2=157 p3=0
SQL> Select name from v$latch where latch#=157;
NAME
----------------------------------------
library cache
44. Primarily Deal with major SGA areas
Shared Pool
Latch: shared pool
Latch: library cache
Latch row cache
Buffer Cache
Latch: cache buffer chains
Latch: cache buffers lru chain
Log Buffer
Latch: redo …
Parameters depend on type (see v$event_name)
Latches in 10g
45. Each block hashes to a chain
Several chains per latch
Several latches per instance (default based on cache size)
Latch: cache buffers chains
46. Find hot blocks
Fix bad SQL
Maybe Increase number of latches
Maybe decrease rows per block
Latch: cache buffers chains –”fixes”
47. Shared Pool Contention
Due primarily to parsing
See also latch: library cache …
Latch: shared pool
WAIT #28: nam='latch: shared pool' ela= 751 address=1611562656 number=213 tries=1 j#=46024
48. Use Bind Variables
CURSOR_SHARING = FORCE
Increase SHARED_POOL
– this is where Darcy came in and sat in my office so
I didn’t get much more done –
You can thank Darcy later!
Latch: shared pool – “fixes”
49. Some tools show this (Hotsos Profiler)
Indicates lost time due to rounding (small)
Or lost time due to CPU contention (large)
Unaccounted for
50. Tracing
alter session set events '10046 trace name context
forever, level 8';
Spits out wait events, plans (real), stats
Level 12 includes bind variables – makes file very big
Puts file in user_dump_dest (ls –altr)
Can be executed from logon trigger, in-line, etc…
Scoping is important (i.e. when it’s turned on and off)
Note: bug in early versions of 9.2, fixed in 9.2.0.6
51. Trace File Contents
WAIT
FETCH
EXEC
PARSE
STAT
XCTEND
PARSING IN CURSOR