Oracle Wait Events That Everyone Should Know.ppt

Oracle Wait Events
That Everyone Should Know
Kerry Osborne
Senior Oracle Guy

What Are Wait Events?
 Basically a section of code
 Uses gettimeofday
 All time is stuck in a bucket with a name
 Multiple things can be stuck in the same bucket
 Oracle is very well instrumented (but It’s not 100%)
 There are 41 classes of wait events in 10.2.0.3
 There are 878 wait events in 10.2.0.3
 209 enqueue events
 29 latch events
 41 I/O events

What do you do with Wait Events?
 See where Oracle is spending it’s time
 Create a Resource Profile
 Aggregate the time by Wait Event
 Order the profile by time
 Include # of occurrences and avg. time/event
 Shows where to focus
 Shows how much improvement can be made

Profiles
 Basic Tool – maybe the most important
 Shows up in Statspack / AWR reports
 Shows up in tkprof output (10g)
 Can be pulled out of trace files
 Can be pulled from V$ tables (v$system_event)
 Can be pulled from ash tables
Oracle is very well instrumented

Profiles
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 37,297 52.27
SQL*Net message from dblink 7,065,802 11,312 15.85
async disk IO 943,852 5,265 7.38
db file sequential read 13,042,432 3,493 4.90
direct path write 108,862 3,410 4.78
-------------------------------------------------------------

Why should developers know this stuff?
1. Because you need to know
where your code is spending it’s
time
2. Scalability is Important

 1. These are the events you should know by heart
 2. The other 260 or so you can figure out
 3. You need to know what causes them
 4. You need to know what values are reasonable
 5. You need to know what you can do to fix them
Top 10 List of Wait Events
select name, parameter1, parameter2, parameter3
from v$event_name
where name like nvl('&event_name',name)
order by name

 Not a wait event – but often included in a profile
 Time spent but not accounted for in other buckets
 Primarily time spent doing lio (we hope)
 We want it to be on top
 It may be an indicator of inefficient plans
CPU

 Single block read
 Usually index block or data block via rowid
 Can be done by fts to get chained rows
 P1=file, P2=block, P3=# of blocks (always 1)
 Always a user’s server process reading into buffer cache
 Reasonable e values: <10ms
DB File Sequential Read
WAIT #14: nam='db file sequential read' ela= 36745 file#=1 block#=67745 blocks=1
obj#=84920

 Do Less Work
 tune SQL
 reduce lio’s 7 / obj rule of thumb
 Explain plan lies (see notes on this slide)
 bigger buffer cache
 Do the Work Faster
 Speed up I/O
 Call Jack
DB File Sequential Read – “fixes”

 Explain plan appears to executes a separate code path
 Even if it didn’t there is no guarantee TEST matches PROD
 Best bet is to execute and look at V$SQL_PLAN
 This is easy in 10g with dbms_xplan
select * from table(dbms_xplan.display_cursor('&sql_id','&child_no',''))
 See an example in the notes section for this slide
Digression – Explain Plan Lies

 Multi block read
 Usually full table scan or index fast full scan
 P1=file, P2=block, P3=# of blocks (always < MBRC)
 Always a user’s server process reading into buffer cache
 Reasonable ela values: <10ms
 Relevant parameters:
 DBFMBRC, System Stats: MBRC,
 _db_file_exec_read_count, _db_file_optimizer_read_count
DB File Scattered Read
WAIT #14: nam='db file scattered read' ela= 19389 file#=139 block#=44 blocks=5
obj#=83580

 Do Less Work (fewer multi-block reads)
 MBRC
 Better Indexes
 System Stats
 bigger buffer cache
 Tune SQL
 Speed up I/O
DB File Scattered Read – “fixes”

 Usually sorting to Temp
 Can also be parallel query
 Could also be insert append, etc…
 Relevant parameters:
 DBFMBRC
 _db_file_direct_io_count (1M)
Direct Path Read/Write
WAIT #10: nam='direct path write' ela= 4475 p1=401 p2=1518353 p3=57
WAIT #10: nam='direct path read' ela= 16770 p1=401 p2=1482031 p3=63

 Adjust PGA_AGGREGATE_TARGET
 Turn off PX query
 Call Randy
Direct Path Read/Write – “fixes”

 User process wait for LGWR to flush dirty buffers on commit
 Synchronous write event
 P1=log buffer block
Log File Sync
WAIT #16: nam='log file sync' ela= 1286 buffer#=11910 p2=0 p3=0 obj#=84920

 Do Less Work (fewer commits)
 Autocommit?
 Row at a time processing with commits?
 Speed Up I/O
 No RAID 5
 No Contention (other db, arch, ASM)
 SS Disk
 Improve LGWR Scheduling
 Renice
Log File Sync – “fixes”

 Wait for LGWR to write to current log files
 Not necessarily synchronous write event
 System I/O event
 Occurs as follows:
 Every 3 seconds
 Log_buffer is 1/3 full or redo = 1M ( default _log_io_size)
 Commit or rollback by user session
 RAC needs to transfer a dirty block
Log File Parallel Write

 Often fixed by fixing Log File Sync (i.e. speed up i/o, etc)
 Force LGWR to write more often
Log File Parallel Write – “fixes”

 Happens when database can’t switch to next log file
 Freezes the whole database
 Several Flavors
 Checkpoint incomplete
 Archiving needed
 Reasonable ela values: 0ms
Log file switch …
WAIT #9: nam='log file switch (checkpoint incomplete)' ela= 986212 p1=0 p2=0 p3=0

 Do less work
 Make total online log space bigger
 Force more frequent incremental checkpoints
 Speed up i/o
Log file switch … – “fixes”

 Contention event for the same block
 Read by other session (new event in 10g)
 Held by other session in incompatible mode
 New event in 10g – read by other session
 Multiple sections of code throw time in this bucket
 P1=file, P2=block, P3=reason code
 See metalink note (34405.1) for details
Buffer Busy Waits
WAIT #7: nam='read by other session' ela= 3251 file#=4 block#=188557 class#=1
obj#=53707 tim=1183096831514887

 It’s complicated – but basically - eliminate the contention
 For Read by Other Session
 Do Less Work (tune SQL to do less lio)
 Eliminate i/o – bigger buffer cache, etc…
 Speed up i/o
 For Others
 Find type of hot objects and statements suffering
 Address issue - ASSM, Freelist groups (RAC), etc..
Buffer Busy Waits - “fixes”

 No place to put a new block
Free Buffer Waits

 Do Less Work
 Reduce the amount of lio
 Make more buffer space available
 Add memory to the buffer cache
 Speed up DBWR to flush blocks more quickly
 Async_io
 More dbwr processes
 Renice
Free Buffer Waits - “fixes”

 Database is idle – waiting on next request from client
 Often marked as an idle event which it may be
 Time distribution can be skewed
 Common technique to ignore anything over 1 sec
 Protocol should be tcp (or beq for local processes)
SQL*Net message from client
WAIT #1: nam='SQL*Net message from client' ela= 336 driver id=1650815232 #bytes=1 p3=0 obj#=83660

 Speed up app server
 Reduce the number of calls
 Use the right protocol (tcp to beq)
 Call network guy
 Get the users to type faster
SQL*Net message from client – “fixes”

 More than 1 packet required to send SQL statement
SQL*Net more data from client

 Don’t send 50K SQL statements
 Use packages
SQL*Net more data from client – “fixes”

 Time to send a SQL*Net message to a client (sort of)
 Actually time it took to write the return data from Oracle’s userland
SDU buffer to OS kernel-land TCP socket buffer (Tanel Poder)
 Reasonable e values: <1ms (micro seconds actually)
 P1=bytes (usually 1 ???)
SQL*Net message to Client
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660

 Use correct protocol
 Fix network issue
 Call Andy
SQL*Net message to Client – “fixes”

 When more than one packet required
 Rows spanning block
 LOB’s
 Array fetches
 P1=bytes (usually 1 ???)
SQL*Net more data to client
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660

 Fix chained/migrated rows
 Don’t use LOBs
 Nothing to fix
SQL*Net more data to client – “fixes”

 Locks for objects
 Que up in order
 9i lumped them all together
 10g has 209 separate events
Enqueues

 P1 has encoded type and mode
Enqueue in 9i
SELECT
chr(bitand(p1,-16777216)/16777215)||
chr(bitand(p1, 16711680)/65535) Type,
mod(p1,16) lmode
from v$session_wait
where event=‘enqueue’;

 Several Major Categories
 TM – table modification
 TX – Transaction locks
 UL – user lock
 CI – Cross Instance
 CU – Cursor Bind
 HW – High Water
 RO – Reusable Object
 ST – Space Transaction
 TS – Temporary Space
 Parameters depend on type
Enqueue in 10g

 User can’t modify block until other user commits
Enqueue: TX row lock

 Use ash tables to find statements waiting
 Fix the code
 Or do what one of our clients did:
write some code to kill any processes blocking for more
than 60 seconds – lot’s easier than changing the app
Enqueue: TX row lock – “fixes”

 Waiting on exclusive access to a sequence
 Seems minor but can actually be a big issue on some
systems
Enqueue: SQ

 Cache the sequence
 Use NOORDER in RAC
Enqueue: SQ – “fixes”

 Locks internal memory structures
 Not ordered
 Very light weight
 Kid Asking for Candy Model
 Spins vs. Sleeps (_spin_count, _latch_class_X)
 Wait events do not include time spent spinning
 9i lumped them all together
 10g has 28 separate events
Latches

 All dumped in the Latch Free event
 P2 has latch#
Latches in 9i
WAIT #9: nam='latch free' ela= 185 p1=-4611686003650090584 p2=157 p3=0
SQL> Select name from v$latch where latch#=157;
NAME
----------------------------------------
library cache

 Primarily Deal with major SGA areas
 Shared Pool
 Latch: shared pool
 Latch: library cache
 Latch row cache
 Buffer Cache
 Latch: cache buffer chains
 Latch: cache buffers lru chain
 Log Buffer
 Latch: redo …
 Parameters depend on type (see v$event_name)
Latches in 10g

 Each block hashes to a chain
 Several chains per latch
 Several latches per instance (default based on cache size)
Latch: cache buffers chains

 Find hot blocks
 Fix bad SQL
 Maybe Increase number of latches
 Maybe decrease rows per block
Latch: cache buffers chains –”fixes”

 Shared Pool Contention
 Due primarily to parsing
 See also latch: library cache …
Latch: shared pool
WAIT #28: nam='latch: shared pool' ela= 751 address=1611562656 number=213 tries=1 j#=46024

 Use Bind Variables
 CURSOR_SHARING = FORCE
 Increase SHARED_POOL
– this is where Darcy came in and sat in my office so
I didn’t get much more done –
You can thank Darcy later!
Latch: shared pool – “fixes”

 Some tools show this (Hotsos Profiler)
 Indicates lost time due to rounding (small)
 Or lost time due to CPU contention (large)
Unaccounted for

Tracing
 alter session set events '10046 trace name context
forever, level 8';
 Spits out wait events, plans (real), stats
 Level 12 includes bind variables – makes file very big
 Puts file in user_dump_dest (ls –altr)
 Can be executed from logon trigger, in-line, etc…
 Scoping is important (i.e. when it’s turned on and off)
 Note: bug in early versions of 9.2, fixed in 9.2.0.6

Trace File Contents
 WAIT
 FETCH
 EXEC
 PARSE
 STAT
 XCTEND
 PARSING IN CURSOR

Raw Trace File
 /opt/oracle/admin/LAB102/udump/lab102_ora_4207.trc
 Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
 ORACLE_HOME = /opt/oracle/product/db/10.2.0/db1
 System name: Linux
 Node name: homer
 Release: 2.6.9-34.ELhugemem
 Version: #1 SMP Fri Feb 24 17:04:34 EST 2006
 Machine: i686
 Instance name: LAB102
 Redo thread mounted by this instance: 1
 Oracle process number: 20
 Unix process pid: 4207, image: oracle@homer (TNS V1-V3)
 *** ACTION NAME:() 2007-08-16 13:48:14.571
 *** MODULE NAME:(SQL*Plus) 2007-08-16 13:48:14.571
 *** SERVICE NAME:(SYS$USERS) 2007-08-16 13:48:14.571
 *** SESSION ID:(143.189) 2007-08-16 13:48:14.571
 =====================
 PARSING IN CURSOR #7 len=68 dep=0 uid=61 oct=42 lid=61 tim=1159462982979740 hv=740818757 ad='30663e48'
 alter session set events '10046 trace name context forever, level 8'
 END OF STMT
 EXEC #7:c=0,e=269,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1159462982979717
 WAIT #7: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1159462982980778
 WAIT #7: nam='SQL*Net message from client' ela= 119 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1159462982981008
 =====================
 PARSING IN CURSOR #8 len=44 dep=0 uid=61 oct=3 lid=61 tim=1159463023994427 hv=761757617 ad='54738434'
 select avg(col1) from skew
 where rownum < 10
 END OF STMT
 PARSE #8:c=4000,e=3904,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1159463023994411
 EXEC #8:c=0,e=185,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1159463023994844
 WAIT #8: nam='SQL*Net message to client' ela= 9 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023994980
 WAIT #8: nam='db file scattered read' ela= 218 file#=4 block#=21900 blocks=5 obj#=53707 tim=1159463023995544
 FETCH #8:c=0,e=665,p=5,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=1159463023995732
 WAIT #8: nam='SQL*Net message from client' ela= 157 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996048
 FETCH #8:c=0,e=7,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1159463023996183
 WAIT #8: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996312
 WAIT #8: nam='SQL*Net message from client' ela= 298 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996669
 XCTEND rlbk=0, rd_only=1
 STAT #8 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=4 pr=5 pw=0 time=736 us)'
 STAT #8 id=2 cnt=9 pid=1 pos=1 obj=0 op='COUNT STOPKEY (cr=4 pr=5 pw=0 time=846 us)'
 STAT #8 id=3 cnt=9 pid=2 pos=1 obj=53707 op='TABLE ACCESS FULL SKEW (cr=4 pr=5 pw=0 time=639 us)'
 WAIT #0: nam='log file sync' ela= 680 buffer#=4862 p2=0 p3=0 obj#=53707 tim=1159463039852003

Questions?
Kerry.Osborne@enkitec.com

Oracle Wait Events That Everyone Should Know.ppt

Recommandé

Recommandé

Contenu connexe

Similaire à Oracle Wait Events That Everyone Should Know.ppt

Similaire à Oracle Wait Events That Everyone Should Know.ppt (20)

Plus de TricantinoLopezPerez

Plus de TricantinoLopezPerez (11)

Dernier

Dernier (20)

Oracle Wait Events That Everyone Should Know.ppt