Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
还原Oracle中真实的cache recovery
1. 还原 Oracle 中真实的
Cache Recovery
by Maclean.liu
liu.maclean@gmail.com
www.oracledatabase12g.com
2. About Me
l Email:liu.maclean@gmail.com
l Blog:www.oracledatabase12g.com
l Oracle Certified Database Administrator Master 10g
and 11g
l Over 6 years experience with Oracle DBA technology
l Over 7 years experience with Linux technology
l Member Independent Oracle Users Group
l Member All China Users Group
l Presents for advanced Oracle topics: RAC,
DataGuard, Performance Tuning and Oracle Internal.
3. 我们在学习 Oracle 基础知识的时候会了解到实例恢复(Instance Recovery)或者说崩溃恢复
(Crash recovery)的概念,有时候甚至于这 2 个名词在我们日常的语言中表达同样的意思。实
际上 Instance Recovery 与 Crash Recovery 是存在区别的:针对单实例(single instance)或者
RAC 中所有节点全部崩溃后的恢复,我们称之为 Crash Recovery。而对于 RAC 中的某一个
节点失败,存活节点(surviving instance)试图对失败节点线程上 redo 做应用的情况,我们称之
为 Instance Recovery。
不管是 Instance Recovery 还是 Crash Recovery,都由 2 个部分组成:cache recovery 继以
transaction recovery。
根据官方文档的 介绍,Cache Recovery 也叫 Rolling Forward,就是我们常说的前滚;而
Transaction Recovery 也叫 Rolling Back,就是我们常说的回滚。前滚和回滚贯穿 Oracle 恢复
的基本概念,是我们入门必要学习的知识,在次不多介绍。
有文事者,必济之以武略。理论学得再好,不实践也无用。所幸 Crash Recovery 是很容易做
成的实验,我们不妨来看一看:
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size 2089336 bytes
Variable Size 486542984 bytes
Database Buffers 570425344 bytes
Redo Buffers 6295552 bytes
Database mounted.
SQL> alter database open;
Crash Recovery 将从 alter database open 开始,我们来观察其日志
==================alert.log====================
alter database open
Tue Jun 14 18:19:53 2011
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Tue Jun 14 18:19:53 2011
Started redo scan
Tue Jun 14 18:19:53 2011
Completed redo scan
0 redo blocks read, 0 data blocks need recovery
Tue Jun 14 18:19:53 2011
Started redo application at
Thread 1: logseq 1004, block 1124, scn 17136185
4. Tue Jun 14 18:19:53 2011
Recovery of Online Redo Log: Thread 1 Group 2 Seq 1004 Reading mem 0
Mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_2_6v34jokt_.log
Mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_2_6v34jotq_.log
Tue Jun 14 18:19:53 2011
Completed redo application
Tue Jun 14 18:19:53 2011
Completed crash recovery at
Thread 1: logseq 1004, block 1124, scn 17156186
0 data blocks read, 0 data blocks written, 0 redo blocks read
Tue Jun 14 18:19:53 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=16, OS id=7829
Tue Jun 14 18:19:53 2011
Thread 1 advanced to log sequence 1005 (thread open)
Thread 1 opened at log sequence 1005
Current log# 3 seq# 1005 mem# 0:
/flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log
Current log# 3 seq# 1005 mem# 1:
/s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log
Successful open of redo thread 1
Tue Jun 14 18:19:53 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
Tue Jun 14 18:19:53 2011
SMON: enabling cache recovery
Tue Jun 14 18:19:53 2011
db_recovery_file_dest_size of 204800 MB is 6.81% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Jun 14 18:19:54 2011
Successfully onlined Undo Tablespace 1.
Tue Jun 14 18:19:54 2011
SMON: enabling tx recovery
Database Characterset is UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 2
replication_dependency_tracking turned off (no async multimaster replication
found)
Starting background process QMNC
QMNC started with pid=17, OS id=7831
Tue Jun 14 18:19:55 2011
Completed: alter database open
6. 我们来看另一个演示,这个演示用以说明 cache recovery 还存在于最普通的不包含 Crash
Recovery 的数据库打开过程中:
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size 2089336 bytes
Variable Size 486542984 bytes
Database Buffers 570425344 bytes
Redo Buffers 6295552 bytes
Database mounted.
SQL> alter database open;
Database altered.
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com
==================alert.log====================
alter database open
Tue Jun 14 18:43:52 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=14, OS id=8133
Tue Jun 14 18:43:52 2011
Thread 1 opened at log sequence 1005
Current log# 3 seq# 1005 mem# 0:
/flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log
Current log# 3 seq# 1005 mem# 1:
/s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log
Successful open of redo thread 1
Tue Jun 14 18:43:52 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
Tue Jun 14 18:43:52 2011
SMON: enabling cache recovery
Tue Jun 14 18:43:53 2011
Successfully onlined Undo Tablespace 1.
Tue Jun 14 18:43:53 2011
SMON: enabling tx recovery
7. Database Characterset is UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 2
replication_dependency_tracking turned off (no async multimaster replication
found)
Tue Jun 14 18:43:53 2011
Incremental checkpoint up to RBA [0x3ed.624.0], current log tail at RBA
[0x3ed.944.0]
Tue Jun 14 18:43:53 2011
Starting background process QMNC
QMNC started with pid=15, OS id=8135
Tue Jun 14 18:43:53 2011
Completed: alter database open
因为是 clean shutdown,所以这里不存在 crash recovery。但这里同样出现了”SMON: enabling
cache recovery”,可见 cache recovery 是每次实例启动 instance startup 必要执行的一种恢复操
作。但问题是,这个恢复操作到底针对何种对象?
实际上 cache recovery 所要恢复的是 rowcache,也就是我们常说的字典缓存(dictionary
cache)。关于这个结论,肯定有很多人要问我这样说的依据是什么,对应于这个”cache
recovery”的问题,我们很难从 google 中得到一些启示,因为它和官方文档所描述的”cache
recovery-rolling forward”存在重名的关系。
为了证明 cache recovery 所恢复的是 rowcache,我们需要一个实证,从正式的系统中得到验
证。要做到这一点是比较困难的,我们需要 Oracle 愿意把整个 database open 的过程变成慢
动作来供我们参考,验证要用到一些调试工具,例如 gdb 或者 dbx。
我们首先将实例启动到 mount 状态,并对执行 startup 的 LOCAL 进程做 gdb 的 breakpoint 断
点调试:
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size 2089336 bytes
Variable Size 486542984 bytes
Database Buffers 570425344 bytes
Redo Buffers 6295552 bytes
Database mounted.
找出 LOCAL 进程的系统进程号 SPID
SQL> select spid from v$process
2 where addr in (
3 select paddr from v$session
4 where sid=(select distinct sid from v$mystat))
5 /
SPID
8. ------------
8326
在实例 startup nomount/mount 后共享池的 library cache 就是可用的
SQL> select namespace from v$librarycache where gets!=0;
NAMESPACE
---------------
SQL AREA
TABLE/PROCEDURE
而 rowcache 则尚未被填充,因为字典缓存来源于自举对象(bootstrap$)和字典基表
SQL> select parameter,count,gets from v$rowcache where count!=0;
no rows selected
另开一个 terminal 窗口,并执行对 LOCAL 进程 8326 的 gdb breakpoint 调试
[oracle@rh2 ~]$ gdb $ORACLE_HOME/bin/oracle 8326
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /s01/db_1/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /s01/db_1/bin/oracle, process 8326
Reading symbols from /s01/db_1/lib/libskgxp10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libskgxp10.so
Reading symbols from /s01/db_1/lib/libhasgen10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libhasgen10.so
Reading symbols from /s01/db_1/lib/libskgxn2.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libskgxn2.so
Reading symbols from /s01/db_1/lib/libocr10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libocr10.so
Reading symbols from /s01/db_1/lib/libocrb10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libocrb10.so
Reading symbols from /s01/db_1/lib/libocrutl10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libocrutl10.so
Reading symbols from /s01/db_1/lib/libjox10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libjox10.so
Reading symbols from /s01/db_1/lib/libclsra10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libclsra10.so
Reading symbols from /s01/db_1/lib/libdbcfg10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libdbcfg10.so
Reading symbols from /s01/db_1/lib/libnnz10.so...(no debugging symbols
found)...done.
Loaded symbols for /s01/db_1/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols
9. found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x0000003181a0d8e0 in __read_nocancel () from /lib64/libpthread.so.0
输入断点 kcrf_commit_force 和 kqlobjlod
(gdb) break kcrf_commit_force
Breakpoint 1 at 0x2724b6c
(gdb) break kqlobjlod
Breakpoint 2 at 0x1ac5e8c
在之前的 terminal 中执行数据库打开操作,因为 breakpoint 的关系这个 open 操作会
hang 住,
这时我们通过观察告警日志来了解恢复进度
SQL> alter database open; --这里会 hang 住
在 gdb 下输入 continue,
(gdb) c
Continuing.
Breakpoint 1, 0x0000000002724b6c in kcrf_commit_force ()
观察告警日志可以发现 redo application 已经完成,但还未进入 cache recovery 阶
段
alter database open
Tue Jun 14 19:14:33 2011
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Tue Jun 14 19:14:33 2011
Started redo scan
Tue Jun 14 19:14:33 2011
Completed redo scan
39 redo blocks read, 74 data blocks need recovery
Tue Jun 14 19:14:33 2011
Started redo application at
Thread 1: logseq 1006, block 1155
Tue Jun 14 19:14:33 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1006 Reading mem 0
Mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_1_6v34jnkn_.log
10. Mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_1_6v34jnst_.log
Tue Jun 14 19:14:33 2011
Completed redo application
Tue Jun 14 19:14:33 2011
Completed crash recovery at
Thread 1: logseq 1006, block 1194, scn 17200193
74 data blocks read, 74 data blocks written, 39 redo blocks read
Tue Jun 14 19:14:33 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=17, OS id=8656
Tue Jun 14 19:14:33 2011
Thread 1 advanced to log sequence 1007 (thread open)
Thread 1 opened at log sequence 1007
Current log# 2 seq# 1007 mem# 0:
/flashcard/oradata/G10R2/onlinelog/o1_mf_2_6v34jokt_.log
Current log# 2 seq# 1007 mem# 1:
/s01/flash_recovery_area/G10R2/onlinelog/o1_mf_2_6v34jotq_.log
Successful open of redo thread 1
Tue Jun 14 19:14:33 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
db_recovery_file_dest_size of 204800 MB is 6.81% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Jun 14 19:14:37 2011
Incremental checkpoint up to RBA [0x3ef.3.0], current log tail at RBA
[0x3ef.3.0]
且此时 rowcache 仍未被填充
SQL> select parameter,count,gets from v$rowcache where count!=0;
no rows selected
在 gdb 界面下再次执行 continue 2 次
(gdb) c
Continuing.
Breakpoint 1, 0x0000000002724b6c in kcrf_commit_force ()
(gdb) c
Continuing.
Breakpoint 2, 0x0000000001ac5e8c in kqlobjlod ()
观察告警日志可以发现已开始 cache recovery,但也卡陷在 cache recovery 上,这保
证我们的演示不受骚扰
Tue Jun 14 19:16:44 2011
SMON: enabling cache recovery
此时 rowcache 中出现唯一的一个 dc_objects 对象
select parameter,count,gets from v$rowcache where count!=0;
PARAMETER COUNT GETS
-------------------------------- ---------- ----------
dc_objects 1 1