Completerecovery

What’s a Complete Recovery? Backup and Recovery Tips

What’s a Complete Recovery?

A complete recovery is any recovery where you recover as much data as you are
theoretically able to. Put another way, it’s any recovery where you do not choose to
terminate the recovery process before it would otherwise naturally finish.

I stress the subtlety of that description, because recoveries performed without the use of
any archives whatsoever (i.e., you don’t run in archivelog mode) are traditionally
considered to be complete recoveries –despite the fact that, in the absence of any archives,
the best you can do is restore the entire database from the time of the last cold backup –
and that obviously means you’ve lost all transactions submitted to the database after the
time of that backup. The point is that ‘complete’ simply means ‘do as much as you can’ –
even if that means, as it will in noarchivelog mode, that you end up losing data.

If you do run in archivelog mode, however, a complete recovery means applying all the
redo from the time of the start of the backup, until the time of the original failure. The
restored Data File is therefore brought completely up to date, and no committed
transactions whatsoever are lost.

For that sort of functionality, there’s only one catch: every single bit of redo generated by
the entire database since the time of the start of the backup has to be available to the
recovery session. No skipping of bits of redo you deem irrelevant is permitted. That
means there can be no gaps in your archive sequence, and all the Online Redo Logs have to
be in fully working order, too. After all, one of those online logs is the ‘current’ log –and
it contains valid redo just as much as the archives do.

For the purposes of the rest of this paper, I’m going to assume that you have a database
containing just 5 Data Files, each of which gets backed up “hot”, 1 per hour, starting at
1am. The database is also assumed to be running in archivelog mode, and there are no
gaps in the archive sequence (otherwise we’d be into incomplete recovery territory!).

Our Monday night backup therefore looks like this:

File 1 : SYSTEM : backup started at 1am
File 2 : TEMP : backup started at 2.00am
File 3 : INDX : backup started at 3.00am
File 4 : RBS : backup started at 4.00am
File 5 : USERS : backup started at 5.00am

…and this sort of pattern repeats every night, until on Thursday morning, some disasters
hit the database.

Various disasters can be imagined (and most certainly will be!).

Copyright © Howard Rogers 2001 24/10/2001 Page 1 of 8


The System Tablespace is stuffed
On Thursday morning, the database crashes for no apparent reason. The first thing you’d
do, of course, is check the Alert Log for any possible error messages –whereupon you
discover that one of the last alerts recorded was a message along the lines of ‘Corrupt
block detected in File 1’.

File 1 is our SYSTEM tablespace, so the first thing we know immediately is that there is no
possibility of getting at least part of the database up and running whilst we do the
recovery… a database simply can’t run without a SYSTEM tablespace and the data
dictionary it contains. Therefore, we know that this recovery will have to take place with
the database in the MOUNT stage.

The steps to recovery are then simple. First, we restore an uncorrupted version of File 1
from one of our backups. Hopefully, the one we took at 1am on Thursday morning will
suffice. We should have been running dbverify against all our backups, so we would
already know if that backup had included any corruption (see my tip on ‘How can I check
that my backups are “clean”?’) –but if we’d neglected to do that at the time of the backup,
we could run it now before proceeding further.

Once we know that our backup is clean, we restore it. Hopefully, we restore it to exactly
the same location as the original SYSTEM Data File occupied (but if not, we can deal with
that too –see my tip on ‘How do you restore a file to a new location?’). The restore is
performed using basic operating system commands –or whatever procedures your tape
backup software permits. However you do it, all that’s really happening is that an old
version of the SYSTEM Data File is being copied back onto disk.

Once the restore is finished, you can try and open the database. If you issue the normal
STARTUP command from within Server Manager or SQL Plus, the thing will fall over in the
MOUNT state, with the error message

DATABASE MOUNTED.
ORA-01113: FILE 1 NEEDS MEDIA RECOVERY
ORA-01110: DATA FILE 1: 'D:ORACLEORADATAHJR9SYSTEM01.DBF'

Alternatively, you could control the startup process so that the database gracefully parks
itself into the MOUNT state with no error messages –you’d simply issue the command
STARTUP MOUNT to do that.

Now we can begin the actual recovery phase –the application of redo to the restored file to
bring it up to date. We issue the following command: RECOVER DATAFILE 1.

Oracle then determines the SCN of the restored file, works out which Archived Redo Log
contains that SCN, and prompts for the application of that log, like this:



SQL> RECOVER DATAFILE 1
ORA-00279: CHANGE 76832 GENERATED AT 10/24/2001 09:00:40 NEEDED FOR THREAD 1
ORA-00289: SUGGESTION : D:ORACLEORADATAHJR9ARCHIVEARCH_141.RDO
ORA-00280: CHANGE 76832 FOR THREAD 1 IS IN SEQUENCE #141

SPECIFY LOG: {<RET>=SUGGESTED | FILENAME | AUTO | CANCEL}

There are four options available to you at this point, as listed on that last line of the
prompt, “Specify Log”.

You can hit the [Enter] key to accept the suggestion as to what archive to apply. You’d do
that if all your archives were still sitting on disk in the place where ARCH originally put
them. If you’ve moved your archives to another part of the disk, though, this option is not
appropriate.

Instead, you could type in the actual path and filename of where your logs are now
currently located. For example, and entry of E:STAGE1ARCH_141.RDO would suffice if
you’ve moved your archives there.

If all your logs are in the correct location, then pressing [Enter] will work –but what if
there are dozens of archives to apply? Do you really want to be sitting there pressing
[Enter] every few minutes for who knows how long? Instead, you can type in the word AUTO,
and by doing so you are asserting that all Oracle’s suggestions about what logs to apply,
and where to find them, are appropriate and correct.

Finally, you can decide that all this recovery work is far too hard, and that you need a cup
of coffee before proceeding. If you type in the one word CANCEL at this point, the recovery
process stops at the point it’s got to (in this case, before it’s even started!), ready to
resume from the same point the next time you type in the RECOVER DATAFILE 1 command.

Let’s assume that all our archives are available for use in all the right locations –so we can
use the AUTO option. Type that one word at the prompt, and press [Enter], and you’ll get
a display like this:

AUTO
ORA-00278: LOG FILE 'D:ORACLEORADATAHJR9ARCHIVEARCH_141.RDO' NO
LONGER NEEDED FOR THIS RECOVERY








LOG APPLIED.
MEDIA RECOVERY COMPLETE.

You’ll see in this example how logs 142 to 146 are sequentially applied, thus effectively
rolling our restored SYSTEM Data File forward in time. At the end of the process, Oracle
declares that ‘media recovery is complete’, meaning that the SYSTEM Data File has been
rolled as far forward in time as it is possible to go –in other words, it is now in
synchronisation with the rest of the database.

Remember that all of this has taken place with the database in the MOUNT stage –so the
only thing left to do is to get the recovered database fully open. You do that with the
usual ALTER DATABASE OPEN command, and you’ll then discover that your database has been
fully fixed.

A non-SYSTEM tablespace is stuffed
Later on Thursday, the database crashes again, and the Alert Log reports nothing
particularly mysterious. So you attempt to open the database by issuing the usual STARTUP
command –at which point, the startup process bombs out with the following error message:

DATABASE MOUNTED.
ORA-01157: CANNOT IDENTIFY/LOCK DATA FILE 5 - SEE DBWR TRACE FILE
ORA-01110: DATA FILE 5: 'D:ORACLEORADATAHJR9USERS01.DBF'



Now we notice from this that it is NOT the SYSTEM tablespace that is stuffed. Neither is it
the rollback segment tablespace, RBS, which is also required for a database to operate
normally. Therefore, we have the possibility of getting what’s left of the database open,
around the problem, and fixing up this particular tablespace whilst Users go about their
business elsewhere in the database.

This is a good technique, because it minimises the impact of a failure in one part of the
database. Of course, if Users want anything out of the bit you are actually fixing, they’re
going to be disappointed. But all other tablespaces are open for business as usual, which
should keep most Users happy most of the time.

So the recovery approach in this situation is basically to persuade Oracle that what it
thinks is a problem with tablespace USERS is not really a problem at all –and you do that by
offlining it, leaving what’s left available to be opened:

SQL> ALTER DATABASE DATAFILE 5 OFFLINE;

DATABASE ALTERED.

SQL> ALTER DATABASE OPEN;

DATABASE ALTERED.

Note that you have to offline the Data File, not the tablespace. That’s because you’re in
the MOUNT state, and you can only talk tablespace SQL language when the database is
fully open. In practice, it amounts to much the same thing, though: those bits of the
database which could be fully functional are now indeed open and usable.

Now we can recover the troublesome tablespace. The procedures are much as before:
restore the Data File concerned, and apply redo to it:

C:> COPY E:BACKUPSUSERS01.DBF D:ORACLEORADATAHJR9USERS01.DBF
SQL> RECOVER DATAFILE 5;

SPECIFY LOG: {<RET>=SUGGESTED | FILENAME | AUTO | CANCEL}

…at which point, we are back in the business of either agreeing with the suggestions
Oracle makes for where to find the required archives, supplying our own replacements, or
selecting to apply all suggestions automatically:

AUTO








LOG APPLIED.
MEDIA RECOVERY COMPLETE.

…I went for the “auto” option, and once again we get the ‘media recovery is complete’
confirmation at the end.

All that we need do now is bring the tablespace involved back online:

SQL> ALTER TABLESPACE USERS ONLINE;
TABLESPACE ALTERED.

…at which point, the thing is fully available for normal use once again.



Variations on a Theme
You’ll notice that in the last scenario, the loss of the non-SYSTEM tablespace caused the
database to crash… so we had to get to the MOUNT state before starting the recovery
process –though the actual recovery phase took place after the rest of the database had
been opened.

It could be, however, that the loss of the tablespace occurred without crashing the entire
database.

You’re then in the position of a database which is already fully open, but with part of it
unavailable because of Data File loss or corruption. In such a scenario, the basic recovery
principles remain the same as before, but with a slight language variation –because you are
already in the fully open state, you can talk “tablespace” commands instead of “datafile”
ones.

So when a User suddenly reports that they are getting an error message like this one:

SQL> SELECT * FROM EMP;
SELECT * FROM EMP
*
ERROR AT LINE 1:
ORA-00376: FILE 5 CANNOT BE READ AT THIS TIME
ORA-01110: DATA FILE 5: 'D:ORACLEORADATAHJR9USERS01.DBF'

…then you can proceed to do an “open database recovery”.

Now all recoveries require you to start by restoring copies of Data Files from backups. But
you can’t copy a Data File from a backup on top of one that Oracle is still maintaining its
grip on. So you have to release that grip first: and you do that by taking the tablespace
offline:

ALTER TABLESPACE USERS OFFLINE IMMEDIATE;

Note the “immediate” keyword there. If you missed it out, you’d get an error message,
because a ‘normal’ offline causes a checkpoint to be issued against the tablespace –and
the whole problem here is that there is clearly something wrong with the Data Files such
that a checkpoint would fail. “Immediate” means, basically, checkpoint what you can,
and don’t bother with what you can’t.

Note also the use of the ‘alter tablespace’ language. Here is the difference between this
scenario and the last one: last time, we were stuck in the mount state, and had to do an
ALTER DATABASE DATAFILE command to offline the offending Data Files. This time, we’re
already fully open, and can thus use tablespace language.



Incidentally, this step might not actually be required at all, because quite often Oracle is
smart enough to notice the problem with the tablespace and take it offline automatically.
Use SELECT FILE#, STATUS FROM V$DATAFILE_HEADER; to find out whether you need to
manually take the files offline (the report will show a status of ‘online’) or whether
Oracle’s done the deed for you already (the report will show a status of ‘offline’).
Now the Data Files are offline, whether manually or automatically, you can restore copies
of them from a backup. As before, that means use whatever Operating System commands
or tape backup software procedures are appropriate.

Now comes the recovery phase, which proceeds pretty much exactly the same as before:

RECOVER TABLESPACE USERS;

…which brings up the usual prompts for the application of appropriate archives, followed
by the ‘media recovery complete’ message.

Once you see that message, the final thing to do is to bring the affected tablespace back
online:

SQL> ALTER TABLESPACE USERS ONLINE;

TABLESPACE ALTERED

…and a ‘select * from emp’ now produces a functional report.

Summary
What I’ve described here are the two basic recovery scenarios, plus one twist.

You either perform recovery in the MOUNT state, or in the OPEN state. The twist comes
with doing OPEN recoveries.

Either the database is already open (in which case you just ‘offline immediate’ the
tablespace causing the problem and then perform recovery), or you have to get it open (in
which case you first offline the Data File(s) causing the problem, followed by an ALTER
DATABASE OPEN, and then perform recovery).

In all cases, you restore troublesome file(s) from a backup, and then issue an appropriate
recovery command to apply redo to the restored file to bring it up to date. If it’s just one
or two files that are stuffed, the RECOVER DATAFILE command is appropriate. If you’re in
the OPEN state, you can use the RECOVER TABLESPACE command –although RECOVER DATAFILE
is still available for use.

In all cases, the files are brought completely up to date, and no committed data is lost.


Completerecovery

Recommended

Recommended

More Related Content

Similar to Completerecovery

Similar to Completerecovery (20)

More from oracle documents

More from oracle documents (20)

Completerecovery