Subject: TECH: Internals of Recovery (Part 2) Type: REFERENCE Creation Date: 13-SEP-1996 4.4 "Crashed" Hot Backup A normal shutdown of the instance that started a backup, or the last remaining instance, is not allowed while any files are in hot backup. Nor may a file in backup be taken offline normal or temporary. This is to ensure an end-backup "marker" is generated whenever possible, and to make administrators aware that they forgot to issue the END BACKUP command, and that the backup copy is unusable. When an instance failure or shutdown abort leaves a hot backup operation incomplete (i.e. lacking termination via END BACKUP), any file that was in backup before the failure has its hotbackup- fuzzy bit set and its checkpoint "frozen" at the begin-backup checkpoint. Even though the online file's datablocks are actually current to the database checkpoint, the file's header makes it look like a restored backup that needs media recovery and is current only to the begin-backup checkpoint. Crash recovery will fail - claiming media recovery is required - if it encounters an online file in "crashed" hot backup state. The file does not actually need media recovery, however, but only an adjustment to its file header to take it out of "crashed" hot backup state. Media recovery could be used to recover and allow normal open of a database that has files left in "crashed" hot backup state. For v7.2 however, a preferable option - because it requires no archived logs - is to use the (new in v7.2) command ALTER DATABASE DATAFILE... END BACKUP on the files left in "crashed" hot backup state (identifiable using the V$BACKUP fixed-view: see 9.6). Following execution of this command, crash recovery will suffice to open the database. Note that the ALTER TABLESPACE ... END BACKUP format of the command cannot be used when the database is not open. This is because the database must be open in order to translate (via the data dictionary) tablespace names into their constituent datafile names. 5 Instance Recovery Instance recovery is used to recover from both crash failures and Parallel Server instance failures. Instance recovery refers either to crash recovery or to Parallel Server instance recovery (where a surviving instance recovers when one or more other instances fail). The goal of instance recovery is to restore the datablock changes that were in the cache of the dead instance and to close the thread that was left open. Instance recovery uses only online redo logfiles and current online datafiles (not restored backups). It recovers one thread at a time, starting at the most recent thread checkpoint and continuing until end-of-thread. 5.1 Detection of the Need for Instance Recovery The kernel performs instance recovery automatically upon detecting that an instance died leaving its thread-open flag set in the controlfile. Instance recovery is performed automatically on two occasions: 1. at the first database open after a crash (crash recovery); 2. when some but not all instances of a Parallel Server fail. In the case of Parallel Server, a surviving instance detects the need to perform instance recovery for one or more failed instances by the following means: 1. A foreground process in a surviving instance detects an "invalid block lock" condition when it attempts to bring a datablock into the buffer cache. This is an indication that another instance died while a block covered by that lock was in a potentially "dirty" state in its buffer cache. 2. The foreground process sends a notification to its instance's SMON process, which begins a search for dead instances. 3. The death of another instance is detected if the current instance is able to acquire that instance's thread-opened locks (see 3.9). SMON in the surviving instance obtains a stable list of dead instances, together with a list of "invalid" block locks. Note: After instance recovery is complete, locks in this list will undergo "lock cleanup" (i.e. they will have their "invalid" condition cleared, making the underlying blocks accessible again). 5.2 Thread-at-a-Time Redo Application Instance recovery operates by processing one thread at a time, thereby recovering one instance at a time. It applies all redo (from the thread checkpoint through the end-of-thread) from each thread before starting on the next thread. This algorithm depends on the fact that only one instance at a time can have a given block modified in its cache. Between changes to the block by different instances, the block is written to disk. Thus, a given block (as read from disk during instance recovery) can need redo applied from at most one thread - the thread containing the most recent modification. Instance recovery can always be accomplished using the online redo logs for the thread being recovered. Crash recovery operates on the thread with the lowest checkpoint SCN first. It proceeds to recover the threads in the order of increasing thread checkpoint SCNs. This ensures that the database checkpoint is advanced by each thread recovered. 5.3 Current Online Datafiles Only The checkpoint counters are used to ensure that the datafiles are the current online files rather than restored backups. If a backup copy of a datafile is restored, then media recovery is required. Media recovery is required for a restored backup even if recovery can be accomplished using the online logs. The reason is that crash recovery applies all post-thread-checkpoint redo from each thread before starting on the next thread. Crash recovery can use this thread-at-a-time redo application algorithm because a given datablock can need redo application from at most one thread. However, starting recovery from a restored backup enables no such assumption about the number of threads that have relevant redo. Thus, the thread-at-a-time algorithm would not work. Recovering a backup requires thread-merged redo application: i.e. application of all post-file-checkpoint redo, simultaneously merging redo from all threads in SCN order. This thread-merged redo application algorithm is the one used by media recovery (see Section 6). Crash recovery would not suffice - even with thread-merged redo application - to recover a backup datafile, even if it were checkpointed at the current database checkpoint. The reason is that in all but the database checkpoint thread, crash recovery would miss applying redo between the database checkpoint and the (higher) thread checkpoint. By contrast, media recovery would start redo application at the file checkpoint in all threads. Furthermore, crash recovery might fail even if it started redo application at the file checkpoint in all threads. The reason is that crash recovery assumes that it will need only online logfiles. All but the database checkpoint thread might have already archived and re-used a needed log. If the STARTUP RECOVER command is used (in place of simple STARTUP), and crash recovery fails due to datafiles needing media recovery (e.g. they are restored backups), then media recovery via RECOVER DATABASE (see 6.4.1) is automatically executed prior to database open. 5.4 Checkpoints Instance recovery does not attempt to apply redo that is before the checkpoint SCN of a datafile. (The datafile header checkpoint SCNs are not used to decide where to start recovery, however.) The redo from the thread checkpoint through the end-of-thread must be read to find the end-of-thread and the highest SCN allocated by the thread. These are then used to close the thread and advance the thread checkpoint. The end of a instance recovery almost always advances the datafile checkpoints, and always advances the checkpoint counters. 5.5 Crash Recovery Completion At the termination of crash recovery, the "fuzzy bits" - online- fuzzy, hotbackup-fuzzy, media-recovery-fuzzy - of all online datafiles are cleared. A special redo record, the end-crash-recovery "marker," is generated. This record is interpreted by media recovery to know when it is permissible to clear the online-fuzzy and hotbackup-fuzzy bits of the datafiles undergoing recovery (see 6.6). 6 Media Recovery Media recovery is used to recover from a lost or damaged datafile, or from a lost current controlfile. It is used to transform a restored datafile backup into a "current" datafile. It is also used to restore changes that were lost when a datafile went offline without a checkpoint. Media recovery can apply archived logs as well as online logs. Unlike instance or crash recovery, media recovery is invoked only via explicit command. 6.1 When to Do Media Recovery As was seen in 5.3, a restored datafile backup always needs media recovery, even if its recovery can be accomplished using only online logs. The same is true of a datafile that went offline without a checkpoint. The database cannot be opened if any of the online datafiles needs media recovery. A datafile that needs media recovery cannot be brought online until media recovery has been executed. Unless the database is not open by any instance, media recovery can only operate on offline files. Media recovery may be explicitly invoked to recover a database prior to open even when crash recovery would have sufficed. If so, crash recovery - though it may find nothing to do - will still be invoked automatically at database open. Note that media recovery may be run - and, in cases such as restored backups or datafiles that went offline immediate, must be run - even if recovery can be accomplished using only the online logs. Media recovery may find nothing to do - and signal the "no recovery required" error - if invoked for files that do not need recovery. If the current controlfile is lost and a backup controlfile is restored in its place, media recovery must be done. This is the case even if all of the datafiles are current. 6.2 Thread-Merged Redo Application Media recovery uses a thread-merged redo application algorithm: i.e. it applies redo from all threads simultaneously, merging redo records in increasing SCN order. The process of media-recovering a backup datafile differs from the process of crash-recovering a current online datafile in the following fundamental way: Crash recovery applies redo from one thread at a time because any block of a current online file can need redo from at most one thread (one instance at a time can dirty a block in cache). With a restored backup, however, no assumption can be made about the number of threads that have redo relevant to particular block. In general, recovering a backup requires simultaneous application of redo from all threads, with merging of redo records across threads in SCN order. Note that this algorithm depends on a redo-generation- time guarantee that changes for a given block occur in increasing SCN order across threads (case of Parallel Server). 6.3 Restoring Backups The administrator may copy backup versions of datafiles to the current datafile while the database is shut down or the file is offline. There is a strong assumption that backups are never copied to files that are currently accessible. Every file header read verifies that this has not been done by comparing the checkpoint counter in the file header with the checkpoint counter in the datafile's controlfile record. 6.4 Media Recovery Commands There are three media recovery commands: 7 RECOVER DATABASE 7 RECOVER TABLESPACE 7 RECOVER DATAFILE The only essential difference in these commands is in how the set of files to recover is determined. They all use the same criteria for determining if the files can be recovered. There is a lock per datafile that is held exclusive by a process doing media recovery on a file, and is held shared by an instance that has the database open with the file online. Media recovery signals an error if it cannot get the lock for a file it is asked to recover. This prevents two recovery sessions from recovering the same file, and prevents media recovery of a file that is in use. 6.4.1 RECOVER DATABASE This command does media recovery on all online datafiles that need any redo applied. If all instances were cleanly shutdown, and no backups were restored, this command will signal the "no recovery required" error. It will also fail if any instances have the database open, since they will have the datafile locks. 6.4.2 RECOVER TABLESPACE This command does media recovery on all datafiles in the tablespaces specified. In order to translate (i.e. via the data dictionary) the tablespace names into datafile names, the database must be open. This means that the tablespaces and their constituent datafiles must be offline in order to do the recovery. An error is signalled if none of the tablepace's constituent files needs recovery. 6.4.3 RECOVER DATAFILE This command specifies the datafiles to be recovered. The database may be open; or it may be closed, as long as the media recovery locks can be acquired. If the database is open in any instance, then datafile recovery can only recover offline files. 6.5 Starting Media Recovery Media recovery starts by finding the media-recovery-start SCN: i.e. the lowest SCN of the datafile header checkpoints of the files being recovered. Note: An exception occurs if a file's checkpoint is in its offline range (see 2.18). In that case, the file's offline-end checkpoint is used in place of its datafile header checkpoint in computing the media-recovery-start SCN. A buffer for reading redo is allocated for each thread in the enabled thread bitvec of the media-recovery-start checkpoint (i.e. the datafile checkpoint with the lowest SCN). The initial file header checkpoint SCN of every file is saved to ensure that no redo from a previous use of the file number is applied, as well as to eliminate needlessly attempting to apply redo to a file from before its checkpoint. The stop SCNs (from the datafiles' controlfile records) are also saved. If finite, the highest stop SCN can be used to allow recovery to terminate without needlessly searching for redo beyond that SCN to apply (see 6.10). At recovery completion, any datafile initially found to have a finite stop SCN will be left checkpointed at that stop SCN (rather than at the recovery end-point). This allows an offline-clean or read-only datafile to be left checkpointed at an SCN that matches the tablespace-clean-stop-SCN of its tablespace. 6.6 Applying Redo, Media Recovery Checkpoints A log is opened for each thread of redo that was enabled at the time the media-recovery-start SCN was allocated (i.e. for each thread in the enabled thread bitvec of the media-recovery-start checkpoint). If the log is online, then it is automatically opened. If the log was archived, then the user is prompted to enter the name of the log (unless automatic recovery is being used). The redo is applied from all the threads in the order it was generated, switching threads as needed. The order of application of redo records without an SCN is not precise, but it is good enough for rollback to make the database consistent. Except in the case of cancel-based incomplete recovery (see 6.12.1) and backup controlfile recovery (see 6.13), the next online log in sequence is accessed automatically, if it is on disk. If not, the user is prompted for the next log. At log boundaries, media recovery executes a "checkpoint." As part of media recovery checkpoint, the dirty recovery buffers are written to disk and the datafile header checkpoints of the files undergoing recovery are advanced, so that the redo does not need to be reapplied. Another type of media recovery "checkpoint" occurs when a datafile initially found to have a finite stop SCN reaches that stop SCN. At such a stop SCN boundary, all dirty recovery buffers are written to disk, and the datafiles that have been made current have their datafile header checkpoints advanced to their stop SCN values. 6.7 Media Recovery and Fuzzy Bits 6.7.1 Media-Recovery-Fuzzy The media-recovery-fuzzy bit is a flag in the datafile header that is used to indicate that - due to ongoing redo application by media recovery - the file may contain changes in the future of (at SCNs beyond) the current header checkpoint SCN. The media-recovery- fuzzy bit is set at the start of media recovery for each file undergoing recovery. Generally the media-recovery-fuzzy bits can be cleared when a media recovery checkpoint advances the checkpoints in the datafile headers. They are left clear when a media recovery session completes successfully or is cancelled. As will be seen on 8.1, open with resetlogs following incomplete media recovery will fail if any online datafile has the media- recovery-fuzzy bit (or any fuzzy bit) set. 6.7.2 Online-Fuzzy Upon encountering an end-crash-recovery "marker" (or a file- specific offline-immediate "marker": generated when a datafile goes offline without a checkpoint), media recovery can (at the next media recovery checkpoint) clear (if set) the online-fuzzy and hotbackup-fuzzy bits in the appropriate datafile header(s). 6.7.3 Hotbackup-Fuzzy Upon encountering an end-backup "marker" (or an end-crash- recovery "marker"), media recovery can (at the next media recovery checkpoint) clear the hotbackup-fuzzy bit. Open with resetlogs following incomplete media recovery will fail if any online datafile has the hotbackup-fuzzy bit (or any fuzzy bit) set. This prevents a successful RESETLOGS open following an incomplete recovery that terminated before all redo generated between BEGIN BACKUP and END BACKUP had been applied. Ending incomplete recovery at such a point would generally result in an inconsistent file, since the backup copy may already have contained changes between this endpoint and the END BACKUP. 6.8 Thread Enables A special thread-enable redo record is written in the thread of an instance enabling a new thread. If media recovery encounters a thread-enable redo record, it allocates a new redo buffer, opens the appropriate log in the new thread, and prepares to start applying redo from the new thread. 6.9 Thread Disables When a thread is disabled, its current log is marked as the end of a disabled thread. After media recovery finishes applying redo from such a log, it deallocates the thread's redo buffer and stops looking for redo from the thread. 6.10 Ending Media Recovery (Case of Complete Media Recovery) The current (i.e. last) log in every enabled thread has the end-of- thread flag set in its header. Complete (as opposed to incomplete: see 6.12) media recovery always continues redo application through the end-of-thread in all threads. The end-of-thread log can be identified without having the current controlfile, since the end- of-thread flag is in the log header rather than in the logfile's controlfile record. Note: Backing up and later restoring copies of current online logs is dangerous, and can lead to mis-identification of the current true end-of-thread. This is because the end-of-thread flag in the backup copy will in general be out-of-date with respect to the current end- of-thread log. If the datafiles being recovered have finite stop SCNs in their controlfile records (assuming a current controlfile), then media recovery can stop prior to the end-of-threads. Redo application for a datafile with a finite stop SCN can terminate at that SCN, since it is guaranteed that no redo for that datafile beyond that SCN was generated. As described on 2.15, the stop SCN is set when a datafile goes offline. Note that without the optimization that allows recovery of a file with a finite stop SCN to terminate at that SCN, it could not be guaranteed that recovery of an offline datafile while the database is open would terminate. 6.11 Automatic Recovery Automatic recovery is invoked by using the AUTOMATIC option of the media recovery command. It saves the user the trouble of entering the names of archived logfiles, provided they are on disk. If the sequence number of the log can be determined, then a name can be constructed by concatenating the current values of the initialization parameters LOG_ARCHIVE_DEST and LOG_ARCHIVE_FORMAT. The current LOG_ARCHIVE_DEST is assumed, unless the user overrides it by specifying a different archiving destination for the recovery session. The media- recovery-start checkpoint (see 6.5) contains (in the RBA field) the initial log sequence number for one thread (i.e. the thread that generated the checkpoint). If multiple threads of redo are enabled, the log history section of the controlfile (if configured) can be used to map the media-recovery-start SCN to a log sequence number for each thread. Once the initial recovery log is found for a thread, all subsequent logs needed from the thread follow in order. If it is not possible to determine the initial log sequence number, the user will have to guess and try logs until the right one is accepted. The timestamp from the media-recovery-start checkpoint is reported to aid in this effort. 6.12 Incomplete Recovery A RECOVER DATABASE execution can be stopped and the database opened before all the redo has been applied. This type of recovery is termed incomplete recovery. The subsequent database open is termed a RESETLOGS open. Incomplete recovery effectively sets the entire database backwards in time to a transaction-consistent state at or near the recovery end- point. All subsequent updates to the database are lost and must be re-entered. Use of incomplete recovery is indicated in the following circumstances: 7 Media recovery is necessary (e.g. due to datafile damage or loss), but cannot be complete (i.e. all redo cannot be applied) because all copies of a needed online or archived redo log were lost. 7 All copies of an active (i.e. needed for instance recovery) log were damaged or lost while the database was open. Since crash recovery is precluded, this case reduces to the previous case. 7 It is necessary to reverse the effect of an erroneous user action (e.g. table drop or batch run); and it is acceptable to set the entire database - not just the affected schema objects - backwards to a point-in-time before the error. 6.12.1 Incomplete Recovery UNTIL Options There are three types of incomplete recovery. They differ in the means used to stop the recovery: 7 Cancel-Based (RECOVER DATABASE UNTIL CANCEL) 7 Change-Based (RECOVER DATABASE UNTIL CHANGE) 7 Time-Based (RECOVER DATABASE UNTIL TIME) The UNTIL CANCEL option terminates recovery when the user enters "cancel" rather than the name of a log. Online logs are not automatically applied in this mode in case cancellation at the next log is desired. If multiple threads of redo are being recovered, there may be logs in other threads that are partially applied when the recovery is cancelled. The UNTIL CHANGE option terminates redo application just before any redo associated with the specified SCN or higher. Thus the transaction that committed at that SCN will be rolled back. If you want to recover through a transaction that committed at a specific SCN, then add one to the specified SCN. The UNTIL TIME option works similarly to the UNTIL CHANGE option, except that a time rather than an SCN is specified. Recovery uses the timestamps in the redo block headers to convert the specified time into an SCN. Then recovery is stopped when that SCN is reached. 6.12.2 Incomplete Recovery and Consistency In order to avoid database corruption when running incomplete recovery, all datafiles must be recovered to the exact same point. Furthermore, no datafile must have any changes in the future of this point. This requires that incomplete media recovery must start from datafiles restored from backups whose copies completed prior to the intended stop time. The system uses file header fuzzy bits (see 8.1) to ensure that the datafiles contain no changes in the future of the stop time. 6.12.3 Incomplete Recovery and Datafiles Known to the Controlfile If recovering to a time before a datafile was dropped, the dropped file must appear in the controlfile used for recovery. Otherwise it would not be recovered. One alternative for achieving this is to recover using a backup controlfile made before the datafile was dropped. Another alternative is to use the CREATE CONTROLFILE command to construct a controlfile that lists the dropped datafile. Recovering to a time before a file was added is not a problem. The extra datafile will be eliminated from the controlfile after the database is open. The unwanted file may be taken offline before the recovery to avoid accessing it. 6.12.4 Resetlogs Open after Incomplete Recovery The next database open after an incomplete recovery must specify the RESETLOGS option. Amongst other effects (see Section 7), resetlogs throws away the redo that was not applied during the incomplete recovery, and marks the database so that the skipped redo can never be accidentally applied by a subsequent recovery. If the incomplete recovery was a mistake (e.g. the lost log was found), the next open can specify the NORESETLOGS option. However, for the open with NORESETLOGS to succeed, it must be preceded by a successful execution of complete recovery (i.e. one in which all redo is applied). 6.12.5 Files Offline during Incomplete Recovery If a file is offline during incomplete recovery, it will not be recovered. This is ok if the file is part of a tablespace that was taken offline normal, and that is still offline normal at the recovery end- point. Otherwise, if the file is still offline when the resetlogs is done, the tablespace containing the file will have to be dropped. This is because it will need media recovery with logs from before the resetlogs. In general V$DATAFILE should be checked to ensure that files are online before running an incomplete recovery. Only files that will be dropped and files that are part of offline normal (or read-only) tablespaces should be offline (Section 8.6). 6.13 Backup Controlfile Recovery If recovery is done with a controlfile other than the current one, then backup controlfile recovery (RECOVER DATABASE...USING BACKUP CONTROLFILE) must be used. This applies both to the case of a restored controlfile backup, and to the case of a "backup" controlfile created via CREATE CONTROLFILE...RESETLOGS. Use of CREATE CONTROLFILE...RESETLOGS makes a controlfile that is a "backup." Only a backup controlfile recovery can be run after executing CREATE CONTROLFILE...RESETLOGS. Only a RESETLOGS open can be used after executing CREATE CONTROLFILE...RESETLOGS. Use of CREATE CONTROLFILE...RESETLOGS is indicated if (all copies of) an online redo log were lost in addition to (all copies of) the control file. By contrast, CREATE CONTROLFILE...NORESETLOGS makes a controlfile that is "current"; i.e. it has knowledge of the current state of the online logfiles and log sequence numbers. A backup controlfile recovery is not necessary following CREATE CONTROLFILE...NORESETLOGS. Indeed, no recovery at all is required if there was a clean shutdown, and if no datafile backups have been restored. A normal or NORESETLOGS open may follow CREATE CONTROLFILE ...NORESETLOGS. A backup controlfile lacks valid information about the current online logs and datafile stop SCNs. Hence, recovery cannot look for online logs to automatically apply. Moreover, recovery must assume infinite stop SCN's. A RESETLOGS open corrects this information. The backup controlfile may have a different set of threads enabled than did the original controlfile. That set will be the effective enabled thread set following RESETLOGS open. The BACKUP CONTROLFILE option may be used either alone or in conjunction with an incomplete recovery option. Unless an incomplete recovery option is included, all threads must be applied to the end-of-thread. This is validated at open resetlogs time. It is currently required that a RESETLOGS open follow execution of backup controlfile recovery, even if no incomplete recovery option was used. The following procedure could be used to avoid a backup controlfile recovery and resetlogs in case the only problem is a lost current controlfile (and a backup controlfile exists): 1. Copy the backup controlfile to the current control file and do a STARTUP MOUNT. 2. Issue ALTER DATABASE BACKUP CONTROLFILE TO TRACE NORESETLOGS. 3. Issue the CREATE CONTROLFILE...NORESETLOGS com- mand from the SQL script output by Step 2. It is important to assure that the CREATE CONTROLFILE command issued in Step 3 creates a controlfile reflecting a database structure equivalent to that of the lost current controlfile. For example, if a datafile was added since the backup controlfile was saved, then the CREATE CONTROLFILE command should be modified to declare the added datafile. Failure to specify the BACKUP CONTROLFILE option on the RECOVER DATABASE command when the controlfile is indeed a backup can frequently be detected. One indication of a restored backup controlfile would be a datafile header checkpoint count that is greater than the checkpoint count in the datafile's controlfile record. However, this test may not catch the backup controlfile if the datafiles are also backups. Another test validates the online logfile headers against their corresponding controlfile records, but this too may not always catch an old controlfile. 6.14 CREATE DATAFILE: Recover a Datafile Without a Backup If a datafile is lost or damaged and no backup of the file is available, it can be recovered using only information in the redo logs and control file. The following conditions must be met: 1. All redo logs written since the datafile was originally created must be available. 2. A control file in which the datafile is declared (i.e. name and size information) must be available or re-creatable. The CREATE DATAFILE clause of the ALTER DATABASE command is first used to create a new, empty replacement for the lost datafile. RECOVER DATAFILE is then used to apply all redo generated for the file from the time of its original creation until the time it was lost. After all redo logs written since the datafile was originally created have been applied, the file will have been restored to its state at the time it was lost. This mechanism is useful for recovering a recently-created datafile for which no backup has yet been taken. The original datafiles of the SYSTEM tablespace cannot be recovered by this means, however, since relevant redo data is not saved at database creation time. 6.15 Point-in-Time Recovery Using Export/Import Occasionally, it may become necessary to reverse the effect of an erroneous user action (e.g. table drop or batch run). One approach would be to perform an incomplete media recovery to a point-in- time before the corruption, then open the database with the RESETLOGS option. Using this approach, the entire database - not just the affected schema objects - would be set backwards in time. This approach has an undesirable side-effect: it discards committed transactions. Any updates that occurred subsequent to the resetlogs SCN are lost and must be re-entered. Resetlogs has another undesirable side-effect: it renders all pre-existing backups unusable for future recovery. Setting a mission-critical database globally back in time is often not an acceptable solution. The following procedure is an alternative whose effect on the mission-critical database is to set just the affected schema objects - termed the recovery-objects - backwards in time. Point-in-time incomplete media recovery is run against a side-copy of the production database, called the recovery-database. The initial version of the recovery-database is created using backups of the production database that were taken before the corruption occurred. Non-relevant objects in the recovery-database can be taken offline in order to avoid unnecessarily recovering them. However, the SYSTEM tablespace and all tablespaces containing rollback segments must participate in the media recovery in order to allow a clean open. (Note that this is a good reason to place rollback segments and data segments into separate tablespaces.) After it has undergone point-in-time incomplete media recovery, the recovery-database is opened with the RESETLOGS option. The recovery-database is now set backwards to a point-in-time before the recovery-objects were corrupted. This effectively creates pre-corruption versions of the recovery-objects in the recovery-database. These objects can then be exported from the recovery-database and imported back into the production database. Prior to importing the recovery-objects, the production database is prepared as follows: 7 In the case of recovering an erroneously updated schema object, the copy of the object in the production database is pre- pared by discarding just the data; e.g. the table is truncated. 7 In the case of recovering an erroneously dropped schema object, the object is re-created (empty) in the production data- base. The import operation is then executed, using the data-only option as appropriate. Since export/import can be a lengthy process, it may be desirable to postpone it until a time when recovery-object unavailability can be tolerated. In the meantime, the recovery- objects can be made available, albeit at degraded performance, via a database link between the production database and the recovery- database. An undesirable side-effect of this approach is that transaction consistency across objects is lost. This side-effect can be avoided by widening the recovery-object set to include all objects that must be kept transaction-consistent. 7 Block Recovery Block recovery is the simplest type of recovery. It is performed automatically by the system during normal operation of the database, and is transparent to the user. 7.1 Block Recovery Initiation and Operation Block recovery is used to clean up the state of a buffer whose modification by a foreground process (in the middle of invoking a redo application callback to apply a change vector to the buffer) was interrupted by the foreground process dying or signalling an error. Recovery involves (i) reading the block from disk; (ii) using the current thread's online redo logs to reconstruct the buffer to a state consistent with the redo already generated; and (iii) writing the recovered block back to disk. If block recovery fails, then after a second attempt, the block is marked logically corrupt (by setting the block sequence number to zero) and a corrupt block error is signalled. Block recovery is guaranteed doable using only the current thread's online redo logs, since: 1. Block recovery cannot require redo from another thread or from before the last thread checkpoint. 2. Online logs are not reused until the current thread checkpoint is beyond the log. 3. No buffer currently in the cache can need recovery from before the last thread checkpoint. 7.2 Buffer Header RBA Fields The buffer header (an in-memory data structure) contains the following fields pertaining to block recovery: Low-RBA and High-RBA: Delineate the range of redo (from the current thread) that needs to be applied to the disk version of the block in order make it consistent with redo already generated. Recovery-RBA: A place marker for recording progress in case the invoker of block recovery is PMON and complete recovery in one invocation would take too long (see next section). 7.3 PMON vs. Foreground Invocation If an error is signalled while a foreground process is in a redo application callback, then the process itself executes block recovery. If foreground process death is detected during a redo application callback, on the other hand, PMON executes block recovery. Block recovery may require an unbounded amount of time and I/O. However, PMON cannot be allowed to spend an inordinate amount of time working on the recovery of one block while neglecting other necessary time-critical tasks. Therefore, a limit is placed on the amount of redo applied by one PMON call to block recovery. (A port-specific constant specifies the maximum number of redo log blocks applied per invocation). As PMON applies redo during invocations of block recovery, it updates the recovery-RBA in the buffer header to record its progress. When a PMON call to block recovery causes the recovery-RBA to reach the high-RBA, then block recovery for that block is complete. 8 Resetlogs The RESETLOGS option is needed on the first database open following: 7 Incomplete recovery 7 Backup controlfile recovery 7 CREATE CONTROLFILE...RESETLOGS. The primary function of resetlogs is to discard the redo that was not applied during incomplete recovery, ensuring that the skipped redo can never be accidentally applied by a subsequent recovery. To accomplish this, resetlogs effectively invalidates all existing redo in all online and archived redo logfiles. This has the side effect of making any existing datafile backups unusable for future recovery operations. Resetlogs also reinitializes the controlfile information about online logs and redo threads, clears the contents of any existing online redo log files, creates the online redo log files if they do not currently exist, and resets the log sequence number in all threads to one. 8.1 Fuzzy Files The most important requirement when doing a RESETLOGS open is that all datafiles be validated as recovered to the same point-in- time. This is what ensures that all the changes in a single redo record are done atomically. It is also important for other consistency reasons. If all threads of redo have been applied through end-of-thread to all online datafiles, then we can be sure that the database is consistent. If incomplete recovery was done, there is the possibility that a file was not restored from a sufficiently old backup. In the general case, this is detectable if the file has a different checkpoint than the other files (exceptions: offline or read-only files). The other possibility is that the file is fuzzy - i.e. it may contain changes in the future of its checkpoint. As seen earlier, the following "fuzzy bits" are maintained in the file header to determine if a file is fuzzy: 7 online-fuzzy bit (see 3.5, 6.7.2) 7 hotbackup-fuzzy bit (see 4, 6.7.3) 7 media-recovery-fuzzy bit (see 6.7.1) Open with resetlogs following incomplete media recovery will fail if any online datafile has any of the three fuzzy bits set. Redo records are created at the end of a hot backup (the end- backup "marker") and after crash recovery (the end-crash-recovery "marker") to enable media recovery to determine when it can clear the fuzzy bits. Resetlogs signals an error if any of the datafiles has any of the fuzzy bits set. Except in the following special circumstances, resetlogs signals an error if any of the datafiles is recovered to a checkpoint SCN different from the one at which the other files are checkpointed (i.e. the resetlogs SCN: see 8.2): 1. A file recovered to an SCN earlier than the resetlogs SCN would be tolerated in case there were no redo generated for the file between its checkpoint SCN and the resetlogs SCN. For example, such would be the case if the file were read-only, and its offline range spanned the checkpoint SCN and resetlogs SCN. In this case, resetlogs would allow the file but set it offline. 2. A file checkpointed at an SCN later than the resetlogs SCN would be tolerated in case its creation SCN (allocated at file creation time and stored in the file header) showed it to have been created after the resetlogs SCN. During the data dictio- nary vs. controlfile check performed by RESETLOGS open (see 8.7), such a file would be found to be missing from the data dictionary but present in the controlfile. As a conse- quence, it would be eliminated from the controlfile. 8.2 Resetlogs SCN and Counter A resetlogs SCN and resetlogs timestamp - known together as the resetlogs data - are kept in the database info record of the controlfile. The resetlogs data is intended to uniquely identify each execution of a RESETLOGS open. The resetlogs data is also stored in each datafile header and in each logfile header. A redo log cannot be applied by recovery if its resetlogs data does not match that in the database info record of the controlfile. Except for some very special circumstances (e.g. offline normal or read-only tablespaces), a datafile cannot be recovered or accessed if its resetlogs data does not match that of the database info record of the controlfile. This ensures that changes discarded by resetlogs do not get back into the database. It also renders previous backups unusable for future recovery operations, making it prudent to take a database backup immediately after a resetlogs. 8.3 Effect of Resetlogs on Threads Each thread's controlfile record is updated to clear the thread-open flag and to set the thread-checkpoint SCN to the resetlogs SCN. Thus, the thread appears to have been closed at the resetlogs SCN. The set of enabled threads from the enabled thread bitvec of the database info controlfile record is used as is. It does not matter which threads were enabled at the end of recovery, since none of the old redo can ever be applied to the database again. The log sequence numbers in all threads are also reset to one. One of the enabled threads is picked as the database checkpoint. 8.4 Effect of Resetlogs on Redo Logs The redo is thrown away by zeroing all the online logs. Note that this means that redo in the online logs would be lost forever - and there would be no way to undo the resetlogs in an emergency - if the online logs were not backed up prior to executing resetlogs. Note that ensuring the ability to undo an erroneous resetlogs is the only valid rationale for making backups of online logs. Undoing an erroneous resetlogs requires re-running the entire recovery operation from the beginning, after restoring backups of all datafiles, controlfile, and online logs. One log is picked to be the current log for every enabled thread. That log header is written as log sequence number one. Note that the set of logs and their thread association is picked up from the controlfile (i.e. using the thread number and log list fields of the logfile records). If it is a backup controlfile, this may be different from what was current the last time the database was open. 8.5 Effect of Resetlogs on Online Datafiles The headers of all the online datafiles are updated to be checkpointed at the new database checkpoint. The new resetlogs data is also written to the header. 8.6 Effect of Resetlogs on Offline Datafiles The controlfile record for an offline file is set to indicate the file needs media recovery. However that will not be possible because it would be necessary to apply redo from logs with the wrong resetlogs data. This means that the tablespace containing the file will have to be dropped. There is one important exception to this rule. When a tablespace is taken offline normal or set read-only, the checkpoint SCN written to the headers of the tablespace's constituent datafiles is saved in the data dictionary TS$ table as the tablespace-clean-stop SCN (see 2.17). No recovery is ever needed to bring a tablespace and its files online if the files are not fuzzy and are checkpointed at exactly the tablespace-clean-stop SCN. Even the resetlogs data in the offline file header is ignored in this case. Thus a tablespace that is offline normal is unaffected by any resetlogs that leaves the database at a time when the tablespace is offline. 8.7 Checking Dictionary vs. Controlfile on Resetlogs Open After the rollback phase of RESETLOGS open, the datafiles listed in the data dictionary FILE$ table are compared with the datafiles listed in the controlfile. This is also done on the first open after a CREATE CONTROLFILE. There is the possibility that incomplete recovery ended at a time when the files in the database were different from those in the controlfile used for the recovery. Using a backup controlfile or creating one can have the same problem. Checking the dictionary does not do any harm, so it could be done on every database open; however there is no point in wasting the time under normal circumstances. The entry in FILE$ is compared with the entry in the controlfile for every file number. Since FILE$ reflects the space allocation information in the database, it is correct, and the controlfile might be wrong. If the file does not exist in FILE$ but the controlfile record says the file exists, then the file is simply dropped from the controlfile. If a file exists in FILE$ but not in the controlfile, a placeholder entry is created in the control file under the name MISSINGnnnn (where nnnn is the file number in decimal). MISSINGnnnn is flagged in the control file as being offline and needing media recovery. The actual file corresponding (with respect to the file header contents as opposed to the file name) to MISSINGnnnn can be made accessible by renaming MISSINGnnnn to point to it. In the RESETLOGS open case however, rename can succeed in making the file usable only in case the file was read-only or offline normal. If, on the other hand, MISSINGnnnn corresponds to a file that was not read-only or offline normal, then the rename operation cannot be used to make it accessible, since bringing it online would require media recovery with redo from before the resetlogs. In this case, the tablespace containing the datafile must be dropped. When the dictionary check is due to open after CREATE CONTROLFILE...NORESETLOGS rather than to open resetlogs, media recovery may be used to make the file current. Another option is to repeat the entire operation that lead up to the dictionary check with a controlfile that lists the same datafiles as the data dictionary. For incomplete recovery, this would involve restoring all backups and repeating the recovery. 9 Recovery-Related V$ Fixed-Views The V$ fixed-views contain columns that extract information from data structures dynamically maintained in memory by the kernel. These "views" make this information accessible to the DBA under SYS. The following is a summary of recovery-related information that is viewable via V$ views: 9.1 V$LOG Contains log group information from the controlfile: GROUP# THREAD# SEQUENCE# SIZE_IN_BYTES MEMBERS_IN_GROUP ARCHIVED_FLAG STATUS_OF_ GROUP (unused, current, active, inactive) LOW_SCN LOW_SCN_TIME 9.2 V$LOGFILE Contains log file (i.e. group member) information from the controlfile: GROUP# STATUS_OF_MEMBER (invalid, stale, deleted) NAME_OF_MEMBER 9.3 V$LOG_HISTORY Contains log history information from the controlfile: THREAD# SEQUENCE# LOW_SCN LOW_SCN_TIME NEXT_SCN 9.4 V$RECOVERY_LOG Contains information (from the controlfile log history) about archived logs needed to complete media recovery.: THREAD# SEQUENCE# LOW_SCN_TIME ARCHIVED_NAME 9.5 V$RECOVER_FILE Contains information on the status of files needing media recovery: FILE# ONLINE_FLAG REASON_MEDIA_RECOVERY_NEEDED RECOVERY_START_SCN RECOVERY_START_SCN_TIME 9.6 V$BACKUP Contains status information relative to datafiles in hot backup: FILE# FILE_STATUS (no-backup-active, backup-active, offline-normal, error) BEGIN_BACKUP_SCN BEGIN_BACKUP_TIME 10 Miscellaneous Recovery Features 10.1 Parallel Recovery (v7.1) The goal of the parallel recovery feature is to use compute and I/O parallelism to reduce the elapsed time required to perform crash recovery, single-instance recovery, or media recovery. Parallel recovery is most effective at reducing recovery time when several datafiles on several disks are being recovered concurrently. 10.1.1 Parallel Recovery Architecture Parallel recovery partitions recovery processing into two operations: 1. Reading the redo log. 2. Applying the change vectors. Operation #1 does not easily lend itself to parallelization. The redo log(s) must be read in sequentially, and merged in the case of media recover. Thus, this task is assigned to one process: the redo-reading-process. Operation #2, on the other hand, easily lends itself to parallelization. Thus, the task of change vector application is delegated to some number of redo-application-slave-processes. The redo-reading-process sends change vectors to the redo- application-slave-processes using the same IPC (inter-process- communication) mechanism used by parallel query. The change vectors are distributed based on the hash function that takes the block address as argument (i.e. DBA modulo # redo-application- slave-processes). Thus, each redo-application-slave-process handles only change vectors for blocks whose DBAs hash to its "bucket" number. The redo-application-slave-processes are responsible for reading the datablocks into cache, checking whether or not the change vectors need to be applied, and applying the change vectors if needed. This architecture achieves parallelism in log read I/O, datablock read I/O, and change vector processing. It allows overlap of log read I/Os with datablock read I/Os. Moreover, it allows overlap of datablock read I/Os for different hash "buckets." Recovery elapsed time is reduced as long as the benefits of compute and I/O parallelism outweigh the costs of process management and inter- process-communication. 10.1.2 Parallel Recovery System Initialization Parameters PARALLEL_RECOVERY_MAX_THREADS PARALLEL_RECOVERY_MIN_THREADS These initialization parameters control the number of redo- application-slave-processes used during crash recovery or media recovery of all datafiles. PARALLEL_INSTANCE_RECOVERY_THREADS This initialization parameter controls the number of redo-appli- cation-slave-processes used during instance recovery. 10.1.3 Media Recovery Command Syntax Changes RECOVER DATABASE has a new optional parameter for specify- ing the number of redo-application-slave-processes. If specified, it overrides PARALLEL_RECOVERY_MAX_THREADS. RECOVER TABLESPACE has a new optional parameter for spec- ifying the number of redo-application-slave-processes. If speci- fied, it overrides PARALLEL_RECOVERY_MIN_THREADS. RECOVER DATAFILE has a new optional parameter for specify- ing the number of redo-application-slave-processes. If specified, it overrides PARALLEL_RECOVERY_MIN_THREADS. 10.2 Redo Log Checksums (v7.2) The log checksum feature allows a potential corruption in an online redo log to be detected when the log is read for archiving. The goal is to prevent the corruption from being propagated, undetected, to the archive log copy. This feature is intended to be used in conjunction with a new command, CLEAR LOGFILE, that allows a corrupted online redo log to be discarded without having to archive it. A new initialization parameter, LOG_BLOCK_CHECKSUM, controls activation of log checksums. If it is set, a log block checksum is computed and placed in the header of each log block as it is written out of the redo log buffer. If present, checksums are validated whenever log blocks are read for archiving or recovery. If a checksum is detected as invalid, an attempt is made to read another member of the log group (if any). If an irrecoverable checksum error is detected - i.e. the checksum is invalid in all members - then the log read operation fails. Note that a rudimentary mechanism for detecting log block header corruption was added, along with log group support, in v7.1. The log checksum feature extends corruption detection to the whole block. If an irrecoverable checksum error prevents a log from being read for archiving, then the log cannot be reused. Eventually log switch - and redo generation - will stall. If no action is taken, the database will hang. The CLEAR LOGFILE command provides a way to obviate the requirement that the log be archived before it can be reused. 10.3 Clear Logfile (v7.2) If all members of an online redo log group are "lost" or "corrupted" (e.g. due to checksum error, media error, etc.), redo generation may proceed normally until it becomes necessary to reuse the logfile. Once the thread checkpoints of all threads are beyond the log, it is a potential candidate for reuse. Possible scenarios preventing reuse are the following: 1. The log cannot be archived due to a checksum error; it cannot be reused because it needs archiving. 2. A log switch attempt fails because the log is inaccessible (e.g. due to a media error). The log may or may not have been archived. The ALTER DATABASE CLEAR LOGFILE command is provided as an aid to recovering from such scenarios involving an inactive online redo log group (i.e. one that is not needed for crash recovery). CLEAR LOGFILE allows an inactive online logfile to be "cleared": i.e. discarded and reinitialized, in a manner analogous to DROP LOGFILE followed by ADD LOGFILE. In many cases, use of this command obviates the need for database shutdown or resetlogs. Note: CLEAR LOGFILE cannot be used to clear a log needed for crash recovery (i.e. a "current" or "active" log of an open thread). Instead, if such a log becomes lost or corrupted, shutdown abort followed by incomplete recovery and open resetlogs will be necessary. Use of the UNARCHIVED option allows the log clear operation to proceed even if the log needs archiving: an operation that would be disallowed by DROP LOGFILE. Furthermore, CLEAR LOGFILE allows the log clear operation to proceed in the following cases: 7 There are only two logfile groups in the thread. 7 All log group members have been lost through media failure. 7 The logfile being cleared is the current log of a closed thread. All of these operations would be disallowed in the case of DROP LOGFILE. Clearing an unarchived log makes unusable any existing backup whose recovery would require applying redo from the cleared log. Therefore, it is recommended that the database be immediately backed up following use of CLEAR LOGFILE with the UNARCHIVED option. Furthermore, the UNRECOVERABLE DATAFILE option must be used if there is a datafile that is offline, and whose recovery prior to onlining requires application of redo from the cleared logfile. Following use of CLEAR LOGFILE with the UNRECOVERABLE DATAFILE option, the offline datafile, together with its entire tablespace, will have to be dropped from the database. This is due to the fact that redo necessary to bring it online has been cleared, and there is no other copy of it. The foreground process executing CLEAR LOGFILE processes the command in several steps: 7 It checks that the logfile is not needed for crash recovery and is clearable. 7 It sets the "being cleared" and "archiving not needed" flags in the logfile controlfile record. While the "being cleared" flag is set, the logfile is ineligible for reuse by log switch. 7 It recreates a new logfile, and performs multiple writes to clear it to zeroes (a lengthy process). 7 It resets the "being cleared" flag. If the foreground process executing CLEAR LOGFILE dies while execution is in process, the log will not be usable as the current log. Redo generation may stall and the database may hang, much as would happen if log switch had to wait for checkpoint completion, or for log archive completion. Should the process executing CLEAR LOGFILE die, the operation should be completed by reissuing the same command. Another option would be to drop the partially-cleared log. CLEAR LOGFILE could also fail due to an I/ O error encountered while writing zeros to a log group member. An option for recovering would be to drop that member and add another to replace it.