Subject: TECH: Internals of Recovery Type: REFERENCE Creation Date: 13-SEP-1996 Oracle7 v7.2 Recovery Outline Authors: Andrea Borr & Bill Bridge Version: 1 May 3, 1995 Abstract This document gives an overview of how database recovery works in Oracle7 version 7.2. It is assumed that the reader is familiar with the Database Administrator's Guide for Oracle7 version 7.2. The intention of this document is to describe the recovery algorithms and data structures, providing more details than the Administrator's Guide. Table of Contents 1 Introduction 1.1 Instance Recovery and Media Recovery: Common Mechanisms 1.2 Instance Failure and Recovery, Crash Failure and Recovery 1.3 Media Failure and Recovery 2 Fundamental Data Structures 2.1 Controlfile 2.1.1 Database Info Record (Controlfile) 2.1.2 Datafile Record (Controlfile) 2.1.3 Thread Record (Controlfile) 2.1.4 Logfile Record (Controlfile) 2.1.5 Filename Record (Controlfile) 2.1.6 Log-History Record (Controlfile) 2.2 Datafile Header 2.3 Logfile Header 2.4 Change Vector 2.5 Redo Record 2.6 System Change Number (SCN) 2.7 Redo Logs 2.8 Thread of Redo 2.9 Redo Byte Address (RBA) 2.10 Checkpoint Structure 2.11 Log History 2.12 Thread Checkpoint Structure 2.13 Database Checkpoint Structure 2.14 Datafile Checkpoint Structure 2.15 Stop SCN 2.16 Checkpoint Counter 2.17 Tablespace-Clean-Stop SCN 2.18 Datafile Offline Range 3 Redo Generation 3.1 Atomic Changes 3.2 Write-Ahead Log 3.3 Transaction Commit 3.4 Thread Checkpoint 3.5 Online-Fuzzy Bit 3.6 Datafile Checkpoint 3.7 Log Switch 3.8 Archiving Log Switches 3.9 Thread Open 3.10 Thread Close 3.11 Thread Enable 3.12 Thread Disable 4 Hot Backup 4.1 BEGIN BACKUP 4.2 File Copy 4.3 END BACKUP 4.4 "Crashed" Hot Backup 5 Instance Recovery 5.1 Detection of the Need for Instance Recovery 5.2 Thread-at-a-Time Redo Application 5.3 Current Online Datafiles Only 5.4 Checkpoints 5.5 Crash Recovery Completion 6 Media Recovery 6.1 When to Do Media Recovery 6.2 Thread-Merged Redo Application 6.3 Restoring Backups 6.4 Media Recovery Commands 6.4.1 RECOVER DATABASE 6.4.2 RECOVER TABLESPACE 6.4.3 RECOVER DATAFILE 6.5 Starting Media Recovery 6.6 Applying Redo, Media Recovery Checkpoints 6.7 Media Recovery and Fuzzy Bits 6.7.1 Media-Recovery-Fuzzy 6.7.2 Online-Fuzzy 6.7.3 Hotbackup-Fuzzy 6.8 Thread Enables 6.9 Thread Disables 6.10 Ending Media Recovery (Case of Complete Media Recovery) 6.11 Automatic Recovery 6.12 Incomplete Recovery 6.12.1 Incomplete Recovery UNTIL Options 6.12.2 Incomplete Recovery and Consistency 6.12.3 Incomplete Recovery and Datafiles Known to the Controlfile 6.12.4 Resetlogs Open after Incomplete Recovery 6.12.5 Files Offline during Incomplete Recovery 6.13 Backup Controlfile Recovery 6.14 CREATE DATAFILE: Recover a Datafile Without a Backup 6.15 Point-in-Time Recovery Using Export/Import 7 Block Recovery 7.1 Block Recovery Initiation and Operation 7.2 Buffer Header RBA Fields 7.3 PMON vs. Foreground Invocation 8 Resetlogs 8.1 Fuzzy Files 8.2 Resetlogs SCN and Counter 8.3 Effect of Resetlogs on Threads 8.4 Effect of Resetlogs on Redo Logs 8.5 Effect of Resetlogs on Online Datafiles 8.6 Effect of Resetlogs on Offline Datafiles 8.7 Checking Dictionary vs. Controlfile on Resetlogs Open 9 Recovery-Related V$ Fixed-Views 9.1 V$LOG 9.2 V$LOGFILE 9.3 V$LOG_HISTORY 9.4 V$RECOVERY_LOG 9.5 V$RECOVER_FILE 9.6 V$BACKUP 10 Miscellaneous Recovery Features 10.1 Parallel Recovery (v7.1) 10.1.1 Parallel Recovery Architecture 10.1.2 Parallel Recovery System Initialization Parameters 10.1.3 Media Recovery Command Syntax Changes 10.2 Redo Log Checksums (v7.2) 10.3 Clear Logfile (v7.2) 1 Introduction The Oracle RDBMS provides database recovery facilities capable of preserving database integrity in the face of two major failure modes: 1. Instance failure: loss of the contents of a buffer cache, or data residing in memory. 2. Media failure: loss of database file storage on disk. Each of these two major failure modes raises its own set of challenges for database integrity. For each, there is a set of requirements that a recovery utility addressing that failure mode must satisfy. Although recovery processing for the two failure modes has much in common, the requirements differ enough to motivate the implementation of two different recovery facilities: 1. Instance recovery: recovers data lost from the buffer cache due to instance failure. 2. Media recovery: recovers data lost from disk storage. 1.1 Instance Recovery and Media Recovery: Common Mechanisms Both instance recovery and media recovery depend for their operation on the redo log. The redo log is organized into redo threads, referred to hereafter simply as threads. The redo log of a single-instance (non-Parallel Server option) database consists of a single thread. A Parallel Server redo log has a thread per instance. A redo log thread is a set of operating system files in which an instance records all changes it makes - committed and uncommitted - to memory buffers containing datafile blocks. Since this includes changes made to rollback segment blocks, it follows that rollback data is also (indirectly) recorded in the redo log. The first phase of both instance and media recovery processing is roll-forward. Roll-forward is the task of the RDBMS recovery layer. During roll-forward, changes recorded in the redo log are re- applied (as needed) to the datafiles. Because changes to rollback segment blocks are recorded in the redo log, roll-forward also regenerates the corresponding rollback data. When the recovery layer finishes its task, all changes recorded in the redo log have been restored by roll-forward. At this point, the datafile blocks contain not only all committed changes, but also any uncommitted changes recorded in the redo log. The second phase of both instance and media recovery processing is roll-back. Roll-back is the task of the RDBMS transaction layer. During roll-back, undo information from rollback segments (as well as from save-undo/deferred rollback segments, if appropriate) is used to undo uncommitted changes that were applied during the roll-forward phase. 1.2 Instance Failure and Recovery, Crash Failure and Recovery Instance failure, a failure resulting in the loss of the instance's buffer cache, occurs when an instance is aborted, either unexpectedly or expectedly. Examples of reasons for unexpected instance aborts are operating system crash, power failure, or background process failure. Examples of reasons for expected instance aborts are use of the commands SHUTDOWN ABORT and STARTUP FORCE. Crash failure is the failure of all instances accessing a database. In the case of a single-instance (non-Parallel Server option) database, the terms crash failure and instance failure are used interchangeably. Crash recovery (equivalent to instance recovery in this case) is the process of recovering all online datafiles to a consistent state following a crash. This is done automatically in response to the ALTER DATABASE OPEN command. In the case of the Parallel Server option, the term crash failure is used to refer to the simultaneous failures of all open instances. Parallel Server crash recovery is the process of recovering all online datafiles to a consistent state after all instances accessing the database have failed. This is done automatically in response to the ALTER DATABASE OPEN command. Parallel Server instance failure refers to the failure of an instance while a surviving instance continues in operation. Parallel Server instance recovery is the automatic recovery by a surviving instance of a failed instance. Instance failure impairs database integrity because it results in loss of the instance's dirty buffer cache. A "dirty" buffer is one whose memory version differs from its disk version. An instance that aborts has no opportunity for writing out "dirty" buffers so as to prevent database integrity breakage on disk following a crash. Loss of the dirty buffer cache is a problem due to the fact that the cache manager uses algorithms optimized for OLTP performance rather than for crash-tolerance. Examples of performance-optimizing cache management algorithms that make the task of instance recovery more difficult are as follows: 7 LRU (least recently used) based buffer replacement 7 no-datablock-force-at-commit (see 3.3). As a consequence of the performance-oriented cache management algorithms, instance failure can cause database integrity breakage as follows: A. At crash time, the datafiles on disk might contain some but not all of a set of datablock changes that constitute a single atomic change to the database with respect to structural integrity (see 2.5). B. At crash time, the datafiles on disk might contain some dat- ablocks modified by uncommitted transactions. C. At crash time, the datafiles on disk might contain some dat- ablocks missing changes from committed transactions. During instance recovery, the RDBMS recovery layer repairs database integrity breakages A and C. It also enables subsequent repair - by the RDBMS transaction layer - of database integrity breakage B. In addition to the requirement that it repair any integrity breakages resulting from the crash, instance recovery must meet the following requirements: 1. Instance recovery must accomplish the repair using the current online datafiles (as left on disk after the crash). 2. Instance Recovery must use only the on-line redo logs. It must not require use of the archived logs. Although instance recov- ery could work successfully from archived logs (except for a database running in NOARCHIVELOG mode), it could not work autonomously (requirement 4) if an operator were required to restore archived logs. 3. The invocation of instance recovery must be automatic, implicit at the next database startup. 4. Detection of the need for repair and the repair itself must pro- ceed autonomously, without operator intervention. 5. The duration of the roll-forward phase of instance recovery is governed by both RDBMS internal mechanisms (checkpoint) and user-configurable parameters (e.g. number and sizes of logfiles, checkpoint-frequency tuning parameters, parallel recovery parameters). As seen above, Oracle's buffer cache component is optimized for OLTP performance rather than for crash-tolerance. This document describes some of the mechanisms used by the cache and recovery components to solve the problems posed by use of performance- optimizing cache algorithms such as LRU buffer replacement and no-datablock-force-at-commit. These mechanisms enable instance recovery to meet its requirements while allowing optimal OLTP performance. These mechanisms include: 7 Log-Force-at-Commit: see 3.3. Facilitates repair of breakage type C by guaranteeing that, at transaction commit time, all of the transaction's redo records, including its "commit record," are stored on disk in the on-line redo log. 7 Checkpointing: see 3.4, 3.6. Bounds the amount of transaction redo that instance recovery must potentially apply. Works in conjunction with online-log switch management to ensure that instance recovery can be accomplished using only online logs and current online datafiles. 7 Online-Log Switch Management: see 3.7. Works in conjunction with checkpointing to ensure that instance recovery can be accomplished using only online logs and current online datafiles. It guarantees that the current checkpoint is beyond an online logfile before that logfile is reused. 7 Write-Ahead-Log: see 3.2. Facilitates repair of breakage types A and B by guaranteeing that: (i) at crash time there are no changes in the datafiles that are not in the redo log; (ii) no datablock change was written to disk without first writing to the log sufficient information to enable undo of the change should a crash intervene before commit. 7 Atomic Redo Record Generation: see 3.1. Facilitates repair of breakage types A and B. 7 Thread-Open Flag: 5.1. Enables detection at startup time of the need for crash recov- ery. 1.3 Media Failure and Recovery Instance failure affects logical database integrity. Because instance failure leaves a recoverable version of the online datafiles on the post-crash disk, instance recovery can use the online datafiles as a starting point. Media failure, on the other hand, affects physical storage media integrity or accessibility. Because the original datafile copies are damaged, media recovery uses restored backup copies of the datafiles as a starting point. Media recovery then uses the redo log to roll-forward these files, either to a consistent present state or to a consistent past state. Media recovery is run by issuing one of the following commands: RECOVER DATABASE, RECOVER TABLESPACE, RECOVER DATAFILE. Depending on the failure scenario, a media failure has the potential for causing database integrity breakages similar to those caused by an instance failure. For example, an integrity breakage of type A, B, or C could result if I/O accessibility to a datablock were lost between the time the block was read into the buffer cache and the time DBWR attempted to write out an updated version of the block. More typical, however, is the case of a media failure that results in the permanent loss of the current version of a datafile, and hence of all updates to that datafile that occurred since the last time the file was backed up. Before media recovery is invoked, backup copies of the damaged datafiles are restored. Media recovery then applies relevant portions of the redo log to roll-forward the datafile backups, making them current. Current implies a pre-failure state consistent with the rest of the database Media recovery and instance recovery have in common the requirement to repair database integrity breakages A-C. However, media recovery and instance recovery differ with respect to requirements 1-5. The requirements for media recovery are as follows: 1. Media recovery must accomplish the repair using restored backups of damaged datafiles. 2. Media recovery can use archived logs as well as the online logs. 3. Invocation of media recovery is explicit, by operator com- mand. 4. Detection of media failure (i.e. the need to restore a backup) is not automatic.Once a backup has been restored however, detection of the need to recover it via media recovery is auto- matic. 5. The duration of the roll-forward phase of media recovery is governed solely by user policy (e.g. frequency of backups, parallel recovery parameters) rather than by RDBMS internal mechanisms. 2 Fundamental Data Structures 2.1 Controlfile The controlfile contains records that describe and keep state information about all the other files of the database. The controlfile contains the following categories of records: 7 Database Info Record (1) 7 Datafile Records (1 per datafile) 7 Thread Records (1 per thread) 7 Logfile Records (1 per logfile) 7 Filename Records (1 per datafile or logfile group member) 7 Log-History Records (1 per completed logfile) Fields of the controlfile records referenced in the remainder of this document are listed below, together with the number(s) of the section(s) describing their use: 2.1.1 Database Info Record (Controlfile) 7 resetlogs timestamp: 8.2 7 resetlogs SCN: 8.2 7 enabled thread bitvec: 8.3 7 force archiving SCN: 3.8 7 database checkpoint thread (thread record index): 2.13, 3.10 2.1.2 Datafile Record (Controlfile) 7 checkpoint SCN: 2.14, 3.4 7 checkpoint counter: 2.16, 5.3, 6.2 7 stop SCN: 2.15, 6.5, 6.10, 6.13 7 offline range (offline-start SCN, offline-end checkpoint): 2.18 7 online flag 7 read-enabled, write-enabled flags (1-1: read/write, 1-0: read- only) 7 filename record index 2.1.3 Thread Record (Controlfile) 7 thread checkpoint structure: 2.12, 3.4, 8.3 7 thread-open flag: 3.9, 3.11, 8.3 7 current log (logfile record index) 7 head and tail (logfile record indices) of list of logfiles in thread: 2.8 2.1.4 Logfile Record (Controlfile) 7 log sequence number: 2.7 7 thread number: 8.4 7 next and previous (logfile record indices) of list of logfiles in thread: 2.8 7 count of files in group: 2.8 7 low SCN: 2.7 7 next SCN: 2.7 7 head and tail (filename record indices) of list of filenames in group: 2.8 7 "being cleared" flag: 10.3 7 "archiving not needed" flag: 10.3 2.1.5 Filename Record (Controlfile) 7 filename 7 filetype 7 next and previous (filename record indices) of list of filenames in group: 2.8 2.1.6 Log-History Record (Controlfile) 7 thread number: 2.11 7 log sequence number: 2.11 7 low SCN: 2.11 7 low SCN timestamp: 2.11 7 next SCN: 2.11 2.2 Datafile Header Fields of the datafile header referenced in the remainder of this document are listed below, together with the number(s) of the section(s) describing their use: 7 datafile checkpoint structure: 2.14 7 backup checkpoint structure: 4.1 7 checkpoint counter: 2.16, 3.4, 5.3, 6.2 7 resetlogs timestamp: 8.2 7 resetlogs SCN: 8.2 7 creation SCN: 8.1 7 online-fuzzy bit: 3.5, 6.7.1, 8.1 7 hotbackup-fuzzy bit: 4.1, 4.4, 6.7.1, 8.1 7 media-recovery-fuzzy bit: 6.7.1, 8.1 2.3 Logfile Header Fields of the logfile header referenced in the remainder of this document are listed below, together with the number(s) of the section(s) describing their use: 7 thread number: 2.7 7 sequence number: 2.7 7 low SCN: 2.7 7 next SCN: 2.7 7 end-of-thread flag: 6.10 7 resetlogs timestamp: 8.2 7 resetlogs SCN: 8.2 2.4 Change Vector A change vector describes a single change to a single datablock. It has a header that gives the Data Block Address(DBA) of the block, the incarnation number, the sequence number, and the operation. After the header is information that depends on the operation. The incarnation number and sequence number are copied from the block header when the change vector is constructed. When a block is made "new," the incarnation number is set to a value that is greater than its previous incarnation number and the sequence number is set to one. The sequence number on the block is incremented after every change is applied. 2.5 Redo Record A redo record is a group of change vectors describing a single atomic change to the database. For example, a transaction's first redo record might group a change vector for the transaction table (rollback segment header), a change vector for the undo block (rollback segment), and a change vector for the datablock. A transaction can generate multiple redo records. The grouping of change vectors into a redo record allows multiple database blocks to be changed so that either all changes occur or no changes occur, despite arbitrary intervening failures. This atomicity guarantee is one of the fundamental jobs of the cache layer. Recovery preserves redo record atomicity across failures. 2.6 System Change Number (SCN) An SCN defines a committed version of the database. A query reports the contents of the database as it looked at some specific SCN. An SCN is allocated and saved in the header of a redo record that commits a transaction. An SCN may also be saved in a record when it is necessary to mark the redo as being allocated after a specific SCN. SCN's are also allocated and stored in other data structures such as the controlfile or datafile headers. An SCN is at least 48 bits long. Thus they can be allocated at a rate of 16,384 SCN's per second for over 534 years without running out of them. We will run out of SCN's in June, 2522 AD (we use 31 day months for time stamps). 2.7 Redo Logs All changes to database blocks are made by constructing a redo record for the change, saving this record in a redo log, then applying the change vectors to the datablocks. Recovery is the process of applying redo to old versions of datablocks to make them current. This is necessary when the current version has been lost. When a redo log becomes full it is closed and a log switch occurs. Each log is identified by its thread number (see below), sequence number (within thread), and the range of SCN's spanned by its redo records. This information is stored in the thread number, sequence number, low SCN, and next SCN fields of the logfile header. The redo records in a log are ordered by SCN. Moreover, redo records containing change vectors for a given block occur in increasing SCN order across threads (case of Parallel Server). Only some records have SCN's in their header, but every record is applied after the allocation of the SCN appearing with or before it in the log. The header of the log contains the low SCN and the next SCN. The low SCN is the SCN associated with the first redo record (unless there is an SCN in its header). The next SCN is the low SCN of the log with the next higher sequence number for the same thread. The current log of an enabled thread has an infinite next SCN, since there is no log with a higher sequence number. 2.8 Thread of Redo The redo generated by an instance - by each instance in the Parallel Server case - is called a thread of redo. A thread is comprised of an online portion and (in ARCHIVELOG mode) an archived portion. The online portion of a thread is comprised of two or more online logfile groups. Each group is comprised of one or more replicated members. The set of members in a group is referred to variously as a logfile group, group, redo log, online log, or simply log. A redo log contains only redo generated by one thread. Log sequence numbers are independently allocated for each thread. Each thread switches logs independently. For each logfile, there is a controlfile record that describes it. The index of a log's controlfile record is referred to as its log number. Note that log numbers are equivalent to log group numbers, and are globally unique (across all threads). The list of a thread's logfile records is anchored in the thread record (i.e. via head and tail logfile record indices), and linked through the logfile records, each of which stores the thread number. The logfile record also has fields identifying the number of group members, as well as the head and tail (i.e. filename record indices) of the list (linked through filename records) of filenames in the group. 2.9 Redo Byte Address (RBA) An RBA points to a specific location in a particular redo thread. It is ten bytes long and has three components: log sequence number, block number within log, and byte number within block. 2.10 Checkpoint Structure The checkpoint structure is a data structure that defines a point in all the redo ever generated for a database. Checkpoint structures are stored in datafile headers and in the per-thread records of the controlfile. They are used by recovery to know where to start reading the log thread(s) for redo application. The key fields of the checkpoint structure are the checkpoint SCN and the enabled thread bitvec. The checkpoint SCN effectively demarcates a specific location in each enabled thread (for a definition of enabled see 3.11). For each thread, this location is where redo was being generated at some point in time within the resolution of one commit. The redo record headers in the log can be scanned to find the first redo record that was allocated at the checkpoint SCN or higher. The enabled thread bitvec is a mask defining which threads were enabled at the time the checkpoint SCN was allocated. Note that a bit is set for each thread that was enabled, regardless of whether it was open or closed. Every thread that was enabled has a redo log that contains the checkpoint SCN. A log containing this SCN is guaranteed to exist (either online or archived). The checkpoint structure also stores the time that the checkpoint SCN was allocated. This timestamp is only used to print a message to aid a person looking for a log. In addition, the checkpoint structure stores the number of the thread that allocated the checkpoint SCN and the current RBA in that thread when the checkpoint SCN was allocated. Having an explicitly-stored thread RBA (as opposed to only having the checkpoint SCN as an implicit thread location "pointer") makes the log sequence number (part of the RBA) and archived log name readily available for the single-instance (i.e. single-thread, non Parallel Server) case. A checkpoint structure for a port that supports up to 1023 threads of redo is 150 bytes long. A VMS checkpoint is 30 bytes and supports up to 63 threads of redo. 2.11 Log History The controlfile can be configured (using the MAXLOGHISTORY clause of the CREATE DATABASE or CREATE CONTROLFILE command) to contain a history record for every logfile that is completed. Log history records are small (24 bytes on VMS). They are overwritten in a circular fashion so that the oldest information is lost. For each logfile, the log-history controlfile record contains the thread number, log sequence number, low SCN, low SCN timestamp, and next SCN (i.e. low SCN of the next log in sequence). The purpose of the log history is to reconstruct archived logfile names from an SCN and thread number. Since a log sequence number is contained in the checkpoint structure (part of the RBA), single thread (i.e. non-Parallel Server) databases do not need log history to construct archived log names. The fields of the log history records are viewable via the V$LOG_HISTORY "fixed-view" (see Section 9 for a description of the recovery-related "fixed-views"). Additionally, V$RECOVERY_LOG, which displays information about archived logs needed to complete media recovery, is derived from information in the log history records. Although log history is not strictly needed for easy administration of single-instance (non- Parallel Server) databases, enabling use of V$LOG_HISTORY and V$RECOVERY_LOG might be a reason to configure it. 2.12 Thread Checkpoint Structure Each enabled thread's controlfile record contains a checkpoint structure called the thread checkpoint. The SCN field in this structure is known as the thread checkpoint SCN. The thread number and RBA fields in this structure refer to the associated thread. The thread checkpoint structure is updated each time an instance checkpoints its thread (see 3.4). During such thread checkpoint events, the instance associated with the thread writes to disk in the online datafiles all dirty buffers modified by redo generated before the thread checkpoint SCN. A thread checkpoint event guarantees that all pre-thread- checkpoint-SCN redo generated in that thread for all online datafiles has been written to disk. (Note that if the thread is closed, then there is no redo beyond the thread checkpoint SCN; i.e. the RBA points just past the last redo record in the current log.) It is the job of instance recovery to ensure that all of the thread's redo for all online datafiles is applied. Because of the guarantee that all of the thread's redo prior to the thread checkpoint SCN has already been applied, instance recovery can make the guarantee that, by starting redo application at the thread checkpoint SCN, and continuing through end-of-thread, all of the thread's redo will have been applied. 2.13 Database Checkpoint Structure The database checkpoint structure is the thread checkpoint of the thread that has the lowest checkpoint SCN of all the open threads. The number of the database checkpoint thread - the number of the thread whose thread checkpoint is the current database checkpoint - is recorded in the database info record of the controlfile. If there are no open threads, then the database checkpoint is the thread checkpoint that contains the highest checkpoint SCN of all the enabled threads. Since each instance guarantees that all redo generated before its own thread checkpoint SCN has been written, and since the database checkpoint SCN is the lowest of the thread checkpoint SCNs, it follows that all pre-database-checkpoint-SCN redo in all instances has been written to all online datafiles. Thus, all pre-database-checkpoint-SCN redo generated in all threads for all online datafiles is guaranteed to be in the files on disk already. This is described by saying that the online datafiles are checkpointed at the database checkpoint. This is the rationale for using the database checkpoint to update the online datafile checkpoints (see below) when an instance checkpoints its thread (see 3.4). 2.14 Datafile Checkpoint Structure The header of each datafile contains a checkpoint structure known as the datafile checkpoint. The SCN field in this structure is known as the datafile checkpoint SCN. All pre-checkpoint-SCN redo generated in all threads for a given datafile is guaranteed to be in the file on disk already. An online datafile has its checkpoint SCN replicated in its controlfile record. Note: Oracle's recovery layer code is designed to "tolerate" a discrepancy in checkpoint SCN between the file header and the controlfile record. These values could get out of sync should an instance failure occur between the time the file header was updated and the time the controlfile "transaction" committed. (Note: A controlfile "transaction" is an RDBMS internal mechanism, independent of the Oracle transaction layer, that allows an arbitrarily large update to the controlfile to be "committed" atomically.) The execution of a datafile checkpoint (see 3.6) for a given datafile updates the checkpoint structure in the file header, and guarantees that all pre-checkpoint-SCN redo generated in all threads for that datafile is on disk already. A thread checkpoint event (see 3.4) guarantees that all pre- database-checkpoint-SCN redo generated in all threads for all online datafiles has been written to disk. The execution of a thread checkpoint may advance the database checkpoint (e.g. in the single-instance case; or if the thread having the oldest checkpoint changed from being the current thread to another thread). If the database checkpoint does advance, then the new database checkpoint is used to update the datafile checkpoints of all the online datafiles (except those in hot backup: see Section 4). It is the job of media recovery (see Section 6) to ensure that all redo for a recovery-datafile (i.e. a datafile being media-recovered) generated in any thread through the recovery end-point is applied. Because of the guarantee that all recovery-datafile-redo generated in any enabled thread prior to that datafile's checkpoint SCN has already been applied, media recovery can make the guarantee that, by starting redo application in each enabled thread with the datafile checkpoint SCN and continuing through the recovery end-point (e.g. end-of-thread on all threads in the case of complete media recovery), all redo for the recovery-datafile from all threads will have been applied. Since the datafile checkpoint is stored in the header of the datafile itself, it is also present in backup copies of the datafile. It is the job of hot backup (see Section 4) to ensure that - despite the occurrence of ongoing updates to the datafile during the backup copy operation - the version of the datafile's checkpoint captured in the backup copy satisfies the checkpoint-SCN guarantee with respect to the versions of the datafile's datablocks captured in the backup copy. 2.15 Stop SCN Each datafile's controlfile record has a field called the stop SCN. If the file is offline or read-only, the stop SCN is the SCN beyond which no further redo exists for that datafile. If the file is online and any instance has the database open, the stop SCN is set to "infinity." The stop SCN is used during media recovery to determine when redo application for a particular datafile can stop. This ensures that media recovery will terminate when recovering an offline file while the database is open. The stop SCN is set whenever a datafile is taken offline or set read- only. This is true whether the offline was "immediate" (due to an I/ O error, or due to taking the file's tablespace offline "immediate"), "temporary" (due to taking the file's tablespace offline "temporary"), or "normal" (due to taking the file's tablespace offline "normal"). However, in the case of a datafile taken offline "immediate," there is no file checkpoint (see 3.6), and dirty buffers are discarded. Hence, media recovery may need to apply redo from before the stop SCN in order to bring the datafile online. However, media recovery does not need to look for redo after the stop SCN, since it does not exist. If the stop SCN is equal to the datafile checkpoint SCN, then the file does not need recovery. 2.16 Checkpoint Counter There is a checkpoint counter kept in both the datafile header and in the datafile's controlfile record. Its purpose is to allow detection of the fact that a datafile or controlfile is a restored backup. The checkpoint counter is incremented every time checkpoints of online files are being advanced (e.g. by thread checkpoint). Thus the datafile's checkpoint counter is incremented even though the datafile's checkpoint is not being advanced because the file is in hot backup (see Section 4), or because its checkpoint SCN is already beyond that of the intended checkpoint (e.g. the file is new or has undergone a recent datafile checkpoint). The old value of the checkpoint counter - matching the checkpoint counter in the datafile's controlfile record - is also remembered in the file header. It is usually one less than the current counter in the header, but may differ from the current counter by more than one if the previous file header update failed after the header was written but before the controlfile "transaction" committed. A mismatch in checkpoint counters between the datafile header and the datafile's controlfile record is used to detect when a backup datafile (or a backup controlfile) has been restored. 2.17 Tablespace-Clean-Stop SCN TS$, a data dictionary table that describes tablespaces, has a column called the tablespace-clean-stop-SCN. It identifies an SCN at which a tablespace was taken offline or set read-only "cleanly": i.e. after checkpointing its datafiles (see 3.6). The SCN at which the datafiles are checkpointed is recorded in TS$ as the tablespace-clean-stop SCN. It allows such a "clean-stopped" tablespace to survive (i.e. not need to be dropped after) a RESETLOGS open (see 8.6). During media recovery, prior to resetlogs, the "clean-stopped" tablespace would be set offline. After resetlogs, the tablespace - which needs no recovery - is permitted to be brought online and/or set read-write. (An immediate backup of the tablespace is recommended). The tablespace-clean-stop SCN is set to zero (after being set momentarily to "infinity" during datafile state transition) when bringing an offline-clean tablespace online, or setting a read-only tablespace read-write. The tablespace-clean-stop SCN is also zeroed when taking a tablespace offline "immediate" or "temporary." A tablespace that has a non-zero tablespace-clean-stop SCN in TS$ is clean at that SCN: the tablespace currently contains all redo up through that SCN, and no redo for the tablespace beyond that SCN exists. If the tablespace's datafiles are still in the state they had when the tablespace was taken offline "normal" or set read-only - i.e. they are not restored backups, are not fuzzy, and are checkpointed at the clean-stop SCN - then the tablespace can be brought online without recovery. Note that the semantics of the tablespace-clean-stop SCN differ from those of a constituent datafile's stop SCN in the datafile's controlfile record. The controlfile stop SCN designates an SCN beyond which no redo for the datafile exists. This does not imply that the datafile currently contains all redo up through that SCN. The tablespace-clean-stop SCN is stored in TS$ rather than in the controlfile so that it is covered by redo and will finish in the correct state - i.e. reflecting the correct online/offline state of the tablespace - following an incomplete recovery (see 6.12). Its value will not be lost if a backup controlfile is restored, or if a new controlfile is created. Furthermore, the presence of the tablespace- clean-stop SCN in TS$ allows an offline normal (or read-only) tablespace to survive (not need to be dropped after) a RESETLOGS open, since it is known that no redo application is needed to bring it online (see 8.6 for more detail). Thus, for example, an offline normal (or read-only) tablespace that was offline during an incomplete recovery can be brought online (or set read-write) subsequent to a RESETLOGS open. Without the tablespace-clean-stop SCN, there would be no way of knowing that the tablespace does not need recovery using redo that was discarded by the resetlogs. The only alternative would have been to force the tablespace to be dropped. 2.18 Datafile Offline Range The offline-start SCN and offline-end checkpoint fields of the controlfile datafile record describe the offline range. If valid, they delimit a log range guaranteed not to contain any redo for the datafile. Thus, media recovery can skip this log range when recovering the datafile, obviating the need to access old archived log data (which may be uavailable or unusable due to resetlogs: see Section 7). This optimization aids in recovering a datafile that is presently online (or read-write), but that was offline-clean (or read- only) for a long time, and whose last backup dates from that time. For example, this would be the case if, after a RESETLOGS open, an offline normal (or read-only) tablespace had been brought online (or set read-write), but not yet backed up. When a datafile transitions from offline-clean to online (or from read-only to read-write), the offline range is set as follows: The offline-start SCN is set from the tablespace-clean-stop SCN saved when setting the file offline (or read-only). The offline-end checkpoint is set from the file checkpoint taken when setting the file online (or read-write). 3 Redo Generation Redo is generated to describe all changes made to database blocks. This section describes the various operations that occur while the database is open and generating redo. 3.1 Atomic Changes The most fundamental operation is to atomically change a set of datablocks. A foreground process intending to change one or more datablocks first acquires exclusive access to cache buffers containing those blocks. It then constructs the change vectors describing the changes. Space is allocated in the redo log buffer to hold the redo record. The redo log buffer - the buffer from which LGWR writes the redo log - is located in the SGA (System Global Area). It may be necessary to ask LGWR to write the buffer to the redo log in order to make space. If the log is full, LGWR may need to do a log switch in order to make the space available. Note that allocating space in the redo buffer also allocates space in the logfile. Thus, even though the redo buffer has been written, it may not be possible to allocate redo log space. After the space is allocated, the foreground process builds the redo record in the redo buffer. Only after the redo record has been built in the redo buffer may the datablock buffers be changed. Writing the redo to disk is the real change to the database. Recovery ensures that all changes that make it into the redo log make it into the datablocks (except in the case of incomplete recovery). 3.2 Write-Ahead Log Write-ahead log is a cache-enforced protocol governing the order in which dirty datablock buffers are written vs. when the redo log buffer is written. According to write-ahead log protocol, before DBWR can write out a cache buffer containing a modified datablock, LGWR must write out the redo log buffer containing redo records describing changes to that datablock. Note that write-ahead log is independent of log-force-at-commit (see 3.3). Note also that write-ahead log protocol only applies to datafile writes that originate from the buffer cache. In particular, write- ahead log does not apply to so-called direct path writes (e.g. originating from direct path load, table create via subquery, or index create). Direct path writes (targeted above the segment high- water mark) originate not as writes out of the buffer cache, but as bulk-writes out of the foreground process' data space. Indeed, correct handling of direct path writes by media recovery dictates a write-behind-log protocol. (The basic reason is that, because the bulk-writes do not go through the buffer cache, there is no mechanism to guarantee their completion at checkpoint). One guarantee made by write-ahead log protocol is that there are no changes in the datafiles that are not in the redo log, regardless of intervening failure. This is what enables recovery to preserve the guarantee of redo record atomicity despite intervening failure. Another guarantee made by write-ahead log protocol is that no datablock change can be written to disk without first writing to the redo log sufficient information to enable the change to be undone should the transaction fail to commit. That undo-enabling information is written to the redo log in the form of "redo" for the rollback segment. Write-ahead log protocol plays a key role in enabling the transaction layer to preserve the guarantee of transaction atomicity despite intervening failure. 3.3 Transaction Commit Transaction commit allocates an SCN and builds a commit redo record containing that SCN. The commit is complete when all of the transaction's redo (including commit redo record) is on disk in the log. Thus, commit forces the redo log to disk - at least up to and including the transaction's commit record. This is termed log- force-at-commit. Recovery is designed such that it is sufficient to write only the redo log at commit time - rather than all datablocks changed by the transaction - in order to guarantee transaction durability despite intervening failure. This is termed no-datablock-force-at-commit. 3.4 Thread Checkpoint A thread checkpoint event, executed by the instance associated with the redo thread being checkpointed, forces to disk all dirty buffers in that instance that contain changes to any online datafile before a designated SCN - the thread checkpoint SCN. Once all redo in the thread prior to the checkpoint SCN has been written to disk, the thread checkpoint structure in the thread's controlfile record is updated in a controlfile transaction. When a thread checkpoint begins, an SCN is captured and a checkpoint structure is initialized. Then all the dirty buffers in the instance's cache are marked for checkpointing. DBWR proceeds to write out the marked buffers in a staged manner. Once all the marked buffers have been written, the SCN in the checkpoint structure is set to the captured SCN, and the thread checkpoint structure in the thread's controlfile record is updated in a controlfile transaction. A thread checkpoint might or might not advance the database checkpoint. If only one thread is open, the new checkpoint is the new database checkpoint. If multiple threads are open, the database checkpoint will advance if the local thread is the current database checkpoint. Since the new checkpoint SCN was allocated recently, it is most likely greater than the thread checkpoint SCN in some other open thread. If it advances, the database checkpoint becomes the new lowest-SCN open thread checkpoint. If the old checkpoint SCN for the local thread was higher than the current checkpoint SCN of some other open thread, then the database checkpoint does not change. If the database checkpoint is advanced, then the checkpoint counter is advanced in every online datafile header. Furthermore, for each online datafile that is not in hot backup (see Section 4), and not already checkpointed at a higher SCN (e.g. as would be the case for a recently added or recovered file), the datafile header checkpoint is advanced to the new database checkpoint, and the file header is written to disk. Also, the checkpoint SCN in the datafile's controlfile record is advanced to the new database checkpoint SCN. 3.5 Online-Fuzzy Bit Note that more changes - beyond those already in the marked buffers - may be generated after the start of checkpoint. Such changes would be generated at SCNs higher than the SCN that will be recorded in the file header. They could either be changes to marked buffers that were added since checkpoint start, or else changes to unmarked buffers. Buffers containing these changes could written out for a variety of reasons. Thus, the online files are online-fuzzy; that is, they generally contain changes in the future of (i.e. generated at higher SCNs than) their header checkpoint SCN. A datafile is virtually always online-fuzzy while it is online and the database is open. Online-fuzzy state is indicated by setting the so-called online-fuzzy bit in the datafile header. The online-fuzzy bits of all online datafiles are set at database open time. Also, when a datafile is brought online while the database is open, its online-fuzzy bit is set. The online-fuzzy bits are cleared after the last instance does a shutdown "normal" or "immediate." Other occasions for clearing the online-fuzzy bits are: (i) the finish of crash recovery; (ii) when media recovery "checkpoints" (flushes its buffers) after encountering an end-crash-recovery redo record (see 5.5); (iii) when taking a datafile offline "temporary" or "normal" (i.e. an offline operation that is preceded by a file checkpoint); (iv) when BEGIN BACKUP is issued (see 4.1). As will be seen in 8.1, open with resetlogs will fail if any online datafile has the online-fuzzy bit (or any fuzzy bit) set. 3.6 Datafile Checkpoint A datafile checkpoint event, executed by all open instances (for all open threads), forces to disk all dirty buffers in any instance that contain changes to a particular datafile (or set of datafiles) before a designated SCN - the datafile checkpoint SCN. Once all datafile- related redo from all open threads prior to the checkpoint SCN has been written to disk, the datafile checkpoint structure in the file header is updated and written to disk. Datafile checkpoints occur as part of operations such as beginning hot backup (see Section 4) and offlining datafiles as part of taking a tablespace offline normal (see 2.17). 3.7 Log Switch When an instance needs to generate more redo but cannot allocate enough blocks in the current log, it does a log switch. The first step in a log switch is to find an online log that is a candidate for reuse. The first requirement for the candidate log is that it must not be active: i.e. it must not be needed for crash/instance recovery. In other words, it must be overwritable without losing redo data needed for instance recovery. The principle enforced is that a logfile cannot be reused until the current thread checkpoint is beyond that logfile. Since instance recovery starts at the current thread checkpoint SCN/RBA (and expects to find that RBA in an online redo log), the ability to do instance recovery using only online logs translates into the requirement that the current thread checkpoint SCN be beyond the highest SCN associated with redo in the candidate log. If this is not the case, then the thread checkpoint currently in progress - e.g. the one started when the candidate log was originally switched into (see below) - is hurried up to complete. The other requirement for the candidate log is that it does not need archiving. Of course, this requirement only applies to a database running in ARCHIVELOG mode. If archiving is required, the archiver is posted. As soon as the log switch completes, a new thread checkpoint is started in the new log. Hopefully, the checkpoint will complete before the next log switch is needed. 3.8 Archiving Log Switches Each thread switches logs independently. Thus, when running Parallel Server, an SCN is almost never at the beginning of a log in all threads. However, it is desirable to have roughly the same range of SCNs in the archived logs of all enabled threads. This ensures that the last log archived in each thread is reasonably current. If an unarchived log for an enabled thread contained a very old SCN (as would occur in the case of a relatively idle instance), it would not be possible to use archived logs from a primary site to do recovery to a higher SCN at a standby site. This would be true even if the log with the low SCN contained no redo. This problem is solved by forcing log switches in other threads when their current log is significantly behind the log just archived. For the case of an open thread, a lock is used to "kick" the laggard instance into switching logs and archiving when it can. For the case of a closed thread, the archiving process in the active instance does the closed thread's log switch and archiving for it. Note that this can result in a thread that is enabled but never used having a bunch of archived logs with only a file header. A force archiving SCN is maintained in the database info controlfile record to implement this feature. The system strives to archive any log that contains that SCN or less. In general, the log with the lowest SCN is archived first. The command ALTER SYSTEM ARCHIVE LOG CURRENT can be used to manually archive the current logs of all enabled threads. It forces all threads, open and closed, to switch to a new log. It archives what is necessary to ensure all the old logs are archived. It does not return until all redo generated before the command was entered is archived. This command is useful for ensuring all redo logs necessary for the recovery of a hot backup are archived. It is also useful for ensuring the potential currency of a standby site in a configuration in which archived logs from a primary site are shipped to a standby site for application by recovery in case of disaster (i.e. "standby database"). 3.9 Thread Open When an instance opens the database, it needs to open a thread for redo generation. The thread is chosen at mount time. A system initialization parameter can be used to specify the thread to mount by number. Otherwise, any available publicly-enabled thread can be chosen by the instance at mount time. A thread-mounted lock is used to prevent two instances from mounting the same thread. When an instance opens a thread, it sets the thread-open flag in the thread's controlfile record. While the instance is alive, it holds a set of thread-opened locks (one held by each of LGWR, DBWR, LCK0, LCK1, ...). (These are released at instance death, enabling one instance to detect the death of another in the Parallel Server environment: see 5.1). Also at thread open time, a new checkpoint is captured and used for the thread checkpoint. If this is the first database open, this becomes the new database checkpoint, ensuring all online files have their header checkpoints advanced at open time. Note that a log switch may be forced at thread open time. 3.10 Thread Close When an instance closes the database, or when a thread is recovered by instance/crash recovery, the thread is closed. The first step in closing a thread is to ensure that no more redo is generated in it. The next step is to ensure that all changes described by existing redo records are in the online datafiles on disk. In the case of normal database close, this is accomplished by doing a thread checkpoint. The SCN from this final thread checkpoint is said to be the "SCN at which the thread was closed." Finally, the thread's controlfile record is updated to clear the thread-open flag. In the case of thread close by instance recovery, the presence in the online datafiles of all changes described by thread redo records is ensured by starting redo application at the most recent thread checkpoint and continuing through end-of-thread. Once all changes described by thread redo records are in the online datafiles, the thread checkpoint is advanced to the end-of-thread. Just as in the case of a normal thread checkpoint, this checkpoint may advance the database checkpoint. If this is the last thread close, the database checkpoint thread field in the database info controlfile record - which normally points to an open thread - will be left pointing at this thread, even though it is closed. 3.11 Thread Enable In order for a thread to be opened, it must be enabled. This ensures that its redo will be found during media recovery. A thread may be enabled in either public or private mode. A private thread can only be mounted by an instance that specifies it in the THREAD system initialization parameter. This is analogous to rollback segments. A thread must have at least two online redo log groups while it is enabled. An enabled thread always has one online log that is its current log. The next SCN of the current log is infinite, so that any new SCN allocated will be within the current log. A special thread- enable redo record is written in the thread of an instance enabling a new thread (i.e. via ALTER DATABASE ENABLE THREAD). The thread-enable redo record is used by media recovery to start applying redo from the new thread. Note that this means it takes an open thread to enable another thread. This chicken and egg problem is resolved by having thread one automatically enabled publicly at database creation. This also means that databases that do not run in Parallel Server mode do not need to enable a thread. 3.12 Thread Disable If a thread is not going to be used for a long while, it is best to disable it. This means that media recovery will not expect any redo to be found in the thread. Once a thread is disabled, its logs may be dropped. A thread must be closed before it can be disabled. This ensures all its changes have been written to the datafiles. A new SCN is allocated to save as the next SCN for the current log. The log header is marked with this SCN and flags saying it is the end of a disabled thread. It is important that a new current SCN is allocated. This ensures the SCN in any checkpoint with this thread enabled will appear in one of the logs from the thread. Note that this means a thread must be open in order to disable another thread. Thus, it is not possible to disable all threads. 4 Hot Backup A hot backup is a copy of a datafile that is taken while the file is in active use. Datafile writes (by DBWR) go on as usual during the time the backup is being copied. Thus, the backup gets a "fuzzy" copy of the datafile: 7 Some blocks may be ahead in time versus other blocks of the copy. 7 Some blocks of the copy may be ahead of the checkpoint SCN in the file header of the copy. 7 Some blocks may contain updates that constitute breakage of the redo record atomicity guarantee with respect to other blocks in this or other datafiles. 7 Some block copies may be "fractured" (due to front and back halves being copied at different times, with an intervening update to the block on disk). The "hotbackup-fuzzy" copy is unusable without "focusing" (via the redo log) that occurs when the backup is restored and undergoes media recovery. Media recovery applies redo (from all threads) from the begin-backup checkpoint SCN (see Step 2. in Section 4.1) through the end-point of the recovery operation (either complete or incomplete). The result is a transaction-consistent "focused" version of the datafile. There are three steps to taking a hot backup: 7 Execute the ALTER TABLESPACE ... BEGIN BACKUP command. 7 Use an operating system copy utility to copy the constituent datafiles of the tablespace(s). 7 Execute the ALTER TABLESPACE ... END BACKUP com- mand. 4.1 BEGIN BACKUP The BEGIN BACKUP command takes the following actions (not necessarily in the listed order) for each datafile of the tablespace: 1. It sets a flag in the datafile header - the hotbackup-fuzzy bit - to indicate that the file is in hot backup. The header with this flag set (copied by the copy utility) enables the copy to be recognized as a hot backup. A further purpose of this flag in the online file header is to cause the checkpoint in the file header to be "frozen" at the begin-backup checkpoint value that will be set in Step 4. This is the value that it must have in the backup copy in order to ensure that, when the backup is recovered, media recovery will start redo application at a suffi- ciently early checkpoint SCN so as to cover all changes to the file in all threads since the execution of BEGIN BACKUP (see 6.5). Since we cannot guarantee that the file header will be the first block to be written out by the copy utility, it is important that the file header checkpoint structure remain "frozen" until END BACKUP time. This flag keeps the datafile checkpoint structure "frozen" during hot backup, preventing it (and the checkpoint SCN in the datafile's controlfile record) from being updated during thread checkpoint events that advance the database checkpoint. New in v7.2: While the file is in hot backup, a new "backup" checkpoint structure in the datafile header receives the updates that the "frozen" checkpoint would have received. 2. It executes a datafile checkpoint, capturing the resultant "begin-backup" checkpoint information, including the begin- backup checkpoint SCN. When the file is checkpointed, all instances are requested to write out all dirty buffers they have for the file. If the need for instance recovery is detected at this time, the file checkpoint operation waits until it is completed before proceeding. Checkpointing the file at begin-backup time ensures that only file blocks changed after begin-backup time might have been written to disk during the course of the file copy. This guarantee is crucial to enabling block before- image logging to cope with the fractured block problem, as described in Step 3. 3. [Platform-dependent option]: It starts block before-image log- ging for the file. During block before-image logging, all instances log a full block before-image to the redo log prior to the first change to each block of the file (since the backup started, or since the block was read anew into the buffer cache). This is to forestall a recovery problem that would arise if the backup were to contain a fractured block copy (mis- matched halves). This could happen if (the database block size is greater than the operating system block size, and) the front and back halves of the block were copied to the backup at dif- ferent times - with an intervening update to the block on disk. In this eventuality, recovery can reconstruct the block using the logged block before-image. 4. It sets the checkpoint in the file header equal to the begin- backup checkpoint captured in Step 2. This file header check- point will be "frozen" until END BACKUP is executed. 5. It clears the file's online-fuzzy bit. The online-fuzzy bit remains clear during the course of the file copy operation, thus ensuring a cleared online-fuzzy bit in the file copy. Note that the online-fuzzy bit is set again by the execution of END BACKUP. 4.2 File Copy The file copy is done by utilities that are not part of Oracle. The presumption is that the platform vendor will have backup facilities that are superior to any portable facility that we could develop. It is the responsibility of the administrator to ensure that copies are only taken between the BEGIN BACKUP and END BACKUP commands, or when the file is not in use. 4.3 END BACKUP The END BACKUP command takes the following actions for each datafile of the tablespace: 1. It restores (i.e. sets) the file's online-fuzzy bit. 2. It creates an end-backup redo record (end-backup "marker") for the datafile. This record, interpreted only by media recov- ery, contains the begin-backup checkpoint SCN (i.e. the SCN matching that in the "frozen" checkpoint in the backup's header). This record serves to mark the end of the redo gener- ated during the backup. The end-backup "marker" is used by media recovery to determine when all redo generated between BEGIN BACKUP and END BACKUP has been applied to the datafile. Upon encountering the end-backup "marker", media recovery can (at the next media recovery checkpoint: see 6.7.1) clear the hotbackup-fuzzy bit. This is only important in preventing an incomplete recovery that might erroneously attempt to end before all redo generated between BEGIN BACKUP and END BACKUP has been applied. Ending incomplete recovery at such a point may result in an inconsis- tent file, since the backup copy may already have contained changes beyond this endpoint. As will be seen on 8.1, open with resetlogs following incomplete media recovery will fail if any online datafile has the hotbackup-fuzzy bit (or any other fuzzy bit) set. 3. It clears the file's hotbackup-fuzzy bit. 4. It stops block before-image logging for the file. 5. It advances the file checkpoint to the current database check- point. This compensates for any file header update(s) missed during thread checkpoints that may have advanced the data- base checkpoint while the file was in hot backup state, with its checkpoint "frozen".