Subject:            SQL*DBA Monitor, A Powerful Tool for the DBA
Last Revision Date: 11 September 1994
Author:             Moe Fardoost                                                                     
Abstract


Through the use of the Monitor utility, much of database and some of the
machine's behavior can be supervised and therefore tuned. This kind of
supervision can be performed at global, process or application level.
Through Monitor, the user can access both cumulative and instantaneous
statistics, IO and locking information. All this plus the base views
are made available to the user. Through the use of the base views,
users can define queries which match their specific tuning criteria or
areas of interest.

Theory

The monitor utility is an interactive and real time utility. It displays
statistical and other information regarding a running V6 Oracle database.
Its real time functionality is implemented using a set of dynamic views.
These views are identified with prefix V_$. There are public synonyms
created on these views. They are identified with prefix V$. Users should
use the V$ objects as opposed to the V_$ objects. The  base objects on
which the V_$ views were created are identified with prefix X$. These are
database data structures maintained by the Oracle kernel. The X$ objects
are also referred to as the X$ virtual tables.

Due to the dynamic nature of some of these virtual tables, they are defined
when the database is started. Through the use of V$ objects one can access
much of the data  stored in the base virtual tables. The X$ objects store
information about dynamic and static aspects of the database.

Methods of Invoking Monitor and System Privileges

In order to be able to invoke the Monitor utility, the user is required to
run the SQLDBA utility and connect to an Oracle account. To be able to
'connect internal' requires extra system privileges which are operating
system dependant. However, the user can connect to any valid username /
password. If this account has been granted the privilege to select from
the Monitor base views, the user can display the Monitor screens; otherwise
the SYS user must grant access to the user wishing to run Monitor. This is
achieved by running the monitor.sql script.

In order to provide real time functionality, most screens provide a cycling
facility using which the display is updated periodically. The length of the
cycle is determined by the user by issuing the SQLDBA command SET CYCLE nnn;
where nnn is in seconds.

The default window size is 80 characters wide and 24 lines high. It is
possible to use a wide screen, often useful when the number of processes
to be monitored is large ( e.g. greater than 20). Depending on the type of
display and the hardware, it is possible to make a SQLDBA window as wide as
180 characters and as high as 66 lines. Please refer to reference 3 for
details of how this can be done.

The options which can be dynamically monitored are; Processes, Users, Table
accesses, Locks, Latches, Statistics, File IO and Rollback segments.
Depending on the amount of display information, the option may provide a
finer degree of monitoring. Notably, Users, Table accesses, Locks, Processes,
statistics and file IO options provide per process, user or class monitoring.
For example:

SQLDBA> monitor st user 6

Due to the overhead involved in compiling information, some of the entries
for the Statistics option are turned off and can only be enabled by adding
the corresponding init.ora parameters in that file. These are as follows:

TIMED_STATISTICS enables the database to gather time related statistics
regarding various statistical variables. It is normally set to FALSE. It
can be enabled by setting it to TRUE.

DB_BLOCK_LRU_STATISTICS enables compilation of data about how the database
would behave if fewer DB_BLOCK_BUFFERS were used. This parameter should
only be set to TRUE during the monitoring period as it imposes extra
processing on the database.

DB_BLOCK_LRU_EXTENDED_STATISTICS enables compilation of data if more
DB_BLOCK_BUFFERS were used. This parameter can be set to an integer by
which the value of DB_BLOCK_BUFFERS is to be increased. This parameter
should only be set to a non-zero value during the monitoring period as it
imposes extra processing on the database.

The following sections describe the different options and their usage in
database tuning.

The Monitor Options and Database Tuning

File IO Option

This option enables the user to obtain disk activity information on a per
file basis. Most operating systems do not provide a direct way to  examine
IO to different devices interactively. Several tuning issues exist here.
An important tuning issue is sorting which may occur within sqlplus when
an index is generated or a select statement is executed. This kind of
sorting often results in creation of temporary objects. Inspecting the IO
on the file belonging to the temporary tablespace can give the user an
indication as whether there is excessive IO to that file and as a result
should the SORT_AREA_SIZE be increased  or different temporary tablespaces
be assigned to groups of users.

In general, the IO screen should be used to strike a balance in the IO
distribution for devices available to the database.

The read/second and write/second figures should be checked against the disk
and controller manufacturers figures to ensure that the hardware is capable
of handling these rates. As the database grows in size and usage the IO
rates may increase.

IO Option

This option provides the user with an IO characteristics of the database as
opposed to a per file representation as in the case of the File IO option.

The data is displayed in two formats. Rates and cumulative values. Processes
associated with the IO are displayed and a series of '=' characters display
the percentage of the actual IO for individual processes.

Tuning using this screen is divided into two categories. The reads and the
writes. The 'Interval' figures are not as informative as the 'Cumulative'
figures. Consider the %Logical Reads and the %Physical Reads for the latter.
It is fairly obvious that maximizing the ratio of Logical Reads to Total
reads would have a performance advantage. All logical figures identify access
through the System Global Area as opposed to disk access. Displaying this
option per process enables the user to observe the behavior of different
applications.

Dynamic space allocation can also result in excessive physical reads as
well as writes. If the base table for an  application has a large number
of extents, the data dictionary will be needed to perform recursive
calls to obtain new blocks in case of writes and the allocated extents
in case of reads. It would be helpful if user objects had few extents.

Excessive physical reads can also be caused by block chaining. If the
user object was created with poor storage parameters, it may result in
chaining during inserts or updates. Subsequent reads will require physical
reads to obtain the other parts of the chain.

There are several init.ora parameters which can have an impact on IO
related issues. These parameters are directly related to the size of the
SGA and the value of DB_BLOCK_BUFFERS. These parameters also affect the
way the DBWR (database writer) process behaves. The objective in DBWR
related IO tuning is to make the blocks required by processes exist in
the SGA and perform the writes more efficiently. Two important parameters
are DB_FILE_MULTIBLOCK_READ_COUNT and DB_FILE_WRITE_BATCH.
Applications that perform full table scans benefit from the former whereas
applications which perform global updates etc. benefit from the latter. If
the value of the DB_BLOCK_MAX_SCAN_CNT is set too low, it can also cause
excessive physical IO.

Note that the 'Hit Ratio' figures at the bottom of the display corresponds
to the ratio of (the cumulative logical reads - physical reads) to the
cumulative logical reads.

Latch Option

Latches are short lived locks used to protect the system's internal
resources. When a user process needs to access a resource, it acquires a
latch which is released after the operation has completed. The resource
( or data structure ) is displayed under the column 'Latch Name'. Some
processes are willing to wait for a lock, others are not. The implementation
is very much operating system dependant. The most important column on this
screen is the Timeouts. They are measured against the Total figures. The
ratio should be about 10% or less. Generally speaking, most of the latches
displayed here do not require tuning and the user is recommended to pay more
attention to IO and memory tuning.

The most commonly tuned latches are CACHE BUFFERS CHAINS, CACHE BUFFERS LRU
CHAIN,  REDO COPY and ROW CACHE OBJECTS.

The CACHE BUFFERS CHAINS latch gives an indication of waits when a user
process scans the SGA to access blocks. This latch  is associated with the
init.ora parameter DB_BLOCK_HASH_BUCKETS. Each bucket is a hash table
containing a list of blocks in the DB_BLOCK_BUFFERS. The value of
DB_BLOCK_HASH_BUCKETS should not be increased but increasing
DB_BLOCK_BUFFERS will result in an increase in this parameter.

The CACHE BUFFERS LRU CHAIN latch gives an indication of waits when a user
process tries to access the LRU chain containing all the used blocks in the
SGA. In order to reduce latch waits, the user is advised to increase the
size of DB_BLOCK_BUFFERS.

The REDO COPY latch gives an indication of waits when the user processes
attempts to write to the redo buffer. This wait can be reduced by either
'prebuilding' the redo information in the user  address space or enabling
simultaneous writes to the log buffer. The former is achieved by Increasing
the size of the LOG_ENTRY_PREBUILD_THRESHOLD parameter. Only redo pieces
smaller in size than the value of this parameter can be prebuilt. The latter
is achieved by increasing the LOG_SIMULTANEOUS_COPIES to about twice the
number of cpu's utilized by the system.

In general redo tuning is not very common and is rarely required.

ROW CACHE OBJECTS latch gives an indication of waits when user processes
attempt to access data dictionary in the cache. To reduce this contention,
the user can use V$ROWCACHE to see which parameters need increasing.
Reference 2 page 3-13 has a comprehensive example. The value of USAGE
should be less than COUNT for all parameters and at the same time, the
ratio of ( GETS - GETMISSES ) to GETS should be over 90%. The user should
be concerned only with those parameters which have large values for GETS.
These statistics should not be used right after the database is started.
Ideally, they should be used when the database is in its peak activity.

Lock Option

This option displays locks held by processes on user or data dictionary
tables. Valid lock types are as follows:

TD Locks are DDL locks placed on data dictionary or user objects in either
share or exclusive mode. The middle figure in the left most column is the
object identifier of the object.

TM Locks are DML locks placed on user objects. The middle figure in the
left most column is the object identifier of the user table.

TS locks are placed on temporary objects.

TX locks are transaction locks. The middle figure in the left most column
is the sequence number of the rollback segment used. If this number is
missing, the SYSTEM rollback segment is being used. The other number is
not useful to the user.

RW locks are row wait locks, used for processes waiting for a row level
lock. The generic format of the left most column in this  screen is
RWffbbbbbbttrrrrrr. Where:

ff represents the file id.
bbbbbb is the block id.
tt is the table sequence number (0 for non clustered).
rrrrrr is the row within the block.

All numbers are in hexadecimal format.

Lock Types:

SS is Row Share. SX is row exclusive. S is share. SSX is share row exclusive.
X is exclusive.

The locks screen is primarily used to observe how processes lock the
database resources. The user may be able to use this screen for application
tuning. Combination of certain locks may cause processes to wait (lower
case s,x,l,r or c). Often the user can tell at what stage a sql script is
by observing the locks it is holding. Therefore, this option can be used
to monitor the progress of sql applications or scripts.

Process Option

This screen enables the user to investigate the processes connected to
Oracle. This option relates Oracle information to the corresponding
Operating System information. Interactive processes have a terminal and
for all processes the executable is displayed. The user can use the System
pid column to obtain more information about the process from the operating
system. The Oracle pid is referenced in other Monitor screens.

Rollback Option

This option allows the user to investigate the rates of read and write for
 the segment header as well as other rollback blocks. The user processes
need to access the header of a rollback segment. This is needed to locate
the address of the rollback data within the rollback segment in order to
construct a consistent version of a data block. therefore 'waits' is only
quoted for the 'header' columns. In fact the two columns under 'header' are
the most important to the user. The ratio of waits to gets should be well
below 1%. As far as the 'Writes rate' is concerned, the user can apply the
same criteria as was described for File IO option. If the 'Waits/sec' is
large, then more rollback segments may be created to reduce this contention.
Beginning with version 6.0.33 of the database, the user has the option to
choose a specific rollback segment. By looking at this screen, the user
can choose a rollback segment which is inactive or one which would cause
the IO to the disks be more balanced.

Spool Option

Use of this option allows the user to obtain hard copies of the monitor
output.

Statistics Option

This screen is commonly used to identify the most important tuning needed.
Due to the volume of the data available to the user, it is easier to
display this screen for 'CLASSES'es of statistics. The valid 'CLASSES's
are as follows:

USER class provides statistical information about a user process. Although
a user can have more than one session, only the interactive session is
displayed and monitored. No information is compiled about the
recursive sessions. The most important 'Statistic Name' values are:

CURRENT OPENED CURSORS. This is the total number of opened cursors since
the screen was displayed. This information should be looked at on a per
process basis as the OPEN_CURSORS init.ora parameter is a process specific
parameter.

RECURSIVE CALLS. This statistic indicates how many times Oracle had to call
itself in order to complete the user's command. Large 'TOT' values indicate
that the data dictionary cache is not sufficiently large or that the user
objects have poor storage parameters or the database is experiencing
fragmentation. The user should be able to identify which is the driving
factor using the information described earlier.

BACKGROUND TIMEOUTS. This statistic may have non-zero values for the database
writer process. If a non zero value for 'dbwr_free_needed' (for the CACHE
class) accompanies a large value for this statistic for the user processes,
contact the Oracle Customer Support. DBWR process may display timeouts when
the database is inactive.

ENQUEUE class provides statistical information about processes which attempt
to obtain locks on database objects. The values of 'enqueue deadlocks' for
the 'TOT' column are the most useful statistics. A non-zero value indicates
that a deadlock occured and as a result a process was rolled back. The other
information on this screen can be ignored.

CACHE class provides statistical information about utilization of memory
allocated to the SGA. This option enables the user to perform significant
tuning of memory. The objective in using this class of statistics is to
reduce the number of times Oracle and user processes make physical IO
operations. This is achieved by keeping blocks in memory.

Cache hits and waits are useful statistics which can be calculated as follows:

Total number of block reads = db block gets + consistent gets + physical reads
Total number of blocks read within SGA = db block gets + consistent gets

                    db block gets + consistent gets
Hit Ratio = ------------------------------------------------ * 100
            db block gets + consistent gets + physical reads

This value should ideally be in the 90%'s. Another useful ratio is the waits
to total read ratio:

                              buffer busy waits
wait contention = -----------------------------------------
                       db block gets + consistent gets

A large ratio may mean that the user needs to increase the number of
rollback segments or if the rollback segments have acquired too many
extents ( > about 15 or 20) recreate them with better storage parameters.

The view V$WAITSTAT is the base view from which all the cache and buffer
wait information is extracted. The query on page 5-6 of reference 2
provides a useful tuning tool. The useful classes of operations reported in
this table are undo segment_header, undo block and data block. This query
enables the user to decide whether waiting is happening on data blocks or
undo blocks. Large waits for undo segment_header, mean there are not enough
rollback segments. The other are really related to caching blocks more
efficiently as described above.

The statistics 'free buffers inspected', 'free buffer requested',
'free buffer scans', 'dbwr free low' and 'dbwr free needed' parameters can
be used to identify if a user process is doing more than it should to find
a block in the SGA . They can also be used to identify how tuned the DBWR
process is in meeting the requests of the user processes. Typically if a
non-zero value for 'dbwr free needed' is seen, then the user should either
increase the value of DB_BLOCK_BUFFERS or tell the DBWR process to perform
more frequent checkpoints by reducing the size of CHECKPOINT_INTERVAL. The
latter will force DBWR to perform more IO. Excessive reduction in this
parameter can result in a negative performance impact. Increasing the value
of DB_BLOCK_WRITE_BATCH causes DBWR to clean more dirty blocks every time it
is signalled. The statistic 'dbwr free low' also indicates lack of block
availability and high values of it may require the above action. In general
this statistic is less severe than 'dbwr free needed'. The statistic
'free buffer scans' indicates the number of times, a process had to search
the LRU chain hash table. Large values indicate lack of block availability.
Some guidelines were provided in this respect for the Latch option. Also
the user can try enabling the DB_BLOCK_LRU_EXTENDED_STATISTICS parameter to
see what effect an increase in the parameter DB_BLOCK_BUFFERS will have.

REDO class provides statistical information about how redo information is
handled from the point of creation in user buffer area to the point it is
written to the redo log files. Some tuning issues were
discussed for the Latch option. The most important statistics are
'redo blocks written', 'redo entries linearized', 'redo small copies' and
'redo writes'. As it was discussed before, redo operations do not usually
require tuning but some guidance is included here.

Large values associated with any of the above statistics are meaningless
unless compared with the corresponding Redo Latch parameters. Poorly tuned
redo parameters often result in latches being held for longer periods and
hence resulting in poor performance. 'redo blocks written' is useful in
finding how much  redo is generated for an operation or time interval. It
may however not have much tuning use. A useful query will be given in a
later section.

'redo entries linearized' statistic is related to the
LOG_ENTRY_PREBUILD_THRESHOLD init.ora parameter mentioned in the Latch
option. It reports how many times redo information was prebuilt. Large
values for this statistic compared to the redo entries statistics is a
sign of effective prebuilding. If LOG_ENTRY_PREBUILD_THRESHOLD is
non-zero, then the user should check 'redo entries linearized' values
to ensure that prebuilding is effective. 'redo small copies' parameter
indicates the number of times that the user process managed to copy redo
information in one operation (into the redo buffer) without having to
obtain a copy latch. This statistic should be compared with 'redo entries'.
This statistic is directly related to LOG_SMALL_ENTRY_MAX_SIZE. This
parameter determines the size of memory in the user address space used to
copy redo. If generated redo is larger than the value of this parameter, the
process needs to hold a wait latch on the redo buffer to perform the copy
operation in several attempts. This statistic should be compared to REDO
COPY in the latch display. Large waits indicate redo chunks are indeed
larger than LOG_SMALL_ENTRY_MAX_SIZE.

'redo writes' is the total number of redo writes to the redo buffer. It may
be useful to determine whether or not it is too large compared with
'redo entries' statistics.

Table Option

This option is used in determining the tables accessed by processes. It
is very useful when used in conjunction with the Locks option. The Obj#
column is the same as the middle figure in the left most column in the
Locks screen for TM and TD locks.

User Option

This option provides information regarding the user session and is closely
related to the Processes option. Each user process can have several
sessions depending on whether or not recursive or other sessions are
invoked. However, the Session information only displays the interactive
session to which the user is connected to. The use of this screen is more
for the user's information than tuning aid. The user can relate this
screen to the processes screen by the System pid number or the Oracle pid
number. The lower part of this screen also provides the last statement
which was run by the user.

Monitor Base Tables and Useful Scripts

V$SESSTAT and V$SYSSTAT Base views

These two views store the session wide and system wide wait figures
respectively. The view V$STATNAME which has the name and id of the
statistics is used in conjunction with the other two views to extract the
waits figures. The following simple query is used to compile the total
wait figures for the system.

SELECT a.name, b.value
FROM v$statname a, v$sysstat b           Q1
WHERE a.statistic# = b.statistic#
ORDER BY a.class, b.name;

In order to perform the same query for a user session, we can run a similar
query and specify the user sid. To get the user sid, run the following:

SELECT s.sid
FROM v$process p, v$session s            Q2
WHERE p.addr = s.paddr
and terminal = userenv('terminal');

Note that this query may not work where the user is accessing the database
remotely, as the terminal may be null.

The advantage of running Q1 for a single process is that by running the
query once before and once after an application, the waits figures
pertaining to that application are derived. Now the user can run the
following to see the statistics for the session.

SELECT a.name, b.value
FROM v$statname a, v$sesstat b
WHERE b.sid = &sid                       Q3
AND a.statistic# = b.statistic#
ORDER BY a.class, a.name;

No shutdowns should occur in between running these queries.

Relevant Statistics in V$STATNAME
----------------------------------

SORTS (MEMORY) User sorts which occur in memory (sort_area_size).

SORTS (DISK) Sorts resulting disk IO.

                        SORTS (MEMORY)
Hit ratio = ------------------------------------- * 100
                SORTS (MEMORY) + SORTS (DISK)

If this ratio is low, increase the value of SORT_AREA_SIZE.

SORTS (ROWS) Rows retrieved in sorts

TABLE SCANS (SHORT TABLES)  Tables with size 4 blocks or less scanned (full).

TABLE SCANS (LONG TABLES)  Tables with size 5 blocks or more scanned (full).

Small tables should use more full table scans than indexed scans. Large
tables which return more than 15% of rows should be scanned fully rather
than indexed. Where full tablescans or full table updates are anticipated
increase DB_FILE_MULTIBLOCK_READ_COUNT and DB_BLOCK_WRITE_BATCH values
respectively.

TABLE SCAN BLOCKS GOTTEN blocks retrieved on full table scans.

TABLE SCAN ROWS GOTTEN rows retrieved on full table scans. This includes
rows which did not satisfy the select criteria.

TABLE FETCH CONTINUED ROW The number of chained blocks retrieved. The
following script gives a fairly accurate number for the current session.
Any inaccuracy would come about since the kernel may actually move the
body of a block to a different block other than the header. This will
occur if an update results in chaining. This is still chaining but this
query does not account for it.

Run the query once, run your application and run the query again and work
out the difference in the value of 'table fetch continued row'.Indeed this
query returns many useful statistics which the user can use in the same way.

SET NUMWIDTH 5
COLUMN ORCLNAME FORMAT a12 WORD_WRAP
COLUMN SYSNAME FORMAT a8 WORD_WRAP
COLUMN NAME FORMAT a40 WORD_WRAP
SET RECSEP OFF
SET PAGESIZE 60                                                Q4
SELECT st.sid sid, s.username orclname, p.username sysname,
          n.name name, st.value value
FROM v$sesstat st, v$statname n, v$session s, v$process p
WHERE st.statistic# = n.statistic#
AND st.sid = s.sid
AND s.paddr = p.addr
AND p.terminal = userenv('terminal');

Replacing the last AND clause above with:       AND p.spid = '&OS_PID'

enables the statistics to be displayed for any process if the operating
system pid is known. System pid can be obtained from Monitor Process
option, or a simple modification to the above script.

V$PROCESS and V$SESSION

This is a script to find the Oracle Session Id if the operating system pid
of the process is known.

SELECT s.sid sess_id
FROM v$process p, v$session s
WHERE p.addr = s.paddr                                Q5
AND p.spid = '&OS_PID';

The 'command' column in V$SESSION gives the last command executed by the
user, match these numbers with Monitor users screen.

V$ROLLSTAT

To work out the size of redo and undo generated for a transaction, save
the required transaction script to a file called test.sql. Then run this
script from sql.

COLUMN NAME FORMAT a40
COLUMN undo_i NEW_VALUE undo
COLUMN redo_i NEW_VALUE redo
SET TERMOUT OFF                              Q6
SELECT SUM(writes) undo_i
FROM v$rollstat
/
SELECT VALUE redo_i
FROM v$sysstat
WHERE statistic# = 63
/
SET TERMOUT ON
@test
SELECT (SUM(writes) - &undo) undo
FROM v$rollstat
/
SELECT (value - &redo) redo
FROM v$sysstat
WHERE statistic# = 63
/

More useful scripts are included in reference 2.

References:

1. Oracle RDBMS Database Administrator's Guide Version 6.0
Revised October 1990.

2. Oracle RDBMS Performance Tuning Guide Version 6.0 August 1989.

3. Bulletin by Saleem Haque; Wide SQLDBA Screens Doc # 99418.912

Acknowledgements:
The author would like to thank Walter Lindsay, Tuomas Pystynen, Saleem Haque,
Steve Hoffman and Mike Kennedy of Oracle Corp. for their invaluable
contribution in preparing this paper.