Document ID: 10585.1
Subject: QUERY AND APPLICATION TUNING USING EXPLAIN AND TKPROF UTILITY
Author: Darryl L. Presley
Last Modified : 11 October 1994
Abstract:
Every SQL statement processed by ORACLE RDBMS must be evaluated
by the ORACLE optimizer. Therefore, knowing how the optimizer
chooses its path and how indexes are being used, the SQL
statement can be tuned. This paper discusses in detail the use
of EXPLAIN and TKPROF as performance diagnostic tools to tune
applications.
INTRODUCTION
Application tuning requires more than just knowing what the
application does. How the application works and what kind of
underlying database design supports it are also necessary.
Knowledge of what types of SQL statements are used, the
underlying table and view structures and the various indexes
associated with these tables is a must. In addition, it may not
be necessary to tune the entire application. Understanding the
various parts of the application will lead to understanding what
really needs tuning. We will focus our attention to the SQL
level tuning using two performance tuning tools provided with
version 6.0 of the Oracle RDBMS.
The EXPLAIN facility can be used quickly and easily to
determine how the data is accessed (known as the access path) for
any given SQL statement, namely queries. The ability to modify
this access path can yield an incredible performance benefit. We
will discuss the various kinds of access paths and performance
gains and loss by providing scenarios.
The TKPROF facility provides timed statistics information
for the length of time it takes to execute each SQL statement
your application passes to the ORACLE RDBMS. TKPROF can
additionally call the EXPLAIN facility to provide a complete
report of timed statistics and the EXPLAIN PLAN to get a better
picture of what the application is doing.
USING EXPLAIN
Using the EXPLAIN facility requires that you build the
EXPLAIN_PLAN table in the Oracle account which owns the
application tables, views and indexes. Your Oracle distribution
media contains the SQL script for doing this. For example, it
can be found in:
ORA_RDBMS:XPLAINPL.SQL (VMS)
$ORACLE_HOME/rdbms/admin/xplainpl.sql (Unix)
You will find it where CATALOG.SQL is found. It will build a
table called PLAN_TABLE which looks like this:
Name Null? Type
------------------------------- -------- ----
STATEMENT_ID CHAR(30)
TIMESTAMP DATE
REMARKS CHAR(80)
OPERATION CHAR(30)
OPTIONS CHAR(30)
OBJECT_NODE CHAR(30)
OBJECT_OWNER CHAR(30)
OBJECT_NAME CHAR(30)
OBJECT_INSTANCE NUMBER(38)
OBJECT_TYPE CHAR(30)
SEARCH_COLUMNS NUMBER(38)
ID NUMBER(38)
PARENT_ID NUMBER(38)
POSITION NUMBER(38)
OTHER LONG
Here is a brief description of the columns we will use in our
discussion. You may find a full description of each column in
the file named EXPLAIN.DOC on your distribution media.
STATEMENT_ID: Identifies the name of a specific execution plan
for a given SQL statement. It is set to NULL if
the SET STATEMENT_ID is not used in the EXPLAIN
PLAN ... syntax.
OPERATION: The name of the operation being performed in this
step of the plan. Various operations performed
might include Filters, Index, Table, Merge Joins
and Table, to name a few.
OPTION: An option that will modify the operation. For
example, on a table access, the operation may be
TABLE ACCESS but the option may say BY ROWID or
FULL.
OBJECT_OWNER: The name of the schema or Oracle account that owns
the database object.
OBJECT_NAME: The name of the database object.
OBJECT_TYPE: The object type i.e., TABLE, VIEW, INDEX, etc.
ID: A number assigned to this step of the execution
plan tree.
PARENT_ID: The number assigned to the previous operation that
received information from this operation. Using
the CONNECT BY with ID and PARENT_ID, a tree walk
through the execution plan can be queried.
Once this PLAN table has been built, the user is ready to
use EXPLAIN for their application. The general syntax is:
EXPLAIN PLAN [SET STATEMENT_ID [=] <string literal>]
[INTO <table_name>]
FOR <sql_statement>
Where:
STATEMENT_ID is a unique string identifer that specifies
this execution plan from others which may be stored in the
same plan table.
TABLE_NAME is the name of the plan table that has the
structure given above. You may give the table any name you
wish.
SQL_STATEMENT is the actual SQL query.
For example:
EXPLAIN PLAN
SET STATEMENT_ID = 'QUERY1'
FOR
SELECT *
FROM EMP
WHERE DEPTNO = 10;
will return the following message:
Operation 50 succeeded.
Note that if the INTO clause is omitted in the EXPLAIN syntax,
EXPLAIN will try to use a table called PLAN_TABLE. By querying
the plan table, we can see what the execution plan looks like:
SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, ID,
PARENT_ID,
FROM PLAN_TABLE
WHERE STATEMENT_ID = 'QUERY1'
ORDER BY ID;
will return:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE ID PARENT_ID
------------- -------- ------------ ----------- -- ---------
TABLE ACCESS FULL EMP TABLE 1
1 row selected.
This indicates that a full table scan will occur for this
particular query. Because no indexes have yet to be created on
the table EMP, all queries against EMP will result in full table
scans. But suppose a non-unique index on the DEPTNO column of
EMP is created:
CREATE INDEX EMP_IDX ON EMP (DEPTNO);
Now, if we re-explain the query:
EXPLAIN PLAN
SET STATEMENT_ID = 'QUERY2'
FOR
SELECT *
FROM EMP
WHERE DEPTNO = 10;
and examine the plan table:
SELECT OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, ID,
PARENT_ID,
FROM PLAN_TABLE
WHERE STATEMENT_ID = 'QUERY2'
ORDER BY ID;
will return:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE ID PARENT_ID
------------- -------- ------------ ----------- -- ---------
TABLE ACCESS BY ROWID EMP TABLE 1
INDEX RANGE-
SCAN EMP_IDX NON-UNIQUE 2 1
2 rows selected.
Here, we see that the index EMP_IDX is used to retrieve all rows
whose DEPTNO is 10 then access the table BY ROWID. The index
stores each row's ROWID of the table. Once a row is found in
the index, a fetch BY ROWID for the remainder of the row is done.
On large tables, this access path may be preferred over the
previous one as this access path will result in fewer disk I/O
operations. If however, the index has "low selectivity", a full
table scan may be more efficient.
Consider this query and its execution plan:
EXPLAIN PLAN
SET STATEMENT_ID = 'QUERY3'
FOR
SELECT DEPTNO
FROM EMP
WHERE DEPTNO = 10;
The plan for the execution path is:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE ID PARENT_ID
------------- -------- ------------ ----------- -- ---------
INDEX RANGE-
SCAN EMP_IDX NON-UNIQUE 1
The above execution plan shows that no data was needed from the
table, the query was resolved in the index only.
EXPLAIN ACCESS PATH EXPLAINED
The execution plan of any SQL statement is governed by a
set of optimization rules discussed on page 19-17 of the ORACLE
RDBMS Database Administrator's Guide Version 6.0. A revised
listing of these rules can be found on the On-Line bulletin board
document number 100040.163 entitled Query path Ranked in order
of Speed dated February 4, 1991. These rules are evaluated to
determine the best access path when retrieving data from the
database. Oracle V6.0 does not support a statistical optimizer
so table or index sizes will not have any affect on how the
optimizer determines which access path it will choose.
The optimizer will build an execution plan tree once it has
evaluated the query and determined the access path for the
query. This execution tree (or plan tree) can be shown visually
from the plan table itself with the following query used in
SQL*Plus:
COLUMN plan FORMAT a70
SELECT lpad(' ', 3*level)||operation||'('||options||')
'||object_name||
' ' ||object_type
FROM plan_table
CONNECT BY PRIOR id = parent_id and statement_id='&stmt_id'
START WITH id = 1 and statement_id = '&stmt_id';
For example, consider this query:
SELECT ENAME
FROM EMP
WHERE DEPTNO = 10
ORDER BY ENAME;
Using the above alternative query to examine the plan table:
SORT(ORDER BY)
TABLE ACCESS(BY ROWID) EMP
INDEX(RANGE SCAN) EMP_IDX NON-UNIQUE
The tree is read from the most indented part in, or in this case,
from the bottom up. This plan tree illustrates that an index
range scan was performed on the EMP_IDX index and the ENAME data
fetched from the table by rowid. These rows were then sorted
using the ORDER BY option. This last part of the plan tree will
take the longest time to complete if this were a large table.
Suppose we explained this query:
SELECT DEPTNO,ENANE
FROM EMP
WHERE DEPTNO BETWEEN 10 AND 30
ORDER BY DEPTNO;
The plan tree looks like:
TABLE ACCESS(BY ROWID) EMP
INDEX(RANGE SCAN) EMP_IDX NON-UNIQUE
Note that the SORT(ORDER BY) is not in this plan tree yet the
query has an ORDER BY DEPTNO. Why? The sort will not be
necessary for 2 reasons: 1) the DEPTNO column is indexed and
hence, already sorted and 2) the DEPTNO is defined as NOT NULL
(i.e., DEPTNO NOT NULL NUMBER). Note that these must be in
place before this feature can be utilized.
Consider the following common join query:
SELECT *
FROM EMP,DEPT
WHERE EMP.DEPTNO = DEPT.DEPTNO
AND SAL > 5000;
The plan tree is:
NESTED LOOPS()
TABLE ACCESS(FULL) DEPT
TABLE ACCESS(BY ROWID) EMP
INDEX(RANGE SCAN) EMP_IDX NON-UNIQUE
NESTED LOOPS means that a sequential scan is done on one
table (DEPT) and each DEPTNO retrieved is looked up in the index
EMP_IDX on EMP. This query has what is called a driving table.
In this case, the driving table is DEPT. In these type of joins,
the driving table is determined by which table is listed last.
This is because both tables have equal rank access paths (both
have non-unique indexes on DEPTNO) and since all rows must be
searched in at least one table, it is more efficient to perform a
full table scan on one table and look up the join column in the
index of the other to meet the join predicate. In such cases,
ensure that the driving table with the fewest number of rows is
the last one listed in the FROM clause. Note that the order of
tables in the FROM clause currently influences how the ORACLE
RDBMS chooses the access path for this type of join.
The driving table in the above query can be overridden by
using a join-query where one of the WHERE predicates returns a
single row. Suppose a unique index is constructed on the EMPNO
column on the EMP table:
SELECT *
FROM EMP,DEPT
WHERE EMP.DEPTNO = DEPT.DEPTNO
AND EMPNO = 7735;
The plan tree is:
NESTED LOOPS()
TABLE ACCESS(BY ROWID) EMP
INDEX(UNIQUE SCAN) EMP_UIDX UNIQUE
TABLE ACCESS(BY ROWID) DEPT
INDEX(RANGE SCAN) DEPT_IDX NON-UNIQUE
This execution plan illustrates that the NESTED LOOPS will still
be done but only once because an INDEX UNIQUE SCAN operation is
performed on EMP (the unique index EMP_UIDX) which is the driving
table. This value is looked up via an INDEX RANGE SCAN on the
DEPT_IDX and if the row is found, the table is access BY ROWID.
Other operations and their modifying options may also show
up in an EXPLAIN output. CONNECT BY for example, is an operation
used when the query contains a CONNECT BY PRIOR ... clause
causing the RDBMS to perform a tree walk through the table based
on the connect by condition. The query which gives us the
execution plan tree is one such example. Use the EXPLAIN on the
query and see what you get.
Still, other operations such as FILTERS, PROJECT, UNION may
also be encountered. For instance, the following query will
cause a FILTER operation to be performed:
SELECT DEPTNO,AVG(SAL),SUM(COM)
FROM EMP
GROUP BY DEPTNO
HAVING SUM(COM) > 500;
Will have the following execution plan:
FILTER()
SORT(GROUP BY)
TABLE ACCESS(FULL) EMP
Once the table has been sorted, the filter operation will collect
the rows having COM greater than 500.
PROJECT is not as frequently encountered. It indicates that
a subset of columns were selected of all columns when MINUS and
UNIONS are used.
UNION and MINUS will be seen in the execution plan tree if
the user explicitly uses them in their SQL statements.
VIEWS
Virtual tables or VIEWS may also be explained. It is of
good practice to do so before applications are to make use of
them. This is done by just taking the query that the view is
based on and use EXPLAIN on the query. The access path should be
transparent to the query utilizing the view. Consider this view
for example:
CREATE VIEW EMP_VIEW AS
SELECT *
FROM EMP
WHERE DEPTNO > 20;
The execution plan tree is:
TABLE ACCESS(BY ROWID) EMP
INDEX(RANGE SCAN) EMP_IDX NON-UNIQUE
if you were to just select from the view. If we were to join
this view with DEPT sa in:
SELECT ENAME,LOC
FROM EMP_VIEW,DEPT;
would yield this execution plan:
NESTED LOOPS()
TABLE ACCESS(FULL) DEPT
TABLE ACCESS(BY ROWID) EMP
INDEX(RANGE SCAN) EMP_IDX NON-UNIQUE
which is identical to an earlier example we gave. If the
operation in a given step of the execution plan is "VIEW", this
indicates that this is a separate operation. This step would be
performed to complete its parent operation. You may find the
VIEW operation when views are defined based on queries joining
two or more tables. There may be other situations where the VIEW
operation is performed.
Ensure that if you are going to utilize views in your
application, that they are written for maximum performance yield.
ORACLE Support often finds many customer applications where views
are used with no regards to the performance they can cost if not
written properly.
WHAT CAN AND CAN'T BE EXPLAINED
Just about any SQL statement can be explained but this tool
is not very useful for INSERT, UPDATE and DELETE statements
without subqueries. INSERT statements without a subquery will
not create an execution plan tree. UPDATE and DELETE will create
an execution plan as they must fetch rows even without a WHERE
clause or subqueries.
In addition, EXPLAIN will reveal the use of other object
types such as SEQUENCE number generators if used in your SQL
statements.
The only real restriction on EXPLAIN is that the user cannot
explain tables, views, indexes or other objects which are owned
by another user. The user must own all of the objects to be
explained. If there are objects for which the user has SELECT
privileges on but does not own the objects, EXPLAIN will return
an error indicating that objects owned by another user cannot be
explained.
TKPROF AND EXPLAIN COMBINED
The TKPROF facility bundled with the ORACLE RDBMS V6.0.27 or
later provides a wider picture for application tuning as a whole.
It provides timed statistics on various parameters such as SQL
parse times, execute and fetch times. These parameters are
important as indicators of what needs tuning. A breakdown of
each SQL statement your application passes to the RDBMS and their
timed statistics is given in a report format. In addition, the
EXPLAIN facility can be invoked with in the TKPROF tool. The
execution plan tree becomes part of the final TKPROF report.
To use TKPROF, ensure the following has been done:
1. TIMED_STATISTICS = TRUE is placed in your init.ora
parameter file.
2. SQL_TRACE = TRUE must be in your init.ora parameter
file or you may enable SQL trace with the ALTER
SESSION SET SQL TRACE TRUE; command.
3. Ensure that the ORACLE account which owns the
objects has a plan table built as described above.
Once the above steps have been carried out, your ORACLE session
will generate a SQL trace file. The location of this trace file
is port specific. Check your ORACLE Installation and User's
Guide for where these trace files are located. You may also
check the parameter FOREGROUND_DUMP_DEST with the SHOW PARAMETER
FOREGROUND_DUMP_DEST from with in SQL*DBA.
To process this trace file, issue the following command:
TKPROF <input-tracefile> <output-file> EXPLAIN=user/password
You may also just type TKPROF with no command line arguments to
get a usage and parameter listing. Check your ORACLE
Installation and User's Guide on how to invoke TKPROF.
The output file is a report laid out with the SQL statement
given first, then the timed statistics for PARSE, EXECUTE and
FETCH followed by the execution plan if the EXPLAIN option was
used. There may be phrases like "Recursive SQL statement" in
between other SQL statement issued by the application. Such
recursive SQL statements are not explained though their
statistics are given. If recursive SQL statements follow one of
the SQL statements of your application, this indicates that the
necessary information needed to parse your SQL statement was not
in the data dictionary cache. The ORACLE RDBMS will generate
these recursive SQL statements to perform data dictionary
operations on behalf of the user's process.
In viewing all of the SQL statements, you can see which ones
are taking the longest to execute and perhaps the execution plan
might explain why. There are at least two phases to every SQL
statement and for those SQL statements with queries, there is a
third phase. PARSE, EXECUTE and FETCH are these three phases
respectively. For each of these phases, TKPROF reports timing
and other statistical information as follows:
COUNT The number of times that the SQL statement was
parsed, executed or fetched to complete an
operation. One call to fetch can cause multiple
blocks to be read.
CPU The total amount of cpu seconds taken to perform
each phase of the given SQL statement. The time
is measured in hundredths of seconds.
ELAP The total amount of time (in hundredths of
seconds) from start to finish for each phase to be
performed. It may require several cpu time slices
to complete any of these phases. This time
statistic can be viewed as "wall clock" time.
PHYS The number of Oracle blocks read from disk for
each phase. If FETCH has high values check for
possible full table scans in the execution plan
tree.
CR Total number of blocks fetched for consistent
reads. If there are lots of INSERT, UPDATE and
DELETE transactions occurring when this trace file
was generated, then this value can be high.
CUR The number of blocks acquired for modifying
transactions such as INSERTS, UPDATES and DELETES.
ROWS The number of rows operated on by either the
EXECUTE or FETCH phases.
When examining these statistics to determine the amount of
time for each SQL statement, look at the statistics in terms of
how long it takes to complete each phase. It might take an
excess amount of time to parse the SQL statement but execution
and fetch is rather quick. Tuning to reduce parse time is quite
different than tuning to reduce execute or fetch times. Parse
time tuning may require tuning the data dictionary cache while
tuning execution or fetch time may range from increasing data-
base buffers to just simply adding an index.
If you find that execution or fetch times are quite large,
look at the execution plan (if the EXPLAIN option was used) to
determine how the data was accessed. Perhaps rewriting the
query might make a significant difference in performance. If
full table scans are done, one would expect that the statistic
for PHYS will be fairly high as well. You may be able to make
use of an index to reduce the amount of physical reads from disk
provided that the selectivity of the index is good. Non-unique
index selectivity should be such that no more than 15% of the
total number of rows for a given table is returned based on an
arbitrary key. On the other hand, it may be possible that some
queries inherently take large amounts of time to complete despite
the amount of tuning one can do to them.
SUMMARY
Tuning applications requires knowledge of the underlying
database structures as well as all of the various SQL statements.
You may use the EXPLAIN facility to ensure that all queries
access the data using the fastest possible execution path.
Ensure that SQL statements are tuned before trying to tune other
parts of the application including the RDBMS or the operating
system.
The TKPROF combined with the EXPLAIN facility can allow
application development staff to enhance application performance
by tuning SQL statements which may be poorly written and take the
most time. Understanding which phases SQL statements consume the
most amount of time will aid in knowing what needs tuning and
which tuning techniques can be applied. Evaluation of the
execution plan tree can be used to determine if the statistics
from TKPROF can be justified especially for queries. After
attempting to tune all of the SQL statements as best as possible,
it is entirely possible that some SQL statements inherently take
a long time to complete. At this point checking other ORACLE
RDBMS or operating system parameters would be your next step.