Document ID: 35934.1
Subject: Cost Based Optimizer - Common Misconceptions and Problems
Author: RPOWELL
Modified: 15 Jan 97 03:37:29
Introduction
~~~~~~~~~~~~
This short article is intended to dispell a few myths about the
cost based optimizer (refered to as the CBO from here on) and to
highlight common mistakes and problem areas.
Background
~~~~~~~~~~
To execute any SQL statement Oracle has to derive an 'execution plan'.
This is basically the plan of how Oracle will implement the retrieval
of data to satisfy a given SQL statement.
Oracle7 has two optimizers which can derive 'execution plans' from
a SQL statement:
- The RULE based optimizer (referred to as RBO from here on)
Inherited from Oracle 6 this uses a rigid set of rules
to determine an execution plan for any SQL statement.
If you know the rules you can construct a SQL query to
access data in a desired manner. The data content of a
table has no effect on the execution plan.
- The COST based optimizer (CBO)
Introduced in Oracle7 this tries to find the lowest 'cost'
method of accessing the data either for maximum throughput
OR quickest initial response time.
The 'cost' of using different execution plans is calculated
and the lowest 'cost' option chosen.
Statistics on the data content of tables are used to determine
an execution plan.
Fundamental Points
~~~~~~~~~~~~~~~~~~
For any SQL statement there are a finite number of possible
'execution plans'.
The 'best plan' will always be the 'best plan' regardless of how it was
arrived at.
The term 'best plan' may mean either:
- The plan which uses up minimal resource to process all rows
affected by the statement. Referred to as ALL_ROWS.
- The plan which returns the first row of a statement as
quickly as possible. Referred to as FIRST_ROWS.
The CBO does *NOT* understand characteristics associated with an
application, nor can it fully understand the impact of complex
relationships between joined tables. It only has limited information
available to determine the 'best' plan.
The CBO determines the best plan by calculating an estimated COST
for various execution plans and uses the plan with the lowest cost.
As this inherently involves assumptions about relative costs the
chosen plan may not necessarily be the true best plan. Such occurences
are often reported to Oracle support as being "bugs" because the CBO
did not choose the best plan for a given scenario. One can generally
prove that, given the input statistics available and the default
'costs' involved, the chosen plan was calculated to be the best
plan - even if it was not. However advanced the CBO becomes there will
always be situations where the plan chosen is not optimal. Hence you
should always be prepared to 'tune' such statements.
The RBO is no longer being enhanced. This means that certain execution
plans are ONLY available to the CBO. However, the RBO will continue
to exist and is expected to be available into Oracle 8.
Before you Continue
~~~~~~~~~~~~~~~~~~~
It is NOT advisable to use the CBO in Oracle releases 7.0.X.
You should upgrade to at least 7.1.3.
Base Statistics
~~~~~~~~~~~~~~~
To give CBO the most information (and the best chance of choosing a
good execution plan) you should ANALYZE all tables to be queried.
ANALYZE with the ESTIMATE option can produce inaccurate results
for some tables, especially for small sample sizes. This is not a
bug but a feature of any statistical sampling method. If the sample
chosen is NOT representative of the full data set you cannot expect
accurate statistics to be produced.
In Oracle 7.1 and 7.2 column values are assumed to be evenly
distributed. This is an important limitation in these Oracle releases
as completely accurate statistics give no indication of the
distribution of the actual data. This limitation is partially
addressed in Oracle release 7.3 where it is possible to keep
information on column value distributions - but see the notes on Bind
Variables in the sections below as these extra statistics may not
actually help with certain types of query.
Some important points regarding the use of ANALYZE include:
- ANALYZE on a table with indexes cascades to all related indexes
on that table. In Oracle 7.3 it is possible to analyze a table
without this cascading to the indexes.
- If you ANALYZE ... ESTIMATE on a table it is sensible to ANALYZE
COMPUTE on the related indexes afterwards.
- Analyzing indexes uses no temporary storage space.
- If you analyze an index but DO NOT analyze the base table then the
cost based optimizer will not be chosen on this basis alone.
Obviously other criteria may force the use of the cost based
optimizer - See the table below.
- If you need to use ESTIMATE (eg: due to time restrictions)
it is advisable to perform ANALYZE ... ESTIMATE at several different
sample sizes to determine the optimal sample size for each object.
The general goal is to find a sample size that yields accurate
statistics in the shortest time. A good starting point is 10% - 15%.
- ANALYZE with an ESTIMATE above 50% results in an ANALYZE ... COMPUTE.
- There are problems with ANALYZE ... ESTIMATE prior to 7.1.6 which can
result in spurious ORA 600 errors on the ANALYZE command.
- You should NEVER analyze the data dictionary tables directly.
- You can analyze the dictionary in Oracle 7.2 using the
dbms_utility.analyze_schema() procedure.
Optimizer Goal / Mode
~~~~~~~~~~~~~~~~~~~~~
Which optimizer is used and its mode of operation are determined
by:
Parallel Degree > 1 on a table
- If any table in a query has a degree of parallelism set
greater than ONE then CBO will be used regardless of
any 'RULE' setting for hints, OPTIMIZER_MODE or
OPTIMIZER_GOAL.
Hints
- Any hint except RULE causes CBO to be used.
It is very important to note that a HINT cannot be 'turned
off' by any parameter settings.
Session level OPTIMIZER_GOAL
- Unless the HINT specifies otherwise the default optimizer
is determined by the session level parameter
"OPTIMIZER_GOAL". If OPTIMIZER_GOAL is set to RULE this is
used regardless of any table statistics.
Init.Ora OPTIMIZER_MODE parameter
- Unless a HINT or OPTIMIZER_GOAL has been specified the
OPTIMIZER_MODE is used. If this is set to CHOOSE then
ALL_ROWS is used if ANY table in the query is analyzed.
PLSQL blocks within stored procedures should use explicit HINTs
to determine the actual optimization goal.
Summary Optimizer Mode:
~~~~~~~~~~~~~~~~~~~~~~~
Assuming no tables have a parallel degree greater than 1 then the
following table shows the optimizer used for various scenarios:
----------+---------------+---------------+---------------+------------
Hint | A table has |Optimizer_Goal |Optimizer_Mode |Actual Mode
* | Statistics | | |
----------+---------------+---------------+---------------+------------
RULE Irrelevant Irrelevant Irrelevant RULE
ALL_ROWS Irrelevant Irrelevant Irrelevant ALL_ROWS
FIRST_ROWS Irrelevant Irrelevant Irrelevant FIRST_ROWS
None Irrelevant RULE Irrelevant RULE
Other Irrelevant RULE Irrelevant ALL_ROWS
None Irrelevant ALL_ROWS Irrelevant ALL_ROWS
Other Irrelevant ALL_ROWS Irrelevant ALL_ROWS
None Irrelevant FIRST_ROWS Irrelevant FIRST_ROWS
Other Irrelevant FIRST_ROWS Irrelevant FIRST_ROWS
None Irrelevant Not Set RULE RULE
Other Irrelevant Not Set RULE ALL_ROWS
None Irrelevant Not Set ALL_ROWS ALL_ROWS
Other Irrelevant Not Set ALL_ROWS ALL_ROWS
None Irrelevant Not Set FIRST_ROWS FIRST_ROWS
Other Irrelevant Not Set FIRST_ROWS FIRST_ROWS
None No Not Set CHOOSE RULE
None Yes Not Set CHOOSE ALL_ROWS
Other Irrelevant Not Set CHOOSE ALL_ROWS
----------+---------------+---------------+---------------+------------
* Other implies a hint other than 'RULE', 'ALL_ROWS' or 'FIRST_ROWS'
General Optimizer Notes
~~~~~~~~~~~~~~~~~~~~~~~
These are general points to be bourne in mind when looking
at optimizer issues:
- ALL_ROWS tends to favour full table scans.
- FIRST_ROWS tends to favour index access.
- By default CBO uses 'ALL_ROWS' costing.
- CBO does not adjust costs to cater for Parallel Queries until
Oracle 7.3.
- CBO assumes column values are evenly distributed between the
highest and lowest values until Oracle 7.3 when histogram statistics
CAN be stored if requested.
- Other things being equal (Eg: Unanalyzed tables) the CBO chooses
the driving order of tables from LEFT to RIGHT in the FROM clause.
This is OPPOSITE to the RBO which chooses the driving order from
RIGHT to LEFT.
- CBO uses ANALYZE information in conjunction with current table high
water-mark information. Hence the query plan for a statement *CAN*
change over time.
- Note that TRUNCATE resets the table 'High Water Mark; and can thus
affect query plans. DELETE from a table does NOT reset the high
water mark.
- When performing many way joins some join combinations will be
'eliminated' to reduce the overall time spent determining an
execution plan. Generally each join order is compared with the best
so far and obviously sub-optimal plans are eliminated.
Problem SQL Statements
~~~~~~~~~~~~~~~~~~~~~~
Rather than fighting the CBO if it comes up with a sub-optimal plan
the best course of action for any problem SQL statement should be to:
a) Check you really have ANALYZED the tables involved so the
CBO has useful information to work on.
b) Check the ANALYZED information is accurate.
Eg: Use the COMPUTE option and compare the statistics
both before and after a re-analysis of the table/s.
c) Check if you are referencing dictionary tables. Dictionary
tables are NOT analyzed by default so SQL accessing them
can produce very bad plans under CBO.
d) Determine if you are expecting to use RBO, ALL_ROWS or
FIRST_ROWS as your 'OPTIMIZER_GOAL'.
e) Use hints to help guide the optimization. It is perfectly
valid to use 'RULE' as a hint if gives you the best plan.
If you do have a problem SQL statement and attempts to guide the
CBO are not yielding results the first action should be to isolate
the problem statement in a SQLPLUS session in its simplest form.
It is then possible to use the EXPLAIN command or TKPROF to determine
what actual execution plan is in use and to try and improve it.
If things suddenly run well in SQLPLUS then note the common mistakes
highlighted in the following sections.
The EXPLAIN PLAN SQL statement and the TKPROF trace file analysis
tool are useful to determine the execution plan for a given
statement but note these limitations:
a) They use the CURRENT information to derive an execution plan and
do *NOT* necessarily show the plan used originally.
Eg: If you have ANALYZEd any table or CREATEd / DROPPed an index
the plan may differ.
If OPTIMIZER_MODE is different a different plan may be produced.
b) They do *NOT* know the 'type' of any bind variables and so assume
all bind variables to be of character type. Hence you can get
errors or misleading results.
Eg:
SELECT 1 FROM DUAL WHERE sysdate < :b2 + 1
this will raise an error if explained as ':b2' is assumed to
be a character type.
To avoid this sort of problem you should enclose any such bind
variables in a relevant type conversion function.
Eg:
SELECT 1 FROM dual WHERE sysdate < to_date(:b2) + 1;
This is good practice anyway as it avoids implicit type conversions.
Making Comparisons
~~~~~~~~~~~~~~~~~~
Comparing SQL statements from PLSQL:
PL/SQL uses 'bind variables' in SQL statements when you reference a
PL/SQL variable. This is very important as it affects the
determination of an execution plan as explained in 'Bind Variables'
below.
Eg: In PLSQL the part-statement:
SELECT ename FROM emp WHERE empno > myempno;
'myempno' is a PLSQL variable, so this statement is effectively
equivalent to:
SELECT ename FROM emp WHERE empno > :bind1;
Bind Variables:
It is common to take a problem SQL statement and try it in SQLPLUS
using literals in place of bind values. In most cases this is an
invalid comparison.
Eg: Compare:
SELECT ename FROM emp WHERE empno > 9999;
with
SELECT ename FROM emp WHERE empno > :bind1;
Assuming the table has been analysed CBO "knows" the HIGH and LOW
values of 'EMPNO'. For the first of these statements CBO can
determine a 'selectivity' of the clause 'WHERE empno > 9999'
but for the second it does not know if ':bind1' is '1' or '9999'
and thus uses a default selectivity - Eg: It assumes that the
clause will return one quarter of all rows in the table. This is
true even when using histograms in Oracle 7.3 as CBO does NOT KNOW
the value of the bind variable. Hence the resulting execution plans
can be very different.
Generally all RANGE SCANS and LIKE comparisons use default
selectivities if you are using bind variables so form an invalid
comparison. You should actually use a BIND variable within SQLPLUS
also.
Eg:
VARIABLE bind1 varchar2(10)
EXPLAIN PLAN FOR SELECT ename FROM emp WHERE empno > :bind1;
as opposed to:
EXPLAIN PLAN FOR SELECT ename FROM emp WHERE empno > 9999;
Comparing Databases:
If you have two separate databases then the base statistics for
the tables are likely to be different. You *CANNOT* transfer
statistics from one database to another. The MAJOR factors
affecting database comparisons are:
- The database block size (DB_BLOCK_SIZE). This is fundamental
to CBO calculations so you cannot compare databases with
different values of DB_BLOCK_SIZE.
- DB_FILE_MULTIBLOCK_READ_COUNT. This init.ora parameter
is used in determining the cost of full table scans in
Oracle 7.1 onwards.
- Table / Index statistics. Export then import of table /
indexes is likely to result in a different population of
table and index blocks. This affects computed OR estimated
statistics and thus can affect the CBO execution plan.
Comparing Versions:
There are subtle changes to the costing algorithm and the default
costs between releases of Oracle. Hence execution plans CAN change
when upgrading. A good example of this is the change to the
default selectivity of a LIKE clause with bind variables.
Comparing to the RBO:
There is little point comparing an execution plan to the RBO plan
and being surprised if you get a different execution plan. Hopefully
it is fairly clear by now that the CBO uses relative 'costs' of
execution plans to determine an execution plan.
Eg:
For RBO the fact that an INDEX exists is sufficient for it
to be used (assuming it is relevant)
For CBO the fact that an INDEX exists is NOT sufficient for
it to be used. The "cost" of access to the EXPECTED amount
of data via the index is compared to the "cost" of alternative
access paths (Eg: Full Table Scan).
If RBO already gives the 'best' plan then you can either use a 'RULE'
hint or other forms of hint to guide the access path.
The most common problem with RBO versus CBO comparisons is the non-use
of an INDEX by CBO. The main contributors to NOT using an index where
it is expected to be used are:
a) The index is 'not selective'. As CBO computes a cost for
access to the EXPECTED amount of data it determines that
the cost of individual index block look-ups will cost more
than scanning the entire table which it can do with multi-
block reads.
b) The 'range of values' and hence the number of rows CBO is
EXPECTING to return is innaccurate due to either inaccurate
statistics OR the use of BIND values limiting a range.
Note: If RBO and CBO give physically different RESULTANT DATA this
is serious and should be reported as a bug.
Comparing PARSE times:
If the parse time is a significant proportion of the execution time
it is advisable to check if you are performing a join between many
tables. If so it would be wise to use the ORDERED hint to set the
order of table access to reduce the number of join options that have
to be considered. Once you have a good execution plan you can re-order
the FROM clause and use the ordered hint to arrive at the same plan
more quickly.
Hints
~~~~~
Hints allow you to provide some input to the CBO as to how to access
data. Unfortunately invalid hints cause the hint to be ignored with
no warning so one has to be very careful with the syntax. The main
points to remember when using hints are:
- Hints must start in a comment of the strict format: /*+ ... */
- All hints (except RULE) cause you to use the CBO. Hence it is not
a good idea to use hints unless the tables are analyzed..
- Hints must reference the table alias if aliases are in use.
Eg:
Wrong:
SELECT /*+ FULL ( emp ) */ empno FROM emp myalias
WHERE empno > 10;
Correct:
SELECT /*+ FULL ( myalias ) */ empno FROM emp myalias
WHERE empno > 10;
- Hints should not reference the schema name,
Eg: SELECT /*+ index(scott.emp emp1) */ ... should not be used.
Alias the tablename instead and use the alias in the hint
- The access path to be HINTed must be an available access path.
For example, if you supply an index hint but the optimizer
chooses to drive the query from a full table scan on the table
that this index is based on then the indexed column(s) may not
be filled at this point (perhaps because they are join
predicates that are not satisfied until another table is
involved).
Eg:
Tables A & B both have indexes on the ind_col column.
A hint has been supplied suggesting that these indexes should
be used.
The value column in both tables is unindexed.
SELECT /*+ index(A) index(b) */ *
FROM A,B
WHERE A.ind_col = B.ind_col
AND A.value = 1
AND B.value = 2
Query Plan
----------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost = 83
NESTED LOOPS
TABLE ACCESS FULL A
TABLE ACCESS BY ROWID B
INDEX RANGE SCAN B1
If we drive the query from A, then the value of A.ind_col at
this stage is unknown.
We could use an index to drive this, but we would have to scan
the whole index.
As the cost of a full scan of the index and subsequent table
lookup (for the unindexed columns) scan is prohibitive when
compared to a full table scan the optimizer is likely to
decide not to take the index hint.
It can use the index on B as when the access to B is made, the
value of ind_col is known.
- Invalid hints may not be immediately obvious.
Eg: FIRST_ROWS as a hint when the statement has an ORDER BY clause.
- PLSQL V1 (which is the PL/SQL engine linked with most tools) does not
contain all the logic to support CBO. In particular it does NOT
understand HINTS. However, you CAN embed hints into a VIEW for use
from PLSQL V1 to tackle particular problem statements.
Eg: For:
SELECT /*+ FULL( mytab ) */ col1, col2
FROM mytab
WHERE col1 > var1;
you can create a view:
CREATE or REPLACE VIEW myview AS
SELECT /*+ FULL( mytab ) */ *
FROM mytab;
and change the SQL to:
SELECT col1, col2 FROM myview WHERE col1 > var1;
Summary
~~~~~~~
To make effective use of the CBO you should:
- Analyze all tables regularly.
- Set the required OPTIMIZER_GOAL (FIRST_ROWS or ALL_ROWS).
- Use hints to help direct the CBO where required.
- Be careful with the use of bind variables.
CBO works fairly well for ad-hoc queries. For hard coded, repeated
SQL statements these should be tuned to obtain a repeatable optimal
plan.
You should ALWAYS monitor performance over time, and should always
test upgrades of the RDBMS version carefully.