Document ID:        35934.1
Subject:            Cost Based Optimizer - Common Misconceptions and Problems
Author:            RPOWELL
Modified:           15 Jan 97 03:37:29           


Introduction
~~~~~~~~~~~~
	This short article is intended to dispell a few myths about the
	cost based optimizer (refered to as the CBO from here on) and to 
	highlight common mistakes and problem areas.

	
Background
~~~~~~~~~~
	To execute any SQL statement Oracle has to derive an 'execution plan'.
	This is basically the plan of how Oracle will implement the retrieval 
	of data to satisfy a given SQL statement.

	Oracle7 has two optimizers which can derive 'execution plans' from 
	a SQL statement:

	- The RULE based optimizer (referred to as RBO from here on)
		Inherited from Oracle 6 this uses a rigid set of rules
		to determine an execution plan for any SQL statement.
		If you know the rules you can construct a SQL query to
		access data in a desired manner. The data content of a 
		table has no effect on the execution plan.

	- The COST based optimizer (CBO)
		Introduced in Oracle7 this tries to find the lowest 'cost'
		method of accessing the data either for maximum throughput
		OR quickest initial response time.
		The 'cost' of using different execution plans is calculated
		and the lowest 'cost' option chosen.
		Statistics on the data content of tables are used to determine
		an execution plan.

Fundamental Points
~~~~~~~~~~~~~~~~~~
	For any SQL statement there are a finite number of possible 
	'execution plans'. 

	The 'best plan' will always be the 'best plan' regardless of how it was
	arrived at.

	The term 'best plan' may mean either:

		- The plan which uses up minimal resource to process all rows 
		  affected by the statement. Referred to as ALL_ROWS.

		- The plan which returns the first row of a statement as
		  quickly as possible. Referred to as FIRST_ROWS.

	The CBO does *NOT* understand characteristics associated with an 
	application, nor can it fully understand the impact of complex 
	relationships between joined tables.  It only has limited information 
	available to determine the 'best' plan. 

	The CBO determines the best plan by calculating an estimated COST
	for various execution plans and uses the plan with the lowest cost.
	As this inherently involves assumptions about relative costs the
	chosen plan may not necessarily be the true best plan. Such occurences
	are often reported to Oracle support as being "bugs" because the CBO
	did not choose the best plan for a given scenario. One can generally 
	prove that, given the input statistics available and the default 
	'costs' involved, the chosen plan was calculated to be the best
	plan - even if it was not. However advanced the CBO becomes there will
	always be situations where the plan chosen is not optimal.  Hence you 
	should always be prepared to 'tune' such statements.

	The RBO is no longer being enhanced. This means that certain execution
	plans are ONLY available to the CBO. However, the RBO will continue
	to exist and is expected to be available into Oracle 8. 
 

Before you Continue
~~~~~~~~~~~~~~~~~~~
	It is NOT advisable to use the CBO in Oracle releases 7.0.X.
	You should upgrade to at least 7.1.3.

Base Statistics
~~~~~~~~~~~~~~~
	To give CBO the most information (and the best chance of choosing a 
	good execution plan) you should ANALYZE all tables to be queried. 

	ANALYZE with the ESTIMATE option can produce inaccurate results
	for some tables, especially for small sample sizes. This is not a 
	bug but a feature of any statistical sampling method. If the sample 
	chosen is NOT representative of the full data set you cannot expect
	accurate statistics to be produced.

	In Oracle 7.1 and 7.2 column values are assumed to be evenly 
	distributed.  This is an important limitation in these Oracle releases
	as completely accurate statistics give no indication of the 
	distribution of the actual data.  This limitation is partially 
	addressed in Oracle release 7.3 where it is possible to keep 
	information on column value distributions - but see the notes on Bind 
	Variables in the sections below as these extra statistics may not 
	actually help with certain types of query.
	
	Some important points regarding the use of ANALYZE include:

	- ANALYZE on a table with indexes cascades to all related indexes 
	  on that table.  In Oracle 7.3 it is possible to analyze a table 
	  without this cascading to the indexes.

	- If you ANALYZE ... ESTIMATE on a table it is sensible to ANALYZE
	  COMPUTE on the related indexes afterwards. 

	- Analyzing indexes uses no temporary storage space.

        - If you analyze an index but DO NOT analyze the base table then the
          cost based optimizer will not be chosen on this basis alone. 
          Obviously other criteria may force the use of the cost based 
          optimizer - See the table below.

	- If you need to use ESTIMATE (eg: due to time restrictions)
	  it is advisable to perform ANALYZE ... ESTIMATE at several different
	  sample sizes to determine the optimal sample size for each object.
	  The general goal is to find a sample size that yields accurate
	  statistics in the shortest time. A good starting point is 10% - 15%.
  
	- ANALYZE with an ESTIMATE above 50% results in an ANALYZE ... COMPUTE.

 	- There are problems with ANALYZE ... ESTIMATE prior to 7.1.6 which can
 	  result in spurious ORA 600 errors on the ANALYZE command. 

	- You should NEVER analyze the data dictionary tables directly.

	- You can analyze the dictionary in Oracle 7.2 using the
	  dbms_utility.analyze_schema() procedure.


Optimizer Goal / Mode
~~~~~~~~~~~~~~~~~~~~~
	Which optimizer is used and its mode of operation are determined
	by:

	  Parallel Degree > 1 on a table 
		- If any table in a query has a degree of parallelism set
		  greater than ONE then CBO will be used regardless of 
		  any 'RULE' setting for hints, OPTIMIZER_MODE or 
		  OPTIMIZER_GOAL.

	  Hints		
		- Any hint except RULE causes CBO to be used.
		  It is very important to note that a HINT cannot be 'turned
		  off' by any parameter settings.

	  Session level OPTIMIZER_GOAL
		- Unless the HINT specifies otherwise the default optimizer 
		  is determined by the session level parameter 
		  "OPTIMIZER_GOAL".  If OPTIMIZER_GOAL is set to RULE this is 
		  used regardless of any table statistics.

	  Init.Ora OPTIMIZER_MODE parameter
		- Unless a HINT or OPTIMIZER_GOAL has been specified the 
		  OPTIMIZER_MODE is used. If this is set to CHOOSE then 
		  ALL_ROWS is used if ANY table in the query is analyzed. 
			  

	PLSQL blocks within stored procedures should use explicit HINTs
	to determine the actual optimization goal.

  Summary Optimizer Mode:
  ~~~~~~~~~~~~~~~~~~~~~~~
	Assuming no tables have a parallel degree greater than 1 then the
	following table shows the optimizer used for various scenarios:

	----------+---------------+---------------+---------------+------------
	Hint      | A table has   |Optimizer_Goal |Optimizer_Mode |Actual Mode
	*         | Statistics    |               |               |
	----------+---------------+---------------+---------------+------------
	RULE       Irrelevant	   Irrelevant	   Irrelevant	   RULE
	ALL_ROWS   Irrelevant	   Irrelevant	   Irrelevant	   ALL_ROWS
	FIRST_ROWS Irrelevant      Irrelevant      Irrelevant      FIRST_ROWS
	None	   Irrelevant	   RULE		   Irrelevant      RULE
	Other	   Irrelevant	   RULE		   Irrelevant      ALL_ROWS
	None	   Irrelevant	   ALL_ROWS	   Irrelevant      ALL_ROWS
	Other	   Irrelevant	   ALL_ROWS	   Irrelevant      ALL_ROWS
	None	   Irrelevant	   FIRST_ROWS	   Irrelevant      FIRST_ROWS
	Other	   Irrelevant	   FIRST_ROWS	   Irrelevant      FIRST_ROWS
	None	   Irrelevant	   Not Set	   RULE		   RULE
	Other	   Irrelevant	   Not Set	   RULE	   	   ALL_ROWS
	None	   Irrelevant	   Not Set	   ALL_ROWS	   ALL_ROWS
	Other	   Irrelevant	   Not Set	   ALL_ROWS	   ALL_ROWS
	None	   Irrelevant	   Not Set	   FIRST_ROWS	   FIRST_ROWS
	Other	   Irrelevant	   Not Set	   FIRST_ROWS	   FIRST_ROWS
	None	   No		   Not Set	   CHOOSE	   RULE
	None	   Yes		   Not Set	   CHOOSE	   ALL_ROWS
	Other	   Irrelevant	   Not Set	   CHOOSE	   ALL_ROWS
	----------+---------------+---------------+---------------+------------

	* Other implies a hint other than 'RULE', 'ALL_ROWS' or 'FIRST_ROWS'
	

General Optimizer Notes
~~~~~~~~~~~~~~~~~~~~~~~
    	These are general points to be bourne in mind when looking
	at optimizer issues:

	- ALL_ROWS tends to favour full table scans.

	- FIRST_ROWS tends to favour index access.

	- By default CBO uses 'ALL_ROWS' costing. 

	- CBO does not adjust costs to cater for Parallel Queries until
	  Oracle 7.3.

	- CBO assumes column values are evenly distributed between the
	  highest and lowest values until Oracle 7.3 when histogram statistics 
	  CAN be stored if requested.
	
	- Other things being equal (Eg: Unanalyzed tables) the CBO chooses
	  the driving order of tables from LEFT to RIGHT in the FROM clause.
	  This is OPPOSITE to the RBO which chooses the driving order from 
	  RIGHT to LEFT.

	- CBO uses ANALYZE information in conjunction with current table high
	  water-mark information. Hence the query plan for a statement *CAN* 
	  change over time.

	- Note that TRUNCATE resets the table 'High Water Mark; and can thus
	  affect query plans. DELETE from a table does NOT reset the high
	  water mark.

	- When performing many way joins some join combinations will be 
	  'eliminated' to reduce the overall time spent determining an 
	  execution plan. Generally each join order is compared with the best 
	  so far and obviously sub-optimal plans are eliminated.

Problem SQL Statements
~~~~~~~~~~~~~~~~~~~~~~
	Rather than fighting the CBO if it comes up with a sub-optimal plan 
	the best course of action for any problem SQL statement should be to:

		a) Check you really have ANALYZED the tables involved so the
		   CBO has useful information to work on.

		b) Check the ANALYZED information is accurate. 
		   Eg: Use the COMPUTE option and compare the statistics
		       both before and after a re-analysis of the table/s.

		c) Check if you are referencing dictionary tables. Dictionary
		   tables are NOT analyzed by default so SQL accessing them 
		   can produce very bad plans under CBO.

		d) Determine if you are expecting to use RBO, ALL_ROWS or
		   FIRST_ROWS as your 'OPTIMIZER_GOAL'.

		e) Use hints to help guide the optimization. It is perfectly
		   valid to use 'RULE' as a hint if gives you the best plan.

	If you do have a problem SQL statement and attempts to guide the
	CBO are not yielding results the first action should be to isolate
	the problem statement in a SQLPLUS session in its simplest form.
	It is then possible to use the EXPLAIN command or TKPROF to determine
	what actual execution plan is in use and to try and improve it.

	If things suddenly run well in SQLPLUS then note the common mistakes 
	highlighted in the following sections.

	The EXPLAIN PLAN SQL statement and the TKPROF trace file analysis
	tool are useful to determine the execution plan for a given 
	statement but note these limitations:

	a) They use the CURRENT information to derive an execution plan and
	   do *NOT* necessarily show the plan used originally.
	   Eg: If you have ANALYZEd any table or CREATEd / DROPPed an index
	       the plan may differ.

	       If OPTIMIZER_MODE is different a different plan may be produced.

	b) They do *NOT* know the 'type' of any bind variables and so assume
	   all bind variables to be of character type. Hence you can get 
	   errors or misleading results.
	   Eg: 
		SELECT 1 FROM DUAL WHERE sysdate < :b2 + 1

		this will raise an error if explained as ':b2' is assumed to
		be a character type.

	   To avoid this sort of problem you should enclose any such bind 
	   variables in a relevant type conversion function. 
	   Eg: 
		SELECT 1 FROM dual WHERE sysdate < to_date(:b2) + 1;

	   This is good practice anyway as it avoids implicit type conversions.

Making Comparisons
~~~~~~~~~~~~~~~~~~
    Comparing SQL statements from PLSQL:
	PL/SQL uses 'bind variables' in SQL statements when you reference a 
	PL/SQL variable.  This is very important as it affects the 
	determination of an execution plan as explained in 'Bind Variables' 
	below.
	Eg: In PLSQL the part-statement:

		SELECT ename FROM emp WHERE empno > myempno;

	    'myempno' is a PLSQL variable, so this statement is effectively 
	    equivalent to:

		SELECT ename FROM emp WHERE empno > :bind1;

	    
    Bind Variables:
	It is common to take a problem SQL statement and try it in SQLPLUS
	using literals in place of bind values. In most cases this is an
	invalid comparison.
	Eg: Compare:
		
		SELECT ename FROM emp WHERE empno > 9999;
	    with
		SELECT ename FROM emp WHERE empno > :bind1;
		
	    Assuming the table has been analysed CBO "knows" the HIGH and LOW
	    values of 'EMPNO'. For the first of these statements CBO can 
	    determine a 'selectivity' of the clause 'WHERE empno > 9999'
	    but for the second it does not know if ':bind1' is '1' or '9999'
	    and thus uses a default selectivity - Eg: It assumes that the
	    clause will return one quarter of all rows in the table. This is 
	    true even when using histograms in Oracle 7.3 as CBO does NOT KNOW
	    the value of the bind variable. Hence the resulting execution plans
	    can be very different.
	
	Generally all RANGE SCANS and LIKE comparisons use default 
	selectivities if you are using bind variables so form an invalid
	comparison. You should actually use a BIND variable within SQLPLUS
	also. 
	Eg:
	    VARIABLE bind1 varchar2(10)
	    EXPLAIN PLAN FOR SELECT ename FROM emp WHERE empno > :bind1;

	as opposed to:
	    EXPLAIN PLAN FOR SELECT ename FROM emp WHERE empno > 9999;
		

    Comparing Databases:
	If you have two separate databases then the base statistics for
	the tables are likely to be different. You *CANNOT* transfer 
	statistics from one database to another. The MAJOR factors 
	affecting database comparisons are:

		- The database block size (DB_BLOCK_SIZE). This is fundamental 
		  to CBO calculations so you cannot compare databases with
		  different values of DB_BLOCK_SIZE.

		- DB_FILE_MULTIBLOCK_READ_COUNT. This init.ora parameter 
		  is used in determining the cost of full table scans in 
		  Oracle 7.1 onwards.

		- Table / Index statistics. Export then import of table / 
		  indexes is likely to result in a different population of
		  table and index blocks. This affects computed OR estimated
		  statistics and thus can affect the CBO execution plan.

	    
    Comparing Versions:
	There are subtle changes to the costing algorithm and the default
	costs between releases of Oracle. Hence execution plans CAN change
	when upgrading. A good example of this is the change to the 
	default selectivity of a LIKE clause with bind variables.

    Comparing to the RBO:
	There is little point comparing an execution plan to the RBO plan
	and being surprised if you get a different execution plan. Hopefully
	it is fairly clear by now that the CBO uses relative 'costs' of 
	execution plans to determine an execution plan. 
	Eg: 	
		For RBO the fact that an INDEX exists is sufficient for it
		to be used (assuming it is relevant)

		For CBO the fact that an INDEX exists is NOT sufficient for 
		it to be used. The "cost" of access to the EXPECTED amount
		of data via the index is compared to the "cost" of alternative 
		access paths (Eg: Full Table Scan).

	If RBO already gives the 'best' plan then you can either use a 'RULE'
	hint or other forms of hint to guide the access path.

	The most common problem with RBO versus CBO comparisons is the non-use
	of an INDEX by CBO.  The main contributors to NOT using an index where 
	it is expected to be used are:

		a) The index is 'not selective'. As CBO computes a cost for
		   access to the EXPECTED amount of data it determines that 
		   the cost of individual index block look-ups will cost more
		   than scanning the entire table which it can do with multi-
		   block reads.

		b) The 'range of values' and hence the number of rows CBO is 
		   EXPECTING to return is innaccurate due to either inaccurate
		   statistics OR the use of BIND values limiting a range.

	Note: If RBO and CBO give physically different RESULTANT DATA this 
	      is serious and should be reported as a bug.

    Comparing PARSE times:
	If the parse time is a significant proportion of the execution time
	it is advisable to check if you are performing a join between many 
	tables. If so it would be wise to use the ORDERED hint to set the 
	order of table access to reduce the number of join options that have 
	to be considered. Once you have a good execution plan you can re-order 
	the FROM clause and use the ordered hint to arrive at the same plan 
	more quickly.
	

Hints
~~~~~
	Hints allow you to provide some input to the CBO as to how to access
	data. Unfortunately invalid hints cause the hint to be ignored with
	no warning so one has to be very careful with the syntax. The main 
	points to remember when using hints are:

	- Hints must start in a comment of the strict format: /*+ ... */

	- All hints (except RULE) cause you to use the CBO. Hence it is not 
	  a good idea to use hints unless the tables are analyzed..

	- Hints must reference the table alias if aliases are in use.
	  Eg: 
	    Wrong:
		SELECT /*+ FULL ( emp ) */ empno FROM emp myalias 
		 WHERE empno > 10;

	    Correct:
		SELECT /*+ FULL ( myalias ) */ empno FROM emp myalias 
		 WHERE empno > 10;

	- Hints should not reference the schema name,
	  Eg: 	SELECT /*+ index(scott.emp emp1) */ ... should not be used.
	  	Alias the tablename instead and use the alias in the hint

	- The access path to be HINTed must be an available access path. 

                For example, if you supply an index hint but the optimizer 
                chooses to drive the query from a full table scan on the table 
                that this index is based on then the indexed column(s) may not 
                be filled at this point (perhaps because they are join 
                predicates that are not satisfied until another table is 
                involved).
 

	    Eg:
		Tables A & B both have indexes on the ind_col column. 
		A hint has been supplied suggesting that these indexes should 
		  be used.
		The value column in both tables is unindexed.

		SELECT /*+ index(A)  index(b) */ * 
		FROM   A,B 
		WHERE  A.ind_col = B.ind_col
		AND    A.value   = 1
		AND    B.value   = 2

		Query Plan
		----------------------------------------------------------------
		SELECT STATEMENT   [CHOOSE] Cost = 83
 		 NESTED LOOPS
 		   TABLE ACCESS FULL A
 		   TABLE ACCESS BY ROWID B
 		     INDEX RANGE SCAN B1

		If we drive the query from A, then the value of A.ind_col at 
		this stage is unknown.
		We could use an index to drive this, but we would have to scan 
		the whole index.
		As the cost of a full scan of the index and subsequent table 
		lookup (for the unindexed columns) scan is prohibitive when 
		compared to a full table scan the optimizer is likely to 
		decide not to take the index hint.
		It can use the index on B as when the access to B is made, the 
		value of ind_col is known.

	- Invalid hints may not be immediately obvious.
	  Eg: FIRST_ROWS as a hint when the statement has an ORDER BY clause.

	- PLSQL V1 (which is the PL/SQL engine linked with most tools) does not 
	  contain all the logic to support CBO.  In particular it does NOT 
	  understand HINTS. However, you CAN embed hints into a VIEW for use 
	  from PLSQL V1 to tackle particular problem statements.
	  Eg: For:
		SELECT /*+ FULL( mytab ) */ col1, col2 
	          FROM mytab
		 WHERE col1 > var1;
 
	    you can create a view:
		CREATE or REPLACE VIEW myview AS
		 SELECT /*+ FULL( mytab ) */ * 
		   FROM mytab;

	    and change the SQL to:

		SELECT col1, col2 FROM myview WHERE col1 > var1;

Summary
~~~~~~~
	To make effective use of the CBO you should:

	- Analyze all tables regularly.

	- Set the required OPTIMIZER_GOAL (FIRST_ROWS or ALL_ROWS).

	- Use hints to help direct the CBO where required.

	- Be careful with the use of bind variables.

	CBO works fairly well for ad-hoc queries. For hard coded, repeated
	SQL statements these should be tuned to obtain a repeatable optimal
	plan. 

	You should ALWAYS monitor performance over time, and should always
	test upgrades of the RDBMS version carefully.