Document ID:        33838.1
Subject:            Determining the execution plan for a distributed query
Last Revision Date: 06 September 1996
Author:             rbeard


Introduction
------------

This bulletin discusses how to determine the execution plan for a
distributed query.  Information in this bulletin is only relevant to Oracle7, 
version 7.2 or earlier. The functionality documented may change with 
later versions of Oracle.

How Oracle Processes Queries
----------------------------

Before processing any query, Oracle uses the optimizer to determine the best
execution plan. Oracle7 posesses two optimizers, the cost based optimizer and 
the older rule based optimizer. It is recommended that distributed queries are 
run using the cost based optimizer as this contains added functionality
which means that it processes distributed queries more efficiently.

A distributed query is a sql statement where one or more objects are 
accessed remotely using a database link. These remote objects can be any 
valid object eg. tables or views.

For further information about some of the commands and utilities referenced 
in this article, see the following manuals:

Oracle7 Server Application Developers Guide 
        Appendix B      Performance Diagnostic Tools
Oracle 7.2 Tuning Guide
        Appendix A      Performance Diagnostic Tools

How To Determine A Query's Execution Plan
-----------------------------------------

The EXPLAIN PLAN or TKPROF command can be used to determine a query's 
execution plan. In both cases, an intermediate table is populated with 
the query's execution plan.  

Using EXPLAIN PLAN, this table is called PLAN_TABLE and must be created 
in your schema before running the command.  'Explaining' a statement 
loads rows into the plan table, which can be queried later. Once you 
have finished examining the plan a query has used, the rows 
must be deleted manually. To create the PLAN_TABLE in your schema, run 
the sql script UTLXPLAN.SQL.

Using TKPROF, the table is called PROF$PLAN_TABLE, and is created as a 
temporary table, populated, the output produced and then the table dropped. 
The format of the output created using TKPROF cannot be changed by the 
user, so for the purposes of determining the sql statements being sent
to remote database(s), the EXPLAIN PLAN command should be used.

How To Determine The SQL Statement(s) Sent To The Remote Database
-----------------------------------------------------------------

The PLAN_TABLE has a column OTHER, which for a specific execution step
contains additional information that a user may find useful. In the case,
where the OPERATION is REMOTE, the OTHER column contains the text of the SQL
statement sent to the remote database.

The following is an example script that could be used to format the
output from the PLAN_TABLE:

REM remote_plan.sql

set long 2000
set arraysize 1

col operation format a22
col options format a8
col object_name format a10
col object_node format a5
col other format a20
col position format 99999
col optimizer format a10

select lpad(' ',2*(level-1))||operation operation,options,object_name,
optimizer,object_node,other
from plan_table
start with id=0 and statement_id='A'
connect by prior id=parent_id and statement_id='A';
set echo on

Note: Be careful because sometimes the output from the OTHER column is 
truncated.

How to Determine the Execution Plan on the Remote Database
----------------------------------------------------------

As stated above, the two methods of determining the execution plan for
a sql statement are TKPROF and EXPLAIN PLAN. In the case of the remote 
query, we need to trace the access path of the sql sent to the remote 
database by producing a trace file and using TKPROF to format it. The 
method of producing the trace file is dependant on the Oracle7 release.

Prior to Oracle 7.1, to produce a trace file on the remote database, sql 
trace must be enabled at the instance level by setting SQL_TRACE=TRUE. The 
resulting trace file(s) formatted using TKPROF with the EXPLAIN option.

From Oracle 7.1, a new package is available which allows a user to enable
sql trace on the remote database from the local database. The package 
is called:

             dbms_session.set_sql_trace(sql_trace boolean)

To enable sql trace on the remote database from within a local session,
a remote procedure should be created which calls dbms_session.set_sql_trace. 
From the local database, the remote procedure is executed, which opens the 
database link and enables sql trace. To enable sql trace on the remote 
database from the local database, the following commands can be followed:

On the local database:

1.     Create a database link to remote database:

        SQLPLUS <localuser>/<localpassword>

        CREATE DATABASE LINK <mydblink>
        CONNECT TO <remoteuser> 
        IDENTIFIED BY <remotepassword>
        USING '<connect string>';

On the remote database:

1.      Grant 'alter session' privilege to the <remoteuser>.  From any 
        DBA user account:

        GRANT ALTER SESSION TO <remoteuser>;

2.      Create the procedure to execute dbms_session under the schema 
        defined by <remoteuser>:

        CONNECT <remoteuser>/<remotepassword> 

        CREATE OR REPLACE PROCEDURE set_trace
        as
        begin
           dbms_session.set_sql_trace(TRUE);
        end;
        /

3.      Validate that the procedure executes correctly:

        set serveroutput on
        EXECUTE set_trace

On the local database:

1.      Execute the procedure on the remote database:

        SQLPLUS <localuser>/<localpassword>

        EXECUTE set_trace@<mydblink>

2.      Execute distributed query

        SELECT e.empno, e.ename, d.dname
        FROM emp e,dept@<mydblink> d
        WHERE d.deptno = e.deptno
        AND e.empno = 7934;

On the remote database:

1.      Determine location of trace files (normally identified by the 
        initialization parameter user_dump_dest). Trace files will normally 
        be named <sid>_ora_<pid>.trc where <sid> is the value of ORACLE_SID and 
        <pid> is the process id of the open dblink.

2.      Use TKPROF with the EXPLAIN option to format the trace file:

        TKPROF <sid>_ora_<pid>.trc explain.out 
                         explain=<remoteuser>/<remotepassword>

        This will produce a formatted trace file called explain.out 

Alternatively sql trace can be enabled on the remote database by setting
the parameter SQL_TRACE=TRUE in the initialisation parameter file. Restart 
the database with this parameter, execute the distributed query. A trace
file will be produced in the normal trace file directory. Format using
TKPROF with the EXPLAIN option. The resulting output file will contain 
the execution path taken by the remote query. But remember that setting
SQL_TRACE=TRUE at the instance level means that every session will produce
a trace file.

From Oracle 7.2, a second package is available which can be used to
enable sql trace on the remote database. This package must be issued on
the remote database having first opened a session from the local database.
The package is called:
 
       dbms_system.set_sql_trace_in_session(sid number, serial# number,
                                     sql_trace boolean)

The default installation of the package DBMS_SYSTEM only allows the user SYS
to execute the package. To enable another user to execute the package, SYS 
must grant execute permission on the package to the other user. 

On the remote database:

1.      From sqlplus as the user SYS:

        GRANT EXECUTE ON dbms_system TO <remoteuser>;

To initiate sql trace on the remote database, the following commands can 
be used:

On the local database:

1.     Create a database link to remote database:

        SQLPLUS <localuser>/<localpassword>

        CREATE DATABASE LINK <mydblink>
        CONNECT TO <remoteuser> 
        IDENTIFIED BY <remotepassword>
        USING '<connect string>';

2.      Open the dblink by using it within a query:

        SELECT * FROM <remote table>@<mydblink>;

On the remote database:

1.      Determine the sid, serial# of the open dblink.  Issue the 
        following query from within sqlplus as a dba user:

        SQLPLUS <dbauser>/<dbapassword>

        COLUMN machine FORMAT a15
        SELECT sid, serial#, username, machine
        FROM v$session
        WHERE machine='<node name of local machine>';

        The USERNAME returned should be the name of the <localuser>.

2.      Using the sid, serial# from the above query:    

        SQLPLUS <dba user>/<dba user password>
        
        EXECUTE sys.dbms_system.set_sql_trace_in_session(<sid>,<serial#>,TRUE)

On the local database:

1.     Execute distributed query

On the remote database:

1.     Determine location of trace files (normally identified by the 
       initialization parameter user_dump_dest). Trace files will normally 
       be named ora_<sid>.trc.
        
2.     Use TKPROF with the EXPLAIN option to format the trace file:

       tkprof ora_<sid>.trc explain.out explain=<remoteuser>/<remotepassword>
 
       This will produce a formatted trace file called explain.out 

How Oracle Processes Distributed Queries
----------------------------------------

A distributed query is a query that references one or more objects on 
one or more remote databases. 

If, within a query, all the objects are remote and use the same database
link, then oracle will attempt to send the complete query to the remote
database for processing. 

If, within a query, some of the objects are remote and some are local, then 
oracle will break up the query into individual sql statements and pass sql to 
the remote database for each remote onject. Relevant predicates are sent where 
possible. Remote joins are not performed by default. 
If you want to join the remote tables, the best way to achieve this is by 
building a view of these tables on the remote site. This view can then be 
referenced by your query. Note that the predicates do not have to be created 
within the view as they should be passed within the sql statement sent from the 
local to remote node.

Examples
--------
The following examples use tables which can be created by running the
supplied script DEMOBLD.SQL. In addition a 3rd table, LOCATION is used. 
To create this table, run the following from sqlplus:

      CREATE TABLE location
      (loc)
      AS
      SELECT loc FROM dept;

Example 1 - select using 2 remote tables and 1 local table
---------
         
         EXPLAIN PLAN
         SET statement_id='A'
         FOR
         SELECT e.empno, e.ename, d.dname, l.loc, d.deptno
         FROM   emp@<mydblink> e, dept@<mydblink> d, location l
         WHERE  d.deptno = e.deptno
         AND    d.loc = l.loc
         AND    e.empno = 7934
         /

After running remote_plan.sql, the formatted explain plan output is as follows:

OPERATION              OPTIONS  OBJECT_NAM OPTIMIZER  OBJEC OTHER
---------------------- -------- ---------- ---------- ----- --------------------
SELECT STATEMENT                           CHOOSE
  NESTED LOOPS
    NESTED LOOPS
      REMOTE                                          ROSE. SELECT "EMPNO","ENAM
                                                      WORLD E","DEPTNO" FROM 
                                                            "EMP" E WHERE 
                                                            "EMPNO"=7934

      REMOTE                                          ROSE. SELECT "DEPTNO","DNA
                                                      WORLD ME","LOC" FROM "DEPT
                                                            " D WHERE "DEPTNO"=:
                                                            1

    TABLE ACCESS       FULL     LOCATION   ANALYZED

With remote tables, the row with OPERATION='REMOTE' stores the  sql sent the 
remote node in the OTHER column.

Example 2 - select using 1 remote view and 1 local table
---------

          Where the remote view is:

          CREATE OR REPLACE VIEW remote_view 
          (empno, ename, dname, loc)
          AS
          SELECT e.empno, e.ename, d.dname, d.loc
          FROM emp e, dept d
          WHERE e.deptno = d.deptno;

         EXPLAIN PLAN
         SET statement_id='A'
         FOR
         SELECT rv.empno, rv.ename, rv.dname, l.loc
         FROM   remote_view@<mydblink> rv, location l 
         WHERE  rv.loc = l.loc
         AND    rv.empno = 7934
         /

The explain plan output for this query is as follows:

OPERATION              OPTIONS  OBJECT_NAM OPTIMIZER  OBJEC OTHER
---------------------- -------- ---------- ---------- ----- --------------------
SELECT STATEMENT                           CHOOSE
  NESTED LOOPS
    REMOTE                                            ROSE. SELECT "EMPNO","ENAM
                                                      WORLD E","DNAME","LOC" FRO
                                                            M "REMOTE_VIEW" RV W
                                                            HERE "EMPNO"=7934

    TABLE ACCESS       FULL     LOCATION   ANALYZED

From this you can see that for the OPERATION='REMOTE' row, the OTHER column
shows the additional where predicate (ie. 'WHERE empno=7934') that has been 
sent to the remote site.

Example 3 - All remote tables
---------
   
         EXPLAIN PLAN
         SET statement_id='A'
         FOR
         SELECT e.empno, e.ename, d.dname, l.loc, d.deptno
         FROM   emp@<mydblink> e, dept@<mydblink> d, location@<mydblink> l
         WHERE  d.deptno = e.deptno
         AND    d.loc = l.loc
         AND    e.empno = 7934
         /

produced the following explain plan output:

OPERATION              OPTIONS  OBJECT_NAM OPTIMIZER  OBJEC OTHER
---------------------- -------- ---------- ---------- ----- --------------------
SELECT STATEMENT       REMOTE              CHOOSE
  NESTED LOOPS
    NESTED LOOPS
      TABLE ACCESS     FULL     EMP        ANALYZED   V722.
                                                      WORLD

      TABLE ACCESS     FULL     DEPT       ANALYZED   V722.
                                                      WORLD

    TABLE ACCESS       FULL     LOCATION              V722.

With all tables located on the remote site, the OPTIONS column is set to REMOTE 
rather 
then the OPERATION column. In this case the complete query was sent to the 
remote 
node and processed remotely.

On the remote node, from the tkprof output:

SELECT A3."EMPNO",A3."ENAME",A2."DNAME",A1."LOC",A2."DEPTNO"
FROM "EMP" A3,"DEPT" A2,"LOCATION" A1 
WHERE A2."DEPTNO"=A3."DEPTNO" 
AND A2."LOC"= A1."LOC" 
AND A3."EMPNO"=7934

and the execution plan from tkprof:

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   HINT: CHOOSE
      0   NESTED LOOPS
      0    NESTED LOOPS
      0     TABLE ACCESS   HINT: ANALYZED (FULL) OF 'EMP'
      0     TABLE ACCESS   HINT: ANALYZED (FULL) OF 'DEPT'
      0    TABLE ACCESS (FULL) OF 'LOCATION'