Home > Technical Articles > Oracle Optimizer and Nested tables, Varrays

Oracle Optimizer and Nested tables, Varrays

by Paul Tabashov


With introduction of object relational features in Oracle 8 many new types of objects and collection are used today in many applications. The collection types such as nested tables and varrays can even be used in place of tables in select statements, like this:

SQL> CREATE OR REPLACE TYPE coll_test_type is TABLE of NUMBER;
2 /

Type created.

SQL>
SQL> select * from table(coll_test_type(1,2,3));

COLUMN_VALUE
------------
           1
           2
           3

This led to the technique where in some applications some session-specific data is actually stored in the collections that are declared in the packages. It can be something like the list of the work orders that user works right now or some other information that is needed only for the lifetime of the session.
By this the developers achieve some improvements in speed of accessing the session specific data, but does the optimizer know how to handle these new structures? This is what I want to tell you about.
First, let's look at how the plan for the statement we executed above looks like:

SQL> select * from table(coll_test_type(1,2,3));

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE
1 0    COLLECTION ITERATOR (CONSTRUCTOR FETCH)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          0 consistent gets
          0 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        503 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          3 rows processed


As you see from the plan, there are no logical reads at all. No surprise, we select from a collection which is not a database object at this point, but a memory object. This is where the improvements in speed might come from. But on the other hand, how does the optimizer get any statistics about what's in this collection? And the answer is that it doesn't have information on what's there and so it just assumes some defaults.

Let's model a simple situation where we have 2 table and we want to get some data from them and also filter it on the contents of the collection that we have in memory. Below is the creation of the tables:

create table table1 (id1 number, id2 number, data varchar2(1000));

declare
i number;
begin
for i in 1..50000 loop
insert into table1 values(round(i/7), round(i/100), 'data '||mod(i,1000));
end loop;
end;
/

create index idx_tab1_id1 on table1(id1);
create index idx_tab1_id2 on table1(id2);

create table table2 (id2 number, data2 varchar2(1000));
declare
i number;
begin
for i in 1..2000 loop
insert into table2 values( mod(i,100), 'data '||mod(i,100));
end loop;
end;
/
commit;

create index idx_tab2_id2 on table2(id2);

execute dbms_stats.gather_table_stats(user,'TABLE1',cascade=>true);
execute dbms_stats.gather_table_stats(user,'TABLE2',cascade=>true)


As you see, table1 has 50,000 records and table2 has 2,000 records.
Let's try to execute the following statement:

SQL> set autotrace traceonly
SQL> SELECT a.data, b.data2
2 FROM table1 a,
3 table2 b,
4 (SELECT /*+ NO_MERGE */ column_value
5 FROM TABLE (CAST (coll_test_type (1, 2, 3) AS coll_test_type))) c
6 WHERE a.id1 = c.column_value and a.id2=b.id2;

420 rows selected.

Execution Plan
----------------------------------------------------------
0   SELECT STATEMENT Optimizer=CHOOSE (Cost=110 Card=113770 Bytes=4437030)

1 0    HASH JOIN (Cost=110 Card=113770 Bytes=4437030)
2 1      TABLE ACCESS (FULL) OF 'TABLE2' (Cost=3 Card=2000 Bytes=20000)
3 1      HASH JOIN (Cost=95 Card=28499 Bytes=826471)
4 3        VIEW (Cost=17 Card=4072 Bytes=52936)
5 4          COLLECTION ITERATOR (CONSTRUCTOR FETCH)
6 3        TABLE ACCESS (FULL) OF 'TABLE1' (Cost=48 Card=50000 Bytes=800000)



Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
        342 consistent gets
          0 physical reads
          0 redo size
       6010 bytes sent via SQL*Net to client
        800 bytes received via SQL*Net from client
         29 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
        420 rows processed


Here we just join our two tables and inline view selecting from our collection. I used NO_MERGE hint so that optimizer processed our subselect as inline view - this way we can see what are the estimates of cardinality and cost.
As you see from the plan, the cardinality that optimizer assumes for our collection is 4072, which is not even near what we have - 3 rows. This assumption lead optimizer to use hash join for table1 and our collection. It results in 342 logical reads. Let's try to hint optimizer to use nested loops instead:

SELECT /*+ ORDERED_PREDICATES USE_NL(c,a) */ a.data, b.data2
2 FROM table1 a,
3 table2 b,
4 (SELECT /*+ NO_MERGE */ column_value
5 FROM TABLE (CAST (coll_test_type (1, 2, 3) AS coll_test_type))) c
6 WHERE a.id1 = c.column_value and a.id2=b.id2;

420 rows selected.

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=8176 Card=113770 Bytes=4437030)

1 0    HASH JOIN (Cost=8176 Card=113770 Bytes=4437030)
2 1      TABLE ACCESS (FULL) OF 'TABLE2' (Cost=3 Card=2000 Bytes=20000)
3 1      TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=2 Card=7 Bytes=112)
4 3        NESTED LOOPS (Cost=8161 Card=28499 Bytes=826471)
5 4          VIEW (Cost=17 Card=4072 Bytes=52936)
6 5            COLLECTION ITERATOR (CONSTRUCTOR FETCH)
7 4            INDEX (RANGE SCAN) OF 'IDX_TAB1_ID1' (NON-UNIQUE) (Cost=1 Card=7)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         60 consistent gets
          0 physical reads
          0 redo size
       6010 bytes sent via SQL*Net to client
        800 bytes received via SQL*Net from client
         29 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)


As we can see, the plan with nested loops brings us only 60 logical reads - almost 6 times less than with HASH JOIN.This shows how the optimiz
er assumptions are not always perfect. And the problem is that with these assumption the whole picture can become skewed for optimizer. The same as when the statistics are stale. But the problem is that you can't fix it by recollecting statistics - our collection is a memory object and not a database table. I do not know any methods so far as to how change optimizer's assumptions on the selectivity of collections like this.
What are the possible problems? In our case optimizer can eventually choose to join table1 and table2 first in a hash join. I've seen such thing in real life and it was quite ugly. I did model example above after what I came across. As you see, the join condition that table1 and table2 are joined on is not selective - it's more of reference table lookup join and even worse - there is many-to-many relationship there. So, as our data (table1) table grows, the hash join between then can become worse and will require disk sorts if your hash_area_size is not big enough. Another problem is that you will not see these disk sorts in 'sorts (disk)' statistic. (see my other article about this).
Probably th best thing to do, especially if you know that your select FROM collection will return pretty consistent number of rows, is to use CARDINALITY hint:

SQL> SELECT a.data, b.data2
2 FROM table1 a,
3 table2 b,
4 (SELECT /*+ CARDINALITY(d 3) NO_MERGE */ column_value
5 FROM TABLE (CAST (coll_test_type (1, 2, 3) AS coll_test_type)) d) c
6 WHERE a.id1 = c.column_value and a.id2=b.id2;

420 rows selected.

Execution Plan
----------------------------------------------------------
0    SELECT STATEMENT Optimizer=CHOOSE (Cost=27 Card=84 Bytes=3276)

1 0   HASH JOIN (Cost=27 Card=84 Bytes=3276)
2 1    TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=2 Card=100 Bytes=1600)
3 2      NESTED LOOPS (Cost=23 Card=21 Bytes=609)
4 3        VIEW (Cost=17 Card=3 Bytes=39)
5 4          COLLECTION ITERATOR (CONSTRUCTOR FETCH)
6 3        INDEX (RANGE SCAN) OF 'IDX_TAB1_ID1' (NON-UNIQUE) (Cost=1 Card=7)
7 1    TABLE ACCESS (FULL) OF 'TABLE2' (Cost=3 Card=2000 Bytes=20000)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         35 consistent gets
          0 physical reads
          0 redo size
       9088 bytes sent via SQL*Net to client
        800 bytes received via SQL*Net from client
         29 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
        420 rows processed

The CARDINALITY hints works out even better (this is due to the fact that optimizer chagned the driving table for the last HASH JOIN)!

So what options have we got so far? It's great if you can modify the select statement with hints as i did in the second and third statement. Generally, knowing how big your collection can be, you can do following:
if your collection is usually small (less than 100 rows) and you join it with a table that has a good selectivity on the join column, then you should prefer to have nested loops with your collection as driving table. In our case the id1 field has good selectivity (for each value of id1 there are 7 records).
If your collection can be both large and small in different situations, you better stick with hash join as it will be more scalable when collection grows. Nested loops will become too expensive.
You can control the behavior of optimizer with hints and here are the ones that are useful:

  • ORDERED - forces optimizer to join tables in the order specified in FROM clause
  • ORDERED_PREDICATES - forces optimizer to evaluate predicates in order of appearance
  • USE_NL(table1, table2) - force optimizer to use nested lops when joining table1 and table2. Please note that tablte1 and table2 should be the aliases used in FROM clause. Please see Listing 4 for example
  • USE_HASH(table1,table2) - similar to USE_NL - forces use of HASH JOIN.
  • CARDINALITY(table rows) - tells optimizer how many rows to expect FROM a row source.

But what if you do not have access to SQL code but need to tune statement like above? You might use optimizer stability (stored outlines). You will need to tweak them, but this is topic of another article.

View Responses (685) Post Response

Poll


Did you find this article useful?

Very
So-so
Not at all