Selecting the right strategy and Analysis
Introduction
Here we present the Query optimization techniques that we followed in our project. We support in all 7 kinds of queries. But we analyzed only 3 of them thoroughly. For the sake of brevity we present only one query of the 3 queries we analyzed completely. For the other two we will only present the time chart showing the statistics that we collected. For the rest of the 4 queries the strategy was chosen depending on our experiences with the 3 which we analyzed.
The query, which is analyzed, is here. Let us name it Q1
Oracle supports cost and rule based optimization. The reference manual of oracle suggests that the cost based optimization should be used. However we made a comparative study of costs of executing the query before arriving at the right strategy for executing the query. Here we present our analysis. The time required for query to execute was determined and the appropriate strategy was chosen. We have chosen only one query and performed its exhaustive analysis. The analysis for all other queries follows the same pattern and hence is not discussed.
We have categorized what we did in 3 sections
Rule based optimization (Without Indexes)
Rule based optimization (With Indexes)
Hitting two birds with one
stone! (Generating Indexes efficiently)
This is the default strategy chosen by the optimizer. However if this
is not set you can set the optimizer mode by issuing the command
ALTER SESSION SET OPTIMIZER_MODE = CHOOSE
>
Attribute | Relation | Number of buckets |
TID | Part | 50 |
TID | Parttype | 50 |
SID | company | 75 (Default) |
The number of bucket chosen is more for the SID attribute of the company relation as the data distribution is highly skewed for it.
The command used was
ANALYZE TABLE parttype ESTIMATE STATISTICS FOR COLUMNS TID SIZE 50
ANALYZE TABLE Company ESTIMATE STATISTICS FOR COLUMNS SID SIZE 75
Query | Time (In seconds) |
Q1 | 3.50 |
Q1 | 3.32 |
Q1 | 3.44 |
Q1 | 3.98 |
Q1 | 3.32 |
Fig 1: The time required for Query Q1 to run ( Cost based optimization
mode )
2. RULE based optimization (Without INDEXES)
This is not the default mode in which the optimizer operates so we need
to change to this mode using the command
USE_NL
USE_MERGE
USE_HASH
USE_NL
This hint specifies that the optimizer uses the Nested loop join method. You can specify the inner table i.e. the one that will be used in the inner loop. Thus the other tables become the driving table. We choose the company relation as the inner table as it has the largest number of tuples. The SQL statement is hence
Query | Time (in seconds) |
Q1 | 6.56 |
Q1 | 6.42 |
Q1 | 6.12 |
Q1 | 6.87 |
Q1 | 6.11 |
Fig 2: The time required for Query Q1 to run ( Rule based optimization
mode with Nested Loop)
USE_MERGE
This hint is for using the sort merge joining method. You can specify
the table, which is to be joined to the row source resulting from joining
the previous tables in the join order using a sort merge join.
Query | Time(in seconds) |
Q1 | 6.00 |
Q1 | 6.21 |
Q1 | 6.34 |
Q1 | 6.45 |
Q1 | 6.55 |
Fig 3: The time required for Query Q1 to run ( Rule based
optimization
mode Merge joins )
USE_HASH
This hint is for using the hash merge joining method. You can specify
the table, which is to be joined to the row source resulting from joining
the previous tables in the join order using a hash merge join.
Query | Time (in seconds) |
Q1 | 6.07 |
Q1 | 6.11 |
Q1 | 6.24 |
Q1 | 6.33 |
Q1 | 6.43 |
Fig 4: The time required for Query Q1 to run (Rule based
optimization
mode Hash merge)
3. RULE based
optimization
(With INDEXES)
Change to the rule based mode using the command.
ALTER SESSION SET OPTIMIZER_MODE = RULE
Here there are many strategies we can choose from. We can supply the optimizer with hints for choosing the method for performing the join operation. There are 3 options available. Each one of them was analyzed in detail for the Query that we presented.
USE_NL
USE_MERGE
USE_HASH
The SQL query is the same as the one specified for the RULE based
method
without indexes and hence is not repeated here. However the timing
statistics
for each method is given below in the figure.
Query | Time(in seconds) |
Q1 | 4.00 |
Q1 | 4.21 |
Q1 | 4.34 |
Q1 | 4.45 |
Q1 | 4.55 |
Fig 5: The time required for Query Q1 to run (Rule based
optimization
mode NL (With indexe))
Query | Time(in seconds) |
Q1 | 3.73 |
Q1 | 4.11 |
Q1 | 4.245 |
Q1 | 4.34 |
Q1 | 4.123 |
Fig 6: The time required for Query Q1 to run (Rule based mode with
USE_MERGE (With indexes))
Query | Time(in seconds) |
Q1 | 4.00 |
Q1 | 4.21 |
Q1 | 4.34 |
Q1 | 4.45 |
Q1 | 4.55 |
Fig 7: The time required executing Q1 to run (Rule based mode
USE_HASH
(With indexes))
Hitting two birds with
one stone!
We choose to create an Index on the Join attributes. This is because we wanted the join output very fast. No Index was created on the part table as the performance (response time) of the Query degraded very badly when an index was created. Hence the only two indexes created were on columns TID of parttype and SID of customer. The DDL statements used were as follows.
CREATE INDEX comp_index ON Company (ID) COMPUTE STATISTICS
This enables us to collect statistics at relatively little cost during
the creation of an index. These statistics are stored in the data
dictionary
for ongoing use by the optimizer in choosing a plan of execution for SQL
statements. This improves the response time drastically. We know that
using
compute statistics is very expensive but using it while the index is being
created is the right step as we are eventually going to use cost based
optimization.
The other DDL statement was
CREATE INDEX ptype_index ON parttype (TID) NOSORT
The TID attribute is already sorted and hence we use the NOSORT option.
This allows us to tell oracle do not sort the index after creating it,
it is already ready in a sorted manner for your use. This index is created
immediately after the initial load of rows into the table parttype. This
allows us to save sort time and space.