sos

Selecting the right strategy and Analysis

Introduction

Here we present the Query optimization techniques that we followed in our project. We support in all 7 kinds of queries. But we analyzed only 3 of them thoroughly. For the sake of brevity we present only one query of the 3 queries we analyzed completely. For the other two we will only present the time chart showing the statistics that we collected. For the rest of the 4 queries the strategy was chosen depending on our experiences with the 3 which we analyzed.

The query, which is analyzed, is here. Let us name it Q1

Select Sname, Saddress
>From parttype, company, part
Where (
Part. pcost < 100 AND
Part. pname = knob
Part. TID = parttype. TID AND
Parttype. SID = company. SID)
Analysis

Oracle supports cost and rule based optimization. The reference manual of oracle suggests that the cost based optimization should be used. However we made a comparative study of costs of executing the query before arriving at the right strategy for executing the query. Here we present our analysis. The time required for query to execute was determined and the appropriate strategy was chosen. We have chosen only one query and performed its exhaustive analysis. The analysis for all other queries follows the same pattern and hence is not discussed.

We have categorized what we did in 3 sections

Cost based optimization

Rule based optimization (Without Indexes)

Rule based optimization (With Indexes)

Hitting two birds with one stone! (Generating Indexes efficiently)

1. Cost based optimization

This is the default strategy chosen by the optimizer. However if this is not set you can set the optimizer mode by issuing the command

ALTER SESSION SET OPTIMIZER_MODE = CHOOSE
>

Transfer interrupted!

st-based optimization you have to generate the statistics for the tables accessed by the statement. For this we used histograms. We initially choose the bucket size 50 and later increased it to 70. Histograms are chosen the JOIN attributes of the three relations. Thus the histograms were generated on

Attribute	Relation	Number of buckets
TID	Part	50
TID	Parttype	50
SID	company	75 (Default)

The number of bucket chosen is more for the SID attribute of the company relation as the data distribution is highly skewed for it.

The command used was

ANALYZE TABLE part COMPUTE STATISTICS FOR COLUMNS TID SIZE 50

ANALYZE TABLE parttype ESTIMATE STATISTICS FOR COLUMNS TID SIZE 50

ANALYZE TABLE Company ESTIMATE STATISTICS FOR COLUMNS SID SIZE 75

ESTIMATE STATISTICS is used because COMPUTE STATISTICS makes a complete table scan and hence is not very efficient. The time that it took to execute the above query using the cost based optimization technique is given below. We have performed only 5 runs of the SQL query. But we believe that this is enough and is close to the optimal measurement.
To set the timer on use the command SET TIME ON. This displays the time by the SQL prompt.

Query	Time (In seconds)
Q1	3.50
Q1	3.32
Q1	3.44
Q1	3.98
Q1	3.32

Fig 1: The time required for Query Q1 to run ( Cost based optimization mode )

2. RULE based optimization (Without INDEXES)

This is not the default mode in which the optimizer operates so we need to change to this mode using the command

ALTER SESSION SET OPTIMIZER_MODE = RULE
Here there are many strategies we can choose from. We can supply the optimizer with hints for choosing the method for performing the join operation. There are 3 options available. Each one of them was analyzed in detail for the Query that we presented.

USE_NL
USE_MERGE
USE_HASH

USE_NL

This hint specifies that the optimizer uses the Nested loop join method. You can specify the inner table i.e. the one that will be used in the inner loop. Thus the other tables become the driving table. We choose the company relation as the inner table as it has the largest number of tuples. The SQL statement is hence

Select /*+ USE_NL (company) */
Sname, Saddress
>From parttype, company, part
Where (
Part. Pcost < 100 AND
Part. Pname = knob
Part. TID = parttype. TID AND
Parttype. SID = company. SID)
For the timing statistics refer to the table below

Query	Time (in seconds)
Q1	6.56
Q1	6.42
Q1	6.12
Q1	6.87
Q1	6.11

Fig 2: The time required for Query Q1 to run ( Rule based optimization mode with Nested Loop)

USE_MERGE

This hint is for using the sort merge joining method. You can specify the table, which is to be joined to the row source resulting from joining the previous tables in the join order using a sort merge join.

Select /*+ USE_MERGE (parttype) */
Sname, Saddress
>From parttype, company, part
Where (
Part. Pcost < 100 AND
Part. Pname = knob
Part. TID = parttype. TID AND
Parttype. SID = company. SID)
The sort merge join produces the best results if the tables are of approximately the same sizes. Hence we choose to apply the parttype table after the tablepart and customers were joined. Actually we tried using company first. It took a long time to get the results. Using part was a total catastrophe it took way too much time. Then finally we settled with the parttype table. This gave reasonably quick response time. For the timing statistics refer to the figure.

Query	Time(in seconds)
Q1	6.00
Q1	6.21
Q1	6.34
Q1	6.45
Q1	6.55

Fig 3: The time required for Query Q1 to run ( Rule based optimization mode Merge joins )

USE_HASH

This hint is for using the hash merge joining method. You can specify the table, which is to be joined to the row source resulting from joining the previous tables in the join order using a hash merge join.

Query	Time (in seconds)
Q1	6.07
Q1	6.11
Q1	6.24
Q1	6.33
Q1	6.43

Fig 4: The time required for Query Q1 to run (Rule based optimization mode Hash merge)

3. RULE based optimization (With INDEXES)

Change to the rule based mode using the command.

ALTER SESSION SET OPTIMIZER_MODE = RULE

Here there are many strategies we can choose from. We can supply the optimizer with hints for choosing the method for performing the join operation. There are 3 options available. Each one of them was analyzed in detail for the Query that we presented.

USE_NL
USE_MERGE
USE_HASH

The SQL query is the same as the one specified for the RULE based method without indexes and hence is not repeated here. However the timing statistics for each method is given below in the figure.

Query Time(in seconds)

Q1 4.00

Q1 4.21

Q1 4.34

Q1 4.45

Q1 4.55

Fig 5: The time required for Query Q1 to run (Rule based optimization mode NL (With indexe))

Query Time(in seconds)

Q1 3.73

Q1 4.11

Q1 4.245

Q1 4.34

Q1 4.123

Fig 6: The time required for Query Q1 to run (Rule based mode with USE_MERGE (With indexes))

Query Time(in seconds)

Q1 4.00

Q1 4.21

Q1 4.34

Q1 4.45

Q1 4.55

Fig 7: The time required executing Q1 to run (Rule based mode USE_HASH (With indexes))

Hitting two birds with one stone!

We choose to create an Index on the Join attributes. This is because we wanted the join output very fast. No Index was created on the part table as the performance (response time) of the Query degraded very badly when an index was created. Hence the only two indexes created were on columns TID of parttype and SID of customer. The DDL statements used were as follows.

CREATE INDEX comp_index ON Company (ID) COMPUTE STATISTICS

This enables us to collect statistics at relatively little cost during the creation of an index. These statistics are stored in the data dictionary for ongoing use by the optimizer in choosing a plan of execution for SQL statements. This improves the response time drastically. We know that using compute statistics is very expensive but using it while the index is being created is the right step as we are eventually going to use cost based optimization.

The other DDL statement was

CREATE INDEX ptype_index ON parttype (TID) NOSORT

The TID attribute is already sorted and hence we use the NOSORT option. This allows us to tell oracle do not sort the index after creating it, it is already ready in a sorted manner for your use. This index is created immediately after the initial load of rows into the table parttype. This allows us to save sort time and space.

Back Next