Analyzing the Information
Analyzing the Information
Statistics Timing
If you are using statistics timing, RUNTIMESTATISTICS provides information about how long each stage of the statement took. An SQL statement has two basic stages within Derby: compilation and execution. Compilation is the work done while the statement is prepared. Compilation is composed of the following stages: parsing, binding, optimization, and code generation. Execution is the actual evaluation of the statement.
Statement Execution Plan
RUNTIMESTATISTICS also provides information about the statement execution plan. The statement execution plan shows how long each node took to evaluate, how many rows were retrieved, whether an index was used, and so on. If an index was used, it shows the start and stop positions for the matching index scan. Looking at the plan can help you determine whether to add an index or to rewrite the query.
A statement execution plan is composed of a tree of result set nodes. A result set node represents the evaluation of one portion of the statement; it returns rows to a calling (or parent) node and can receive rows from a child node. A node can have one or more children. Starting from the top, if a node has children, it requests rows from the children. Usually only the execution plans of DML statements (queries, inserts, updates, and deletes, not dictionary object creation) are composed of more than one node.
For example, consider the following query:
SELECT * FROM Countries
This simple query involves one node only--reading all the data out of the Countries table. It involves a single node with no children. This result set node is called a Table Scan ResultSet. RUNTIMESTATISTICS text for this node looks something like this:
Statement Name: null Statement Text: select * from countries Parse Time: 20 Bind Time: 10 Optimize Time: 50 Generate Time: 20 Compile Time: 100 Execute Time: 10 Begin Compilation Timestamp : 2004-05-25 09:16:21.24 End Compilation Timestamp : 2004-05-25 09:16:21.34 Begin Execution Timestamp : 2004-05-25 09:16:21.35 End Execution Timestamp : 2004-05-25 09:16:21.4 Statement Execution Plan Text: Table Scan ResultSet for COUNTRIES at read committed isolation level using instntaneous share row locking chosen by the optimizer Number of opens = 1 Rows seen = 114 Rows filtered = 0 Fetch Size = 16 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 10 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched=All Number of columns fetched=3 Number of pages visited=3 Number of rows qualified=114 Number of rows visited=114 Scan type=heap start position: null stop position: null qualifiers: None optimizer estimated row count: 119.00 optimizer estimated cost: 69.35
Consider this second, more complex query:
SELECT Country FROM Countries WHERE Region = 'Central America'
When executed, this query involves two nodes--one to retrieve qualifying rows (the restriction is done at this node) and one to project the requested columns. So, at bottom, there is a TableScanResultSet for scanning the table. The qualifier (Region = 'Central America') is evaluated in this node. These data are passed up to the parent node, called a Project-Restrict ResultSet, in which the rows are projected--only the country column is needed (the first column in the table). RUNTIMESTATISTICS text for these two nodes looks something like this:
Statement Name: null Statement Text: SELECT Country FROM Countries WHERE Region = 'Central America' Parse Time: 10 Bind Time: 0 Optimize Time: 370 Generate Time: 10 Compile Time: 390 Execute Time: 0 Begin Compilation Timestamp : 2004-05-25 09:20:41.274 End Compilation Timestamp : 2004-05-25 09:20:41.664 Begin Execution Timestamp : 2004-05-25 09:20:41.674 End Execution Timestamp : 2004-05-25 09:20:41.674 Statement Execution Plan Text: Project-Restrict ResultSet (2): Number of opens = 1 Rows seen = 6 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 11.90 optimizer estimated cost: 69.35 Source result set: Table Scan ResultSet for COUNTRIES at read committed isolation level using instantaneous share row locking chosen by the optimizer Number of opens = 1 Rows seen = 6 Rows filtered = 0 Fetch Size = 16 constructor time (milliseconds) = 0 open time (milliseconds) = 10 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched={0, 2} Number of columns fetched=2 Number of pages visited=3 Number of rows qualified=6 Number of rows visited=114 Scan type=heap start position: null stop position: null qualifiers: Column[0][0] Id: 2 Operator: = Ordered nulls: false Unknown return value: false Negate comparison result: false optimizer estimated row count: 11.90 optimizer estimated cost: 69.35
Other, more complex queries such as joins and unions have other types of result set nodes.
For inserts, updates, and deletes, rows flow out of the top, where they are inserted, updated, or deleted. For selects (queries), rows flow out of the top into a result set that is returned to the user.
The Derby Reference Manual shows the many possible ResultSet nodes that might appear in an execution plan.
In addition, read "DML statements and performance", for more information about some of the ways in which Derby executes statements.
Optimizer estimates
RUNTIMESTATISTICS show the optimizer estimates for a particular node. They show the optimizer's estimated row count and the optimizer's "estimated cost."
The estimated row count is the query optimizer's estimate of the number of qualifying rows for the table or index for the entire life of the query. If the table is the inner table of a join, the estimated row count will be for all the scans of the table, not just for a single scan of the table.
The estimated cost consists of a number, which is a relative number; it does not correspond directly to any time estimate. It is not, for example, the number of milliseconds or rows. Instead, the optimizer constructs this number for each possible access path. It compares the numbers and chooses the access path with the smallest number.
Previous Page
Next Page
Table of Contents
Index