Tutorial :Confused about Itzik Ben-Gan's Logical Query Processing order in his SQL Server 2005 book and SQL Server 2008 book



Question:

In the book Inside Microsoft SQL Serverâ„¢ 2005 T-SQL Querying, the author Itzik Ben-Gan tell us the order or SQL Server 2005's logical query processing is:

(8)  SELECT (9) DISTINCT (11) <TOP_specification> <select_list>  (1)  FROM <left_table>  (3)       <join_type> JOIN <right_table>  (2)       ON <join_condition>  (4)  WHERE <where_condition>  (5)  GROUP BY <group_by_list>  (6)  WITH {CUBE | ROLLUP}  (7)  HAVING <having_condition>  (10) ORDER BY <order_by_list>    1.  FROM  2.  ON  3.  OUTER (join)  4.  WHERE  5.  GROUP BY  6.  CUBE | ROLLUP  7.  HAVING  8.  SELECT  9.  DISTINCT  10. ORDER BY  <---------------------- NOTE  11. TOP       <---------------------- NOTE  

In his book Inside Microsoft SQL Server 2008 : T-SQL Querying, he tell us the following logical query processing order:

(1) FROM  (1-J1) Cartesian Product  (1-J2) ON Filter  (1-J3) Add Outer Rows  (2) WHERE  (3) GROUP BY  (4) HAVING  (5) SELECT  (5-1) Evaluate Expressions  (5-2) DISTINCT  (5-3) TOP       <---------------------- NOTE  (6) ORDER BY    <---------------------- NOTE  

Note the order of TOP and ORDER BY in the upper excerpts from these books. They are just opposite. I think these two step is very important, and will give a totally different result with different order. I want to know whether SQL Server 2008 changed something in it's storage engine from SQL Server 2005 or something else reason cause this?

Thanks.


Solution:1

The logical processing order is also documented in this Books Online entry. Be careful to distinguish logical processing order from physical processing order. As the BOL entry notes:

The following steps show the logical processing order, or binding order, for a SELECT statement. This order determines when the objects defined in one step are made available to the clauses in subsequent steps. For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps. Conversely, because the SELECT clause is step 8, any column aliases or derived columns defined in that clause cannot be referenced by preceding clauses. However, they can be referenced by subsequent clauses such as the ORDER BY clause. Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list.

The query optimizer is free to translate the logical requirement specified by the query into any physical execution plan that produces the correct results. Generally, there are many physical alternatives for a given logical query, so it is quite usual for a physical plan to differ fundamentally from the logical processing order (for binding purposes) described above.


Solution:2

Check this out - it's an expose on this issue - and Itzik's book is mentioned. The second order above is correct.


Solution:3

In his 2015 release of T-SQL Querying, Itzik Ben-Gan updated the logical query processing in "LISTING 1-1 Logical query-processing step numbers" as this:

(5) SELECT (5-2) DISTINCT (7) TOP(<top_specification>) (5-1) <select_list>  (1) FROM (1-J) <left_table> <join_type> JOIN <right_table> ON <on_predicate>          | (1-A) <left_table> <apply_type> APPLY <right_input_table> AS <alias>          | (1-P) <left_table> PIVOT(<pivot_specification>) AS <alias>          | (1-U) <left_table> UNPIVOT(<unpivot_specification>) AS <alias>   (2) WHERE <where_predicate>   (3) GROUP BY <group_by_specification>   (4) HAVING <having_predicate>   (6) ORDER BY <order_by_list>   (7) OFFSET <offset_specification> ROWS FETCH NEXT <fetch_specification> ROWS ONLY;  

Unlike the previous 2008 version of the book, ORDER BY (step 6) now occurs before TOP (step 7). Of course, as stated elsewhere, the two are related. As Itzik explains on page 6 of T-SQL Querying:

[TOP] filters the specified number of rows based on the ordering in the ORDER BY clause, or based on arbitrary order if an ORDER BY clause is absent. With OFFSET-FETCH, this phase skips the specified number of rows, and then filters the next specified number of rows, based on the ordering in the ORDER BY clause.

Understanding the above logical query processing steps is important because it explains otherwise unintuitive quirks in SQL.

For example, you cannot use an alias from the SELECT clause (step 5-2) in the WHERE clause (step 2) because the expression engine evaluates the WHEREclause before the SELECT clause.

-- This won't work!  SELECT Cost = Quantity * Price  FROM Orders  WHERE Cost > 500.00;  

As Itzik says in his book, "The phases in the logical processing of a query have a specific order. In contrast, the optimizer can often make shortcuts in the physical execution plan that it generates. Of course, it will make shortcuts only if the result set is guaranteed to be the correct one."


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »