Continuing my blog post series after 24HOP Russia “Query Processor Internals – Joins”.
In this (and the next one) blog post, we will talk about the Nested Loops Post Optimization Rewrite optimizations.
Some of you may know that Nested Loops join algorithm preserves order of the outer table.
Here is Craig’s Freedman (well known Query Optimizer Team Member) post:
Let’s first setup some test data.
create database nldb; go use nldb; go create table dbo.SalesOrder(SalesOrderID int identity primary key, CustomerID int not null, SomeData char(30) not null); go with nums as ( select top(1000000) rn = row_number() over(order by (select null)) from master..spt_values v1, master..spt_values v2, master..spt_values v3 ) insert dbo.SalesOrder(CustomerID, SomeData) select rn%500000, str(rn,30) from nums; go create index ix_CustomerID on dbo.Salesorder(CustomerID); go
Now we will issue two queries. I use an index hint for demonstration purposes only, there might be no hint needed in real life for such a situation.
select * from dbo.SalesOrder with(index(ix_CustomerID)) where CustomerID < 500; select * from dbo.SalesOrder with(index(ix_CustomerID)) where CustomerID < 1000;
The plans are almost the same (except the second one has a missing index warning):
But the results:
Obviously, the sort order is different. That is one of the examples showing, that you should never rely on the “default” sort order, as there is no such thing. The only way to get a desired sort order is to add an explicit outer order by in your query.
Ok, but still, why the sort order is different.
Implicit Batch Sort
Let’s examine the Nested Loops operator details.
You may notice the Optimized property set to true. That is not very informative, what does it mean – “optimized”. Luckily, we have an explanation in the Craig Freedman’s blog:
What I love in Craig’s posts, that every insignificant (from the first glance) word has a solid meaning.
The catch, even for an experienced users, is that both plans has an “Optimized=true” keyword in the Loops Join plan operator, but only in the second query the reordering is done.
Why do the reordering?
The purpose is to minimize random access impact. If we perform an Index Seek (with a partial scan, probably) we read the entries in the index order, in our case, in the order of CustomerID, which is clearly seen on the first result set. The index on CustomerID does not cover our query, so we have to ask the clustered index for the column SomeData, and actually, we perform one another seek, seeking by the SalesOrderID column. This is a random seek, so what if, before searching by the SalesOrderID we will sort by that key, and then issue an ordered sequence of Index Seeks, turning the random acces into the sequential one, wouldn’t it be more effective?
Yes, it would in some cases, and that is what “optimized” property tells us about. However, we remember, that it is not necessarily leads to the real reordering. As for comparing the real impact, I will refer you to the actual Craig’s post or leave it as a homework.
How it is implemented inside SQL Server?
Let’s issue the query with some extra trace flags, showing the output physical tree (the tree of the physical operators) and the converted tree (the tree of the physical operators converted to a plan tree and ready to be compiled into the executable plan by the Query Executor component).
select * from dbo.SalesOrder with(index(ix_CustomerID)) where CustomerID < 1000 option( recompile , querytraceon 3604 , querytraceon 8607 , querytraceon 7352 );
The first one yellow is the output tree of the physical operators, the result of the optimization. The second yellow one is the converted tree ready to become a plan, for the execution compilation.
Notice the in the first one we have no mysterious nodes, like “??? ???”. But we have two of them in the second one. That is the result of Post Optimization Rewrites phase introducing some extra optimizations for the Nested Loops.
The first one red is for another optimization – called “Prefetching” and displayed as “WithUnorderedPrefetch = true” in the plan properties. That is not the topic of this blog post, but in case you are interested, please, refer to the Paul White’s great article Nested Loops Prefetching.
The second one red is for the Batch Sort – our case.
You may be interested to know if there is a special iterator for that mysterious node in the executable plan, and there is one. It is called CQScanBatchSortNew. Here how it looks like in the WinDbg call stack.
The marked area represents a call to the iterator CQScanBatchSortNew which is responsible for the “OPTIMIZED” property and sorting optimization.
Below you may also see the Prefetch iterator, mentioned above, and well described in the Paul’s article.
Not pretending to the Paul’s drawing skills, but to be clear, the artificial plan, with those artificially drawn operators might look like this.
Explicit Batch Sort
The implicit batch sort optimization is a lightweight optimization, that is done on the Post Optimization Rewrite phase, however, that kind of optimization might be done before, by explicitly introducing the Sort operator. The decision to do one or another is a cost based decision.
First let’s fool the optimizer to pretend there is much more rows (just to not create a huge test database):
update statistics dbo.SalesOrder with rowcount = 10000000, pagecount = 400000 Now consider this example: select * from dbo.SalesOrder with(index(ix_CustomerID)) where CustomerID < 10000 option( recompile , maxdop 1 , querytraceon 3604 , querytraceon 8607 );
I’m using the hints again only for the easy demonstration, it is not necessary in the real life.
The plan is now:
The sort operator properties clearly shows that the order is a clustered index order.
Unlikely the OPTIMZED property and implicit Batch Sort this kind of decision is done before the Post Optimization Rewrite, and we may see the explicit sort operator in the output physical tree.
The difference between those two sorting optimizations
Though they are doing the similar function and serve similar goals, they are slightly different
- The implicit sort is done before the Post Optimization Rewrite
- The implicit batch sort may not spill to disk and is net compromise between the explicit sort and not sorted requests
- Different TF to manage those=)
- TF 2340 – Disable Nested Loops Implicit Batch Sort on the Post Optimization Rewrite Phase
- TF 8744 – Disable Nested Loops Prefetching on the Post Optimization Rewrite Phase
- TF 9115 – Disable both, and not only on the Post Optimization, but the explicit Sort also
All of them, except 2340 are undocumented, so not should be used in production, be careful.
That is all for that post.
There are some other batch sort optimization applications, but not to do the re-write, I’d rather provide the links to further reading and some additional relevant and interesting material.