In this post, I continue exploration of SQL Server vNext and we will look at the nonclustered columnstore index updates.
Columnstore index has some internal structures to support updates. In 2014 it was a Delta Store – to accept new inserted rows (when there will be enough rows in delta store, server compresses it and switchs to Columnstore row groups) and a Deleted Bitmap to handle deleted rows. In 2016 there are more internal structures, Mapping Index for a clustered Columnstore index to maintain secondary nonclustered indexes and a deleted buffer to speed up deletes from a nonclustered Columnstore index.
Updates were always split into insert + delete. But that is now changed, if a row locates in a delta store, now inplace updates are possible. Another change is that it is now possible to have a per row (narrow) plan instead of per index (wide) plan.
Let’s make some experiments.
I will create a table, insert 100 000 rows into it and then create a nonclustered Columnstore index. All the rows will go to the compressed row group. After that, I insert one more portion of 100 000 rows, that is not enough to go directly to a compressed row group, so a delta store will be created and the new rows will go to it.
use opt; go -- Creating a table drop table if exists dbo.Orders; create table dbo.Orders(OrderID int, CustomerID int, OrderDate datetime, Note varchar(max)); go -- Inserting 100 000 rows with cte as (select top(100000) rn = row_number() over(order by (select null)) from sys.all_columns c1, sys.all_columns c2) insert dbo.Orders(OrderID, CustomerID, OrderDate, Note) select rn, rn%10000, dateadd(dd, rn%10000, '20161201'), replicate('a', 8001) from cte ; go -- Creating nonclustered columnstore index, and all the rows are compressed and goes to compressed row groups create nonclustered columnstore index ncci on dbo.Orders(OrderID, CustomerID, OrderDate); go -- Inserting 100 000 rows, all these goes go to delta store with cte as (select top(100000) rn = row_number() over(order by (select null)) from sys.all_columns c1, sys.all_columns c2) insert dbo.Orders(OrderID, CustomerID, OrderDate, Note) select rn+100000, rn%10000, dateadd(dd, rn%10000, '20161201'), replicate('a', 8001) from cte ; go -- Check that we have two row groups, compressed and delta store select * from sys.column_store_row_groups where [object_id] = object_id('dbo.Orders')
The result is:
Note: Notice, that I created a varchar(max) column in a table, that was done deliberately, because I wanted to test if LOB columns are supported in vNext. Unfortunatelly, when I tried to create a nonclustered Columnstore index, I got an error:
The statement failed. Column ‘Note’ has a data type that cannot participate in a columnstore index. Omit column ‘Note’.
However, if you create a clustered Columnstore, you will succeed! Though, you will get the following warning in CTP 1.2:
Warning: Using Lob types (NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX)) with Clustered Columnstore Index is in public preview. Do not use it on production data without backup during public preview period.
I think it is great that we can now have a LOB varchar columns in Columnstore, because it was a blocker for some scenarios, at least I have seen some of those.
Now, let’s look at the update plans under compatibility level of SQL Server 2016 (130) and SQL Server vNext (140):
-- SQL Server 2016 alter database opt set compatibility_level = 130; go set showplan_xml on; go update o set o.OrderDate = getdate() from dbo.Orders o where o.OrderID = 100001; go set showplan_xml off; go -- SQL Server vNext alter database opt set compatibility_level = 140; go set showplan_xml on; go update o set o.OrderDate = getdate() from dbo.Orders o where o.OrderID = 100001; go set showplan_xml off; go
The plans are:
The first plan is a per index plan, that means that a server will apply changes in each index one after another separately. The second plan is a per row plan, and the changes are done in one iterator. Usually, if a small portion of rows is updated, the per row plan is cheaper and more convenient. If you want to know more details about wide and narrow plans, I’ll refer you to a great article from Paul White Optimizing T-SQL queries that change data.
We can see, that in vNext the per-row narrow plan is now possible.
Next, we will check the in-place updates for delta stores. That means that an update operator should not be split as insert + delete, but rather done in place as a single update.
We will issue an update of the delta store row (OrderID > 100000) and then query transaction log, to see, what is done, was it an insert + delete or update. We do this first under compatibility level 2016 and then under vNext.
-- Switch to simple recovery mode alter database [opt] set recovery simple with no_wait; go -- SQL Server 2016 alter database opt set compatibility_level = 130; go checkpoint; go update o set o.OrderDate = getdate() from dbo.Orders o where o.OrderID = 100001 option(maxdop 1); go go select Operation, Context, AllocUnitName from fn_dblog(null,null) l where AllocUnitName like 'dbo.Orders%'; go -- SQL Server vNext alter database opt set compatibility_level = 140; go checkpoint; go update o set o.OrderDate = getdate() from dbo.Orders o where o.OrderID = 100002 option(maxdop 1); go go select Operation, Context, AllocUnitName from fn_dblog(null,null) l where AllocUnitName like 'dbo.Orders%'; go
The result is:
You may see, that in case of 2016 there was first LOP_MODIFY_ROW for a base table which is a heap and then LOP_DELETE_ROWS and LOP_INSERT_ROWS for a delta store of dbo.Orders (function fn_dblog is very user friendly and even adds “(Delta)” for an allocation unit name, cool)!
In case of vNext there are just two operations, both of them are LOP_MODIFY_ROW, one for a base heap and one for a delta store.
What I like about it, is that it is one of the additions that you get out of the box simply by upgrading to a newer version (when there will be an official RTM), as Microsoft used to say – it just runs faster. You have less pressure on transaction log and you save CPU instructions doing one operation instead of two, which, I believe, is quite important for the real-time analytics.
If you change the OrderID to the value that is stored in a compressed row group and run it (two times, to exclude PFS and GAM page actions when the deleted buffer is created for the first row), you will see that for a compressed row groups nothing is changed:
That’s all for today, but stay tuned, there are some more interesting query processing related things in vNext to be covered.
Thank you for reading.