SQL Server – which Index is better – Composite index or Index with included columns – Part 2

From my previous post the questions were:

1. Based on the logic – the index seek will happen with the leading column of the index and in both the cases it’s col2. If the column exists in both select and where clause then which covering index is better

2. How composite index outperform index with included columns

3. Amidst the overhead of index fragmentation and size of composite index over index with included can help decide which one is better.

I went in depth to understand why composite indexes outperform indexes with included columns. For the basics of composite indexes, please check this blog:

Let’s get started! (Repro attached at the end of this post)

This time, I will create the table with the same definition but the data will vary. The whole answer to this question revolves around data. If you check the previous blog , it’s explained in detail about how uniqueness of data drive the efficiency of the index. That’s the answer in a nutshell.

Let’s understand how data can influence the better query plans.

Step 1:

IF EXISTS (SELECT 1 FROM sys.objects WHERE name = ‘tbl_test’ AND type = ‘U’)
DROP TABLE tbl_test;
Create table tbl_test (col1 int , col2 int, col3 int,col4 int)

Step 2:

DECLARE @count1 BIGINT;
DECLARE @count2 BIGINT;
DECLARE @count3 BIGINT;
DECLARE @count4 BIGINT;

SET @count1 = 1;
SET @count2 = 2;
SET @count3 = 3;
SET @count4 = 4;

WHILE @count2 < 100000
BEGIN
INSERT INTO tbl_test  (col1, col2,col3,col4) VALUES (@count1, @count2,@count3,@count4);
SET @count1 = @count1;
SET @count2 = @count2+1;
SET @count3 = @count3 + 1;
SET @count4 = @count4 +1;
END;

Now, the data in col1 is duplicate , it has 99,998 records with value just 1.

Step 3:

Let’s query the table and we will see the full table scan in the execution plan as expected:

select * from tbl_test

Execution plan snippet –
|–Table Scan(OBJECT:([tempdb].[dbo].[tbl_test]))

Now, let’s create a clustered index

create clustered index idx_col3 on tbl_test(col3)

Now we will see clustered index scan as there is no where clause in the query!

Execution Plan Snippet:

|–Clustered Index Scan(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col3]))

Step 4:

set statistics IO on  -> To get the IO statistics for the query execution
set statistics Time on –> To get the time statistics for the query execution
set statistics profile on –> To get the execution plan for the query execution

Step 5

Let’s run another query and see what’s there in the execution plan:

select col1,col2 from tbl_test where col2=2
and col4=9000 and col1=1

Execution Plan snippet:
|–Clustered Index Scan(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col1]), WHERE:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0) AND [tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0)))

It’s a clustered index scan.

Now the question is – How to convert it into a seek
1. Composite index
2. Covering index

Step 6:

Let’s delve little deeper to understand the concept:

  • Create an index with included columns:

Based on the query:

select col1,col2 from tbl_test where col2=2
and col4=9000 and col1=1

we will create index with leading column col1 and included columns col2 and col4. This way the query will have all the columns it needs and it must be a seek.

create index idx_col2 on tbl_test(col1) include (col2,col4)

  • Lets run the same query:
    select col1,col2 from tbl_test where col2=2  –23B is row size
    and col4=9000 and col1=1

Now it’s the time to look into execution plan in detail:

  • I will extract the Index seek iterator and split it into two parts:

Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0)), WHERE:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)

lets split this execution plan in parts.

Part1 :Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0))

Part2:  WHERE:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)

Part 1 – In part1 , you can simply see the seek happened on the basis of col1 which was the leading column of the index

Part 2 –  and remaining went into where clause. Now, the question how does it matter because anyways, it was a seek and scanned few rows and got our records.

For the seek operation in this example , SQL will first fetch all the rows corresponding to col1=1 and then will put a where clause for col2 and col4. Just think of a situation where you have heavy duplicate data for the leading column. e.g. in this case we have 99,998 records as duplicate.

It will simply mean, SQL will first bring all 99,998 records from the index and then again put a filter on it for col2 and col4. Therefore there will be lots of extra IOs done.

If we see the IO statistics of this query , we can see there were around 274 logical reads:

Table ‘tbl_test’. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Step 7:

Now, lets create a composite index and see the difference in the execution plan:

create index idx_col2 on tbl_test(col1,col4,col2) with (drop_existing=on)

Now lets run a query and see the execution plan:

select col1,col2 from tbl_test where col2=2
and col4=9000 and col1=1

If you notice index in the execution plan , there is a difference :

|–Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0) AND [tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0)) ORDERED FORWARD)

There is no where clause in the execution plan. It means the join was made on all three columns and filtering of row was much better. It didn’t need to bring rows of one column and then filter out. If we see the Io statistics of this execution :

Table ‘tbl_test’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Here we have just 2 logical reads which is far more efficient from the index with included columns. Imagine the scenarios where you have millions of pages and just think about the IO and time composite index can save.

In conclusion : – When the data of the index is highly unique then composite index and index with included columns may perform at the same level for both columns in select and where clause. But if the data is skewed , composite index for the columns in where clause will be a better option. Of course , to cover the columns in the select list , index with included columns is the best bet.

Reference – http://blogs.msdn.com/b/mssqlisv/archive/2008/09/26/designing-composite-indexes.aspx

HTH!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s