Index Session Scripts for SQL community Delhi NCR

I spoke about very basic indexing techniques which could help you tune your SQL environment better. When it comes to query tuning, discussion mostly revolves around indexing. I had identified the areas of discussion for Indexing based on my experience on the field. I picked up the common questions and created a PPT for that.

I talked about :

1. Index Seek/Scan and Key/RID lookup
2. Covering Indexes – Composite indexes vs Indexes with included columns
3. Duplicate/Missing/Unused Indexes
4. Read/Write ratio
5. Index Fill Factor

Please check the helpful links for each topics and scripts will be attached with the post:

1. Index Seek/Scan and Key/RID lookup http://www.mssqltips.com/sqlservertutorial/258/eliminating-bookmark-keyrid-lookups

2. Covering Indexes – Composite indexes vs Indexes with included columns:
https://dbcouncil.net/2015/06/03/which-one-is-better-composite-index-or-index-with-included-columns/
https://dbcouncil.net/2015/06/04/which-index-is-better-composite-index-or-index-with-included-columns-part-2/

3. Duplicate/Missing/Unused Indexes –  http://blogs.msdn.com/b/blogdoezequiel/archive/2013/02/19/sql-swiss-army-knife-series-is-indexed.aspx

4. Read Write ratio – http://sqlblog.com/blogs/louis_davidson/archive/2009/06/20/read-write-ratio-versus-read-write-ratio.aspx

5. Index Fill Factor – http://sqlmag.com/blog/what-fill-factor-index-fill-factor-and-performance-part-1

Please find the demo script attached with this post!

Missing_indexes Seek_Scan_lookup Read Write ratioi Duplicate_unused_indexes Composite_index_includedcolumn_index

SQL Server – which Index is better – Composite index or Index with included columns – Part 2

From my previous post the questions were:

1. Based on the logic – the index seek will happen with the leading column of the index and in both the cases it’s col2. If the column exists in both select and where clause then which covering index is better

2. How composite index outperform index with included columns

3. Amidst the overhead of index fragmentation and size of composite index over index with included can help decide which one is better.

I went in depth to understand why composite indexes outperform indexes with included columns. For the basics of composite indexes, please check this blog:

Let’s get started! (Repro attached at the end of this post)

This time, I will create the table with the same definition but the data will vary. The whole answer to this question revolves around data. If you check the previous blog , it’s explained in detail about how uniqueness of data drive the efficiency of the index. That’s the answer in a nutshell.

Let’s understand how data can influence the better query plans.

Step 1:

IF EXISTS (SELECT 1 FROM sys.objects WHERE name = ‘tbl_test’ AND type = ‘U’)
DROP TABLE tbl_test;
Create table tbl_test (col1 int , col2 int, col3 int,col4 int)

Step 2:

DECLARE @count1 BIGINT;
DECLARE @count2 BIGINT;
DECLARE @count3 BIGINT;
DECLARE @count4 BIGINT;

SET @count1 = 1;
SET @count2 = 2;
SET @count3 = 3;
SET @count4 = 4;

WHILE @count2 < 100000
BEGIN
INSERT INTO tbl_test  (col1, col2,col3,col4) VALUES (@count1, @count2,@count3,@count4);
SET @count1 = @count1;
SET @count2 = @count2+1;
SET @count3 = @count3 + 1;
SET @count4 = @count4 +1;
END;

Now, the data in col1 is duplicate , it has 99,998 records with value just 1.

Step 3:

Let’s query the table and we will see the full table scan in the execution plan as expected:

select * from tbl_test

Execution plan snippet –
|–Table Scan(OBJECT:([tempdb].[dbo].[tbl_test]))

Now, let’s create a clustered index

create clustered index idx_col3 on tbl_test(col3)

Now we will see clustered index scan as there is no where clause in the query!

Execution Plan Snippet:

|–Clustered Index Scan(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col3]))

Step 4:

set statistics IO on  -> To get the IO statistics for the query execution
set statistics Time on –> To get the time statistics for the query execution
set statistics profile on –> To get the execution plan for the query execution

Step 5

Let’s run another query and see what’s there in the execution plan:

select col1,col2 from tbl_test where col2=2
and col4=9000 and col1=1

Execution Plan snippet:
|–Clustered Index Scan(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col1]), WHERE:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0) AND [tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0)))

It’s a clustered index scan.

Now the question is – How to convert it into a seek
1. Composite index
2. Covering index

Step 6:

Let’s delve little deeper to understand the concept:

  • Create an index with included columns:

Based on the query:

select col1,col2 from tbl_test where col2=2
and col4=9000 and col1=1

we will create index with leading column col1 and included columns col2 and col4. This way the query will have all the columns it needs and it must be a seek.

create index idx_col2 on tbl_test(col1) include (col2,col4)

  • Lets run the same query:
    select col1,col2 from tbl_test where col2=2  –23B is row size
    and col4=9000 and col1=1

Now it’s the time to look into execution plan in detail:

  • I will extract the Index seek iterator and split it into two parts:

Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0)), WHERE:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)

lets split this execution plan in parts.

Part1 :Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0))

Part2:  WHERE:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)

Part 1 – In part1 , you can simply see the seek happened on the basis of col1 which was the leading column of the index

Part 2 –  and remaining went into where clause. Now, the question how does it matter because anyways, it was a seek and scanned few rows and got our records.

For the seek operation in this example , SQL will first fetch all the rows corresponding to col1=1 and then will put a where clause for col2 and col4. Just think of a situation where you have heavy duplicate data for the leading column. e.g. in this case we have 99,998 records as duplicate.

It will simply mean, SQL will first bring all 99,998 records from the index and then again put a filter on it for col2 and col4. Therefore there will be lots of extra IOs done.

If we see the IO statistics of this query , we can see there were around 274 logical reads:

Table ‘tbl_test’. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Step 7:

Now, lets create a composite index and see the difference in the execution plan:

create index idx_col2 on tbl_test(col1,col4,col2) with (drop_existing=on)

Now lets run a query and see the execution plan:

select col1,col2 from tbl_test where col2=2
and col4=9000 and col1=1

If you notice index in the execution plan , there is a difference :

|–Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@3],0) AND [tempdb].[dbo].[tbl_test].[col4]=CONVERT_IMPLICIT(int,[@2],0) AND [tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0)) ORDERED FORWARD)

There is no where clause in the execution plan. It means the join was made on all three columns and filtering of row was much better. It didn’t need to bring rows of one column and then filter out. If we see the Io statistics of this execution :

Table ‘tbl_test’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Here we have just 2 logical reads which is far more efficient from the index with included columns. Imagine the scenarios where you have millions of pages and just think about the IO and time composite index can save.

In conclusion : – When the data of the index is highly unique then composite index and index with included columns may perform at the same level for both columns in select and where clause. But if the data is skewed , composite index for the columns in where clause will be a better option. Of course , to cover the columns in the select list , index with included columns is the best bet.

Reference – http://blogs.msdn.com/b/mssqlisv/archive/2008/09/26/designing-composite-indexes.aspx

HTH!

SQL server – which Index is better – Composite index or Index with included columns – Part 1

Indexes are really important subject when it comes to query tuning. There has always been a question around indexes over covering indexes options:

1. Composite Indexes

2. Indexes with included column

Based on the recommendations and readings on the web , the best practice is:

1. Composite indexes should be created to cover the columns in where clause

2. Indexes with included columns should be created to cover the columns in the select
I will try to explain whether creating indexes with included columns will help in both the scenarios. As by doing so, I can still get index seek on the cost of lesser space utilization

Note -I have attached the script for the repro.
Step 1:

Create table tbl_test (col1 int , col2 int, col3 int,col4 int)

Step 2: Insert some data in the table:

DECLARE @count1 BIGINT;

DECLARE @count2 BIGINT;

DECLARE @count3 BIGINT;

DECLARE @count4 BIGINT;

SET @count1 = 1;

SET @count2 = 2;

SET @count3 = 3;

SET @count4 = 4;

WHILE @count1 < 100000000

BEGIN

INSERT INTO tbl_test (col1, col2,col3,col4) VALUES (@count1, @count2,@count3,@count4);

SET @count1 = @count1+1;

SET @count2 = @count2+1;

SET @count3 = @count3 + 1;

SET @count4 = @count4 +1;

END;

Step 3:

select * from tbl_test

Execution plan snippet –
|–Table Scan(OBJECT:([tempdb].[dbo].[tbl_test]))
Created a Clustered index to change it to index scan:

create clustered index idx_col1 on tbl_test(col1)

Execution Plan Snippet:

|–Clustered Index Scan(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col1]))
Step 4:

select col1 from tbl_test where col1=1

Execution plan – Clustered Index Seek
|–Clustered Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col1]), SEEK:([tempdb].[dbo].[tbl_test].[col1]=CONVERT_IMPLICIT(int,[@1],0)) ORDERED FORWARD)

Step 5:

select col1,col2 from tbl_test where col2=2

Execution plan will show scan on the clustered Index :

|–Clustered Index Scan(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col1]), WHERE:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0)))

The reason being there is no index on col2 therefore it goes for a scan.  Once NCI is created on Col2 , query will go for a seek

create index idx_col2 on tbl_test(col2)

 Step 6: Here starts my question:

select col1,col2 from tbl_test where col2=2

and col3=3 –-additional column

Currently, Col2 has non clustered index which covers col1 and col2 but doesn’t cover col3. To make the query go for index seek we need to make this index as a covering index by adding col3 to the index.

Here is the question , whether to go for composite index or index with included column to create a covering Index.

The obvious answer will be to create composite index to get the Index seek as we have the column to be added, in the where clause.

Case 1 :

But even If I create index with included column, I still get the seek as follows:

create index idx_col2 on tbl_test(col2) include (col3)

|–Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0)), WHERE:([tempdb].[dbo].[tbl_test].[col3]=CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)

If I see the IO stats of the query:

Table ‘tbl_test’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Case 2:

create index idx_col2 on tbl_test(col2) include (col3,col4) with (drop_existing=on)

|–Index Seek(OBJECT:([tempdb].[dbo].[tbl_test].[idx_col2]), SEEK:([tempdb].[dbo].[tbl_test].[col2]=CONVERT_IMPLICIT(int,[@1],0) AND [tempdb].[dbo].[tbl_test].[col3]=CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)

Table ‘tbl_test’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If I see the IO stats of the query:

Table ‘tbl_test’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In both the case I am getting a seek and with no difference in IO. There is a slight difference in the seek columns in the execution plan (if you notice closely):

1. For the execution plan in the first case, filter criteria is on the single column and remaining is in the where clause

2. In the second case, filter criteria is on both the columns of the index
Question –

1. Based on the logic – the index seek will happen with the leading column of the index and in both the cases it’s col2. If the column exists in both select and where clause then which covering index is better

2. How composite index outperform index with included columns

3. Amidst the overhead of index fragmentation and size of composite index over index with included can help me decide which one is better.

I will answer my questions in the next blog post!

Seek_Scan_lookup – demo