Indexing is one of the most debatable topics in databases. One sole purpose to create index is to have better reads. But sometimes, creating too many indexes can reverse the benefits. Eventually, the database size can un-necessarily grow huge causing backup size/disk space/resource utilization to grow. I wanted to share some inputs based on my experience with working on indexing.
There are few factors I’d like to discuss to make the indexing optimal on the tables in the databases:
1. Unused Indexes
2. Duplicate Indexes
3. Missing Indexes
Note : Optimizing the indexes is an ongoing activity, it has to be performed on regular intervals.
Let me set the context before we get into the detailed discussion. As we all know, the indexes are created to make the query run faster i.e. faster search. Intent of a DBA is mostly, to make the query go for index seeks instead of index scan. Ideally, whenever there is some heavy query – DBAs tend to convert index scan to index seek by creating additional indexes. But somewhere we tend to miss on the overhead , over indexing creates. We already know, any update or insert triggers another update in the overall indexes of a table.
Here comes the concept of index read write ratio : This is the ratio of number of reads over the writes on the indexes i.e. my reads on a particular index should always be higher than the index writes. e.g. If I am hitting on the index 10 times in a second – out of 10, if 8 times I am updating the index then it’s an overhead. Because, the intent of creating an index is to improve the reads not adding write overhead.
This whole post is all about these tiny concepts. How to improve the read write ratio so that maximum benefit can be yielded.
1. Unused Indexes : These indexes are not used by any query throughout the system i.e. indexes with no seeks or scans but just writes. These indexes just do one thing i.e. slowing down inserts and updates on the system. The reason being, even if there is no read on the index but with every insert and update these indexes still have to be updated. Therefore, it’s important to remove these shrubs from the database.
Here is the query to find Unused Indexes: http://blogs.msdn.com/b/blogdoezequiel/archive/2012/04/08/sql-swiss-army-knife-12-index-information-galore.aspx#.U1EPkWa6apo
Note: Please make sure the SQL server was up and running for at least 1-2 weeks. So that, the users have touched upon every angle of the application for significant duration of time.
2. Duplicate Indexes : These types of indexes are the other reason of slowing down the DB operations. Recently, I’d a discussion with a DBA saying, when I check the size of my table it’s 20 GB but the size of the indexes is more than 40 GB. It can be possible because of the fill factor or types of data being indexed etc. but still we need to investigate if there are any unnecessary indexes on the database.
Duplicate indexes can be of two types :
1. Completely Identical(Duplicate) – These types of indexes are same in structure (number of columns and data types) but just have different names. e.g. Index A is on column a,b and index B on the same columns a,b.
2. Partially Identical(Redundant) – These types of indexes are same in structure but with difference in included/secondary columns. These indexes are not ideal candidates for the removal straightaway but they can be tweaked to handle the selects from any one of them.
Before we proceed further, just want to make sure that we are aware of the benefits of this DMV sys.dm_db_index_usage_stats. This DMV will give you a clear cut explanation of how the indexing is performing. It keeps the record of all the seeks/scans/updates on all the indexes in the SQL server instance. Refer to the link to understand the output of this DMV: http://technet.microsoft.com/en-us/library/ms188755.aspx
Note : This DMV captures the inputs since the last instance startup time. So, ensure you have enough instance uptime before trusting the output. And any index removal/creation should be done on the test environment before implementing it on production.
Query to fetch redundant Indexes can be found here: http://blogs.msdn.com/b/blogdoezequiel/archive/2012/04/08/sql-swiss-army-knife-12-index-information-galore.aspx#.U1EPkWa6apo
Here are some examples:
Duplicate Indexes output
If we see the above screenshot and see the two pairs of duplicate indexes. The only difference is the change in names but the column names on which the index is built are same. Now the question is which index should be removed. Normally, dropping any index will do but to be on safer side, look for the usage statistics of the index and remove the index with lower usage statistics.
The query to check the usage statistics is: http://www.mssqltips.com/sqlservertip/1239/how-to-get-index-usage-information-in-sql-server/
SELECT OBJECT_NAME(S.[OBJECT_ID]) AS [OBJECT NAME],
I.[NAME] AS [INDEX NAME],
FROM SYS.DM_DB_INDEX_USAGE_STATS AS S
INNER JOIN SYS.INDEXES AS I
ON I.[OBJECT_ID] = S.[OBJECT_ID]
AND I.INDEX_ID = S.INDEX_ID
WHERE OBJECTPROPERTY(S.[OBJECT_ID],’IsUserTable’) = 1
For Redundant indexes:
If we see the above screenshot, the first pair of index is having same column but the sequence of column is different. Therefore, it’s not recommended to remove any of the index. But the second pair has primary column same but the secondary columns are different. Therefore, we need to check the usage statistics of the indexes and then remove the one with lesser seeks/scans.
Missing Indexes : These indexes are favorites of many DBAs , they simply pick up top 10 missing indexes and create them. This approach can be dangerous on the production servers with heavy transaction. The entire theory of indexing will revolve around Read/Write ratio and usage statistics.
There are two most common ways to get the missing indexes.
1. Using the below query:
DECLARE @runtime datetime
SET @runtime = GETDATE()
SELECT CONVERT (varchar, @runtime, 126) AS runtime,
CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure,
‘CREATE INDEX missing_index_’ + CONVERT (varchar, mig.index_group_handle) + ‘_’ + CONVERT (varchar, mid.index_handle)
+ ‘ ON ‘ + mid.statement
+ ‘ (‘ + ISNULL (mid.equality_columns,”)
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ‘,’ ELSE ” END + ISNULL (mid.inequality_columns, ”)
+ ISNULL (‘ INCLUDE (‘ + mid.included_columns + ‘)’, ”) AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
2. By looking at the execution plan.
In this section, I will explain how to check if you are creating the right missing index why you need to consider below factors :
1. Check for Seeks /Scans / Lookups / Avg. User impact
2. Check for read write ratio of the present indexes
1. Seeks /Scans / Lookups / Avg. User impact : Before creating any missing index, it’s important to understand if it’s worth creating this index. Seeks/scans/lookups are the counters which describe if the missing index is created, these many seeks/scans/lookup operations can be anticipated. There is no thumb rule to understand this value and is completely based on the judgment. If SQL server is Up for weeks and this counter is in millions then definitely the index is worth creating. On the other hand, if counter is in some hundreds then we can simply skip creating that particular index.
2. Read Write Ratio : Once it’s identified that index is worth creating then R/W ratio of the present indexes need to be identified. If there are too many indexes already on the target table then , you need to first find the R/W ratio and the check the usage statistics of the index. If possible, drop the eligible indexes.
For more information, please check : http://sqlperformance.com/2013/06/t-sql-queries/missing-index