Strategy for tuning SQL server VLDB

I have been thinking to write this post for quite sometime. I thought it will be good to share my success story of making a VLDB run super fast. When I started working the scenario was:

1. The database size – 1 TB
2. Data insertion rate per day – approx. 25 GB
3. Long running transactions and heavy blocking entire day
4. Log shipping failure because of huge log file backups , sometimes 100 GB
5. Replication latency due to long running transactions.
6. Biggest table had 10,000,000,000 records(cause of the contention).

It was a total chaos. At that point I was thinking where to start. As usual, I started to understand the environment and tried to find out the issues and bottlenecks. MDW(Management Data Warehouse) was really a life saver. MDW and perfmon helped to find out the health/risks of the overall system.
To read more about MDW, please check this post: https://dbcouncil.net/2013/06/25/amazing-sql-server-tools-every-dba-must-have-part-2/

To start with, I checked for overall SQL server status like Memory, CPU , IO , Top bottlenecks and resource intensive queries. My initial action was just to give a band aid solution to the system to restore it to a normal state. I tuned top queries , made configuration changes and optimally configured the hardware e.g. disks and memory.  All said and done,  the system started performing better.

Then I asked myself, is it sufficient????  The answer was a big no. I had to churn the system to give a permanent resolution or at least a long term solution.So, I did a flashback on my on my initial observation of the system, the issue which was striking me the most , was the heavy scans on the queries (more than 1 billion reads) .  It’s not why there were 1 billion reads but the problem was number of times these number of reads were being performed. Then the question was, can it be avoided or reduced ?

Finally my mission started, I started delving deeper into the system. I found there were many tables where we had more than 1 billion records and lots of index scans etc.  It seems really common to have more than 1 billion records. But sometimes, we need to check with the management/project leads/DBAs:

1. Do we really need these number of records in a table(Data purging)?
2. Do we access these many records actively(Data Archival)?
3. Can the records be archived or at least partitioned?
4. Is the data type and size being used optimally?
5. Is the indexing strategy optimal?
6. Can the compression be done?

Believe me, many times these questions can bring you lots of work 🙂 and eventually relief from the issues mentioned in the beginning of the post. Project leads/ management , they are so busy in meeting the project deadlines – these activities about huge tables is completely ignored unless there are any major issues.

Any table with more than 1 billion records need to have either partitioning or archival strategy in conjunction with the data type assessment , data compression and indexing strategy. Sometimes, the problem is not just bad query writing but also huge unmanaged data in the tables. In my scenario, i found the tipping point to be huge unmanaged data in the tables.

For better understanding,let’s take some real life example, lets say in a food market, if we have 2 million bottles of ketchup out which 1.8 million bottles have reached expiry date. Now if I have to find remaining .2 million bottles. How much extra time/fetches , will I spend/do? Of course it’s going to be huge. It’s always preferred to have only required stuff – be it bottles of ketchup in food market or clothes in our wardrobe and similarly the data in the database.

Again “Band aid” solution first : To showcase the value of the plan, it’s always good to start with the band aid solution. Band aid solution is the solution which can show results with minimal tweaks/efforts. In my scenario, the band aid solution was:

1. Better Indexing strategy
2. Data Partitioning
3. Data compression

These three points seemed achievable as band aid solutions. Planning the archival strategy and Data type assessment were big activities and needed lot of intervention of busy people (management and project leads). Along with that these two activities may need coding and design changes which sometimes are stuck.

For better indexing strategy, I will write another blog post soon – but for now , i will just brief you on this. It’s more about removing duplicate/unused indexes and creating better indexes which could fetch more seeks. There is whole lot of dynamics around it which I will discuss in detail.

For Data partitioning strategy : please refer my blog post https://dbcouncil.net/2014/04/08/table-partitioning-have-you-chosen-right-partition-column/

For Data Compression –  please refer the link : http://technet.microsoft.com/en-us/library/dd894051(v=SQL.100).aspx

you must be thinking, how to relate it back to 1 billion reads in a query. If we have partitioned the big tables, it helps to reduce the scans/seeks to partition level. Instead of search from 1 billion records, we may now be searching from 25 million records which is still better. On the top of it , we we have a good indexing strategy and data compression, the queries will perform more seeks and eventually the number of reads will be very very less. When the reads in the query execution reduces , the queries start performing faster and eventually lesser load on the disks,lesser long running transactions , smaller the T-log backups and reduction in the data latency in replication. Such a big impact!

Now, the impact will be visible and of course management will be happier. But still permanent/long term resolution is pending. I will discuss about

1. Data Archival / Purging
2. Data Type/size assessment

in my next blog post.

HTH!

Advertisements

SQL server table partitioning : Have you chosen a right partition column?

It’s been long time I’ve written any post and I’ve always been thinking of writing something unique. Finally, I got something  🙂 .  Normally, I prefer to write long posts but this time I will try to keep it brief and precise.

I have been with working with a particular customer for a long time and wanted to share my success story through a series of blog posts. In this post, I will discuss about table partition which was a game changer for us.

Customer Scenario:
1. The database size – 1 TB
2. Data insertion rate per day – approx. 25 GB
3. Long running transactions and heavy blocking entire day
4. Log shipping failure because of huge log file backups , sometimes 100 GB
5. Replication latency due to long running transactions.
6. Biggest table had 10,000,000,000 records(cause of the contention).

In my next posts , I will discuss about how did I conquer/win this battle. I will share my strategy of finding the tipping points and plan I followed.

Staying in the context, how do we normally choose our partition column. I have asked many of my colleagues and read some blogs, the answer was:

1. Check with the developers on what’s the most commonly used column in the where clause
2. Pick up the most expensive queries and check the where clause
3. Check if they want to archive the data based on financial year. If yes, then “of course” a date column.

If you ask me, all the answers are correct. The approach is correct but there are some caveats with this approach. Sometimes, it may not yield predictable results because if the customer is not aware of what’s there is where clauses or sometimes most expensive queries may not be using columns which need partitioning. As a result we may end up doing table partitioning based on a wrong column. I contemplated about it and tried to find a predictable approach.

Before I delve deeper, I would like to ask a question :

Lets say , in a book library – if the books are searched with the name of the book/author.  Does it matter to organize the book based on the published dates.

Or

Lets say if I have to organize a book library how will I decide what to keep where so that user can find the book as fast as possible?

The answer is , I will think about the search criteria of the user. If the user mostly search based on the subject and name of the book I will organize based on subject and then names in alphabetical order. In this particular case we already knew that we should always partition based on the subject because that’s what user will always search for in a library and then find a book based on the name.

In this case the most heaviest columns used will be subject of the book and then name of the book. But, how did we find this search criteria?

1. By checking with the experienced people
2. By learning from the other systems and experiences
3. By user feedback

if we see, all three are equally important. But lets say, if I am not sure if all three are 100% correct and in worst case, no one is aware of the search criteria. In that case, I need something more predictable and relevant. Sometimes, the system is very complex and it’s difficult to find out the perfect search criteria.  That’s what this post is all about i.e. to find a right search column so that partition can be done optimally.

For this customer,  here is what I started with:

1. Checked with the developers about the inputs on where clauses
2. Checked the archival strategy of the customer

and something more I did in conjunction with the above , as mentioned below:

It’s simple for the the tables , where the archival had to be performed based on dates we “had” to choose the date columns. For the tables, where there is no archival strategy and the size of the table is huge – how to choose a right partitioning column to have more predictable results. Trusting the inputs from developers/DBAs is not bad but we need to explore little more – to be more certain about this choice of the right partitioning column.

Here comes the gist :

After capturing all the above inputs, I also leveraged a DMV sys.dm_db_index_stats.

Note : Before using this DMV’s , make sure the SQL server instance is up and running for more than a week. The reason being, it’s output is based on the last startup time. More the startup time, more useful will be the information.  For more information on this DMV, please check the link : http://technet.microsoft.com/en-us/library/ms188755.aspx

To set the context right, this DMV captures the information of the index usage like how many index seeks/scans/lookups and updates happened on an index since the last startup of the SQL server instance.

This DMV is the centre point of every query which is run on the SQL server. It gives us overview of which index is being hit most of the times and which index is not being used at all. If we think little deeper, it gives us overview of which column is being hit in where clause , most of the times eventually our search column.

Table Name object_id index_id user_seeks user_scans user_lookups user_updates last_user_seek last_user_scan last_user_lookup last_user_update
Dummy 1234000 10 25295223 42 0 44575 8/8/2013 8/8/2013 NULL 8/8/2013
Dummy 1234000 2 12648653 2 0 44575 8/7/2013 8/6/2013 NULL 8/8/2013
Dummy 1234000 14 89127 0 0 89141 8/8/2013 NULL NULL 8/8/2013
Dummy 1234000 1 5 5 12669239 89141 8/7/2013 8/7/2013 8/8/2013 8/8/2013

If we see the above output, it clearly shows Index_id 10,2 is being hit most of the times. If we see the clustered index stats, it shows that this is not an optimal clustered index because most of the times, there are only key lookups.

Now the question arises, what if there are missing indexes with the user_seeks more than 25295223 ,  the answer is simple – create that index first and then do the analysis again. The end goal is to find out the column with the heaviest hits.  If we are able to find that column and it also (may or may not) aligns with the customers input then surely we have got our column for partitioning.

If we relate back to the library example of Subject and the name of the book – here we can partition based on the index 10 and sort based on index 2. It calls for a composite partitioned index with index column id 2 and then partition column of index 10.

Now you may think : what if the index is composite index. In that case, the index seek will happen only when the search is based on the primary columns along with the secondary column. So, pick up the primary column.

In Conclusion
This is a very precise and useful information which can be leveraged while partitioning instead of just trusting blindly what developers/DBAs say. The end goal of the whole exercise is to find out most heaviest search column so that partition can yield more predictable and fruitful results.

HTH!