Query Tuning Approach – For SQL Server and SQL Azure DB – Part2

It’s really important for a DBA to identify and tune the queries so that applications can perform at their best. If you are interested to learn the query tuning, you should at least have the understanding of the followings:

  1. Understand the phases of query processor/optimizer
  2. Purpose of query optimizer
  3. Indexing
  4. Statistics
  5. Types of execution plans (Estimated Vs Actual)
  6. Various execution plan operators like Index seek/scan, Joins, conversions , Compute scalar etc.
  7. Cardinality estimation
  8. Understand the purpose of data distribution in the table
  9. Various DMVs to understand the above information

Based on my experience, the query tuning world revolves around the above topics. If you manage to get a good understanding on the topics, you have already learnt why the query is performing slow. Rest all is an art which one learns with the experience.

We tried to cover these topics briefly during our SQL Server Delhi NCR community event as well. Though these topic were really big but we tried to share our experience on the field and information about must- knows on the above topics.

There are two approaches which we discussed in the event:
1. Proactive
2. Reactive

1. Proactive approach : This approach is really helpful to reduce the chances of major performance bottlenecks and help to understand the root cause of the ones , if occur. MDW is a great tool which really undervalued. I have personally implemented at many customers’ environments and it’s been really helpful to baseline the workload and identify the top bottlenecks and their causes. For more info on this, read this post
image

if we understand the system really well, we can identify the cause of the issues quickly. Generally, when the questions like ‘The application was working fine last month and suddenly performance started degrading’ arise we have no answers on what would have gone wrong. But if we have the performance baselining of good scenario , we can simply have a comparison the current state and identify the difference.

Moreover, optimal indexing plays a great role for the optimal performance. Many DBA’s tend to ignore this piece and work randomly on indexing when missing indexes warnings are seen in the execution plans. But as a proactive measure, if we can chuck out unused/duplicate indexes from the database – it really helps with optimal DDL and DML operations. This post is really helpful to find out such information from your server.

In my first post of this series, I have discussed on how to get the top resource intensive queries using query/plan hash. Once we have this information, we can simply start tuning the query after seeing the IO/Time statistics and execution plan of the query.

Reactive approach: This approach is generally followed when the issue is currently happening and we need to resolve the issue as quickly as possible. In this approach, we first try to find out the cause of contention which may be IO/CPU or memory. Once we identify the cause of contention, we need to pick up the queries accordingly.

image

If the cause of contention is CPU, steps mentioned in this post can be followed. If it’s memory/IO , we need to focus on high logical read queries and try to tune them. In my first post of this series, I have shared the queries to be used to identify the top logical read queries. Once we have those queries, we need to identify the IO/Time statistics along with their execution plans. As we know, all the DMV share the compile time plan and for run time execution plan , we need to run the query.

Once we have these statistics, my approach is as follows:

1. Look at the IO statistics and identify which table has the highest logical/physical reads
2. Look at the time statistics and see how long did it take to compile the query. Compile time may be high for the really big reporting queries.
3. Once the table with highest logical reads is identified – look at the execution plan with the highest cost in percentage which will mostly be around the operations on the same table.
4. Look and fix for the Index/table scan or key lookup in the execution plan
5. Look for the estimated and actual number of rows in the operators. If the actual rows are higher than the estimated rows – check the statistics of the table/columns
6. Look for the warnings in the plan for hash/sort/missing statistics or missing indexes

All the above mentioned steps will resolve 40-50% of the cases. Generally, indexing plays a vital role and fixing that itself will help to make the query run faster. When nothing of the above helps, then we need to go a bit deeper which is: Cardinality estimation which really depends on the data distribution of the table.

image

Majority of the query issues are caused by wrong cardinality estimation. Once we identify the issue with cardinality estimation, we need to dig and find out what could be the issue. Always try to calculate , how number of estimated rows were identified by query optimizer as there is a formula/reason behind every such calculation.

There are two really great videos from PASS 2014 which really talk about major execution plan issues and the cardinality estimation. If you go though them, you will understand the logic behind this discussion.

Cardinality Estimations:

Various Query Execution plan patterns:

This post covers the summary of Delhi NCR community December event session 2 – Learn SQL Query tuning

PPT can be found here. Demo Scripts can be found here.

Advertisements

Delhi NCR Community December event Session 1 – leverage Query Store for Query troubleshooting

On 19th December, we had completed the anniversary for the SQL Server Community for Delhi NCR. For this event, the agenda was as follows:

Session Title Speaker Contact Details
Leverage SQL 2016 Query Store for PTO Pranab Mazumdar Twitter , Blog
Tips for Query Tuning and Understand execution plan patterns Harsh Chawla Twitter , Blog
Tips of optimal Query Writing Sourabh Agarwal Twitter , Blog

In the first session, Pranab talked about query store a new feature for SQL 2016. The best , I could make out of the session was – now we can have run time plan captured in the tool for us. Before SQL 2014, we used to get compile time plan and to get the runtime plan, we had to run the query manually. Moreover, this feature can also be enabled on SQL Azure DBs as well.

Moreover, it’s really handy tool to enforce the best execution plan from all the execution plans already captured. Generally, we face issues after the upgrade to the newer versions i.e. our critical queries start taking longer time than than expected due to plan changes.If the backup of previous plan is not done, then it’s difficult to restore the execution time back to as normal.

Now, we can enable this tool on the test environment and replay the load of production system. It can help to capture the execution plans efficiently and help to pick up the efficient plan if there is any performance degradation on the upgraded SQL instance due to the new execution plan.
PPT and demo scripts are attached with this post – please feel free to download and try it on SQL server 2016 instances.

HTH!

Query Tuning Approach – For SQL Server and SQL Azure DB – Part1

It’s been really long since I have written a post. I have been working on lots of query tuning my in recent past and I thought of sharing my approach.I have been using this approach for quite some time now and have got really good success rate to tune the query.

Just understand one rule, 80% of the times you will encounter queries which can be tuned by intermediate skills. It’ll be just 20% or even lesser when you will need some extreme level of query tuning skills. Honestly speaking, I rarely get into such situations where I see the query is complex to be tuned – really rare. I have been using really basic approach to combat bad queries and it’s really serving well for my customers.

I assume , you know how to identify which queries should be picked up for tuning e.g. if you are facing high CPU issues , you need to pick CPU intensive queries or if you are facing memory pressure or slow I/O , you need you pick high logical/physical read queries. Now, you know the resource contention on your system – let’s see what to do next:

1. Query to get top resource intensive queries: I have picked these queries from the PSSDIAG tool mentioned here

print ‘– top 10 CPU by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash, COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_worker_time) DESC
) t

print ‘– top 10 logical reads by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_logical_reads) DESC
) t

print ‘– top 10 elapsed time by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash,
sum(execution_count) as ‘execution_count’,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_elapsed_time) DESC
) t

print ‘– top 10 CPU by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_worker_time) DESC;

print ‘– top 10 logical reads by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash, sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_logical_reads) DESC;

print ‘– top 10 elapsed time  by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash, sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_elapsed_time) DESC;

2. Check the execution plan for the query – Just remember , wherever you get the top resource intensive queries – be it MDW / Profiler / DMVs – you will get an option to get SQL Handle / Plan Handle/ SQL Hash / Plan Hash. If you have any of these , you can get an execution plan from the cache.

Query to get the execution plan is:

This will give you the query execution and plan

select b.*,a.* from sys.dm_exec_query_stats a cross apply
sys.dm_exec_query_plan  (a.plan_handle) b
where a.query_hash= <Query_hash>

if you want to know , execution statistics specific to the plan e.g. how many times the plan was reused – in the above output , you will get:

image

There can be multiple plans as well but you need to pick up the plans with More reads/CPU/time based on execution count.

3. Get the query parameters for the runtime planJust remember , the plans you get from these DMVs are the compile time plans and to get the actual plans – you will need to run the query on your SQL Instance. I have observed, DBAs ask the developer team to share the parameters of the query so that, they can run the query  on the test environment. In fact, if you have the query plan – you can get the parameter from there itself. let’s how to get that:

Right click on the graphical execution plan and click on show execution plan as XML – once the xml plan is visible , search for ParameterList as show below:

image

4. Execute the query :  Now, it’s the time to execute the query and analyze the actual statistics. leverage the below commands to delve deeper into the query execution:

Set statistics IO ON –>   Get the IO statistics of the query

Table ‘residual_demo’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Set statistics Time ON –> Get the time statistics of the query i.e. how long it took for the query for compilation and running

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

Set Statistics Profile ON –> Get the execution plan of the query in the text format
image

or
get the graphical execution by clicking on the actual execution plan from the Management Studio

Depending on the issue you are troubleshooting, the operator to be analyzed will vary. e.g. If you are tuning query for CPU intensive workload then probably the first thing to look at will be sort and for high logical reads , it will be an index/table scans

In this post, we have got the execution plan and now we have to start the tuning process. In the next post, I will talk about the operators you can pick for the tuning.

HTH!

SQL UG meet Delhi NCR – Theme – Performance and Tuning of SQL Server

This event was held on 19th September, where we had DB professionals coming from all over Delhi NCR region. Agenda of this event was :
1. SQL Performance Troubleshooting using PSSDIAG and SQLNexus
2. Reduce downtime on Production Server – How to Manage SQL Server Proactively
3. Increase IOPS to handle SQL workloads using Windows 2012 storage spaces

SQL Performance Troubleshooting using PSSDIAG and SQLNexus : This session was delivered by Amit Khandelwal , Support Escalation Engineer , India GTSC. He explained about
1. PSSDIAG configuration and SQL Nexus in detail.
2. Helped the audience understand , the use of this tool while troubleshooting performance issues.
3. Explained about IO issues and tips to troubleshoot such issues
4. Shared some trace flags which can be enabled on all the SQL environments as best practices
5. Shared some tricks while troubleshooting the SQL performance before delving deeper.

Session PPT can be found here

Reduce downtime on Production Server – How to Manage SQL Server Proactively : This sessions was delivered by Harsh Chawla , Premier Field Engineer , Microsoft Services(myself) on how to be more proactive while working on the production server. This was a philosophical session on how as a DBA can change his mindset while working on the database and be more efficient on the job. In this presentation , the complete discussion was around a production down scenario and how it could be avoided. There were a major discussion around , these three rules:

1. Understand your “Present” – Just be aware of the Application, business purpose/impact, SLA/RPO/RTO and hardware configuration of SQL Server you are working on.
2. Invest for better “Future” – Follow the right practices and have the correct configuration in place to avoid any issues in future.
3. Don’t Ignore “Past” – Have right monitoring and auditing solution to track the cause of the issues in the past.

Session PPT can be found here

Increase IOPS to handle SQL workloads using Windows 2012 storage spaces :  This session was delivered by Gaurav Srivastava , Service Engineer , Microsoft IT. He talked about how storage spaces can help you with high disk performance even without using SANs. Storage spaces is the feature from Windows 2012 and is very useful when you deploy SQL on Azure VMs. Nevertheless , the same feature can be used even for on-premise SQL server deployed on Windows 2012. He talked about:
1. Storage Pools
2. Disk caching
3. Storage spaces types – Simple, Mirror and Parity
4. Tools like Diskpd and SQLIO to see the disk IOPS and latency
5. Shared the demo of creating Storage pools and spaces using Powershell and GUI

Session PPT and Scripts can be found here

Disclaimer – The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

HTH!

List of SQL server readiness links for Self learning!

I have been interacting with DBAs consultants as part of my work for a long time now. I have been getting a common request on SQL learning resources and mostly for a collated list of blogs / learning materials for self-study. I thought of writing this post where I could mention the resources , you can simply add to your browser favorites and refer at leisure. If you are new or enhancing your skills on SQL server, this information will help you grow further.

In this post you will see :

1. List of Microsoft SQL server teams’ Blog websites
2. SQL learning free video library
3. List of SQL fundamental and troubleshooting learning
4. Report SQL server bugs website
5. Sites to download SQL Troubleshooting tools
6. List of Facebook Pages/groups for help and getting latest community events updates

Microsoft Team Blogs:
These blogs are managed by Microsoft teams directly and can be referred for the latest updates and SQL features.

SQL Customer Advisory Team blog – http://blogs.msdn.com/b/sqlcat/
SQL server CSS SQL server Engineer’s blog – http://blogs.msdn.com/b/psssql/
SQL server PFE blog – http://blogs.msdn.com/b/sql_pfe_blog/
Microsoft GTSC – SQL server troubleshooting blog – http://blogs.msdn.com/b/sqlserverfaq/
Microsoft SQL server Team Blog – http://blogs.technet.com/b/dataplatforminsider/

SQL Video Libraries:
You can view/download free videos from these websites for your SQL server learning –

Free videos for SQL learning – https://www.sqlskills.com/sql-server-resources/sql-server-mcm-training-videos/
Free Videos from Microsoft Team – https://channel9.msdn.com/
Free Videos from Microsoft GTSC Team – http://blogs.msdn.com/b/sqlserverfaq/archive/tags/dbvideo/

SQL Learning/troubleshooting blog :
These links can be referred for the SQL fundamental and troubleshooting learning.

SQL fundamentals and features blog – http://sqlblog.com/
SQL Skills – https://www.sqlskills.com/sql-server-resources/
Query Processing and Optimization – http://blogs.msdn.com/b/sqlqueryprocessing/
Bart Duncan’s Query Processing and Optimization Blog –http://blogs.msdn.com/b/bartd/
Craig Freedman’s Query Processing and Optimization Blog- http://blogs.msdn.com/b/craigfr/
Microsoft SQL server Library – https://msdn.microsoft.com/en-us/library/bb545450.aspx
SQL Authority Blog – http://blog.sqlauthority.com/author/pinaldave/
SQL/.net blog – http://blogs.msdn.com/b/arvindsh/
SQL PTO Blog – http://www.brentozar.com/blog/
SQL AlwaysON blog –http://blogs.msdn.com/b/sqlalwayson/
SQL troubleshooting blog – http://troubleshootingsql.com/
SQL & BI blog – http://www.sqlserverfaq.net/
SQL PTO blog – http://blogs.msdn.com/b/blogdoezequiel/
SQL AlwaysON and SQL 2014 learning blog – http://sqlserver-help.com/
SQL learning blog – http://sqluninterrupted.com/
SQL /Azure learning  – http://dbcouncil.net
SQL troubleshooting blog – http://mssqlwiki.com/
SQL Learning Blog – http://www.sqlservergeeks.com/

 

Report Bug for SQL server :
If you suspect you have hit a bug with SQL server, you can notify or vote for the bug on this forum –
Microsoft Connect website – http://connect.microsoft.com/

SQL Forums:
If you are stuck on SQL server issues, you can post your questions in these forums to get help –

Microsoft’s SQL server Forum – https://social.msdn.microsoft.com/Forums/en-US/home
Microsoft’s SQL server Forum – https://social.technet.microsoft.com/Forums/sqlserver/en-US/home
SQL server Central – http://www.sqlservercentral.com/Forums/
SQL Team Blog –http://www.sqlteam.com/forums/

SQL Facebook pages/groups:
If you want to join SQL community events for free in your region, you can join these groups for the further updates. Moreover, you can also put your questions in these groups:

SQL Server Delhi NCR community –https://www.facebook.com/groups/1537669073140700/
SQLserverFAQ – https://www.facebook.com/groups/213527572040350/
SQL Bangalore Group –https://www.facebook.com/groups/SQLBangalore/
SQL Server DBA group – https://www.facebook.com/groups/sqlserverdbaindia/
MSSQLWIKI – https://www.facebook.com/groups/mssqlwiki/
Mumbai Techie Group – https://www.facebook.com/MumbaiTechieGroup?fref=ts

Troubleshooting tools site:
This can be used to get latest troubleshooting tools.
Sysinternals – https://technet.microsoft.com/en-us/sysinternals
Codeplex – http://www.codeplex.com/

The list is huge but I have tried to keep as optimal as possible to make this relevant for you. However, I will keep on updating more links for you.

HTH!

Unable to shrink the SQL server database. Is it the same Blob thing?

This is a very common scenario which I have seen on many customers’ environments. Even shrinking the database doesn’t help to reduce the size of the database. Most of the times on the products which use BLOB very often face this issue e.g Microsoft SharePoint – it deals with BLOB most of the times.

Many times,I have seen many customers complaining of the database size going even double after the upgrade. Most of the times, the issue is found to be due to BLOB stuff.  In this blog we will talk about resolving the issues which are only related to BLOB storage.

Now it’s the time to build some base to understand the cause of the issue.  Let’s start with some Q & A

Q : – When we issue delete command on a table, what happens in the background. Do the records get deleted immediately?

A : –  There are lots of people who have doubt about this. Well, the answer is no, they are not deleted immediately. They are marked as ghost records in the PFS page. There is a process known as ghost record cleanup task which runs periodically to delete those records.

Q : – How is it related to the database size issue.

A: –  When we delete a record, it’s the responsibility of the ghost records cleanup task to delete those records physically to reclaim the space. It helps to keep the database size up to the mark. But the above issue can happen due to the reasons :

1. The ghost record cleanup task is not running.
2. The load on the system is very high and there are huge no. of deletions and updates – and the ghost cleanup task is not able to catch up.
3.  In certain cases, these allocated but empty LOB pages may accumulate and this is by design. When the LOB value to be inserted is larger than one page, we break it up into page-size fragments. And when we insert a page-size fragment, we don’t search the existing LOB pages for free space; we instead just allocate a new page.

This is an optimization: in the typical case, the free space fragments on the existing pages are all smaller than a page. Since we need a whole page of free space, we might as well skip the search and just allocate a new page. Thus, if we repeatedly rollback insertions of LOB data larger than one page, their database will accumulate empty LOB pages.

Now we will talk about the action steps to resolve this issue:

1.  Run DBCC updateusage and see if this sorts out your issue.

It reports and corrects pages and row count inaccuracies in the catalog views. These inaccuracies may cause incorrect space usage reports returned by the sp_spaceused system stored procedure.

2. If step 1 doesn’t help, we need to know whether we have ghost records or not. To find that out , please run this command:

SELECT object_name(object_id) as Name, record_count, GHOST_RECORD_COUNT,Version_ghost_record_count,INDEX_TYPE_DESC, ALLOC_UNIT_TYPE_DESC
FROM sys.dm_db_index_physical_stats (DB_ID(N'<dbname>’), NULL, NULL, NULL , ‘DETAILED’)

and this may take some time to give you output for all the objects .  if the database size is very large then we can first check the largest objects and then run the query for a particular table name. To get the object name with the largest allocation , you could run the query:

select name=object_schema_name(object_id) + ‘.’ + object_name(object_id)
, rows=sum(case when index_id < 2 then row_count else 0 end)
, reserved_kb=8*sum(reserved_page_count)
, data_kb=8*sum( case
when index_id<2 then in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count
else lob_used_page_count + row_overflow_used_page_count
end )
, index_kb=8*(sum(used_page_count)
– sum( case
when index_id<2 then in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count
else lob_used_page_count + row_overflow_used_page_count
end )
)
, unused_kb=8*sum(reserved_page_count-used_page_count)
from sys.dm_db_partition_stats
where object_id > 1024
group by object_id
–order by name
order by reserved_kb

once we get the top space consumers, we can pick up the table names and run it like:

    SELECT object_name(object_id) as Name, record_count, GHOST_RECORD_COUNT,Version_ghost_record_count,INDEX_TYPE_DESC, ALLOC_UNIT_TYPE_DESC
FROM sys.dm_db_index_physical_stats (DB_ID(N'<dbname>’), object_id(‘<TableName’), NULL, NULL , ‘DETAILED’)

3. If we see the ghost records, then we also need to find out whether ghost cleanup task is running or not. To check that , we will need to enable the trace flags : DBCC TRACEON (662,3605,-1) and it will throw messages in the SQL error logs. Also, you could run the query :

if exists (select 1 from sys.dm_tran_active_transactions where name=’GhostCleanupTask’)
select * from sys.dm_tran_active_transactions where name=’GhostCleanupTask’

Keep this query running for 10-15 seconds continuously and you will get an output as ghost cleanup task runs after every 5-10 seconds. If the ghost cleanup task is not working then please check if you have 661 trace flag enabled. If yes, please disable it.

4.  There is an Cumulative update of SQL server which has fix for this issue: – http://support.microsoft.com/kb/2622823

5. If you can’t upgrade the SQL instance then you could try, dbcc cleantable.

6. If still no help, you could try  ALTER INDEX…REORGANIZE WITH (LOB_COMPACTION = ON)

I hope this will help you to fix the issue of database space usage. Please feel free to post the questions if any.

Analyzing PSSDIAG logs without using SQLNEXUS

In this blog we will discuss about how we can analyze the PSSDIAG logs without using SQLnexus tool.  In this post I will talk in reference to my blog on troubleshooting high CPU issues. Please make sure you open it while reading this blog because I have followed the same steps to analyze the PSSDIAG.

After installing and configuring the PSSDIAG as mentioned in my blog post, here are the next steps we have to follow.

After configuring the PSSDIAG , you will get a cab named pssd.cab in the chosen location:

image

Please make sure the cab is saved in some folder because we will have to extract files and it will become easier to manage .

Here is how it will look when the files are extracted:

image

Before running the PSSDIAG data collection , we have to make sure that the issue is happening  or going to happen and we will have to capture the logs for at least 5-10 minutes “whilst” the issue is happening. It will help us to get enough data for analysis.

To start the PSSDIAG we have to double click the PSSDIAG.cmd and it will look like :

image

Once you see the below text in the command prompt , that means now the PSSDIAG collection process has started.

image

Below is the scenario of high CPU for which I had collected the PSSDIAG logs.

image

Now lets look at the collected logs in the folder named “output” which will be created in the same location from where pssdiag.cmd was run.

The output folder will look like:

image

There will be lots of files in the output and I will discuss about the major files which will help us to find out the cause of high CPU. I will follow my blog on troubleshooting high CPU issues step by step to troubleshoot.

Step 1: – When you see high CPU utilization on the server check in task manager, which process is consuming high CPU. If it’s SQL server then open Perfmon and add the counter for Processor: % Privileged time, % Processor time and %user time

If you see the %privileged time to be high like hovering around 40% and above then it could be some driver issue. To troubleshoot these kinds of issues you could try:

Using Xperf: – http://blogs.msdn.com/b/ntdebugging/archive/2010/03/22/using-xperf-to-root-cause-cpu-consumption.aspx — Video; http://msdn.microsoft.com/en-us/performance/cc709422

If it’s user time which is high then find the process which is consuming high CPU.
image

In this case which apparently is , now we have to check the process consuming high CPU:

image

In this case SQL is consuming around 60% of the CPU and rest is consumed by the remaining processes. As SQL is major contributor to the high CPU at the system and is consistently consuming the same amount.

Another way to find out how much CPU SQL server is consuming – Open the file(from the output folder) , name ending with SQL_2005_Perf_Stats_Startup.OUT.

Look for: Recent SQL Processor Utilization (Health Records)  – you will get the output as follows:

runtime                         system_idle_cpu sql_cpu_utilization
——————————  ————— ——————-
2012-03-23T16:26:33.137        90                   1
2012-03-23T16:26:33.137        64                  20
2012-03-23T16:26:33.137        58                  25
2012-03-23T16:26:33.137        56                  25
2012-03-23T16:26:33.137        55                  25
2012-03-23T16:26:33.137        57                  26
2012-03-23T16:26:33.137        57                  26
2012-03-23T16:26:33.137        44                  35
2012-03-23T16:26:33.137        14                  53
2012-03-23T16:26:33.137         4                  60
2012-03-23T16:26:33.137         3                  59
2012-03-23T16:26:33.137         3                  59
2012-03-23T16:26:33.137         2                  58
2012-03-23T16:26:33.137         3                  58
2012-03-23T16:26:33.137         8                  54
2012-03-23T16:26:33.137        10                  53
2012-03-23T16:26:33.137         2                  56

Step 2. Lets move to the next step of finding the architecture which in our case is x64.

Snippet from Msinfo2 file:

Item    Value
OS Name    Microsoft Windows Server 2008 R2 Enterprise
Version    6.1.7600 Build 7600
System Name    ANMOL-PC
System Type    x64-based PC
Processor    Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz, 2933 Mhz, 4 Core(s), 8 Logical Processor(s)

Step 3: – Check if there are any trace flags enabled:

To get this we can check in the SQL error log at the startup of SQL server. the file name is SQL_Base_Errorlog_Shutdown.TXT. In this case we don’t have any trace flags(snippet from error logs):

2012-03-23 11:50:55.060 Server Microsoft SQL Server 2008 (RTM) – 10.0.1600.22 (X64)
Jul  9 2008 14:17:44
Copyright (c) 1988-2008 Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )
2012-03-23 11:50:55.060 Server (c) 2005 Microsoft Corporation.


Step 4
: – Check the below mentioned SQL parameters

To check the SP_configure output we can check the file : –  name ending with sp_sqldiag_Shutdown.OUT

In this case:

max degree of parallelism                     0          64            0           0
max server memory (MB)                       16  2147483647         2048        2048
min server memory (MB)                        0  2147483647            0           0
priority boost                                0           1            0           0

In this case we are safe because we have:
1. Set max server memory
2. Not set Priority boost
3. As we have 8 processors, we don’t need to set any value for this unless required.


Step 5
: – Run DBCC memorystatus and check for the USERSTORE_TOKENPERM memory clerk.

To check the DBCC memorystatus, please check the file:  filename ending with DBCC_MEMORYSTATUS_Startup.OUT

In our case, it is normal:

USERSTORE_TOKENPERM (node 0)             KB
—————————————- ———–
VM Reserved                              0
VM Committed                             0
Locked Pages Allocated                   0
SM Reserved                              0
SM Committed                             0
SinglePage Allocator                     584 <=very less so no problem here.
MultiPage Allocator                      288

CACHESTORE_SQLCP (node 0)                KB
—————————————- ———–
VM Reserved                                        0
VM Committed                                       0
Locked Pages Allocated                             0
SM Reserved                                        0
SM Committed                                       0
SinglePage Allocator                           10128 <= Not very high
MultiPage Allocator                             2152

MEMORYCLERK_SQLQERESERVATIONS (node 0)   KB
—————————————- ———–
VM Reserved                                        0
VM Committed                                       0
Locked Pages Allocated                             0
SM Reserved                                        0
SM Committed                                       0
SinglePage Allocator                          374256 <= High based on the allocated memory
MultiPage Allocator                                0


Step 6
: – If the memory consumed by CACHESTORE_SQLCP is high then we check compilations and re-compilations which in our case is not required.

But I will still mention about the file in the PSSDIAG output which you will need to check for procedure cache pollution:  File name will end with :- SQL_2005_Mem_Stats_Startup.OUT and you will have to search for the string: –   — proccache_pollution

The output will look like :

runtime     plan_count  total_size_in_bytes  cacheobjtype       objtype    usecounts
———– ———– ——————– —————— ———————-
2011-08-22        48768           2898108416 Compiled Plan      Adhoc                4
2011-08-22         5762            281034752 Compiled Plan      Adhoc                1
2011-08-22         5755           1100980224 Compiled Plan      Adhoc                1
2011-08-22         4125            103612416 Compiled Plan      Adhoc                1
2011-08-22         1380            396263424 Compiled Plan      Adhoc                1
2011-08-22         1379            200916992 Compiled Plan      Adhoc                1
2011-08-22         1379            291463168 Compiled Plan      Adhoc                1
2011-08-22         1378            149831680 Compiled Plan      Adhoc                1
2011-08-22         1166            107454464 Compiled Plan      Adhoc                4
2011-08-22         1056            158662656 Compiled Plan      Adhoc                1
2011-08-22         1056             72105984 Compiled Plan      Adhoc                1
2011-08-22         1054             48947200 Compiled Plan      Adhoc                1
2011-08-22          157             11329536 Compiled Plan      Prepared             1
2011-08-22          109              4751360 Compiled Plan      Adhoc                1

But this is just dummy data to show you how it looks like. There will be more columns e.g. short_qry_text which will show you the query for which the parameterization is required.

In the above output, we have to look at the plan_count , size and the use counts which will simply mean that the query for which a new plan was created per execution. For more explanation, please check step 6 in  troubleshooting high CPU issues.

Step 7:- This relates to our issue i.e. lots of hashing and sorting is happening :

image

If we see, the output exactly matches with :

MEMORYCLERK_SQLQERESERVATIONS (node 0) KB
—————————————- ———–
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 374256
MultiPage Allocator 0

Now, the time to check worktables/sec and workfiles/sec

Worktables/sec are high(~15) based on the number of batches/sec (~ 2 ) – which simply lots of sorting spilling to tempdb :

image

Workfiles/sec are are very low (It shows that there was no hashing spilled to the tempdb) :

image

In essence:

1. No compilation and re-compilation which means no parameterization issue.
2. No configuration issues at SQL server end.
3. we need to find out a query which might have cause the issue.

Please note : – The values are low because I have run small batched to reproduce high CPU.

Now we will move in the direction of hunting the query which is causing the trouble.We will make use of SQL server 2008’s feature query plan hash to drill down the issue. In the file – name ending with  Perf_Stats_Snapshot_Shutdown.OUT. Look for top 10 CPU by query_hash

Here is the output you will get:

runtime     total_worker_time    sample_statement_text
———-  ——————– ———————————-
2012-03-23            1683404272 insert into test (select * from test123 order by col1)
2012-03-23               4432252 SELECT * FROM sys.sysperfinfo
2012-03-23               2461141 SELECT   CONVERT (varchar(30), @ru
2012-03-23               1595088 select db_id() as dbid,     case
2012-03-23               1495085 SELECT CONVERT (varchar(30), @runt
2012-03-23               1062060 SELECT       CONVERT (varchar(30),
2012-03-23               1026058 SELECT CONVERT (varchar(30), @runt
2012-03-23                905051 SELECT CONVERT (varchar(30), @runt
2012-03-23                686038 SELECT      sess.session_id, req.r
2012-03-23                684039 select TOP 10000 * from sys.dm_exe

In the above output, if we see the query which is causing the high CPU is: insert into test (select * from test123 order by col1)

We can tune this query to avoid excessive sorting and which will further help to avoid high CPU issue.

I will keep adding more content in this blog as I find more interesting scenario of high CPU.