Query Tuning Approach – For SQL Server and SQL Azure DB – Part1

It’s been really long since I have written a post. I have been working on lots of query tuning my in recent past and I thought of sharing my approach.I have been using this approach for quite some time now and have got really good success rate to tune the query.

Just understand one rule, 80% of the times you will encounter queries which can be tuned by intermediate skills. It’ll be just 20% or even lesser when you will need some extreme level of query tuning skills. Honestly speaking, I rarely get into such situations where I see the query is complex to be tuned – really rare. I have been using really basic approach to combat bad queries and it’s really serving well for my customers.

I assume , you know how to identify which queries should be picked up for tuning e.g. if you are facing high CPU issues , you need to pick CPU intensive queries or if you are facing memory pressure or slow I/O , you need you pick high logical/physical read queries. Now, you know the resource contention on your system – let’s see what to do next:

1. Query to get top resource intensive queries: I have picked these queries from the PSSDIAG tool mentioned here

print ‘– top 10 CPU by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash, COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_worker_time) DESC
) t

print ‘– top 10 logical reads by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_logical_reads) DESC
) t

print ‘– top 10 elapsed time by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash,
sum(execution_count) as ‘execution_count’,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_elapsed_time) DESC
) t

print ‘– top 10 CPU by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_worker_time) DESC;

print ‘– top 10 logical reads by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash, sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_logical_reads) DESC;

print ‘– top 10 elapsed time  by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash, sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_elapsed_time) DESC;

2. Check the execution plan for the query – Just remember , wherever you get the top resource intensive queries – be it MDW / Profiler / DMVs – you will get an option to get SQL Handle / Plan Handle/ SQL Hash / Plan Hash. If you have any of these , you can get an execution plan from the cache.

Query to get the execution plan is:

This will give you the query execution and plan

select b.*,a.* from sys.dm_exec_query_stats a cross apply
sys.dm_exec_query_plan  (a.plan_handle) b
where a.query_hash= <Query_hash>

if you want to know , execution statistics specific to the plan e.g. how many times the plan was reused – in the above output , you will get:

image

There can be multiple plans as well but you need to pick up the plans with More reads/CPU/time based on execution count.

3. Get the query parameters for the runtime planJust remember , the plans you get from these DMVs are the compile time plans and to get the actual plans – you will need to run the query on your SQL Instance. I have observed, DBAs ask the developer team to share the parameters of the query so that, they can run the query  on the test environment. In fact, if you have the query plan – you can get the parameter from there itself. let’s how to get that:

Right click on the graphical execution plan and click on show execution plan as XML – once the xml plan is visible , search for ParameterList as show below:

image

4. Execute the query :  Now, it’s the time to execute the query and analyze the actual statistics. leverage the below commands to delve deeper into the query execution:

Set statistics IO ON –>   Get the IO statistics of the query

Table ‘residual_demo’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Set statistics Time ON –> Get the time statistics of the query i.e. how long it took for the query for compilation and running

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

Set Statistics Profile ON –> Get the execution plan of the query in the text format
image

or
get the graphical execution by clicking on the actual execution plan from the Management Studio

Depending on the issue you are troubleshooting, the operator to be analyzed will vary. e.g. If you are tuning query for CPU intensive workload then probably the first thing to look at will be sort and for high logical reads , it will be an index/table scans

In this post, we have got the execution plan and now we have to start the tuning process. In the next post, I will talk about the operators you can pick for the tuning.

HTH!

Advertisements

SQL UG meet Delhi NCR – Theme – Performance and Tuning of SQL Server

This event was held on 19th September, where we had DB professionals coming from all over Delhi NCR region. Agenda of this event was :
1. SQL Performance Troubleshooting using PSSDIAG and SQLNexus
2. Reduce downtime on Production Server – How to Manage SQL Server Proactively
3. Increase IOPS to handle SQL workloads using Windows 2012 storage spaces

SQL Performance Troubleshooting using PSSDIAG and SQLNexus : This session was delivered by Amit Khandelwal , Support Escalation Engineer , India GTSC. He explained about
1. PSSDIAG configuration and SQL Nexus in detail.
2. Helped the audience understand , the use of this tool while troubleshooting performance issues.
3. Explained about IO issues and tips to troubleshoot such issues
4. Shared some trace flags which can be enabled on all the SQL environments as best practices
5. Shared some tricks while troubleshooting the SQL performance before delving deeper.

Session PPT can be found here

Reduce downtime on Production Server – How to Manage SQL Server Proactively : This sessions was delivered by Harsh Chawla , Premier Field Engineer , Microsoft Services(myself) on how to be more proactive while working on the production server. This was a philosophical session on how as a DBA can change his mindset while working on the database and be more efficient on the job. In this presentation , the complete discussion was around a production down scenario and how it could be avoided. There were a major discussion around , these three rules:

1. Understand your “Present” – Just be aware of the Application, business purpose/impact, SLA/RPO/RTO and hardware configuration of SQL Server you are working on.
2. Invest for better “Future” – Follow the right practices and have the correct configuration in place to avoid any issues in future.
3. Don’t Ignore “Past” – Have right monitoring and auditing solution to track the cause of the issues in the past.

Session PPT can be found here

Increase IOPS to handle SQL workloads using Windows 2012 storage spaces :  This session was delivered by Gaurav Srivastava , Service Engineer , Microsoft IT. He talked about how storage spaces can help you with high disk performance even without using SANs. Storage spaces is the feature from Windows 2012 and is very useful when you deploy SQL on Azure VMs. Nevertheless , the same feature can be used even for on-premise SQL server deployed on Windows 2012. He talked about:
1. Storage Pools
2. Disk caching
3. Storage spaces types – Simple, Mirror and Parity
4. Tools like Diskpd and SQLIO to see the disk IOPS and latency
5. Shared the demo of creating Storage pools and spaces using Powershell and GUI

Session PPT and Scripts can be found here

Disclaimer – The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

HTH!

Amazing SQL server tools every DBA must have!!! – Part 2

This post is continuation of my post Post

In the previous post, I discussed about Database and OS configuration tools and in this post I will discuss about performance monitoring tools.

Performance monitoring and troubleshooting : In performance monitoring we have a great set of tools which are free of cost and can be easily accessible.  I won’t go in much details of these tools as there is already lots of stuff available. The main intent is to collate all the tools list at one place. It will help to know what tools are available for free and how best they can fit in your environment.

Performance monitoring and troubleshooting can be done in two ways. 1. Live 2. postmortem.

lets take an example of High CPU issue, if DBAs are asked to troubleshoot live on such issues – they need a very deep understanding of how SQL server work and need lots of SQL scripts/perfmon counters handy to find out the issue. But for DBAs who have level 100-200, they will end up either restarting the SQL server or killing the sessions with heavy queries.

Now, lets say the issue is not fixed in live troubleshooting and the management team wants to know the root cause of the issue i.e. postmortem analysis. If DBA know lots of DMV’s it still can be figured out though still the chances are very less but for the new DBA it will be very difficult to find out something.

To cater to both the needs of live and postmortem analysis, we have enough tools available.

For postmortem performance troubleshooting (These tools can be used for performance baseline as well) , the tools which we have are :
1.MDW – Management Data Warehouse
2.PSSDIAG/SQLNEXUS
3.PAL

For Live troubleshooting and monitoring the tools we have, are:
1. Activity monitor
2. UCP
3. Standard SQL reports

Postmortem performance analysis tools

1. Management Data Warehouse (MDW) : – This was the new tool introduced in SQL server 2008. It’s gained a lot of popularity for the postmortem analysis. Basically, It has predefined set of data collectors which can be used to collect and upload the data to MDW database.  It captures query statistics,disk activity and server activity. Even Utility control point(will discuss in the next sections) tool uses this in the background.

Here are the reports screenshot which could be used to find out the cause of the issue:

Overall Server Summary:

clip_image002

Query Statistics report where you could see the reports based on CPU/IO/Duration.

image

Disk summary report

image

These reports help to find out the issue proactively. After studying it little deeper, I found that they have SSIS packages built to fetch the data from the server and upload it to the data warehouse database.

To read the packages in detail you could open and fetch the packages from SQL server integration services and understand what it does:

image

Note:  Change the job frequency accordingly as the default frequency may not be appropriate for your environment.

2. PSSDIAG / SQLNexus : This is the set of tools which helps to do postmortem analysis. I will discuss briefly about both of them.

PSSDIAG: It’s a framework you could configure the kind of data you want to capture.  The dashboard looks like:
image

In the screenshot, you could see various sections like profiler trace, perfmon, SQLDIAG and other SQL scripts.  In this blog, I have discussed about how to configure the pssdiag for the high CPU issue: https://dbcouncil.net/2012/02/18/configure-pssdiag-to-capture-high-cpu-issues/ . you could refer this blog for reference to configure the tool.

Once the tool is configured and you save the configuration, it will create a pssd.cab/exe which could use to capture the logs. You have to follow the below steps to run it during the time of the issue:

Steps to run the PSSDIAG:

a. Create a folder named PSSDIAG on your SQL server machine.  This folder should be on a drive with plenty of space as diagnostic file collections can be quite large
b. Download the PSSDIAG from FTP site.
c. Open a command prompt.  Change the current directory to your PSSDIAG folder and run pssd.exe to extract its contents.
d. Run PSSDIAG.cmd at the command prompt to start the collection process.
e. Once PSSDIAG displays the message, “PSSDIAG Started,” attempt to reproduce your issue.
f. Stop PSSDIAG by pressing CTRL+C.

Please make sure you collect the data for 10-15 minutes minimum to get the complete understanding of the issue.

Note : – Please make sure you configure profiler traces to minimal level. It can cause production outage if not configured appropriately.

Reference links:
Freely downloadable tool, the PSSDIAG: – http://diagmanager.codeplex.com/
Configure PSSDIAG : – http://sql-blogs.com/2012/02/18/configure-pssdiag-to-capture-high-cpu-issues/
More information on PSSDIAG: – http://support.microsoft.com/kb/830232

After collecting the logs ,you may think – how to analyze the data. To help us with that, there is another tool SQLNEXUS.

SQLNexus :  SQLNEXUS is tool which helps to analyze the logs captured by PSSDIAG utility. The PSSDIAG logs are uploaded to SQLNEXUS tool and it shows the performance bottlenecks, slow queries, blocking in graphical reports. It provides effective and detailed analysis of the issue, if any.

We have to upload the PSSDIAG data on the dashboard like this: Click on import and give the path of the PSSDIAG output folder:

image

Reports of SQLNexus look like:

image

Steps to install and run the SQLNEXUS tool:

Download RML – http://sqlnexus.codeplex.com/wikipage?title=ReadTrace&referringTitle=Home
Download SQLNexus –  http://sqlnexus.codeplex.com/
How to run SQLNexus – http://sqlnexus.codeplex.com/wikipage?title=GETTING_STARTED

For manual PSSDIAG analysis, you could follow this blog: http://sql-blogs.com/2012/03/23/analyzing-pssdiag-logs-without-using-sqlnexus/

Above mentioned tools are boon to the DBAs to find out RCA. If you are confused about when to use which tool , let me explain you little more about it:

MDW : – This tool can be left configured and can be used for post mortem analysis. The limitation of this tool is that, it won’t give you missing indexes list , complilation / recompliation details etc. It will give you enough details to find out the issue but if you want to go in granular level, then this tool may not give you complete information.

To get more details , you will need to run the PSSDIAG. To run the PSSDIAG, you will have to first install the dashboard where you will configure the package. Once the configuration is done, you could pickup that .cab/exe file and take it to the server where you want to run it.

There you will follow the steps mentioned in PSSDIAG section above and capture the data. Once the data is captured you will need to feed that to SQLNexus so that it could show the reporting in more detailed format to find out the issue.

For manual PSSDIAG analysis, you could follow this blog: https://dbcouncil.net/2012/03/23/analyzing-pssdiag-logs-without-using-sqlnexus/

The drawback of PSSDIAG over MDW is , you will have to run it manually every time when the issue occurs but MDW will keep running in the background and fetch the data for you.

Live troubleshooting tools :

Activity Monitor : This is a very good tool to troubleshoot live issues. It has four sections as shown the below screenshot:

image

To see this tool you need to right click on the instance name and select activity monitor:

image

It can be used best for blocking issues or may be other query issues. Under processes you can get the output of sys.dm_exec_requests joined with some other DMVs which can be traced in profiler trace or we can check the query text:

image

Same way, we could use the other options to get resource waits , IO stats and expensive queries. It can help to at least find out the direction , you need to go.

It will be more effective if used with the standard reports mentioned below:

Standard SQL reports : There is a great set of reports which could be leveraged to do live troubleshooting for performance issues. Here is how find them:

Right click on the instance name as shown in the screenshot:
image

here is the screenshot of Server Dashboard report :
image

Similarly, there are lots of other reports which can help to find out high CPU/IO/time queries.  To read more about it , please check : http://sudeeptaganguly.wordpress.com/2011/11/04/standard-reports-in-ssms-2008-r2/ and  http://blogs.msdn.com/b/buckwoody/archive/2008/03/10/sql-server-management-studio-standard-reports-login-failures.aspx

UCP (Utility Control point): I have seen very few DBAs aware of this utility but can be very useful to track CPU,DB space and disk space on the servers.

Let me explain the best use of this utility by giving an example. lets say you have 10 applications hosted on SQL environment and there is high CPU caused by one of them. Could you think of any method of tracking which application’s query is causing the issue.

May be the answer will be, find out the query causing the issue and then find out which application this query belongs to. But If we use UCP , we can find out the application wise CPU consumption with just few clicks. But we will have deploy the application using data tier project using BIDS then only we will get the list under data tier application.

Application wise resource usage as shown in below screenshot :

image

This article explains about how to configure the UCP and it’s benefits : http://www.databasejournal.com/features/mssql/sql-server-utility-control-point-ucp.html

The dashboard of UCP looks like :

image

I have got this question many times, We have all the tools available then why do people go for paid tools for DB monitoring. The answer is , the paid tools give us one integrated framework where we can monitor all this. But almost same work can be accomplished by using all the above mentioned tools but the only difference is we have to hop between these tools to drive a conclusion.  These tools are boon for the organizations which don’t have budgets to spend on the monitoring tools.

HTH!

Analyzing PSSDIAG logs without using SQLNEXUS

In this blog we will discuss about how we can analyze the PSSDIAG logs without using SQLnexus tool.  In this post I will talk in reference to my blog on troubleshooting high CPU issues. Please make sure you open it while reading this blog because I have followed the same steps to analyze the PSSDIAG.

After installing and configuring the PSSDIAG as mentioned in my blog post, here are the next steps we have to follow.

After configuring the PSSDIAG , you will get a cab named pssd.cab in the chosen location:

image

Please make sure the cab is saved in some folder because we will have to extract files and it will become easier to manage .

Here is how it will look when the files are extracted:

image

Before running the PSSDIAG data collection , we have to make sure that the issue is happening  or going to happen and we will have to capture the logs for at least 5-10 minutes “whilst” the issue is happening. It will help us to get enough data for analysis.

To start the PSSDIAG we have to double click the PSSDIAG.cmd and it will look like :

image

Once you see the below text in the command prompt , that means now the PSSDIAG collection process has started.

image

Below is the scenario of high CPU for which I had collected the PSSDIAG logs.

image

Now lets look at the collected logs in the folder named “output” which will be created in the same location from where pssdiag.cmd was run.

The output folder will look like:

image

There will be lots of files in the output and I will discuss about the major files which will help us to find out the cause of high CPU. I will follow my blog on troubleshooting high CPU issues step by step to troubleshoot.

Step 1: – When you see high CPU utilization on the server check in task manager, which process is consuming high CPU. If it’s SQL server then open Perfmon and add the counter for Processor: % Privileged time, % Processor time and %user time

If you see the %privileged time to be high like hovering around 40% and above then it could be some driver issue. To troubleshoot these kinds of issues you could try:

Using Xperf: – http://blogs.msdn.com/b/ntdebugging/archive/2010/03/22/using-xperf-to-root-cause-cpu-consumption.aspx — Video; http://msdn.microsoft.com/en-us/performance/cc709422

If it’s user time which is high then find the process which is consuming high CPU.
image

In this case which apparently is , now we have to check the process consuming high CPU:

image

In this case SQL is consuming around 60% of the CPU and rest is consumed by the remaining processes. As SQL is major contributor to the high CPU at the system and is consistently consuming the same amount.

Another way to find out how much CPU SQL server is consuming – Open the file(from the output folder) , name ending with SQL_2005_Perf_Stats_Startup.OUT.

Look for: Recent SQL Processor Utilization (Health Records)  – you will get the output as follows:

runtime                         system_idle_cpu sql_cpu_utilization
——————————  ————— ——————-
2012-03-23T16:26:33.137        90                   1
2012-03-23T16:26:33.137        64                  20
2012-03-23T16:26:33.137        58                  25
2012-03-23T16:26:33.137        56                  25
2012-03-23T16:26:33.137        55                  25
2012-03-23T16:26:33.137        57                  26
2012-03-23T16:26:33.137        57                  26
2012-03-23T16:26:33.137        44                  35
2012-03-23T16:26:33.137        14                  53
2012-03-23T16:26:33.137         4                  60
2012-03-23T16:26:33.137         3                  59
2012-03-23T16:26:33.137         3                  59
2012-03-23T16:26:33.137         2                  58
2012-03-23T16:26:33.137         3                  58
2012-03-23T16:26:33.137         8                  54
2012-03-23T16:26:33.137        10                  53
2012-03-23T16:26:33.137         2                  56

Step 2. Lets move to the next step of finding the architecture which in our case is x64.

Snippet from Msinfo2 file:

Item    Value
OS Name    Microsoft Windows Server 2008 R2 Enterprise
Version    6.1.7600 Build 7600
System Name    ANMOL-PC
System Type    x64-based PC
Processor    Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz, 2933 Mhz, 4 Core(s), 8 Logical Processor(s)

Step 3: – Check if there are any trace flags enabled:

To get this we can check in the SQL error log at the startup of SQL server. the file name is SQL_Base_Errorlog_Shutdown.TXT. In this case we don’t have any trace flags(snippet from error logs):

2012-03-23 11:50:55.060 Server Microsoft SQL Server 2008 (RTM) – 10.0.1600.22 (X64)
Jul  9 2008 14:17:44
Copyright (c) 1988-2008 Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )
2012-03-23 11:50:55.060 Server (c) 2005 Microsoft Corporation.


Step 4
: – Check the below mentioned SQL parameters

To check the SP_configure output we can check the file : –  name ending with sp_sqldiag_Shutdown.OUT

In this case:

max degree of parallelism                     0          64            0           0
max server memory (MB)                       16  2147483647         2048        2048
min server memory (MB)                        0  2147483647            0           0
priority boost                                0           1            0           0

In this case we are safe because we have:
1. Set max server memory
2. Not set Priority boost
3. As we have 8 processors, we don’t need to set any value for this unless required.


Step 5
: – Run DBCC memorystatus and check for the USERSTORE_TOKENPERM memory clerk.

To check the DBCC memorystatus, please check the file:  filename ending with DBCC_MEMORYSTATUS_Startup.OUT

In our case, it is normal:

USERSTORE_TOKENPERM (node 0)             KB
—————————————- ———–
VM Reserved                              0
VM Committed                             0
Locked Pages Allocated                   0
SM Reserved                              0
SM Committed                             0
SinglePage Allocator                     584 <=very less so no problem here.
MultiPage Allocator                      288

CACHESTORE_SQLCP (node 0)                KB
—————————————- ———–
VM Reserved                                        0
VM Committed                                       0
Locked Pages Allocated                             0
SM Reserved                                        0
SM Committed                                       0
SinglePage Allocator                           10128 <= Not very high
MultiPage Allocator                             2152

MEMORYCLERK_SQLQERESERVATIONS (node 0)   KB
—————————————- ———–
VM Reserved                                        0
VM Committed                                       0
Locked Pages Allocated                             0
SM Reserved                                        0
SM Committed                                       0
SinglePage Allocator                          374256 <= High based on the allocated memory
MultiPage Allocator                                0


Step 6
: – If the memory consumed by CACHESTORE_SQLCP is high then we check compilations and re-compilations which in our case is not required.

But I will still mention about the file in the PSSDIAG output which you will need to check for procedure cache pollution:  File name will end with :- SQL_2005_Mem_Stats_Startup.OUT and you will have to search for the string: –   — proccache_pollution

The output will look like :

runtime     plan_count  total_size_in_bytes  cacheobjtype       objtype    usecounts
———– ———– ——————– —————— ———————-
2011-08-22        48768           2898108416 Compiled Plan      Adhoc                4
2011-08-22         5762            281034752 Compiled Plan      Adhoc                1
2011-08-22         5755           1100980224 Compiled Plan      Adhoc                1
2011-08-22         4125            103612416 Compiled Plan      Adhoc                1
2011-08-22         1380            396263424 Compiled Plan      Adhoc                1
2011-08-22         1379            200916992 Compiled Plan      Adhoc                1
2011-08-22         1379            291463168 Compiled Plan      Adhoc                1
2011-08-22         1378            149831680 Compiled Plan      Adhoc                1
2011-08-22         1166            107454464 Compiled Plan      Adhoc                4
2011-08-22         1056            158662656 Compiled Plan      Adhoc                1
2011-08-22         1056             72105984 Compiled Plan      Adhoc                1
2011-08-22         1054             48947200 Compiled Plan      Adhoc                1
2011-08-22          157             11329536 Compiled Plan      Prepared             1
2011-08-22          109              4751360 Compiled Plan      Adhoc                1

But this is just dummy data to show you how it looks like. There will be more columns e.g. short_qry_text which will show you the query for which the parameterization is required.

In the above output, we have to look at the plan_count , size and the use counts which will simply mean that the query for which a new plan was created per execution. For more explanation, please check step 6 in  troubleshooting high CPU issues.

Step 7:- This relates to our issue i.e. lots of hashing and sorting is happening :

image

If we see, the output exactly matches with :

MEMORYCLERK_SQLQERESERVATIONS (node 0) KB
—————————————- ———–
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 374256
MultiPage Allocator 0

Now, the time to check worktables/sec and workfiles/sec

Worktables/sec are high(~15) based on the number of batches/sec (~ 2 ) – which simply lots of sorting spilling to tempdb :

image

Workfiles/sec are are very low (It shows that there was no hashing spilled to the tempdb) :

image

In essence:

1. No compilation and re-compilation which means no parameterization issue.
2. No configuration issues at SQL server end.
3. we need to find out a query which might have cause the issue.

Please note : – The values are low because I have run small batched to reproduce high CPU.

Now we will move in the direction of hunting the query which is causing the trouble.We will make use of SQL server 2008’s feature query plan hash to drill down the issue. In the file – name ending with  Perf_Stats_Snapshot_Shutdown.OUT. Look for top 10 CPU by query_hash

Here is the output you will get:

runtime     total_worker_time    sample_statement_text
———-  ——————– ———————————-
2012-03-23            1683404272 insert into test (select * from test123 order by col1)
2012-03-23               4432252 SELECT * FROM sys.sysperfinfo
2012-03-23               2461141 SELECT   CONVERT (varchar(30), @ru
2012-03-23               1595088 select db_id() as dbid,     case
2012-03-23               1495085 SELECT CONVERT (varchar(30), @runt
2012-03-23               1062060 SELECT       CONVERT (varchar(30),
2012-03-23               1026058 SELECT CONVERT (varchar(30), @runt
2012-03-23                905051 SELECT CONVERT (varchar(30), @runt
2012-03-23                686038 SELECT      sess.session_id, req.r
2012-03-23                684039 select TOP 10000 * from sys.dm_exe

In the above output, if we see the query which is causing the high CPU is: insert into test (select * from test123 order by col1)

We can tune this query to avoid excessive sorting and which will further help to avoid high CPU issue.

I will keep adding more content in this blog as I find more interesting scenario of high CPU.

Configure PSSDIAG to capture high CPU issues

I am sure most of the SQL community is aware of the PSSDIAG and how helpful it is when it comes to troubleshooting performance issues. In this blog , I will discuss about how to configure the PSSDIAG in high CPU Scenarios.

Here are few blogs which I would like to share about how we can download and configure PSSDIAG:

Download PSSDIAG: – http://diagmanager.codeplex.com/releases/view/67449

Configure PSSDIAG :- http://sqlbg.wordpress.com/2011/06/15/pssdiag-and-sqldiag-manager/
http://troubleshootingsql.com/2011/05/25/tools-tips-and-tricks-9-pssdiag-configuration-manager/

Please be aware, this PSSDIAG is only for SQL server 2005,2008 and for earlier version you will still need to use sqldiag :- http://support.microsoft.com/kb/830232

Here are the things which we can configure in the PSSDIAG(SQL database engine):

1. Profiler Traces
2. SQL DMV outputs
3. Perfmon

Before configuring the PSSDIAG, we have to be very clear about the workload on the environment especially when it’s production. When we run the PSSDIAG, there can be slight performance degradation and if the workload is very high then it can even cause outage (if not configured properly). The heaviest resource taker in the PSSDIAG is profiler traces and in busy environments they can be disabled to avoid any outage/issue.

There may be a question in your mind – How to differentiate between low and high workload? Well, this question is open ended and different people may have different answers.  Here is what I think about it (completely based on my experiance)

Low Workload : – less than 100 batches per second
Medium Workload : – 100 – 500(approx.) batches per second
High Workload: – greater than 500 batches per second

The above is just a rough idea but any load even with less than 100 batches can be a high load depending upon the type of queries and throughput of the system. So, the configuration with the profiler traces needs some logic otherwise the configuration is very simple :

1. Select the architcture X64,X86 or IA64
2. Select Instance type(SQL 2005 or SQL 2008)
3. Enter machine name and instance name
4. Uncheck the SQL trace button

5. Select the files shown in the screen shot(SQL 2008 Perf Stats,SQL 2005 Perf Stats,SQL memory error,SQL blocking) and leave rest of the things to default unless there is any specifc requirement.

Here you go, PSSDIAG without traces for high CPU is ready. This approach works for heavy load environments especially for SQL server 2008. Because in SQL 2008, we can find out the high CPU queries accurately even without having profiler traces. The secret behind this the new feature called Plan hash and query hash which was introduced in SQL server 2008.

I am sure you will be aware of plan hashes(only SQL 2008 onwards): –  http://blogs.msdn.com/b/bartd/archive/2008/09/03/query-fingerprints-and-plan-fingerprints_3a00_-the-best-new-sql-2008-feature-you_2700_ve-never-heard-of.aspx (must read). And yes, we capture plan hashes in the PSSDIAG for SQL 2008 and which eliminates the need of profiler traces .

But for SQL 2005, we have to rely on sys.dm_exec_query_stats and as mentioned in the above blog, it’s not 100% correct. But it gives us fair idea of which query may be the culprit.

Now, what if we need to configure the PSSDIAG along with the profiler traces, here are the steps for that:

1. Select the architcture X64,X86 or IA64
2. Select Instance type(SQL 2005 or SQL 2008)
3. Enter machine name and instance name
4. In profiler traces, here are the events you could capture(Specifc for High CPU only).

5. Select the files shown in the screen shot(SQL 2008 Perf Stats,SQL 2005 Perf Stats,SQL memory error,SQL blocking) and leave rest of the things to default unless there is any specifc requirement.

Once this is configured, we are ready with the PSSDIAG to be run.  For various options on how PSSDIAG can be scheduled to run at specifc times, please check the article : – http://support.microsoft.com/kb/830232. I have created the PSSDIAG for SQL server 2008 and 2008 R2. Please copy the files , save them as .xml and open in the PSSDIAG by clicking on open button on the top left( PSSDiag_2008,PSSDiag_2008R2). Please test it before running it on production server.

In my next post, I will explain how can we analyze the PSSDIAG output logs without using SQL Nexus and what are the main files that we have to look at while troublehoosing high CPU issues.