This post is continuation of my post Post
In the previous post, I discussed about Database and OS configuration tools and in this post I will discuss about performance monitoring tools.
Performance monitoring and troubleshooting : In performance monitoring we have a great set of tools which are free of cost and can be easily accessible. I won’t go in much details of these tools as there is already lots of stuff available. The main intent is to collate all the tools list at one place. It will help to know what tools are available for free and how best they can fit in your environment.
Performance monitoring and troubleshooting can be done in two ways. 1. Live 2. postmortem.
lets take an example of High CPU issue, if DBAs are asked to troubleshoot live on such issues – they need a very deep understanding of how SQL server work and need lots of SQL scripts/perfmon counters handy to find out the issue. But for DBAs who have level 100-200, they will end up either restarting the SQL server or killing the sessions with heavy queries.
Now, lets say the issue is not fixed in live troubleshooting and the management team wants to know the root cause of the issue i.e. postmortem analysis. If DBA know lots of DMV’s it still can be figured out though still the chances are very less but for the new DBA it will be very difficult to find out something.
To cater to both the needs of live and postmortem analysis, we have enough tools available.
For postmortem performance troubleshooting (These tools can be used for performance baseline as well) , the tools which we have are :
1.MDW – Management Data Warehouse
For Live troubleshooting and monitoring the tools we have, are:
1. Activity monitor
3. Standard SQL reports
Postmortem performance analysis tools
1. Management Data Warehouse (MDW) : – This was the new tool introduced in SQL server 2008. It’s gained a lot of popularity for the postmortem analysis. Basically, It has predefined set of data collectors which can be used to collect and upload the data to MDW database. It captures query statistics,disk activity and server activity. Even Utility control point(will discuss in the next sections) tool uses this in the background.
Here are the reports screenshot which could be used to find out the cause of the issue:
Overall Server Summary:
Query Statistics report where you could see the reports based on CPU/IO/Duration.
Disk summary report
These reports help to find out the issue proactively. After studying it little deeper, I found that they have SSIS packages built to fetch the data from the server and upload it to the data warehouse database.
To read the packages in detail you could open and fetch the packages from SQL server integration services and understand what it does:
Note: Change the job frequency accordingly as the default frequency may not be appropriate for your environment.
2. PSSDIAG / SQLNexus : This is the set of tools which helps to do postmortem analysis. I will discuss briefly about both of them.
PSSDIAG: It’s a framework you could configure the kind of data you want to capture. The dashboard looks like:
In the screenshot, you could see various sections like profiler trace, perfmon, SQLDIAG and other SQL scripts. In this blog, I have discussed about how to configure the pssdiag for the high CPU issue: https://dbcouncil.net/2012/02/18/configure-pssdiag-to-capture-high-cpu-issues/ . you could refer this blog for reference to configure the tool.
Once the tool is configured and you save the configuration, it will create a pssd.cab/exe which could use to capture the logs. You have to follow the below steps to run it during the time of the issue:
Steps to run the PSSDIAG:
a. Create a folder named PSSDIAG on your SQL server machine. This folder should be on a drive with plenty of space as diagnostic file collections can be quite large
b. Download the PSSDIAG from FTP site.
c. Open a command prompt. Change the current directory to your PSSDIAG folder and run pssd.exe to extract its contents.
d. Run PSSDIAG.cmd at the command prompt to start the collection process.
e. Once PSSDIAG displays the message, “PSSDIAG Started,” attempt to reproduce your issue.
f. Stop PSSDIAG by pressing CTRL+C.
Please make sure you collect the data for 10-15 minutes minimum to get the complete understanding of the issue.
Note : – Please make sure you configure profiler traces to minimal level. It can cause production outage if not configured appropriately.
Freely downloadable tool, the PSSDIAG: – http://diagmanager.codeplex.com/
Configure PSSDIAG : – http://sql-blogs.com/2012/02/18/configure-pssdiag-to-capture-high-cpu-issues/
More information on PSSDIAG: – http://support.microsoft.com/kb/830232
After collecting the logs ,you may think – how to analyze the data. To help us with that, there is another tool SQLNEXUS.
SQLNexus : SQLNEXUS is tool which helps to analyze the logs captured by PSSDIAG utility. The PSSDIAG logs are uploaded to SQLNEXUS tool and it shows the performance bottlenecks, slow queries, blocking in graphical reports. It provides effective and detailed analysis of the issue, if any.
We have to upload the PSSDIAG data on the dashboard like this: Click on import and give the path of the PSSDIAG output folder:
Reports of SQLNexus look like:
Steps to install and run the SQLNEXUS tool:
– Download RML – http://sqlnexus.codeplex.com/wikipage?title=ReadTrace&referringTitle=Home
– Download SQLNexus – http://sqlnexus.codeplex.com/
– How to run SQLNexus – http://sqlnexus.codeplex.com/wikipage?title=GETTING_STARTED
For manual PSSDIAG analysis, you could follow this blog: http://sql-blogs.com/2012/03/23/analyzing-pssdiag-logs-without-using-sqlnexus/
Above mentioned tools are boon to the DBAs to find out RCA. If you are confused about when to use which tool , let me explain you little more about it:
MDW : – This tool can be left configured and can be used for post mortem analysis. The limitation of this tool is that, it won’t give you missing indexes list , complilation / recompliation details etc. It will give you enough details to find out the issue but if you want to go in granular level, then this tool may not give you complete information.
To get more details , you will need to run the PSSDIAG. To run the PSSDIAG, you will have to first install the dashboard where you will configure the package. Once the configuration is done, you could pickup that .cab/exe file and take it to the server where you want to run it.
There you will follow the steps mentioned in PSSDIAG section above and capture the data. Once the data is captured you will need to feed that to SQLNexus so that it could show the reporting in more detailed format to find out the issue.
For manual PSSDIAG analysis, you could follow this blog: https://dbcouncil.net/2012/03/23/analyzing-pssdiag-logs-without-using-sqlnexus/
The drawback of PSSDIAG over MDW is , you will have to run it manually every time when the issue occurs but MDW will keep running in the background and fetch the data for you.
Live troubleshooting tools :
Activity Monitor : This is a very good tool to troubleshoot live issues. It has four sections as shown the below screenshot:
To see this tool you need to right click on the instance name and select activity monitor:
It can be used best for blocking issues or may be other query issues. Under processes you can get the output of sys.dm_exec_requests joined with some other DMVs which can be traced in profiler trace or we can check the query text:
Same way, we could use the other options to get resource waits , IO stats and expensive queries. It can help to at least find out the direction , you need to go.
It will be more effective if used with the standard reports mentioned below:
Standard SQL reports : There is a great set of reports which could be leveraged to do live troubleshooting for performance issues. Here is how find them:
Right click on the instance name as shown in the screenshot:
here is the screenshot of Server Dashboard report :
Similarly, there are lots of other reports which can help to find out high CPU/IO/time queries. To read more about it , please check : http://sudeeptaganguly.wordpress.com/2011/11/04/standard-reports-in-ssms-2008-r2/ and http://blogs.msdn.com/b/buckwoody/archive/2008/03/10/sql-server-management-studio-standard-reports-login-failures.aspx
UCP (Utility Control point): I have seen very few DBAs aware of this utility but can be very useful to track CPU,DB space and disk space on the servers.
Let me explain the best use of this utility by giving an example. lets say you have 10 applications hosted on SQL environment and there is high CPU caused by one of them. Could you think of any method of tracking which application’s query is causing the issue.
May be the answer will be, find out the query causing the issue and then find out which application this query belongs to. But If we use UCP , we can find out the application wise CPU consumption with just few clicks. But we will have deploy the application using data tier project using BIDS then only we will get the list under data tier application.
Application wise resource usage as shown in below screenshot :
This article explains about how to configure the UCP and it’s benefits : http://www.databasejournal.com/features/mssql/sql-server-utility-control-point-ucp.html
The dashboard of UCP looks like :
I have got this question many times, We have all the tools available then why do people go for paid tools for DB monitoring. The answer is , the paid tools give us one integrated framework where we can monitor all this. But almost same work can be accomplished by using all the above mentioned tools but the only difference is we have to hop between these tools to drive a conclusion. These tools are boon for the organizations which don’t have budgets to spend on the monitoring tools.