Delhi NCR Community December event Session 1 – leverage Query Store for Query troubleshooting

On 19th December, we had completed the anniversary for the SQL Server Community for Delhi NCR. For this event, the agenda was as follows:

Session Title Speaker Contact Details
Leverage SQL 2016 Query Store for PTO Pranab Mazumdar Twitter , Blog
Tips for Query Tuning and Understand execution plan patterns Harsh Chawla Twitter , Blog
Tips of optimal Query Writing Sourabh Agarwal Twitter , Blog

In the first session, Pranab talked about query store a new feature for SQL 2016. The best , I could make out of the session was – now we can have run time plan captured in the tool for us. Before SQL 2014, we used to get compile time plan and to get the runtime plan, we had to run the query manually. Moreover, this feature can also be enabled on SQL Azure DBs as well.

Moreover, it’s really handy tool to enforce the best execution plan from all the execution plans already captured. Generally, we face issues after the upgrade to the newer versions i.e. our critical queries start taking longer time than than expected due to plan changes.If the backup of previous plan is not done, then it’s difficult to restore the execution time back to as normal.

Now, we can enable this tool on the test environment and replay the load of production system. It can help to capture the execution plans efficiently and help to pick up the efficient plan if there is any performance degradation on the upgraded SQL instance due to the new execution plan.
PPT and demo scripts are attached with this post – please feel free to download and try it on SQL server 2016 instances.

HTH!

Advertisements

Query Tuning Approach – For SQL Server and SQL Azure DB – Part1

It’s been really long since I have written a post. I have been working on lots of query tuning my in recent past and I thought of sharing my approach.I have been using this approach for quite some time now and have got really good success rate to tune the query.

Just understand one rule, 80% of the times you will encounter queries which can be tuned by intermediate skills. It’ll be just 20% or even lesser when you will need some extreme level of query tuning skills. Honestly speaking, I rarely get into such situations where I see the query is complex to be tuned – really rare. I have been using really basic approach to combat bad queries and it’s really serving well for my customers.

I assume , you know how to identify which queries should be picked up for tuning e.g. if you are facing high CPU issues , you need to pick CPU intensive queries or if you are facing memory pressure or slow I/O , you need you pick high logical/physical read queries. Now, you know the resource contention on your system – let’s see what to do next:

1. Query to get top resource intensive queries: I have picked these queries from the PSSDIAG tool mentioned here

print ‘– top 10 CPU by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash, COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_worker_time) DESC
) t

print ‘– top 10 logical reads by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_logical_reads) DESC
) t

print ‘– top 10 elapsed time by query_hash –‘

select getdate() as runtime, *
from
(
SELECT TOP 10 query_hash,
sum(execution_count) as ‘execution_count’,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_hash
ORDER BY sum(total_elapsed_time) DESC
) t

print ‘– top 10 CPU by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash,
COUNT (distinct query_plan_hash) as ‘distinct query_plan_hash count’,
sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_worker_time) DESC;

print ‘– top 10 logical reads by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash, sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_logical_reads) DESC;

print ‘– top 10 elapsed time  by query_plan_hash and query_hash –‘

SELECT TOP 10 query_plan_hash, query_hash, sum(execution_count) as ‘execution_count’,
sum(total_worker_time) as ‘total_worker_time’,
SUM(total_elapsed_time) as ‘total_elapsed_time’,
SUM (total_logical_reads) as ‘total_logical_reads’,
max(REPLACE (REPLACE (SUBSTRING (st.[text], qs.statement_start_offset/2 + 1,
CASE WHEN qs.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), st.[text]))
ELSE qs.statement_end_offset/2 – qs.statement_start_offset/2 + 1
END), CHAR(13), ‘ ‘), CHAR(10), ‘ ‘))  AS sample_statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
group by query_plan_hash, query_hash
ORDER BY sum(total_elapsed_time) DESC;

2. Check the execution plan for the query – Just remember , wherever you get the top resource intensive queries – be it MDW / Profiler / DMVs – you will get an option to get SQL Handle / Plan Handle/ SQL Hash / Plan Hash. If you have any of these , you can get an execution plan from the cache.

Query to get the execution plan is:

This will give you the query execution and plan

select b.*,a.* from sys.dm_exec_query_stats a cross apply
sys.dm_exec_query_plan  (a.plan_handle) b
where a.query_hash= <Query_hash>

if you want to know , execution statistics specific to the plan e.g. how many times the plan was reused – in the above output , you will get:

image

There can be multiple plans as well but you need to pick up the plans with More reads/CPU/time based on execution count.

3. Get the query parameters for the runtime planJust remember , the plans you get from these DMVs are the compile time plans and to get the actual plans – you will need to run the query on your SQL Instance. I have observed, DBAs ask the developer team to share the parameters of the query so that, they can run the query  on the test environment. In fact, if you have the query plan – you can get the parameter from there itself. let’s how to get that:

Right click on the graphical execution plan and click on show execution plan as XML – once the xml plan is visible , search for ParameterList as show below:

image

4. Execute the query :  Now, it’s the time to execute the query and analyze the actual statistics. leverage the below commands to delve deeper into the query execution:

Set statistics IO ON –>   Get the IO statistics of the query

Table ‘residual_demo’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Set statistics Time ON –> Get the time statistics of the query i.e. how long it took for the query for compilation and running

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

Set Statistics Profile ON –> Get the execution plan of the query in the text format
image

or
get the graphical execution by clicking on the actual execution plan from the Management Studio

Depending on the issue you are troubleshooting, the operator to be analyzed will vary. e.g. If you are tuning query for CPU intensive workload then probably the first thing to look at will be sort and for high logical reads , it will be an index/table scans

In this post, we have got the execution plan and now we have to start the tuning process. In the next post, I will talk about the operators you can pick for the tuning.

HTH!

SQL UG meet Delhi NCR – Theme – Performance and Tuning of SQL Server

This event was held on 19th September, where we had DB professionals coming from all over Delhi NCR region. Agenda of this event was :
1. SQL Performance Troubleshooting using PSSDIAG and SQLNexus
2. Reduce downtime on Production Server – How to Manage SQL Server Proactively
3. Increase IOPS to handle SQL workloads using Windows 2012 storage spaces

SQL Performance Troubleshooting using PSSDIAG and SQLNexus : This session was delivered by Amit Khandelwal , Support Escalation Engineer , India GTSC. He explained about
1. PSSDIAG configuration and SQL Nexus in detail.
2. Helped the audience understand , the use of this tool while troubleshooting performance issues.
3. Explained about IO issues and tips to troubleshoot such issues
4. Shared some trace flags which can be enabled on all the SQL environments as best practices
5. Shared some tricks while troubleshooting the SQL performance before delving deeper.

Session PPT can be found here

Reduce downtime on Production Server – How to Manage SQL Server Proactively : This sessions was delivered by Harsh Chawla , Premier Field Engineer , Microsoft Services(myself) on how to be more proactive while working on the production server. This was a philosophical session on how as a DBA can change his mindset while working on the database and be more efficient on the job. In this presentation , the complete discussion was around a production down scenario and how it could be avoided. There were a major discussion around , these three rules:

1. Understand your “Present” – Just be aware of the Application, business purpose/impact, SLA/RPO/RTO and hardware configuration of SQL Server you are working on.
2. Invest for better “Future” – Follow the right practices and have the correct configuration in place to avoid any issues in future.
3. Don’t Ignore “Past” – Have right monitoring and auditing solution to track the cause of the issues in the past.

Session PPT can be found here

Increase IOPS to handle SQL workloads using Windows 2012 storage spaces :  This session was delivered by Gaurav Srivastava , Service Engineer , Microsoft IT. He talked about how storage spaces can help you with high disk performance even without using SANs. Storage spaces is the feature from Windows 2012 and is very useful when you deploy SQL on Azure VMs. Nevertheless , the same feature can be used even for on-premise SQL server deployed on Windows 2012. He talked about:
1. Storage Pools
2. Disk caching
3. Storage spaces types – Simple, Mirror and Parity
4. Tools like Diskpd and SQLIO to see the disk IOPS and latency
5. Shared the demo of creating Storage pools and spaces using Powershell and GUI

Session PPT and Scripts can be found here

Disclaimer – The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

HTH!

List of SQL server readiness links for Self learning!

I have been interacting with DBAs consultants as part of my work for a long time now. I have been getting a common request on SQL learning resources and mostly for a collated list of blogs / learning materials for self-study. I thought of writing this post where I could mention the resources , you can simply add to your browser favorites and refer at leisure. If you are new or enhancing your skills on SQL server, this information will help you grow further.

In this post you will see :

1. List of Microsoft SQL server teams’ Blog websites
2. SQL learning free video library
3. List of SQL fundamental and troubleshooting learning
4. Report SQL server bugs website
5. Sites to download SQL Troubleshooting tools
6. List of Facebook Pages/groups for help and getting latest community events updates

Microsoft Team Blogs:
These blogs are managed by Microsoft teams directly and can be referred for the latest updates and SQL features.

SQL Customer Advisory Team blog – http://blogs.msdn.com/b/sqlcat/
SQL server CSS SQL server Engineer’s blog – http://blogs.msdn.com/b/psssql/
SQL server PFE blog – http://blogs.msdn.com/b/sql_pfe_blog/
Microsoft GTSC – SQL server troubleshooting blog – http://blogs.msdn.com/b/sqlserverfaq/
Microsoft SQL server Team Blog – http://blogs.technet.com/b/dataplatforminsider/

SQL Video Libraries:
You can view/download free videos from these websites for your SQL server learning –

Free videos for SQL learning – https://www.sqlskills.com/sql-server-resources/sql-server-mcm-training-videos/
Free Videos from Microsoft Team – https://channel9.msdn.com/
Free Videos from Microsoft GTSC Team – http://blogs.msdn.com/b/sqlserverfaq/archive/tags/dbvideo/

SQL Learning/troubleshooting blog :
These links can be referred for the SQL fundamental and troubleshooting learning.

SQL fundamentals and features blog – http://sqlblog.com/
SQL Skills – https://www.sqlskills.com/sql-server-resources/
Query Processing and Optimization – http://blogs.msdn.com/b/sqlqueryprocessing/
Bart Duncan’s Query Processing and Optimization Blog –http://blogs.msdn.com/b/bartd/
Craig Freedman’s Query Processing and Optimization Blog- http://blogs.msdn.com/b/craigfr/
Microsoft SQL server Library – https://msdn.microsoft.com/en-us/library/bb545450.aspx
SQL Authority Blog – http://blog.sqlauthority.com/author/pinaldave/
SQL/.net blog – http://blogs.msdn.com/b/arvindsh/
SQL PTO Blog – http://www.brentozar.com/blog/
SQL AlwaysON blog –http://blogs.msdn.com/b/sqlalwayson/
SQL troubleshooting blog – http://troubleshootingsql.com/
SQL & BI blog – http://www.sqlserverfaq.net/
SQL PTO blog – http://blogs.msdn.com/b/blogdoezequiel/
SQL AlwaysON and SQL 2014 learning blog – http://sqlserver-help.com/
SQL learning blog – http://sqluninterrupted.com/
SQL /Azure learning  – http://dbcouncil.net
SQL troubleshooting blog – http://mssqlwiki.com/
SQL Learning Blog – http://www.sqlservergeeks.com/

 

Report Bug for SQL server :
If you suspect you have hit a bug with SQL server, you can notify or vote for the bug on this forum –
Microsoft Connect website – http://connect.microsoft.com/

SQL Forums:
If you are stuck on SQL server issues, you can post your questions in these forums to get help –

Microsoft’s SQL server Forum – https://social.msdn.microsoft.com/Forums/en-US/home
Microsoft’s SQL server Forum – https://social.technet.microsoft.com/Forums/sqlserver/en-US/home
SQL server Central – http://www.sqlservercentral.com/Forums/
SQL Team Blog –http://www.sqlteam.com/forums/

SQL Facebook pages/groups:
If you want to join SQL community events for free in your region, you can join these groups for the further updates. Moreover, you can also put your questions in these groups:

SQL Server Delhi NCR community –https://www.facebook.com/groups/1537669073140700/
SQLserverFAQ – https://www.facebook.com/groups/213527572040350/
SQL Bangalore Group –https://www.facebook.com/groups/SQLBangalore/
SQL Server DBA group – https://www.facebook.com/groups/sqlserverdbaindia/
MSSQLWIKI – https://www.facebook.com/groups/mssqlwiki/
Mumbai Techie Group – https://www.facebook.com/MumbaiTechieGroup?fref=ts

Troubleshooting tools site:
This can be used to get latest troubleshooting tools.
Sysinternals – https://technet.microsoft.com/en-us/sysinternals
Codeplex – http://www.codeplex.com/

The list is huge but I have tried to keep as optimal as possible to make this relevant for you. However, I will keep on updating more links for you.

HTH!

Strategy for tuning SQL server VLDB

I have been thinking to write this post for quite sometime. I thought it will be good to share my success story of making a VLDB run super fast. When I started working the scenario was:

1. The database size – 1 TB
2. Data insertion rate per day – approx. 25 GB
3. Long running transactions and heavy blocking entire day
4. Log shipping failure because of huge log file backups , sometimes 100 GB
5. Replication latency due to long running transactions.
6. Biggest table had 10,000,000,000 records(cause of the contention).

It was a total chaos. At that point I was thinking where to start. As usual, I started to understand the environment and tried to find out the issues and bottlenecks. MDW(Management Data Warehouse) was really a life saver. MDW and perfmon helped to find out the health/risks of the overall system.
To read more about MDW, please check this post: https://dbcouncil.net/2013/06/25/amazing-sql-server-tools-every-dba-must-have-part-2/

To start with, I checked for overall SQL server status like Memory, CPU , IO , Top bottlenecks and resource intensive queries. My initial action was just to give a band aid solution to the system to restore it to a normal state. I tuned top queries , made configuration changes and optimally configured the hardware e.g. disks and memory.  All said and done,  the system started performing better.

Then I asked myself, is it sufficient????  The answer was a big no. I had to churn the system to give a permanent resolution or at least a long term solution.So, I did a flashback on my on my initial observation of the system, the issue which was striking me the most , was the heavy scans on the queries (more than 1 billion reads) .  It’s not why there were 1 billion reads but the problem was number of times these number of reads were being performed. Then the question was, can it be avoided or reduced ?

Finally my mission started, I started delving deeper into the system. I found there were many tables where we had more than 1 billion records and lots of index scans etc.  It seems really common to have more than 1 billion records. But sometimes, we need to check with the management/project leads/DBAs:

1. Do we really need these number of records in a table(Data purging)?
2. Do we access these many records actively(Data Archival)?
3. Can the records be archived or at least partitioned?
4. Is the data type and size being used optimally?
5. Is the indexing strategy optimal?
6. Can the compression be done?

Believe me, many times these questions can bring you lots of work 🙂 and eventually relief from the issues mentioned in the beginning of the post. Project leads/ management , they are so busy in meeting the project deadlines – these activities about huge tables is completely ignored unless there are any major issues.

Any table with more than 1 billion records need to have either partitioning or archival strategy in conjunction with the data type assessment , data compression and indexing strategy. Sometimes, the problem is not just bad query writing but also huge unmanaged data in the tables. In my scenario, i found the tipping point to be huge unmanaged data in the tables.

For better understanding,let’s take some real life example, lets say in a food market, if we have 2 million bottles of ketchup out which 1.8 million bottles have reached expiry date. Now if I have to find remaining .2 million bottles. How much extra time/fetches , will I spend/do? Of course it’s going to be huge. It’s always preferred to have only required stuff – be it bottles of ketchup in food market or clothes in our wardrobe and similarly the data in the database.

Again “Band aid” solution first : To showcase the value of the plan, it’s always good to start with the band aid solution. Band aid solution is the solution which can show results with minimal tweaks/efforts. In my scenario, the band aid solution was:

1. Better Indexing strategy
2. Data Partitioning
3. Data compression

These three points seemed achievable as band aid solutions. Planning the archival strategy and Data type assessment were big activities and needed lot of intervention of busy people (management and project leads). Along with that these two activities may need coding and design changes which sometimes are stuck.

For better indexing strategy, I will write another blog post soon – but for now , i will just brief you on this. It’s more about removing duplicate/unused indexes and creating better indexes which could fetch more seeks. There is whole lot of dynamics around it which I will discuss in detail.

For Data partitioning strategy : please refer my blog post https://dbcouncil.net/2014/04/08/table-partitioning-have-you-chosen-right-partition-column/

For Data Compression –  please refer the link : http://technet.microsoft.com/en-us/library/dd894051(v=SQL.100).aspx

you must be thinking, how to relate it back to 1 billion reads in a query. If we have partitioned the big tables, it helps to reduce the scans/seeks to partition level. Instead of search from 1 billion records, we may now be searching from 25 million records which is still better. On the top of it , we we have a good indexing strategy and data compression, the queries will perform more seeks and eventually the number of reads will be very very less. When the reads in the query execution reduces , the queries start performing faster and eventually lesser load on the disks,lesser long running transactions , smaller the T-log backups and reduction in the data latency in replication. Such a big impact!

Now, the impact will be visible and of course management will be happier. But still permanent/long term resolution is pending. I will discuss about

1. Data Archival / Purging
2. Data Type/size assessment

in my next blog post.

HTH!

Amazing SQL server tools every DBA must have!!! – Part 2

This post is continuation of my post Post

In the previous post, I discussed about Database and OS configuration tools and in this post I will discuss about performance monitoring tools.

Performance monitoring and troubleshooting : In performance monitoring we have a great set of tools which are free of cost and can be easily accessible.  I won’t go in much details of these tools as there is already lots of stuff available. The main intent is to collate all the tools list at one place. It will help to know what tools are available for free and how best they can fit in your environment.

Performance monitoring and troubleshooting can be done in two ways. 1. Live 2. postmortem.

lets take an example of High CPU issue, if DBAs are asked to troubleshoot live on such issues – they need a very deep understanding of how SQL server work and need lots of SQL scripts/perfmon counters handy to find out the issue. But for DBAs who have level 100-200, they will end up either restarting the SQL server or killing the sessions with heavy queries.

Now, lets say the issue is not fixed in live troubleshooting and the management team wants to know the root cause of the issue i.e. postmortem analysis. If DBA know lots of DMV’s it still can be figured out though still the chances are very less but for the new DBA it will be very difficult to find out something.

To cater to both the needs of live and postmortem analysis, we have enough tools available.

For postmortem performance troubleshooting (These tools can be used for performance baseline as well) , the tools which we have are :
1.MDW – Management Data Warehouse
2.PSSDIAG/SQLNEXUS
3.PAL

For Live troubleshooting and monitoring the tools we have, are:
1. Activity monitor
2. UCP
3. Standard SQL reports

Postmortem performance analysis tools

1. Management Data Warehouse (MDW) : – This was the new tool introduced in SQL server 2008. It’s gained a lot of popularity for the postmortem analysis. Basically, It has predefined set of data collectors which can be used to collect and upload the data to MDW database.  It captures query statistics,disk activity and server activity. Even Utility control point(will discuss in the next sections) tool uses this in the background.

Here are the reports screenshot which could be used to find out the cause of the issue:

Overall Server Summary:

clip_image002

Query Statistics report where you could see the reports based on CPU/IO/Duration.

image

Disk summary report

image

These reports help to find out the issue proactively. After studying it little deeper, I found that they have SSIS packages built to fetch the data from the server and upload it to the data warehouse database.

To read the packages in detail you could open and fetch the packages from SQL server integration services and understand what it does:

image

Note:  Change the job frequency accordingly as the default frequency may not be appropriate for your environment.

2. PSSDIAG / SQLNexus : This is the set of tools which helps to do postmortem analysis. I will discuss briefly about both of them.

PSSDIAG: It’s a framework you could configure the kind of data you want to capture.  The dashboard looks like:
image

In the screenshot, you could see various sections like profiler trace, perfmon, SQLDIAG and other SQL scripts.  In this blog, I have discussed about how to configure the pssdiag for the high CPU issue: https://dbcouncil.net/2012/02/18/configure-pssdiag-to-capture-high-cpu-issues/ . you could refer this blog for reference to configure the tool.

Once the tool is configured and you save the configuration, it will create a pssd.cab/exe which could use to capture the logs. You have to follow the below steps to run it during the time of the issue:

Steps to run the PSSDIAG:

a. Create a folder named PSSDIAG on your SQL server machine.  This folder should be on a drive with plenty of space as diagnostic file collections can be quite large
b. Download the PSSDIAG from FTP site.
c. Open a command prompt.  Change the current directory to your PSSDIAG folder and run pssd.exe to extract its contents.
d. Run PSSDIAG.cmd at the command prompt to start the collection process.
e. Once PSSDIAG displays the message, “PSSDIAG Started,” attempt to reproduce your issue.
f. Stop PSSDIAG by pressing CTRL+C.

Please make sure you collect the data for 10-15 minutes minimum to get the complete understanding of the issue.

Note : – Please make sure you configure profiler traces to minimal level. It can cause production outage if not configured appropriately.

Reference links:
Freely downloadable tool, the PSSDIAG: – http://diagmanager.codeplex.com/
Configure PSSDIAG : – http://sql-blogs.com/2012/02/18/configure-pssdiag-to-capture-high-cpu-issues/
More information on PSSDIAG: – http://support.microsoft.com/kb/830232

After collecting the logs ,you may think – how to analyze the data. To help us with that, there is another tool SQLNEXUS.

SQLNexus :  SQLNEXUS is tool which helps to analyze the logs captured by PSSDIAG utility. The PSSDIAG logs are uploaded to SQLNEXUS tool and it shows the performance bottlenecks, slow queries, blocking in graphical reports. It provides effective and detailed analysis of the issue, if any.

We have to upload the PSSDIAG data on the dashboard like this: Click on import and give the path of the PSSDIAG output folder:

image

Reports of SQLNexus look like:

image

Steps to install and run the SQLNEXUS tool:

Download RML – http://sqlnexus.codeplex.com/wikipage?title=ReadTrace&referringTitle=Home
Download SQLNexus –  http://sqlnexus.codeplex.com/
How to run SQLNexus – http://sqlnexus.codeplex.com/wikipage?title=GETTING_STARTED

For manual PSSDIAG analysis, you could follow this blog: http://sql-blogs.com/2012/03/23/analyzing-pssdiag-logs-without-using-sqlnexus/

Above mentioned tools are boon to the DBAs to find out RCA. If you are confused about when to use which tool , let me explain you little more about it:

MDW : – This tool can be left configured and can be used for post mortem analysis. The limitation of this tool is that, it won’t give you missing indexes list , complilation / recompliation details etc. It will give you enough details to find out the issue but if you want to go in granular level, then this tool may not give you complete information.

To get more details , you will need to run the PSSDIAG. To run the PSSDIAG, you will have to first install the dashboard where you will configure the package. Once the configuration is done, you could pickup that .cab/exe file and take it to the server where you want to run it.

There you will follow the steps mentioned in PSSDIAG section above and capture the data. Once the data is captured you will need to feed that to SQLNexus so that it could show the reporting in more detailed format to find out the issue.

For manual PSSDIAG analysis, you could follow this blog: https://dbcouncil.net/2012/03/23/analyzing-pssdiag-logs-without-using-sqlnexus/

The drawback of PSSDIAG over MDW is , you will have to run it manually every time when the issue occurs but MDW will keep running in the background and fetch the data for you.

Live troubleshooting tools :

Activity Monitor : This is a very good tool to troubleshoot live issues. It has four sections as shown the below screenshot:

image

To see this tool you need to right click on the instance name and select activity monitor:

image

It can be used best for blocking issues or may be other query issues. Under processes you can get the output of sys.dm_exec_requests joined with some other DMVs which can be traced in profiler trace or we can check the query text:

image

Same way, we could use the other options to get resource waits , IO stats and expensive queries. It can help to at least find out the direction , you need to go.

It will be more effective if used with the standard reports mentioned below:

Standard SQL reports : There is a great set of reports which could be leveraged to do live troubleshooting for performance issues. Here is how find them:

Right click on the instance name as shown in the screenshot:
image

here is the screenshot of Server Dashboard report :
image

Similarly, there are lots of other reports which can help to find out high CPU/IO/time queries.  To read more about it , please check : http://sudeeptaganguly.wordpress.com/2011/11/04/standard-reports-in-ssms-2008-r2/ and  http://blogs.msdn.com/b/buckwoody/archive/2008/03/10/sql-server-management-studio-standard-reports-login-failures.aspx

UCP (Utility Control point): I have seen very few DBAs aware of this utility but can be very useful to track CPU,DB space and disk space on the servers.

Let me explain the best use of this utility by giving an example. lets say you have 10 applications hosted on SQL environment and there is high CPU caused by one of them. Could you think of any method of tracking which application’s query is causing the issue.

May be the answer will be, find out the query causing the issue and then find out which application this query belongs to. But If we use UCP , we can find out the application wise CPU consumption with just few clicks. But we will have deploy the application using data tier project using BIDS then only we will get the list under data tier application.

Application wise resource usage as shown in below screenshot :

image

This article explains about how to configure the UCP and it’s benefits : http://www.databasejournal.com/features/mssql/sql-server-utility-control-point-ucp.html

The dashboard of UCP looks like :

image

I have got this question many times, We have all the tools available then why do people go for paid tools for DB monitoring. The answer is , the paid tools give us one integrated framework where we can monitor all this. But almost same work can be accomplished by using all the above mentioned tools but the only difference is we have to hop between these tools to drive a conclusion.  These tools are boon for the organizations which don’t have budgets to spend on the monitoring tools.

HTH!

Amazing SQL server tools every DBA must have!!! – Part 1

After visiting to many customer locations I have found there is a great set tools , which many DBAs are not aware of. They struggle a lot while troubleshooting performance issues. So, I decided to blog on the subject matter so that it could help DBAs to troubleshoot the performance issues faster. This blog targets the audience with the knowledge level 100-200. All the tools discussed in this article are either free or inbuilt in SQL server installation:

There are majorly two sections of this blog:

    1. Database and OS configuration
    2. Performance monitoring

Database and OS configuration : There are 2 major tools which we have available :

  • BPA(Best Practice Analyzer): BPA tool is used to find the discrepancies in the system and SQL level configuration. It helps to unveil the critical issues before it causes serious outage. It has some great set of rules which it runs against the collected data. This tool is deprecated in SQL 2012 as it’s taken over by system center advisor which we will discuss in the next section:

It shows critical errors : If consistency checks are not being done on regular basis, If backups are outdated and there are some critical issues in the server etc. Please check the demo in the links mentioned below to get familiarity in depth:

This tool can be great help to understand the issues proactively and take necessary actions before it’s too late.

Here is the screenshot of the recommendations by BPA:

clip_image001

For more details check the link:

Download and Install MBCA  –http://www.microsoft.com/en-us/download/details.aspx?id=16475

Download and install BPA – http://www.microsoft.com/en-us/download/details.aspx?id=15289

Steps to run BPA – http://www.microsoft.com/en-us/download/details.aspx?id=436

Demo of how BPA works  – http://www.microsoft.com/en-us/download/details.aspx?id=18882

 

System Center Advisor(SCA) : This tool , which is now released for public , helps to identify the discrepancies in the SQL configuration. Not only this, it can help to find out the unsupported features and patch levels of SQL server instances. This tool can be of great help for the customers who can’t purchase the Microsoft premier services.

It pretty much comes under the proactive work for the SQL environment. As more than 30% approx. the performance issues in the SQL environment are due to the wrong configuration. It also helps to proactively find out, if the backups are outdated or consistency checks are not done etc.

How it works: 

There are two major components of this tool :

1. Gateway
2. Agent

1. Gateway: – This component has to be installed on the machine from where you will upload the logs to the MS cloud service. It can be a centralized place from where the data is uploaded to the cloud service or can be installed on every individual machine to upload.

2. Agent : – This component must be installed on the machine on which you want to collect the data.

image

This is how the reports look like :

image

image

In most of the environments, the production servers can’t be exposed to internet and the ports can’t be opened without critical purpose. For this tool to upload the data to cloud service ,we need to open a port (port no. 80) . To avoid any complications there is a provision to setup a remote gateway to which all the data will be provided and it will upload the logs to the cloud service.  As shown the first screenshot under this section, multiple agents can report to one gateway and it can be given internet access to upload the logs.

To read more about it , please check this post: http://sql-blogs.com/2013/03/21/system-centre-advisor-is-now-free/

Performance monitoring, I will discuss in my next post. Please click here to read about it