Thursday, 27 October 2011

T-SQL Performance Tuning

Good Evening ,

We had discussed basic incidence when query can be optimize by adding cluster/ non cluster index and  modification in select columns instead of using "*".

Get more detail by click on http://rajnish-awasthi.blogspot.com/2011/10/sql-query-running-slow.html

Here I would really like share other important resources which can be used to analyze query and actually helps to get the exact reason of slow running query.

We are going here to discuss the following key items.

    • Execution plan.
    • DMVs/DMFs.


Execution Plan   If query is running slow then definitely few reasons can be behind it, so the question is that how to find those reasons. It is difficult but not impossible to get the exact reason of slow running query, when query get executed it uses/create execution plan which can be used further to run the same query. If optimizer does not find the reusable execution plan then it create new one and it stores in sql cache .

Execution Plan

 In figure, we can see 100% cost on cluster index scan.Because this is basic execution plan, if we        consider complex query then easily  complex execution plan can be seen over the screen. We are seeing cluster index scan because there is no index in table we is covering whole query that is why scan is being used  here. Lets see what happen when following query is executed.

         select * from  HumanResources.Employee where employeeid between 10 and 50

   
Hmm, because filter has added here so now cluster scan has been replaced with cluster seek and cluster index now can cover whole query so it can execute faster than before. This way we have analyzed the query and resolved the issue by adding where clause in select statement.

When talking execution plan, DBA should caught first those queries which are taking high cost. When analyzing SP's execution plan, it could be difficult to find the worst part because plan does not gives the time duration for each query used in stored procedure. Some time print getdate() can be used in sp between sql statement to get the time duration which is taken by each query in stored procedure, this way developer/DBA can caught the worst query in the term of taking more time, special attention should given here for long running queries and again DBA should analyze execution plan for these queries.

Okay, now the question comes that what could be focus pints when analyzing execution plan ?

We are talking about graphical execution plan in our discussion however execution plan could be in form of text and XML as well. When interpreting graphical execution plan usually read it from right to left and top to bottom.

Here we can think about following key items which can included when analyzing execution plan of worst quires, you can leave items i.e. cluster index seek, non cluster index seek in execution plan because 'seek' keyword itself gives the best performance while executing query.
  • Table scan
  • Cluster index scan
  • Non cluster index scan
  • Rid lookup
  • Key lookup
  • Hash match
  • Merge join
  • Sort
Table Scan

Table scan is the worst part if you are getting this in your execution plan, DBA would be happy when tuning such queries because definitely query performance will increase if he do even little changes. Probably table scan occurs due to missing of  indexes. Also you may get surprise when query optimizer can select table scan instead of indexes which may already exist on table, why it happen !,  because table has not much records which should be there for choosing any index instead of table scan as optimizer think that table scan could be faster than using index for small size of tables.

Cluster Index Scan

Cluster scan is almost the same as table scan, optimizer finds more rows are returning than actual result, so how to resolve such issue ?
Developer should think about the columns he had mentioned in select statement and rows return by the query, optimizer is going to return the rows more than expectation so rows can be filter by adding appropriate where clause and unnecessary column should eliminate from select statement.

SELECT * FROM emp WHERE fname = 'Jeannie2'


Cluster index scan

 Because there is no another non cluster index on fname so optimizer used only cluster index and cluster index scan is being used here because index does not cover whole query , below select statement and its execution plan gives the idea that if key column is the part of where clause then optimizer can use the index and whole query would use the cluster index seek.

SELECT * FROM emp WHERE id = 167396

Cluster index seek

Non Cluster Index Scan

Non cluster index scan is also same as cluster index scan. Whenever another index is required by query optimizer, it creates an entry into DMVs and also append useful information into columns which is really useful when DBA/Developer is analyzing the worst query part. Non cluster index scan occurs because of the same reason as index does not cover whole query and need another piece of data that can retrieved from outside and that is why index scan occurs.

Index scan can be resolved by adding another filter which returns less data and can retrieve data from non cluster index only, unnecessary rows retrieval can hurt the sql performance.


Rid Lookup

lookup as well.

lookup comes in picture.

RID Lookup


Key Lookup

There is another lookuplookup except few differences.
lookup as bookmark and by using nested loop data join is required to combine the rows of index seek and key lookup. In such scenario non cluster can be modified by adding column to cover the whole query.


Key Lookup

Hash Match

When we are talking about data fetched from multiple table by using joins then situation may come that your query can use hash match internally which gives an idea to change in sql query or any missing index or modification required in existing index can be on table otherwise optimizer can continue with hash match because it is only the way to fetch the data faster from hash table.

Merge Join

Whenever data is puling from multiple tables and joined columns are  presorted, query optimizer choose merge join as a most efficient method to get the data from multiple tables, if columns are not presorted then query performance could be degrade because sort operation would be chosen by optimizer before merge join.



Sort

As operator is self-explanatory, when sort operation takes more than 50% cost in a query it means that query can be optimized by few changes in sql code otherwise data can be presorted by using index on column which is being used in order by clause. Some time when where clause is missing in select statement, it can lead sort operation.

Dynamic Management View / Function


DMVs/DMFs are new feature of  SQL Server 2005, well I think its a great feature and gift for DBAs, now administrator can easily maintain various important task by using of DMVs/DMFs. If we talk about sql 2000, first thing comes in DBA's mind that whats necessary DBCC statement are in sql server and how to use those  DBCC statement to produce necessary reports for management, various decision making report for DBAs, etc.

Many views and functions are included with sql 2005, following list of DMV/DMF can be useful for various purpose.

sys.dm_db_index_usage_stat 
Gives the count for index scan, seek, lookup.                           
sys.dm_db_index_operational_stats          
Gives a deeper idea about index used, leaf insert , leaf update, etc.
sys.dm_db_index_physical_stats
It is substitute of DBCC SHOWCONTIG, it returns fragmentation of data and indexes and takes few parameter as it is DMF.
sys.dm_db_missing_index_groups
Returns detail about what missing indexes are contains in specific missing group.
sys.dm_db_missing_index_group_stats
Returns information about group of missing indexes i.e. user_seek, user_scan, last_user_seek, etc.
sys.dm_db_missing_index_details
Returns missing index detail i.e. equality column, inequality column, included column, etc.
sys.dm_exec_connections
Returns detail about connection established to sql instance i.e. session id, connection time, protocol type,  last read, last write, connection id, etc.
sys.dm_exec_sessions
Returns useful information about all user active connection i.e. session id, login name, host name, login time, status, cpu time, memory usage, etc.
sys.dm_exec_requests
Returns detail about each request which is executing within sql server i.e. session id, start time, status, command, database id, connection id, blocked session id, wait type, read, write, etc.
sys.dm_io_virtual_file_stats
Returns I/O statistics for data and log files.
sys.dm_os_wait_stats
Returns information about the waits encountered by threads that executed i.e. wait type, waiting task count, wait time ms, max wait time ms