SlideShare une entreprise Scribd logo
1  sur  93
SQL Server Query
Tuning Guidelines
Objectives
 Know the basic topics related to query tuning
 Dispel some common myths
 Understand indexes
 Understand statistics
 Understand query plans and how plan cache is used
 Parameter sniffing.
 Main objective: Set the basis that you need to develop
further your own SQL tuning skills.
Queries, a high level overview
 A query is submitted to SQL Server
 The query optimizer decides what
the best plan is, out of potentially
thousands of possibilities, in a
limited amount of time
 Paramount to find the best plan
are indexes and distribution
statistics
 The result is a query plan that will
be stored in cache. Potentially
reused if posible.
The query optimizer
 The query optimizer follows several
steps based on heuristical rules to
generate a logical tree composed
of nodes
 It simplifies as much as posible
 Several plans are attempted in
parallel
 Once the “best” plan is found, the
optimizer generates the plan
 One of the most important steps
for this presentation will be Derive
Cardinality. This is where indexes
and Statistics come into place.
Understanding
Indexes
Types of tables in SQL
Heap tables
 Heap tables are just tables without a
clustered index.
 As there are no clustered index, they
are unordered.
 They are great for ETL processes
where we want to store data really
quickly.
 The main con is that every time we
need to look up data in a heap we
always have to scan.
 Another good used could be for log
activity tables where we just insert
data.
Clustered tables
 Clustered tables are tables with a
clustered index
 The table is the clustered index
itself and is organized basd on the
clustered key. That is the reason
there can only be one clustered
index per table.
Types of indexes
Clustered indexes
 They are always maintained as
unique regardless of wether you
define them as unique.
 Think of clustered index as the
pages of a book where the key is
the page number
 If you wanto look for a particular
page you just go to the page
number.
 The book itself is organised by the
clustered index ( the page
number)
Non clustered indexes
 They can be unique or not
 Think of non clustered index as a
book index.
 All of the data you are looking for (
the page) might not be in the
index itself but it points you to the
right page ( the clustered key in
the case the base table is
clustered) so you dont have to
search blindly through the book.
(Table scan operation)
Clustered index structure
Non clustered Index structure
Look that the main difference is that the Leaf nodes do not contain
all of the data
Investigating index structure
Included colums in non clustered
indexes
 When we define a non clustered index we can define what columns are going to be
stored at the page level.
 This is helpful because we dont have to go to the base table to get the data we
want. We avoid key lookups (more in the query plan section).
 The caveat is our indexes are going to be larger
 The nonclustered keys have a limitation of 900 bytes. This can be overriden in the
case of included columns.
 You cannot use image or text data types
Key differences between the two
Clustered indexes
 Leaf nodes are the actual data
itself.
 As there are no clustered index,
they are unordered.
 They are great for ETL processes
where we want to store data
really quickly
 The main con is that every time we
need to look up data in a heap
we always have to scan.
Non clustered indexes
 Leaf nodes are not actually the data.
They only contain included columns!).
 They point to the base table via the
clustered key.
 This is why the longest your clustered
key is the longest is your non clustered
indexes are going to be. Beware with
your clustered indexes!!!!
 Any change done to the clustered
index will need to be maintained in the
non clustered index.
 The more non clustered indexes we
have, the slowest our system might be
in data modifications.
Common errors when defining
the clustered key
► It is pretty common to create our primary keys without knowing that by
default SQL Server creates a unique clustered index behind the scenes.
► This might become a problem, because if you remember, non clustered
indexes have a pointer in the form a clustered key at the leaf level.
 One common example is to define a GUID as our primary key. Note the
impact this has in the size, maintainance and potential fragmentation of
nonclustered indexes!!!!
 Make sure you know what happens behind the scenes.
So when are indexes used?
 First thing to notice is that if we don’t have an index SQL Server will scan the
whole table.
 If the table has indexes it will see if those indexes cover the query or not. Covering
indexes
 If they cover the index, Selectivity will be analysed.
What is selectivity?
 A highly selective query is the one that returns a small percentage of the
total records. It will lead to seek operations versus scans.
 Based on this:
 Covered index on queries with a high selective predicate  Seek
 Covered index on queries with low selective preicate  Scans
 Queries without covered index  Scan
 This leaves an important conclusion:
 It does not make sense to créate indexes on low selective keys, because they will
use the index yes, but not in an efficient manner ( Scans)
 When specifying more than one key in an index , the order is important. Should go
from more selective to less selective.
Why not create indexes everywhere?
 Index have a cost for SELECT statements but not so great for DM statements.
 For instance, an UPDate operation might modify the key values and this will
mean maintaining index structures.  Possible page splits, index
reorganization.
 They also have a cost in terms of space.
 index , the order is important. Should go from more selective to less selective.
Can I know if my indexes are getting
used at all?
 In SQL we have a DMV called sys.dm_db_index_usage_stats with the
following columns among the most important:
 Database_id: the id of the database
 Object_id: The id of the object (table or view) that owns the index being
analyzed
 Index_id: Id of the index
 User_seeks: Number of seeks performed on the index
 User_scans: Number of scans performed on the index
 User_updates: Number of times the index has been modified.
 User_lookups: Number of times the index has been accessed for lookup
operations
Can I know if my indexes are getting
used at all?
Internal index fragmentation
 Fragmentation happens when we perform more IOs than neccessary when
using the indexes.
 Internal fragmentation happens at the page level. It measures how full the page
is used.
 Ideally, a page should be 100 % utilized but this is never the case.
 Delete operations make pages less utilized each time.
 Update operations may provoke page splits because a record might not fit
anymore in the same page. The original page is less utilized as a result.
 Consequences?
 The more fragmenation pages have, the more number of pages required to read the
same amount of data, the more IOS for the same operation
 The size of the database file increases.
 Buffer cache efficiency is reused. Less data in the same amount of cache.
External index fragmentation
 External fragmentation happens when the logical order of pages within an
index is not the same as the physical order.
 Indexes are implemented as double linked lists with each node poiting to
the next and previous pages
 The more physically unordered the pages are the slowest is going to be disk
retrieving the data.
 Page Splits can be made less frequent if we set a Fill factor when creating
the indexes. Fill Factor determines how we want pages to be utilized.
 By specifying some free space in each page of the index, page splits will
happen less frequently.
Analyzing Internal index fragmentation
Analyzing external index fragmentation
Maintainance options
 Reorganize and index.  Quicker. It just matched the logical order with the
physical one. Gets rid of external fragmentation.
 Rebuild  Takes longer because it drops and recreates the index. Gets rid of
both internal and external fragmentation.
Understanding
Statistics
What are statistics in SQL Server?
 The query optimizer needs to know
which operators to use : Index Seeks,
table scans, nested loop joins, etc.
 Most ot these decisitions come from
the statistics from indexes in each of
the tables implied in the query
 They are a histogram of how many
rows the table has based on the
index key.
 Each histogram can have only up to
200 steps  Inaccuracy when the
number of rows is very large!!
From SQL Server Management
Studio
Via DBCC SHOW_STATISTICS
command
Statistics Header
 Name: Name of the statistics. If this contains a WA_ prefix this will be automatically
created statistics.
 Updated: This tell us when statistics were last updated.
 Rows: The number of rows in the index
 Rows Sampled: The number of rows that have been read to generate the statistics. If
this number is less than the Rows value, it means the sample has not been FULL
Different samples options
 Steps: The number of steps used to represent the histogram. Remember SQL can
only use up to 200 steps.
 Average Key length: this is the average length of the key of the index.
Density
 There wil be as many entries as possible combinations of the keys of the index.
 Density = 1 / Number of distinct rowsRows: The number of rows in the index
 The higher the density the more selective the index is. Used by the query optimizer to
decide whether to choose an index or not.
Histogram
 RANGE_HI_KEY: Represents the highest key value for the step.
 RANGE_ROWS: This represents the number of rows in the target table that contain
the key value for the step but without including the lowest and the highest one
(RANGE_HI_KEY)
 EQ_ROWS: Represents the number of rows that contain the range_hi_key value
within the step.
 DISTINCT_RANGE_ROWS: Represents the number of rows that contain different key
values within the step.
 AVG_RANGE_ROWS: This is a number representing the average number of rows for a
given key value within the step.
How are statistics used?
 If you remember previous section we analyzed the
histogram step that contained the key value
ProductID = 831.
As we are asking SQL to estimate the number of
rows in Sales.SalesOrderDetail where ProductID
equals 831, SQL goes and looks at the histogram
and voila!, it knows exactly the number of rows
thanks to the colum EQ_ROWS.
Statistics Update
 Having statistics update is paramount to have optimal query plans
 SQL Server can be configured to automatically update statistics.
 This automatic update takes place when the 20% of rows of a table has changed.
Think of large tables!!!!
 To manually update you can:
Auto Created Statistics
 If enabled, auto created statistics can be created for those predicates not covered
by an index.
 These special statistics start with the prefix _WA
Limitations of the histogram
 The number of steps that can be used to create a histogram is only 200. As you
may have guessed by now, this number can become quite small specially
when we are dealing with tables with millions of rows and skewed distribution
data.
 This can lead to all kind of suboptimal query plans, usages of index scans when
it would be more appropiate to use index seeks, assigning too much or too
little memory for a given operator generating what is known as memory spills,
etc.
 This all translates in bad query plans and bad performance.
 Google Kimberly L. Tripp and her proposed alternative.
One common misconception
 Many times, developers find that the client reports a stored procedure is taking a long
time to execute suddenly.
 Usually the developer first attempt is to update statistics. Then everything works fine!
 After some time, the client reports back the problem and the developer does the
same thing entering an endless loop.
 What might be happening behind the scenes?
 Updating statistics that affect a given table marks execution plans associated to that table to
be recompiled in the future.
 It is likely that what update statistic does in reality is forcing SQL Server to generate a new plan
and hence hiding a potential parameter sniffing problem.
 When in doubt try to execute the query with the RECOMPILE option. This will not evict plans
from cache. If everything works fine then you have a parameter sniffing issue.
 Otherwise, you are correct and then it is statistics stale problem.
Understanding query
plans
How to view the execution plan
 We can view the execution plan for a given query from SSMS.
 There are two options:
 Getting the estimated execution plan. This plan is not cached nor executed.
 Getting the actual plan. This plan gets cached and executed.
Estimated execution plan
Actual execution plan
Things to note
 Plans are read from right to left
 Each operator subtreecost represents the cost for it and its descendants. This means that by
examining the SELECT operator subtree cost we can infer the estimated total cost for our query
 By clicking on a given node and select properties we can get very detailed information.
 By clicking on the SELECT node and select properties we can get very detailed information
about the query itself. For instance, compiled parameter values for a stored procedure,
memory used by the query, etc.
Things to note
 Within a given query plan, the estimated cost percentages give us hints on what to look at in
terms of optimization
 When multiple statements are executed, the total cost for each query plan gives us a hint as to
which statement is being the higher in terms of cost.
 The difference between the estimated and actual number of rows usually relates to the
statistics subject we looked at and usually points us in the right tuning path.
 The cost of the operators is unit less, no time , just an estimation. Only for relative comparison
Other “better” alternatives
 I highly
recommend
SQL entry plan
explorer
 More
compact and
easier view.
 Many features
 Both free and
professional
edition.
The operators
 There are over 100 operators that a query plan can present.
 Logical operators vs Physical operators. For instance, INNER JOIN vs its possible physical implementations (
NESTED LOOP, HASH JOIN , etc)
 Operator cost is unit less and it is composed of IO cost and CPU cost. (some have only one or both)
 Each operator requires some memory to perform its operation. The more rows the more memory. If the memory
estimated by SQL server is less than the actual one needed memory spills occur. These are indicated by warning
over the operator.
 Operators can be blocking or not blocking depending on wether they need to process all the rows or not.
 Blocking operators like Sort, Eager Spool, Hash aggregates, hash join, etc.
 Non blocking operators: nested loops, lazy spool, etc.
High memory operators
 Operators like HASH JOIN or Sort operators usually require high memory.
Beware with them.
 High memory requiring queries can be problematic as they may be waiting for
execution more than usual if the system is low in memory.
 Cardinality Estimate errors may led to spills to tempdb. It means that instead of
working in memory we will need to use IO to physical tempdb.
Table and Index scan operators
 Table scan happens when a table is a heap and all of its rows are fetched (no
clustered index).
 Index Scan happens when all rows from a table with a clustered index are
fetched.
Index seek operators
 The predicate was selective enough so as to be able to seek a particular index
for some specific rows
 Clustered index seek when the seek is done in a clustered index
 Non clustered index seek when the seek is done in a non clustered index
Lookup operators
 When the non clustered index does not provide all the rows that we need,
then a lookup is required.
 Extra trip to the base table to get the required columns.
 They can be very highly cost. Might consider to make the non clustered index
a covering index.
 They always happen with nested loops
Join operators
 Same logical join can be carried out physically in different forms. It is neither
good nor bad. It depends.
 Hints can change the physical operator determined by the query plan but this
is not generally a good idea and might be revealing some other problem.
 Nested Loop
 Merge Join
 Hash Join
Nested Loops
 Nested loops can relate to many logical joins (inner join , left join, outer apply, cross apply, etc.)
 For each row coming from the top table an operations is performed in the bottom table, like a
loop, hence the name.
 Usually the optimizer chooses the table that output less rows to be the top table, i.e. the one that
determines the number of iterations for the loop
 Queries where the top table does not return many rows usually end up as nested loop joins.
 Very low memory requirements.
Merge Join
 Merge is very efficient operator. It only works for presorted sets and for joins that use only the
equal operator.
 If the sets are presorted by the equality condition, then it is a matter of fetching the first row of the
top table and see if the bottom table has rows that match. If not, continue fetching the next row
from the top table.
 It usually takes place when we have indexes that are sorted based on the condition of the join.
 Otherwise the query optimizer might decide to do a presort if it is not very costly.
 More costly in terms of memory requirement.
Hash Join
 Hash join is the less efficient join and is used when either there are no indexes to be used or the
rows are not sorted by the join key.
 Sometimes, however, hash join is the best option because even adding indexes does not help.
 The algorithm is as follows:
 The table that returns less rows is chosen to build a hash table, a table where groups of rows are
identified by a hash key.
 The bottom table will act as the probe. For each row a hash function will be use to compare.
Note about joins
 Hash join is the more memory intensive join. Plus, it is a blocking operation.
 For OLTP systems, you should ideally expect more nested loop joins than merge
and hash ones
 For DataWare house systems it would be just the opposite.
Residual predicates
 There are two types of predicates for the seek operators
 Seek predicates: the ones that are SARGable and that the index can evaluate to filter out
rows
 Residual predicates: the ones that need to be evaluate at a later stage. They are due to non
SARGAble predicates.
 Why do we care about them? Because having residual predicates is similar to being
doing scans, i.e, we are not filtering as much as we could be filtering by using index
seeks. Could be hidden bottlenecks in our query plan.
 To detect them you have to go to the properties for the index seek operators in the
case of nested loops or the join operators themselves in the case of hash and merge
joins.
Residual predicates in nested loops
Residual predicates in hash joins
Residual predicates in hash joins
Can I influence the type of Join
chosen?
 The answers is yes, you can, via Join hints. But you shouldn’t!!!!
 Except for extremely isolated cases, this is not a good idea. Very rarely, will we be
able to outsmart the query optimizer.
 Using Join hints is just hiding a bigger problem that should be fixed better.
Stream aggregates
 This operator is used to calculate aggregates
 In the case of calculating aggregates for a group by statement, it will require
the inputs to be sorted if the sorted is not provided by the index itself.
Otherwise it will add a sort operator.
Hash aggregates
 This operator is used to calculate aggregates too but based on a hash
algorithm
 It can work with unordered inputs.
 Generally used when the work required is greater. Beware of potential tempdb
spills associated.
Potential spills to tempdb
 Indicated as a warning icon. It indicates that the memory estimated for the
operator is lower than required. And that IO operations to tempdb are
required.
By the way, did I tell you about
SQLSentry ?
Spool operators
 Spool operators basically mean , creating a temporal structure in tempdb
 Why is that needed?
 Maybe we need to use that structure over and over again
 Maybe we need to separate that structure from the original source data (Halloween
problem)
 The optimizer thinks it is better to create a temporary structure than making seeks
and scans operations over and over again.
 Eager spool: All of the needed rows are retrieved in one step. Blocking operators.
 Lazy spool: Needed rows are retrieved as needed.. Not blocking.
 Are they good or bad? It all depends on the particular scenario. They are there to
boost the performance but as in the case of a missing index they could be hiding
some other problem.
The Halloween problem
 If we specify to use the
clustered index, we can get
rows one by one and
updating the nonclustered
index as we update the
amounts. That is ok, because
that is not going to affect the
original source of data, i.e,
the clustered index.
The Halloween problem
 In this case, it is the non clustered index we force as the source of data and hence we have
a potential problem. If we are modifying the amount we could be impacting the non
clustered index and come to an inconsistency because the already processed row could
be reprocessed. That is why an Eager Spool with a copy of the data is used.
Measuring time and IOs
 Use SET STATISTICS TIME ON and SET STATISTICS IO ON to evaluate exactly the elapsed time
and the Ios performed by the query when tuning.
So If I am given a bad query plan, what
should i do? I
 While there is no definite answer for this, here are some common tips:
 Go to the query within a batch that shows the highest estimated cost ( though this could
be misleading if using multistatement table valued functions)
 Go to the highest estimated cost operators within a query
 Compare estimated rows and actual rows. Do estimates make sense based on statistics?
 Detect Scans and contemplate the possibility of adding indexes or detecting why existing
indexes are not used. Often, the plan itself will give index warnings. But beware, don’t take
them too seriously.
 Detect the thickest rows and see if they are being moved through the tree for a reason.
 If you see filter operators at the end of the query, filtering a lot of rows that might be
something to look at.
So If I am given a bad query plan, what
should i do? II
 Make sure when seeing nested loops that the top table used is always the smaller one. If
not, you might have some cardinality estimate issue.
 If you have merge join operators preceded by sort operations, evaluate if the sort
operators can be avoided. Remember sort operators can have spills to tempdb.
 If you have hash join operators, make sure that the top table (the build one) is the smallest.
Otherwise, you might be having some cardinality estimate issues. Also look for spills to
tempdb.
 Look at sort high cost operators and determine if they come from the query ( example
from an order by in the statement) or because they were introduced to favor for a merge
operator.
 When looking at sort operators before an aggregate, evaluate if a sorted index is worthy
or not.
 In hash an merge operators look for the residual predicates ( in properties).
So If I am given a bad query plan, what
should i do? III
 When having parallelism in plans, watch out if that is ideal or not. Parallelism
indicates that the query is complex and maybe that is ok but can we simplify the
query and avoid it parallelism?
One Last Note
 Beware with implicit conversions.
 Not all implicit conversions will cause an index scan. Check
https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/
 They are shown via a warning icon in the SELECT operator.
One Last Note
 Avoiding the implicit conversion an expected Index Seek is used.
Different Execution
modes and plan
cache implications
Different execution methods
 Ad hoc. This corresponds to the typical queries we write directly in SSMS 
Dynamic string execution via EXEC command. Very unlikely to generate plans
that get reused.
 Sp_ExecuteSQL. Accepts a preconstructed SQL string and accepts
parameters. The generated plans can be reused
 EXEC command. Accepts a preconstructed SQL. Does not accept parameters.
Very unlikely the generate plans can be reused.
 Stored procedures. The generated plans are cached and reused
Why bothering about the execution
modes?
 Being unable to reuse a plan is very costly. Mainly because we force the query
optimizer to work each time and generating a plan takes time.
 The more plans we get to reuse the more plan cache memory we waste and
the more plan cache we waste the more query plans already generated
might get evicted and forced to be regenerated  Less efficiency.
 Although reusing the plan cache ( parameter sniffing) is generally a good idea
it has some potential problems: Some plans may not be optimal in all situations.
Inspecting the plan cache
 Plans are created an cached when a query is executed.
 Sys.dm_exec_cached_plans DMV tracks information about all query plans in
cache
 Sys.dm_exec_query_plan DMF that accepts a query plan handle and returns
the corresponding XML execution plan.
 Sys.dm_exec_sql DMF that accepts a plan handle and returns the
corresponding SQL text
Inspecting the plan cache
When is my query plan thrown out
of cache?
 Server level flush command: DBCC FREEPROCCACHE
 Database level flush command DBCC FLUSHPROCINDB
 procedure level: sp_recompile
 Indirect causes: memory pressure, plan cache pollution, schema changes,
index creation, statistics update, etc.
Ad hoc in action
 Supposing we have disabled at the server level “Optimize for
ad hoc workload “ option and enabled “Simple
parameterization”.
 The queries can be parameterized. One single plan is cached
and reused.
 Unfortunately, it is very unlikely that SQL considers a plan to be
safe. For instance, as long as the query has more than one
table or an IN condition, it will deem the plan as unsafe and
avoid parameterization. In our simple example, it knows it is
safe, cause we are querying on the primary key of the table
and that is always going to use an Index Seek no matter what
parameter we pass.
Ad hoc in action II
 If a query by another field, I make the plan “unsafe”.
 What is worse, even if the query is deemed unsafe and I just
add spaces in the query, they will be considered different!!!!!
 Notice query hash being the same and query plan hash being
different!
 Apart from the reusability problem => potential bloating cache
, evicting from cache already compiled plans!!
Options to consider
 If our system uses a lot of ad hoc statements, evaluate the plan
cache. See if most of our queries execute only once. If that is
the case, set the setting “Optimize for ad hoc workload”. This
way, only the secod time the ad hoc gets executed will a plan
be really cached.
 If evaluating the plan cache , we see that many queries share
the same query hash but use different query plans, change the
setting “Parameterization” from SIMPLE to FORCED. That way,
ad hoc statements will be parameterized  possible
parameters sniffing???
Execute_Sql in action
 Queries are always parameterized
 Plan is cached and reused.
 No bloating o plan cache but possible parameter
sniffing???
Stored procedures in action
 What are the benefits?
 Less compilation time
 Less plan cache used, less pollution
 Functional encapsulation
 Reusability of plans.
 What are the potential caveats?
 Parameter sniffing. Is the same plan good
enough for all executions?
Parameter sniffing is not the devil
(not always)
 Parameter sniffing is not a problem but the behavior by wich, the first time the plan is
created the first parameters passed are sniffed and a plan based on the created and
cache.
 If this is a problem? Not necessarily. It depends, if the plan is stable.
 By stable, I mean, if I execute the same procedure with different parameters
appending WITH OPTION RECOMPILE at the end ( to simulate a new plan generation),
is my plan the same?
 If not the same, my query might be unstable , meaning different parameters will make
the procedure execute faster and others the query will go slow as hell.
 As you see parameter sniffing is not bad, is desirable as long as my plan is stable.
How can I know the sniffed values?
 Go to the execution plan, to the select
operator in the left and click properties.
CREATE STORED PROCEDURE WITH
RECOMPILE
 Less preferable option.
 The stored procedure is not cached. For every execution the plan is recompiled
CREATE STORED PROCEDURE WITH
RECOMPILE
 Less preferable option. The whole plan gets recompiled.
 The stored procedure is not cached. For every execution the plan is recompiled
STATEMENT LEVEL RECOMPILATION
 Better option because it only recompiles the particular statement.
 The stored procedure is cached but the part corresponding to the statement gets
recompiled each time.
Conditional logic based on the input
parameters?
 Good idea, but the problem is that SQL server optimizes only what can be optimized
and this only includes hardcoded values or parameters, not intermediate variables.
 The conditional logic for the plan is unknown for the optimizer.
 So if you have if else statements in your code thinking that is going to avoid
parameter sniffing, then you are probably wrong.
In this case, what we are not taking into
account is that the first time the plan is
created and cached will optimize for the
first parameters passed and the conditional
logic will not have effect.
Creating subprocedures
 In this case, the optimizer will only optimize the stored procedure that gets executed
each time.
OPTIMIZE FOR
 It allows you to say , for this procedure most of the executions will benefit if I optimize
for this particular value.
 Obviously, some executions will still run slower but that might be something you could
live with.
OPTIMIZE FOR UNKOWN
 It allows you to say , I don’t know the particular value which would be better. Go and
try use the density value from the histogram and base the optimization on an
average estimation.
DYNAMIC STRING EXECUTION
 In this case, for each execution we will obtain the optimal plan.
 But remember all the problems we talked about dynamic string execution.
STORED PROCEDURES DIFFICULT TO
OPTIMIZE
 Suppose our stored procedure filtered in many fields and all of them were optional.
 Typical search form problem. The first parameter combination is going to determine
everything.
Possible solutions
 Building the entire string dynamically and only include non null parameters. Execute
using EXEC. Remember problems associated to this.
 Building the entire string dynamically including non null parameters and using
sp_executeSql. Should be better because you will get more plans cached for each
possible non null combination, but would each of those plans still suffer from
parameter sniffing?

Contenu connexe

Tendances

Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
paulguerin
 
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuning
OSSCube
 

Tendances (17)

Database index
Database indexDatabase index
Database index
 
Mysql Indexing
Mysql IndexingMysql Indexing
Mysql Indexing
 
Index Tuning
Index TuningIndex Tuning
Index Tuning
 
Optimize access
Optimize accessOptimize access
Optimize access
 
153680 sqlinterview
153680  sqlinterview153680  sqlinterview
153680 sqlinterview
 
Important SAS Tips and Tricks for A Grade
Important SAS Tips and Tricks for A GradeImportant SAS Tips and Tricks for A Grade
Important SAS Tips and Tricks for A Grade
 
Access 2007-Get to know Access
Access 2007-Get to know AccessAccess 2007-Get to know Access
Access 2007-Get to know Access
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
What is in reality a DAX filter context
What is in reality a DAX filter contextWhat is in reality a DAX filter context
What is in reality a DAX filter context
 
Access 2007 lesson1
Access 2007 lesson1Access 2007 lesson1
Access 2007 lesson1
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
 
Access 2010
Access 2010Access 2010
Access 2010
 
SQL_Part1
SQL_Part1SQL_Part1
SQL_Part1
 
The Key to Keys - Database Design
The Key to Keys - Database DesignThe Key to Keys - Database Design
The Key to Keys - Database Design
 
How To Automate Part 2
How To Automate Part 2How To Automate Part 2
How To Automate Part 2
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQLIntroduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQL
 
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuning
 

Similaire à dotnetMALAGA - Sql query tuning guidelines

Sql Interview Questions
Sql Interview QuestionsSql Interview Questions
Sql Interview Questions
arjundwh
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…
Aaron Shilo
 
Sql server ___________session_17(indexes)
Sql server  ___________session_17(indexes)Sql server  ___________session_17(indexes)
Sql server ___________session_17(indexes)
Ehtisham Ali
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
Rick Perry
 
Guidelines for indexing and tools
Guidelines for indexing and toolsGuidelines for indexing and tools
Guidelines for indexing and tools
NagaVarthini
 

Similaire à dotnetMALAGA - Sql query tuning guidelines (20)

Sql
SqlSql
Sql
 
Sql Interview Questions
Sql Interview QuestionsSql Interview Questions
Sql Interview Questions
 
Sql
SqlSql
Sql
 
Sql
SqlSql
Sql
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…
 
Db performance optimization with indexing
Db performance optimization with indexingDb performance optimization with indexing
Db performance optimization with indexing
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL Server
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
Indexing Strategies
Indexing StrategiesIndexing Strategies
Indexing Strategies
 
Sql server ___________session_17(indexes)
Sql server  ___________session_17(indexes)Sql server  ___________session_17(indexes)
Sql server ___________session_17(indexes)
 
Database Basics
Database BasicsDatabase Basics
Database Basics
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
DMBS Indexes.pptx
DMBS Indexes.pptxDMBS Indexes.pptx
DMBS Indexes.pptx
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 
Filtered Indexes In Sql 2008
Filtered Indexes In Sql 2008Filtered Indexes In Sql 2008
Filtered Indexes In Sql 2008
 
Guidelines for indexing and tools
Guidelines for indexing and toolsGuidelines for indexing and tools
Guidelines for indexing and tools
 
Tips for Database Performance
Tips for Database PerformanceTips for Database Performance
Tips for Database Performance
 

Dernier

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Dernier (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

dotnetMALAGA - Sql query tuning guidelines

  • 2. Objectives  Know the basic topics related to query tuning  Dispel some common myths  Understand indexes  Understand statistics  Understand query plans and how plan cache is used  Parameter sniffing.  Main objective: Set the basis that you need to develop further your own SQL tuning skills.
  • 3. Queries, a high level overview  A query is submitted to SQL Server  The query optimizer decides what the best plan is, out of potentially thousands of possibilities, in a limited amount of time  Paramount to find the best plan are indexes and distribution statistics  The result is a query plan that will be stored in cache. Potentially reused if posible.
  • 4. The query optimizer  The query optimizer follows several steps based on heuristical rules to generate a logical tree composed of nodes  It simplifies as much as posible  Several plans are attempted in parallel  Once the “best” plan is found, the optimizer generates the plan  One of the most important steps for this presentation will be Derive Cardinality. This is where indexes and Statistics come into place.
  • 6. Types of tables in SQL Heap tables  Heap tables are just tables without a clustered index.  As there are no clustered index, they are unordered.  They are great for ETL processes where we want to store data really quickly.  The main con is that every time we need to look up data in a heap we always have to scan.  Another good used could be for log activity tables where we just insert data. Clustered tables  Clustered tables are tables with a clustered index  The table is the clustered index itself and is organized basd on the clustered key. That is the reason there can only be one clustered index per table.
  • 7. Types of indexes Clustered indexes  They are always maintained as unique regardless of wether you define them as unique.  Think of clustered index as the pages of a book where the key is the page number  If you wanto look for a particular page you just go to the page number.  The book itself is organised by the clustered index ( the page number) Non clustered indexes  They can be unique or not  Think of non clustered index as a book index.  All of the data you are looking for ( the page) might not be in the index itself but it points you to the right page ( the clustered key in the case the base table is clustered) so you dont have to search blindly through the book. (Table scan operation)
  • 9. Non clustered Index structure Look that the main difference is that the Leaf nodes do not contain all of the data
  • 11. Included colums in non clustered indexes  When we define a non clustered index we can define what columns are going to be stored at the page level.  This is helpful because we dont have to go to the base table to get the data we want. We avoid key lookups (more in the query plan section).  The caveat is our indexes are going to be larger  The nonclustered keys have a limitation of 900 bytes. This can be overriden in the case of included columns.  You cannot use image or text data types
  • 12. Key differences between the two Clustered indexes  Leaf nodes are the actual data itself.  As there are no clustered index, they are unordered.  They are great for ETL processes where we want to store data really quickly  The main con is that every time we need to look up data in a heap we always have to scan. Non clustered indexes  Leaf nodes are not actually the data. They only contain included columns!).  They point to the base table via the clustered key.  This is why the longest your clustered key is the longest is your non clustered indexes are going to be. Beware with your clustered indexes!!!!  Any change done to the clustered index will need to be maintained in the non clustered index.  The more non clustered indexes we have, the slowest our system might be in data modifications.
  • 13. Common errors when defining the clustered key ► It is pretty common to create our primary keys without knowing that by default SQL Server creates a unique clustered index behind the scenes. ► This might become a problem, because if you remember, non clustered indexes have a pointer in the form a clustered key at the leaf level.  One common example is to define a GUID as our primary key. Note the impact this has in the size, maintainance and potential fragmentation of nonclustered indexes!!!!  Make sure you know what happens behind the scenes.
  • 14. So when are indexes used?  First thing to notice is that if we don’t have an index SQL Server will scan the whole table.  If the table has indexes it will see if those indexes cover the query or not. Covering indexes  If they cover the index, Selectivity will be analysed.
  • 15. What is selectivity?  A highly selective query is the one that returns a small percentage of the total records. It will lead to seek operations versus scans.  Based on this:  Covered index on queries with a high selective predicate  Seek  Covered index on queries with low selective preicate  Scans  Queries without covered index  Scan  This leaves an important conclusion:  It does not make sense to créate indexes on low selective keys, because they will use the index yes, but not in an efficient manner ( Scans)  When specifying more than one key in an index , the order is important. Should go from more selective to less selective.
  • 16. Why not create indexes everywhere?  Index have a cost for SELECT statements but not so great for DM statements.  For instance, an UPDate operation might modify the key values and this will mean maintaining index structures.  Possible page splits, index reorganization.  They also have a cost in terms of space.  index , the order is important. Should go from more selective to less selective.
  • 17. Can I know if my indexes are getting used at all?  In SQL we have a DMV called sys.dm_db_index_usage_stats with the following columns among the most important:  Database_id: the id of the database  Object_id: The id of the object (table or view) that owns the index being analyzed  Index_id: Id of the index  User_seeks: Number of seeks performed on the index  User_scans: Number of scans performed on the index  User_updates: Number of times the index has been modified.  User_lookups: Number of times the index has been accessed for lookup operations
  • 18. Can I know if my indexes are getting used at all?
  • 19. Internal index fragmentation  Fragmentation happens when we perform more IOs than neccessary when using the indexes.  Internal fragmentation happens at the page level. It measures how full the page is used.  Ideally, a page should be 100 % utilized but this is never the case.  Delete operations make pages less utilized each time.  Update operations may provoke page splits because a record might not fit anymore in the same page. The original page is less utilized as a result.  Consequences?  The more fragmenation pages have, the more number of pages required to read the same amount of data, the more IOS for the same operation  The size of the database file increases.  Buffer cache efficiency is reused. Less data in the same amount of cache.
  • 20. External index fragmentation  External fragmentation happens when the logical order of pages within an index is not the same as the physical order.  Indexes are implemented as double linked lists with each node poiting to the next and previous pages  The more physically unordered the pages are the slowest is going to be disk retrieving the data.  Page Splits can be made less frequent if we set a Fill factor when creating the indexes. Fill Factor determines how we want pages to be utilized.  By specifying some free space in each page of the index, page splits will happen less frequently.
  • 21. Analyzing Internal index fragmentation
  • 22. Analyzing external index fragmentation
  • 23. Maintainance options  Reorganize and index.  Quicker. It just matched the logical order with the physical one. Gets rid of external fragmentation.  Rebuild  Takes longer because it drops and recreates the index. Gets rid of both internal and external fragmentation.
  • 25. What are statistics in SQL Server?  The query optimizer needs to know which operators to use : Index Seeks, table scans, nested loop joins, etc.  Most ot these decisitions come from the statistics from indexes in each of the tables implied in the query  They are a histogram of how many rows the table has based on the index key.  Each histogram can have only up to 200 steps  Inaccuracy when the number of rows is very large!!
  • 26. From SQL Server Management Studio
  • 28. Statistics Header  Name: Name of the statistics. If this contains a WA_ prefix this will be automatically created statistics.  Updated: This tell us when statistics were last updated.  Rows: The number of rows in the index  Rows Sampled: The number of rows that have been read to generate the statistics. If this number is less than the Rows value, it means the sample has not been FULL Different samples options  Steps: The number of steps used to represent the histogram. Remember SQL can only use up to 200 steps.  Average Key length: this is the average length of the key of the index.
  • 29. Density  There wil be as many entries as possible combinations of the keys of the index.  Density = 1 / Number of distinct rowsRows: The number of rows in the index  The higher the density the more selective the index is. Used by the query optimizer to decide whether to choose an index or not.
  • 30. Histogram  RANGE_HI_KEY: Represents the highest key value for the step.  RANGE_ROWS: This represents the number of rows in the target table that contain the key value for the step but without including the lowest and the highest one (RANGE_HI_KEY)  EQ_ROWS: Represents the number of rows that contain the range_hi_key value within the step.  DISTINCT_RANGE_ROWS: Represents the number of rows that contain different key values within the step.  AVG_RANGE_ROWS: This is a number representing the average number of rows for a given key value within the step.
  • 31. How are statistics used?  If you remember previous section we analyzed the histogram step that contained the key value ProductID = 831. As we are asking SQL to estimate the number of rows in Sales.SalesOrderDetail where ProductID equals 831, SQL goes and looks at the histogram and voila!, it knows exactly the number of rows thanks to the colum EQ_ROWS.
  • 32. Statistics Update  Having statistics update is paramount to have optimal query plans  SQL Server can be configured to automatically update statistics.  This automatic update takes place when the 20% of rows of a table has changed. Think of large tables!!!!  To manually update you can:
  • 33. Auto Created Statistics  If enabled, auto created statistics can be created for those predicates not covered by an index.  These special statistics start with the prefix _WA
  • 34. Limitations of the histogram  The number of steps that can be used to create a histogram is only 200. As you may have guessed by now, this number can become quite small specially when we are dealing with tables with millions of rows and skewed distribution data.  This can lead to all kind of suboptimal query plans, usages of index scans when it would be more appropiate to use index seeks, assigning too much or too little memory for a given operator generating what is known as memory spills, etc.  This all translates in bad query plans and bad performance.  Google Kimberly L. Tripp and her proposed alternative.
  • 35. One common misconception  Many times, developers find that the client reports a stored procedure is taking a long time to execute suddenly.  Usually the developer first attempt is to update statistics. Then everything works fine!  After some time, the client reports back the problem and the developer does the same thing entering an endless loop.  What might be happening behind the scenes?  Updating statistics that affect a given table marks execution plans associated to that table to be recompiled in the future.  It is likely that what update statistic does in reality is forcing SQL Server to generate a new plan and hence hiding a potential parameter sniffing problem.  When in doubt try to execute the query with the RECOMPILE option. This will not evict plans from cache. If everything works fine then you have a parameter sniffing issue.  Otherwise, you are correct and then it is statistics stale problem.
  • 37. How to view the execution plan  We can view the execution plan for a given query from SSMS.  There are two options:  Getting the estimated execution plan. This plan is not cached nor executed.  Getting the actual plan. This plan gets cached and executed.
  • 40. Things to note  Plans are read from right to left  Each operator subtreecost represents the cost for it and its descendants. This means that by examining the SELECT operator subtree cost we can infer the estimated total cost for our query  By clicking on a given node and select properties we can get very detailed information.  By clicking on the SELECT node and select properties we can get very detailed information about the query itself. For instance, compiled parameter values for a stored procedure, memory used by the query, etc.
  • 41. Things to note  Within a given query plan, the estimated cost percentages give us hints on what to look at in terms of optimization  When multiple statements are executed, the total cost for each query plan gives us a hint as to which statement is being the higher in terms of cost.  The difference between the estimated and actual number of rows usually relates to the statistics subject we looked at and usually points us in the right tuning path.  The cost of the operators is unit less, no time , just an estimation. Only for relative comparison
  • 42. Other “better” alternatives  I highly recommend SQL entry plan explorer  More compact and easier view.  Many features  Both free and professional edition.
  • 43. The operators  There are over 100 operators that a query plan can present.  Logical operators vs Physical operators. For instance, INNER JOIN vs its possible physical implementations ( NESTED LOOP, HASH JOIN , etc)  Operator cost is unit less and it is composed of IO cost and CPU cost. (some have only one or both)  Each operator requires some memory to perform its operation. The more rows the more memory. If the memory estimated by SQL server is less than the actual one needed memory spills occur. These are indicated by warning over the operator.  Operators can be blocking or not blocking depending on wether they need to process all the rows or not.  Blocking operators like Sort, Eager Spool, Hash aggregates, hash join, etc.  Non blocking operators: nested loops, lazy spool, etc.
  • 44. High memory operators  Operators like HASH JOIN or Sort operators usually require high memory. Beware with them.  High memory requiring queries can be problematic as they may be waiting for execution more than usual if the system is low in memory.  Cardinality Estimate errors may led to spills to tempdb. It means that instead of working in memory we will need to use IO to physical tempdb.
  • 45. Table and Index scan operators  Table scan happens when a table is a heap and all of its rows are fetched (no clustered index).  Index Scan happens when all rows from a table with a clustered index are fetched.
  • 46. Index seek operators  The predicate was selective enough so as to be able to seek a particular index for some specific rows  Clustered index seek when the seek is done in a clustered index  Non clustered index seek when the seek is done in a non clustered index
  • 47. Lookup operators  When the non clustered index does not provide all the rows that we need, then a lookup is required.  Extra trip to the base table to get the required columns.  They can be very highly cost. Might consider to make the non clustered index a covering index.  They always happen with nested loops
  • 48. Join operators  Same logical join can be carried out physically in different forms. It is neither good nor bad. It depends.  Hints can change the physical operator determined by the query plan but this is not generally a good idea and might be revealing some other problem.  Nested Loop  Merge Join  Hash Join
  • 49. Nested Loops  Nested loops can relate to many logical joins (inner join , left join, outer apply, cross apply, etc.)  For each row coming from the top table an operations is performed in the bottom table, like a loop, hence the name.  Usually the optimizer chooses the table that output less rows to be the top table, i.e. the one that determines the number of iterations for the loop  Queries where the top table does not return many rows usually end up as nested loop joins.  Very low memory requirements.
  • 50. Merge Join  Merge is very efficient operator. It only works for presorted sets and for joins that use only the equal operator.  If the sets are presorted by the equality condition, then it is a matter of fetching the first row of the top table and see if the bottom table has rows that match. If not, continue fetching the next row from the top table.  It usually takes place when we have indexes that are sorted based on the condition of the join.  Otherwise the query optimizer might decide to do a presort if it is not very costly.  More costly in terms of memory requirement.
  • 51. Hash Join  Hash join is the less efficient join and is used when either there are no indexes to be used or the rows are not sorted by the join key.  Sometimes, however, hash join is the best option because even adding indexes does not help.  The algorithm is as follows:  The table that returns less rows is chosen to build a hash table, a table where groups of rows are identified by a hash key.  The bottom table will act as the probe. For each row a hash function will be use to compare.
  • 52. Note about joins  Hash join is the more memory intensive join. Plus, it is a blocking operation.  For OLTP systems, you should ideally expect more nested loop joins than merge and hash ones  For DataWare house systems it would be just the opposite.
  • 53. Residual predicates  There are two types of predicates for the seek operators  Seek predicates: the ones that are SARGable and that the index can evaluate to filter out rows  Residual predicates: the ones that need to be evaluate at a later stage. They are due to non SARGAble predicates.  Why do we care about them? Because having residual predicates is similar to being doing scans, i.e, we are not filtering as much as we could be filtering by using index seeks. Could be hidden bottlenecks in our query plan.  To detect them you have to go to the properties for the index seek operators in the case of nested loops or the join operators themselves in the case of hash and merge joins.
  • 54. Residual predicates in nested loops
  • 57. Can I influence the type of Join chosen?  The answers is yes, you can, via Join hints. But you shouldn’t!!!!  Except for extremely isolated cases, this is not a good idea. Very rarely, will we be able to outsmart the query optimizer.  Using Join hints is just hiding a bigger problem that should be fixed better.
  • 58. Stream aggregates  This operator is used to calculate aggregates  In the case of calculating aggregates for a group by statement, it will require the inputs to be sorted if the sorted is not provided by the index itself. Otherwise it will add a sort operator.
  • 59. Hash aggregates  This operator is used to calculate aggregates too but based on a hash algorithm  It can work with unordered inputs.  Generally used when the work required is greater. Beware of potential tempdb spills associated.
  • 60. Potential spills to tempdb  Indicated as a warning icon. It indicates that the memory estimated for the operator is lower than required. And that IO operations to tempdb are required.
  • 61. By the way, did I tell you about SQLSentry ?
  • 62. Spool operators  Spool operators basically mean , creating a temporal structure in tempdb  Why is that needed?  Maybe we need to use that structure over and over again  Maybe we need to separate that structure from the original source data (Halloween problem)  The optimizer thinks it is better to create a temporary structure than making seeks and scans operations over and over again.  Eager spool: All of the needed rows are retrieved in one step. Blocking operators.  Lazy spool: Needed rows are retrieved as needed.. Not blocking.  Are they good or bad? It all depends on the particular scenario. They are there to boost the performance but as in the case of a missing index they could be hiding some other problem.
  • 63. The Halloween problem  If we specify to use the clustered index, we can get rows one by one and updating the nonclustered index as we update the amounts. That is ok, because that is not going to affect the original source of data, i.e, the clustered index.
  • 64. The Halloween problem  In this case, it is the non clustered index we force as the source of data and hence we have a potential problem. If we are modifying the amount we could be impacting the non clustered index and come to an inconsistency because the already processed row could be reprocessed. That is why an Eager Spool with a copy of the data is used.
  • 65. Measuring time and IOs  Use SET STATISTICS TIME ON and SET STATISTICS IO ON to evaluate exactly the elapsed time and the Ios performed by the query when tuning.
  • 66. So If I am given a bad query plan, what should i do? I  While there is no definite answer for this, here are some common tips:  Go to the query within a batch that shows the highest estimated cost ( though this could be misleading if using multistatement table valued functions)  Go to the highest estimated cost operators within a query  Compare estimated rows and actual rows. Do estimates make sense based on statistics?  Detect Scans and contemplate the possibility of adding indexes or detecting why existing indexes are not used. Often, the plan itself will give index warnings. But beware, don’t take them too seriously.  Detect the thickest rows and see if they are being moved through the tree for a reason.  If you see filter operators at the end of the query, filtering a lot of rows that might be something to look at.
  • 67. So If I am given a bad query plan, what should i do? II  Make sure when seeing nested loops that the top table used is always the smaller one. If not, you might have some cardinality estimate issue.  If you have merge join operators preceded by sort operations, evaluate if the sort operators can be avoided. Remember sort operators can have spills to tempdb.  If you have hash join operators, make sure that the top table (the build one) is the smallest. Otherwise, you might be having some cardinality estimate issues. Also look for spills to tempdb.  Look at sort high cost operators and determine if they come from the query ( example from an order by in the statement) or because they were introduced to favor for a merge operator.  When looking at sort operators before an aggregate, evaluate if a sorted index is worthy or not.  In hash an merge operators look for the residual predicates ( in properties).
  • 68. So If I am given a bad query plan, what should i do? III  When having parallelism in plans, watch out if that is ideal or not. Parallelism indicates that the query is complex and maybe that is ok but can we simplify the query and avoid it parallelism?
  • 69. One Last Note  Beware with implicit conversions.  Not all implicit conversions will cause an index scan. Check https://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/  They are shown via a warning icon in the SELECT operator.
  • 70. One Last Note  Avoiding the implicit conversion an expected Index Seek is used.
  • 71. Different Execution modes and plan cache implications
  • 72. Different execution methods  Ad hoc. This corresponds to the typical queries we write directly in SSMS  Dynamic string execution via EXEC command. Very unlikely to generate plans that get reused.  Sp_ExecuteSQL. Accepts a preconstructed SQL string and accepts parameters. The generated plans can be reused  EXEC command. Accepts a preconstructed SQL. Does not accept parameters. Very unlikely the generate plans can be reused.  Stored procedures. The generated plans are cached and reused
  • 73. Why bothering about the execution modes?  Being unable to reuse a plan is very costly. Mainly because we force the query optimizer to work each time and generating a plan takes time.  The more plans we get to reuse the more plan cache memory we waste and the more plan cache we waste the more query plans already generated might get evicted and forced to be regenerated  Less efficiency.  Although reusing the plan cache ( parameter sniffing) is generally a good idea it has some potential problems: Some plans may not be optimal in all situations.
  • 74. Inspecting the plan cache  Plans are created an cached when a query is executed.  Sys.dm_exec_cached_plans DMV tracks information about all query plans in cache  Sys.dm_exec_query_plan DMF that accepts a query plan handle and returns the corresponding XML execution plan.  Sys.dm_exec_sql DMF that accepts a plan handle and returns the corresponding SQL text
  • 76. When is my query plan thrown out of cache?  Server level flush command: DBCC FREEPROCCACHE  Database level flush command DBCC FLUSHPROCINDB  procedure level: sp_recompile  Indirect causes: memory pressure, plan cache pollution, schema changes, index creation, statistics update, etc.
  • 77. Ad hoc in action  Supposing we have disabled at the server level “Optimize for ad hoc workload “ option and enabled “Simple parameterization”.  The queries can be parameterized. One single plan is cached and reused.  Unfortunately, it is very unlikely that SQL considers a plan to be safe. For instance, as long as the query has more than one table or an IN condition, it will deem the plan as unsafe and avoid parameterization. In our simple example, it knows it is safe, cause we are querying on the primary key of the table and that is always going to use an Index Seek no matter what parameter we pass.
  • 78. Ad hoc in action II  If a query by another field, I make the plan “unsafe”.  What is worse, even if the query is deemed unsafe and I just add spaces in the query, they will be considered different!!!!!  Notice query hash being the same and query plan hash being different!  Apart from the reusability problem => potential bloating cache , evicting from cache already compiled plans!!
  • 79. Options to consider  If our system uses a lot of ad hoc statements, evaluate the plan cache. See if most of our queries execute only once. If that is the case, set the setting “Optimize for ad hoc workload”. This way, only the secod time the ad hoc gets executed will a plan be really cached.  If evaluating the plan cache , we see that many queries share the same query hash but use different query plans, change the setting “Parameterization” from SIMPLE to FORCED. That way, ad hoc statements will be parameterized  possible parameters sniffing???
  • 80. Execute_Sql in action  Queries are always parameterized  Plan is cached and reused.  No bloating o plan cache but possible parameter sniffing???
  • 81. Stored procedures in action  What are the benefits?  Less compilation time  Less plan cache used, less pollution  Functional encapsulation  Reusability of plans.  What are the potential caveats?  Parameter sniffing. Is the same plan good enough for all executions?
  • 82. Parameter sniffing is not the devil (not always)  Parameter sniffing is not a problem but the behavior by wich, the first time the plan is created the first parameters passed are sniffed and a plan based on the created and cache.  If this is a problem? Not necessarily. It depends, if the plan is stable.  By stable, I mean, if I execute the same procedure with different parameters appending WITH OPTION RECOMPILE at the end ( to simulate a new plan generation), is my plan the same?  If not the same, my query might be unstable , meaning different parameters will make the procedure execute faster and others the query will go slow as hell.  As you see parameter sniffing is not bad, is desirable as long as my plan is stable.
  • 83. How can I know the sniffed values?  Go to the execution plan, to the select operator in the left and click properties.
  • 84. CREATE STORED PROCEDURE WITH RECOMPILE  Less preferable option.  The stored procedure is not cached. For every execution the plan is recompiled
  • 85. CREATE STORED PROCEDURE WITH RECOMPILE  Less preferable option. The whole plan gets recompiled.  The stored procedure is not cached. For every execution the plan is recompiled
  • 86. STATEMENT LEVEL RECOMPILATION  Better option because it only recompiles the particular statement.  The stored procedure is cached but the part corresponding to the statement gets recompiled each time.
  • 87. Conditional logic based on the input parameters?  Good idea, but the problem is that SQL server optimizes only what can be optimized and this only includes hardcoded values or parameters, not intermediate variables.  The conditional logic for the plan is unknown for the optimizer.  So if you have if else statements in your code thinking that is going to avoid parameter sniffing, then you are probably wrong. In this case, what we are not taking into account is that the first time the plan is created and cached will optimize for the first parameters passed and the conditional logic will not have effect.
  • 88. Creating subprocedures  In this case, the optimizer will only optimize the stored procedure that gets executed each time.
  • 89. OPTIMIZE FOR  It allows you to say , for this procedure most of the executions will benefit if I optimize for this particular value.  Obviously, some executions will still run slower but that might be something you could live with.
  • 90. OPTIMIZE FOR UNKOWN  It allows you to say , I don’t know the particular value which would be better. Go and try use the density value from the histogram and base the optimization on an average estimation.
  • 91. DYNAMIC STRING EXECUTION  In this case, for each execution we will obtain the optimal plan.  But remember all the problems we talked about dynamic string execution.
  • 92. STORED PROCEDURES DIFFICULT TO OPTIMIZE  Suppose our stored procedure filtered in many fields and all of them were optional.  Typical search form problem. The first parameter combination is going to determine everything.
  • 93. Possible solutions  Building the entire string dynamically and only include non null parameters. Execute using EXEC. Remember problems associated to this.  Building the entire string dynamically including non null parameters and using sp_executeSql. Should be better because you will get more plans cached for each possible non null combination, but would each of those plans still suffer from parameter sniffing?