SQL Server implements three different physical operators to perform joins. In this presentation you'll see how each of these operators work plus its advantages and challenges.
You'll learn:
* The logic behind the optimizer's decisions
* Which operator to use for various joins using (semi) real life examples
* How to avoid common join-related pitfalls
Ami Levin is a Microsoft SQL Server MVP and a Mentor with SolidQ. For the past 14 years, he has been consulting, teaching, writing, and speaking about SQL Server worldwide.
Levin’s areas of expertise are data modeling, database design, T-SQL and performance tuning.
Before moving to California, he led the Israeli SQL Server user group (ISUG) and moderated the Hebrew MSDN SQL Server support forum. Ami is a regular speaker at Microsoft Tech-Ed Israel, Dev Academy, and other SQL Server conferences. He blogs at SQL Server Tuning Blog.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Microsoft SQL Server Physical Join Operators
1. Ami Levin, SolidQ
Presented to the Silicon Valley SQL Server User Group, April 2013
Nesting Merged Hash Loops
Ami Levin
CTO, DBSophic
SQL Server
Physical Join Operators
2. Session Goals
SQL Server uses three physical join operators:
Nested loops, Merge, and Hash Match.
In this session we will:
• See how each of these operators work
• Review their advantages and drawbacks
• Understand some of the logic behind the
optimizer’s decisions on which operator to use
• Learn to identify common join-related pitfalls
2
3. Not This Time
• Outer joins
• Non equi-joins
• Logical processing order
• NULL issues
• Join parallelism
• Partitioned joins
• …
3
6. Nested Loops
6
Fetch next row
from blue input
Row
exists
Quit
Find matching
rows in red input
True
False
Start
?
7. Nested Loops I
• Outer loop determines number of iterations
• At least one input should be (relatively) small
• Inner operation is performed for every
iteration of the outer loop
• Index or table scan (naïve)
• Index seek + lookup
• Covering index seek
• Index spool
7
8. • Data pages may be accessed repeatedly
• Risky a-sequential page access path
• Output of matching row sets is fast
• Unordered, but typically grouped
• Physical resources
• CPU Very low
• Physical IO low to very high
• Memory low
Nested Loops II
8
9. Nested Loops
with Foreign Key Joins
• Foreign keys join parent and child
• Most common relationship is one-to-many
• Often parent input is significantly smaller
• Parent must already be indexed
• Either primary key or unique constraint
• Therefore, indexing foreign keys often
enables efficient use of nested loops
9
11. Merge
11
Fetch next row
from blue input
Row
exists
Quit
Fetch next row
from red input
True
False
Start
Rows
matchTrue
False
? ?
12. Merge I
• Inputs must be sorted prior to merge
• Sorted by (all?) join expression(s)
• Pre-sorted in plan, but not necessarily in DB
• Preferred when sorting supports additional
plan operations
• Merge join types
• One to many
• Many to many - requires temporary worktable
12
13. Merge II
• Residual predicates
• Fast, ordered and grouped output
• Physical resources
• CPU Very low
• Physical IO Very low
• Memory Very low
• * Excluding sorting costs
13
15. Hash Match - Phase I (Build)
15
Fetch next row
from blue input
Row
exists
Phase II
Apply hash
function
True
False
Start
?
16. Hash Match - Phase II (Probe)
16
Fetch next row
from red input
Row
exists
Quit
Apply hash
function
True
False
Phase I
?
17. • Hash function selection
• Extremely complex
• CPU intensive
• Build and probe costs are hidden
• Do not constitute logical reads
• Output of matching row sets is slow
• Unordered and typically ungrouped
Hash Match I
17
18. • In memory hash join
Grace hash join
Recursive hash join
• Hash bailout
• Hash warnings event class
• Update Statistics
• Add more RAM
• Role reversal
Hash Match II
18
19. Hash Match III
• May indicate sub-optimal indexing
• Best for very large, non covered joins
• Physical resources
• CPU Very high
• Physical IO Low to very high
• Memory Very high
19
21. Summary
21
Nested Loops Merge Hash
Good when
Small outer input
Inner input indexed
Pre-sorted inputs
Sorting needed
Very large inputs
Not well indexed
CPU Low
Low
* Excluding sorting
High
Memory Low
Low
* Excluding sorting
High
Physical IO Low / High Low Low / High
Logical reads High Low
Low
* Misleading
Output
Fast, unordered,
grouped*
Fast, ordered,
grouped
Slow, unordered,
ungrouped*
22. For More Information
• Books on line
• White papers
• “Inside Microsoft SQL server” books
• Craig Freedman’s blog
• http://blogs.msdn.com/craigfr/about.aspx
22
24. Complete the Evaluation Form
to Win!
Win a Dell Mini Netbook – every day – just for handing
in your completed form. Each session evaluation form
represents a chance to win.
Pick up your evaluation form:
• In each presentation room
• Online on the PASS Summit website
Drop off your completed form:
• Near the exit of each presentation room
• At the Registration desk
• Online on the PASS Summit website
Sponsored by Dell
24