SQLintersection keynote a tale of two teams

High Performance and
High Availability with
SQL Server 2012 AlwaysOn
Sumeet Bansal, Fusion-IO
Kevin Kline, SQL Sentry

2
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Introduction
A Tale of Two Teams
Two rival teams …
… each working to satisfy an important customer.
What’s the hardware solution? What’s the software solution?

3
The Customer
 A real-world major financial institution headquartered in London.
 A core banking application - credit card transactions from ATM and
Branches
 Requirement: 10,000 Business Transactions / sec (Not IOPs!)
 Highly available using AlwaysOn across hundreds of nodes in many
Availability Groups
 ... AND IT LOOKS LIKE THIS AT LOAD...
o

4
Meeting the Requirements: Hardware
 High Performance on SQL Server means tuning
the FULL STACK.
 Key Takeaway: This is NOT going to be easy…
OS 
SQL 
CPU 
HBA 
NIC 
Array 
Cache 
Spindles 

5
First Surprise - Memory
 At scale, SQL Server does a generally good job of
memory management by default.
 Some improvements are possible on large
CPU/Memory boxes dedicated to SQL Server:
 Lock Pages in Memory
 Big performance gain!
 Use gpedit.msc to grant it to SQL Service account
 Large page Allocations (-TF834)
 On Windows 2008R2 previous issues with this TF are fixed
 Around 10% throughput increase
 NUMA node memory distribution: Beware!
 Set max memory close to box max if dedicated box available
o

6
Second Surprise - NICs
 At scale, network traffic will generate a LOT of
interrupts for the CPU
 These must be handled by CPU Cores
 Must distribute packets to cores for processing
 Rule of thumb (OTLP): 1 NIC / 16 Cores
 Watch the DPC activity in Taskmanager
 Remove SQL Server (using affinity masking) from the NIC cores
o

7
Drive Selection - General
 Number of files matter for SQL Server
 TempDB and user database has multiple files, on segregated arrays
 Other important configs:
 NTFS allocation size at 64-KB; HBA queue depth at 64; Storport HBA Driver
 Number of drives matter
 More drives = more speed
 True for both SAN and DAS
 ... Less so for SSD, but still relevant (especially for NAND)
 If designing for performance, make sure the topology can
handle it!
 Understand the path to the drives
 Consider workload: Random or Sequential?
 Key Takeaway: Validate and compare configurations prior to
deployment

8
Rules of Thumb – Disk IO
 Traditional Spindle throughput
 10K RPM – 100-130 IOPs, ‘full
stroke’
 15K RPM – 150-180 IOPs, ‘full
stroke’
 Can achieve 2x or more when
‘short stroking’ the disks (using
less than 20% capacity of the
physical spindle)
 These are for random 8K I/O
 Aggregate throughput when
sequential access:
 Between 90MB/sec and
125MB/sec for a single drive
 If truly sequential, any block size
over 8K will give you these
numbers
 Some 3.5” drives slightly faster
than 2.5”
 Approximate latency: 3-5ms
 Cable speed
 Theoretical: 1.5GB/sec
 Typical: 1.2GB/sec
 PCI-e v2 Bus
 X4 slot: 1.5 – 1.8GB/sec
 X8 slot: 3GB/sec
 HBA speed
 4Gbit – ~500MB/sec
 8Gbit – ~1GB/sec on PCI-e X4 v2
bus
 Typical: 350-400MB/sec on 4Gbit,
doubled on 8Gbit
u

9
What’s Causing these Non-Disk
Bottlenecks?
Added
disk pair
Backplane
limit
140
140
110
Added
controller

10
Understand the Full Stack to the drives
 Key Takeaway: The deeper the topology, the
greater latency, the more important the
tuning
 Best Practices:
 Understand topology, potential bottlenecks and
theoretical throughput of each component in the path!
 Engage storage engineers early in the process
 Two major topologies for SQL Server
Storage
 DAS – Direct Attached Storage
 SAN – Storage Area Networks

11
Traditional Centralized Architecture
11
Application
CPU and
Memory
HBA Switches
Target
Adapters
CPU and
Memory
RAID
Controllers
HDD/SSD
SERVERS
Active and
Archive Data
STORAGE (Performance Optimized)NETWORK
Milliseconds
Databases
Virtualization
Web-scale
Latency and Processing
Time

12
Shared Data Decentralization
12
Active Data
Application CPUs NAND Flash Raid Controller HDD/SSD
SERVERS
Latency and Processing
Time
Archive Data
Microseconds Milliseconds
Databases
Virtualization
Web-scale

13
The SAN – Panacea to All IO Issues…
….YEAH RIGHT! 
Green: Checkpoint, Red: tx/sec, Black: Disk Latency
o

14
DAS vs. SAN - Summary
Feature SAN DAS
Cost High, offset by better utilization Low, may waste space
Flexibility More, abstraction allows online
configuration changes
Less, get it right the first
time!
Skills required Complex with steep learning
curve
Simple and well
understood
Additional
Features
Snapshots; Storage
Replication; Thin Provisioning
None
Performance Not high performance
technology
High performance for
small investment
Reliability More, very high reliability Less, depending on RAID
level
Clustering
Support
Yes No (special
implementations exist)
So, which should we choose?
SAN DAS
o

15
Let’s See What It Can Do!
1 x MS SQL Server 25 Billion Transactions/Day
(Equivalent to the number
of estimated Credit card
transactions around the
globe in a single day)
http://www.fusionio.com/blog/powering-global-commerce-with-sql-server-iomemory/
4 x 1.2TB

Demo
Turn difficult disk IO tuning into easy
ioMemory plug-n-play.

17
Meeting the Requirements: Software
 Highly transparent instrumentation means
monitoring the FULL STACK.
 Key Takeaway: This is NOT going to be easy…
OS 
SQL 
CPU 
HBA 
NIC 
Array 
Cache 
Spindles 
o
v

18
Instrumentation: PerfMon
 Throughput: Measured in MB/sec or IOPs by PerfMon: Logical Disk
 Disk Read Bytes / Sec
 Disk Write Bytes / Sec
 Disk Read / Sec
 Disk Writes / Sec
 Latency: Measured in milliseconds (ms) by PerfMon: Logical Disk
 Avg. Disk Sec / read
 Avg. Disk Sec / write
 More on healthy latency values later
 Key Takeway: For transparency, PerfMon gives a limited picture of
performance.
o

19
Instrumentation: Profiler / Trace
 High overhead
 Lots of experience needed to filter the results
 Deprecated! (But only for relational engine).
 Key Takeway: Shows triggered events, but not a comprehensive view of
whole system. Not a reliable long-term solution.
o

20
Instrumentation: DMVs
-- SQL Server 2012 Diagnostic Information Queries, by Glenn Berry, @GlennAlanBerry
-- http://sqlserverperformance.wordpress.com/
-- http://sqlskills.com/blogs/glenn/
-- Get total buffer used by DB for current instance
SELECT DB_NAME(database_id) AS [Database Name],
COUNT(*) * 8/1024.0 AS [Cached Size (MB)]
FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)
WHERE database_id > 4 -- system databases
AND database_id <> 32767 -- ResourceDB
GROUP BY DB_NAME(database_id)
ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE);
 Great information! Built in for SQL Server 2005+.
 No history. No correlation. No interpretation.
 Key Takeway: Very useful. Not very useable.
o

21
Instrumentation: Extended Events
 Low overhead
 Lots of experience needed to filter the results
 How much memory or space? Other administrative questions to
answer…
 Key Takeway: Deep data, but is it actionable and proactive information?
o

22
Instrumentation: Notifications
 Per server setup
 Requires SQLAgent service
 Can only capture error msg/lvl, WMI metrics, PerfMon metrics
 Key Takeway: Alerts are available, but high support requirements and
limited proactivity.
o

Demo
Bringing all the instrumentation together
for meaningful, actionable
performance information.
o
w

24
Meeting the Requirements: HA
 Need more flexibility than in legacy approaches
like log shipping and database mirroring.
 Need a shared nothing architecture.
 Key Takeaway: This is not too bad UNTIL we
scale up …
OS 
SQL 
CPU 
HBA 
NIC 
Array 
Cache 
Spindles 

25
Availability Groups Fundamentals
o

26
Special Considerations: AlwaysOn
 Granular control and some visibility into AlwaysOn through SSms Rt-
Click  Show Dashboard.
 Designed for small scale implementations.
 As with earlier tools, user carries the risk and requirement for expertise.
o

Demo
HA + DR management and
monitoring at scale.
o
x

28
How Did We Do It?OS 
SQL 
CPU 
HBA 
NIC 
Array 
Cache 
Spindles 
OS +
SQL 
CPU 
Fusion-io +
SQL Sentry 

29
References
 Thomas Kejser, SQLCAT, and high performance IO tuning:
 http://blog.kejser.org/tag/sqlcat/
 http://blog.kejser.org/
 Jonathan Kehayias & xEvents:
 http://www.sqlskills.com/blogs/jonathan/category/extended-events/
 Joe Sack & AlwaysOn:
 http://www.sqlskills.com/blogs/joe/answering-questions-with-the-
alwayson-dashboard/
 SQLPerformance.com (Jonathan Kehayias) instrumentation overhead
analysis:
 http://www.sqlperformance.com/2012/10/sql-trace/observer-overhead-
trace-extended-events

30
Review
 High performance IO is very hard when restricted to
disk-only architectures.
 ioMemory from Fusion-IO is the solution!
 Highly transparent monitoring and alerting,
especially for HA, is very hard with native tools and
features.
 Performance Advisor from SQL Sentry is the solution!
 Visit our booths to see the latest releases and sign
up for free trials and demonstrations!

Don’t forget to enter your evaluation
of this session using EventBoard!
Questions?
Thank you!

SQLintersection keynote a tale of two teams

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à SQLintersection keynote a tale of two teams

Similaire à SQLintersection keynote a tale of two teams (20)

Dernier

Dernier (20)

SQLintersection keynote a tale of two teams

Notes de l'éditeur