LOPSA SD 2014.03.27 Presentation on Linux Performance Analysis
An introduction using the USE method and showing how several tools fit into those resource evaluations.
2. Me
• Systems Architect
• Sony Network Entertainment
• 18 years running stuff
• Majority of the last 14 years: medium-large Internet
services
3. Read this book…
And look here:
http://www.brendangregg.com/
http://www.brendangregg.com/
methodology.html
http://www.brendangregg.com/Slides/
LISA2012_methodologies.pdf
http://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098
4. The website is down!!!
It’s just too slow!
The DB is too slow!
The disk is too slow!
SLOW!!!
http://farm4.staticflickr.com/3190/2976755407_6a6a574596_o.jpg
5. SLOW!!!!
• What does slow mean
anyways?
• Is it not transferring fast
enough?
• Is it handling (not) too many
requests?
http://commons.wikimedia.org/wiki/File:United_States_sign_-_Slow_Traffic_Ahead.svg
6. Slow can mean…
• Latency: How long it takes
• ms, s, request time, etc
• Throughput: How much can
happen at the same time
• bandwidth, IOPS, rps, tps,
etc
http://upload.wikimedia.org/wikipedia/commons/2/2e/Miniature_DNF_Dictionary_055_ubt.JPG
7. Slowness comes from…
• Full utilization of a resource
• Waiting in a saturated queue
• Generated errors!
!
• The USE Method
http://farm6.staticflickr.com/5181/5614813544_a30d693a50_o.jpg
8. Utilization
• You have fully used up what’s
been allocated
• aka 5 lb bag
http://farm3.staticflickr.com/2524/4000641774_3331fe06fb_o.jpg
9. Saturation
• Waiting for someone else to
get done so you can do yours
• Typically because a resource
is fully utilized, but not
necessarily directly
http://www.fotocommunity.com/pc/pc/display/30396619
10. Errors
• Dropped packets
• Incorrect responses
• Deadlocks
• Timeouts
!
• Not all failures fail fast
http://farm8.staticflickr.com/7001/6509400855_aaaf915871_b.jpg
11. How do we determine?
• Different types of tools for
different examinations
• Depends on what you’re
looking for (which can be a
problem in and of itself)
http://farm5.staticflickr.com/4083/5086955738_61f6455ace_b.jpg
12. Resource vs Transaction
• Do you care if…
• a CPU is maxed out?
• processes are blocked?
• packets are lost?
• or if…
• a user’s request fails?
• a user gives up on waiting for a response?
13. Maturity
• Tracing tools, especially using
in production, requires a level
of maturity
• I’m not that mature… ;)
• No, really just focusing on the
basics first
http://upload.wikimedia.org/wikipedia/commons/b/bd/OFLC_large_R18%2B.svg
47. Running out of Apache
Threads
• Lots of incoming requests
• Apache hits ServerLimit of
threads (Utilization!)
• Requests start to get stuck in
TCP backlog (Saturation!)
• Apache endpoints are
removed from load balancers
(Error!)
• Fail!
http://upload.wikimedia.org/wikipedia/commons/9/96/Colorful_Threads_(3965274345).jpg
48. Cold DB Start
• DB’s like to be in memory, but
can’t start that way
• All data requests go to disk
(which is SAN backed)
• SAN controller CPU gets
maxed out (Utilization!)
• HBA queues get deep
(Saturation!)
• Requests timeout (Error!)
• Fail!
50. Methods > Tools
• Don’t let tools get in the way of
solutions
• It’s easy to think that all your
missing a tool.
• But are you actually following
a method to your performance
madness?
http://upload.wikimedia.org/wikipedia/commons/6/6d/Three_Card_Monte.jpg
51. Anti-Methods
• Blame Someone Else
• Streetlight
• Drunk Man
• Random Change
• Passive Benchmark
!
• Don’t do these…
http://www.brendangregg.com/methodology.html http://upload.wikimedia.org/wikipedia/commons/a/af/Villainc.svg
52. Methods
• Ad Hoc Checklist
• Problem Statement
• Scientific
• Workload Characterization
• Drill-down Analysis
• By-layer
• Latency Analysis
• Tools
• Stack Profile
• Off-CPU Analysis
• Thread State Analysis
• Active Benchmark
http://www.brendangregg.com/methodology.html http://memegenerator.net/instance/9192015