SlideShare une entreprise Scribd logo
1  sur  129
Télécharger pour lire hors ligne
CMG-T 2010: Unix/Linux Quick-Start
             Bob Sneed, EPiC Performance Associates
                for CMG 2010, December 9, 2010
                        V1.1 - December 10, 2010



CMG-T 2010                                            1
Preliminaries

 •  Disclaimers
 •  About the authors
   –  Adrian Cockcroft
   –  Bob Sneed

 •  An overview of the three sessions




 •  NOTE: These colored-background slides are intended to
    ease navigation within this slide deck



CMG-T 2010                Unix/Linux Quick Start            Slide 2
Disclaimers

   Opinions and views expressed herein are those of the authors.
              (The factual bits are mostly Adrian's.)
      Bob is not a doctor - and does not even play one on TV.
   Oops! Bob is not with Intellimagic; it’s an error in the program.
  There is no warranty, expressed or implied, in the quality of the
     information herein, or its fitness for any given purpose.
    If you goof up applying this stuff and have a bad outcome or
         destroy a bunch of data – it's entirely your own fault.
  These CMG-T materials do not refer to and are not endorsed by
  the authors’ current or past employers. They are based on the
                   authors’ career experiences.
  Bob can’t speak to all of Adrian's slides quite as well as he does.
       Batteries not included. Your mileage may vary (YMMV).


CMG-T 2010                   Unix/Linux Quick Start               Slide 3
This CMG-T content is primarily based on
 Adrian Cockcroft's 2009 materials …




CMG-T 2010        Unix/Linux Quick Start    Slide 4
Adrian Cockcroft
  •  Where's Adrian?
    –  Netflix 2007-present; Director of Engineering, Cloud Architectures
    –  CMG 2007 Michelson Award Winner for lifetime contribution to
       computer measurement;
       http://perfcap.blogspot.com/2007/12/cmg07-and-a-
       michelson-award.html
    –  eBay Research Labs 2004-2007; Distinguished Engineer
    –  Sun Microsystems 1988-2004; Distinguished Engineer
  •  Adrian’s recent conference interests
    –  QCon and Velocity
  •  Adrian’s books
    –  “Sun Performance and Tuning”, Prentice Hall, 1994, 1998 (2nd Ed)
    –  “Resource Management”, Prentice Hall, 2000
    –  “Capacity Planning for Internet Services”, Prentice Hall, 2001
  •  Adrian’s online presence
    –  Slides: http://www.slideshare.net/adrianco
    –  Blog: http://perfcap.blogspot.com/


CMG-T 2010                   Unix/Linux Quick Start                 Slide 5
Bob Sneed
  •  About Bob
    –  EPiC Performance Associates, 2009-present; Owner & Principal,
       Independent Consultant
    –  Sun Microsystems,1997-2009; Sr. Staff Engineer, variously worked in
       Sun Competency Centers, Performance and Availability Engineering
       Group (PAE), and the Systems Quality Office (SQO)
  •  Papers and presentations by Bob …
    –  “Sun/Oracle Best Practices”, Sun Blueprint, January 2001
    –  “Oracle I/O; Supply and Demand”, Sun User's Performance Group
       (SUPerG), 2001
    –  “Performance Forensics”, Sun Blueprint, December 2003 (previously a
       CMG 2002 paper)
    –  “I/O Microbenchmarking with Oracle in Mind”, Hotsos Symposium, 2006
    –  “Capacity; It's Not All About U!”, Hotsos Symposium, 2008“Best Practices
       for Optimal Configuration of Oracle Databases on Sun Hardware”, (with
       co-author; Allan Packer), Oracle Open World, 2009
    –  “CPU QoS”, Southern Area CMG, 2010 & Hotsos Symposium 2010
    –  Coming March 2011; “Brute-Force Parallelism”


CMG-T 2010                     Unix/Linux Quick Start                   Slide 6
Overview
  •  Session 1 is an introduction to (or review of) key principles and
     terminology for performance and capacity planning. This discussion
     will highlight many common high-level strategic errors that can lead
     to under- or over-provisioning or disappointing results despite
     adequate provisioning.
  •  Session 2 will survey the data sources and tools available for
     traditional resource-oriented analysis (network, CPU, memory,
     storage), plus the tools available for understanding kernel, hardware,
     and per-thread program performance. Pointers will be given to some
     major commercial and open-source tools for performance and
     capacity management. Some emphasis will be placed on Solaris
     features including microstate accounting and DTrace.
  •  Session 3 will conclude the tool survey begun in Session 2, then
     survey several common errors, traps, and pitfalls. This session ends
     with some broad guidelines on how to manage performance and
     capacity in Unix and Linux environments.


CMG-T 2010                     Unix/Linux Quick Start                 Slide 7
Session 1 of 3



     An introduction to (or review of) key principles and
  terminology for performance and capacity planning. This
      discussion will highlight many common high-level
       strategic errors that can lead to under- or over-
   provisioning or disappointing results despite adequate
                          provisioning




CMG-T 2010              Unix/Linux Quick Start         Slide 8
Definitions




CMG-T 2010     Unix/Linux Quick Start   Slide 9
Capacity Planning Definitions
 •  Capacity
     –  Resource utilization and headroom
 •  Planning
     –  Predicting future needs by analyzing historical data and
        modelling future scenarios
 •  Performance Monitoring
     –  Collecting and reporting on performance data
 •  Unix/Linux (apologies to users of OSX, HP-UX, AIX etc.)
     –  Emphasis on Solaris and Linux
     –  Much of the discussion is independent of the OS




CMG-T 2010                    Unix/Linux Quick Start               Slide 10
Measurement Terms and Definitions

 •  Bandwidth - gross work per unit time [unattainable]
 •  Throughput - net work per unit time
 •  Peak throughput - at maximum acceptable response time
 •  Utilization - busy time relative to elapsed time [can be misleading]
 •  Queue length - number of requests waiting
 •  Service time - time to process a unit of work after waiting
 •  Response time - time to complete a unit of work including waiting
 •  Key Performance Indicator (KPI) – a measurement you have
    decided to watch because it has some business value
 •  Laws – constraints which physics and math place on reality




CMG-T 2010                     Unix/Linux Quick Start                  Slide 11
Service Level Agreements (SLA)

 •  Behavioral goals for the system in terms of KPIs
   –  NOTE: With or without contracted SLAs, the industry's universal
      KPI tends to be “call center traffic” or “rate of complaints”

 •  Response time target
   –  Rule of thumb: Estimate 95th percentile response time as three
      times mean response time
   –  e.g. if SLA says 1 second response, measured average should
      be less than 333ms

 •  Utilization Target (a proxy for Response Time)
   –  Specified as a minimum and maximum
   –  Minimum utilization target to keep costs down
   –  Maximum utilization target for good response times and capacity
      headroom for future workload fluctuations

CMG-T 2010                   Unix/Linux Quick Start               Slide 12
Capacity Planning Requirements

   •  We care about CPU, Memory, Network and Disk resources, and
      Application response times
   •  We need to know how much of each resource we are using
      now, and will use in the future
   •  We need to know how much headroom we have to handle
      higher loads
   •  We want to understand how headroom varies, and how it relates
      to application response times and throughput
   •  The application workload must be characterized so we can
      understand and manage system behaviours
   •  We want to be able to find the bottleneck in an under-performing
      system



CMG-T 2010                    Unix/Linux Quick Start               Slide 13
Laws




CMG-T 2010   Unix/Linux Quick Start   Slide 14
Ignorance of The Law is no excuse!

 •  Physics
   –  The speed of light
   –  Moore's Law

 •  Management
   –  Murphy's Law
   –  Parkinson's Law

 •  Performance and Capacity
   –  Amdahl's Law
   –  Little's Law




CMG-T 2010                 Unix/Linux Quick Start   Slide 15
The Speed of Light; Deal with it!



              SPEED
               LIMIT
              186,000
              Miles Per Second
               It’s not just a Good Idea!
                      It’s the Law!




CMG-T 2010           Unix/Linux Quick Start   Slide 16
Physics Laws (Well, sort-of …)

 •  The speed of light
   –  Nothing can go faster; 186,000 miles/second
   –  “A foot is a nanosecond”
   –  A thousand miles is (1000/186000=) 5.4 ms, plus routing and
      handling delays, in the case of digital traffic
 •  Moore's Law
   –  From Wikipedia: “Moore's law describes a long-term trend in the
      history of computing hardware. The number of transistors that
      can be placed inexpensively on an integrated circuit has doubled
      approximately every two years.[1] The trend has continued for
      more than half a century and is not expected to stop until 2015 or
      later.[2]”
   –  NOTE: This speaks only of circuit density, not speed.
   –  CORRELARY: This ride is coming to an end?


CMG-T 2010                   Unix/Linux Quick Start                 Slide 17
Management Laws

 •  Parkinson's Law
   –  Often stated simply as; “Work expands to fill the time available.”
   –  Corollary: “Software bloat occurs in direct proportion to gains
      from Moore's Law.”
   –  Corollary: “Disk retention strategies vary in direct proportion to
      increasing storage density.”
   –  Corollary: “Performance optimization has no limits unless you set
      them.” (Bob just made that up.)
 •  Murphy's Law
   –  Often stated simply as; “Anything that can go wrong will go
      wrong; and at the most inopportune time.”
   –  This is what Capacity Planning aims to prevent.




CMG-T 2010                   Unix/Linux Quick Start                 Slide 18
Performance and Capacity Laws
    Must-know!
 •  Little’s Law
    –  Simple form: X = N/R, where X is throughput, N is degree of
       concurrency, and R is response time
 •  Little’s Law is crucial to spotting serialization (N=1) or
    determining other values of N
    –  Q1: With 1 KB I/O size and 0.5 ms response time for reads
       what is the throughput for a single I/O-bound stream?
    –  A1: N/R = 1 KB / 0.0005 = 2000 KB/sec = (2000/1024) or 1.95
       MB/sec
    –  Q2: What degree of concurrency would be required to attain 20
       MB/sec at 0.5 ms response time and 1 KB I/O size?
    –  A2: N = X*R, so 20*1024*0.0005 = 10.24 (Shucks! We need
       an integer N, so make that 11.)



CMG-T 2010                    Unix/Linux Quick Start                 Slide 19
Performance and Capacity Laws
    Must-know!
 •  Amdahl’s Law
    –  From Wikipedia: “The speedup of a program using multiple processors
       in parallel computing is limited by the time needed for the sequential
       fraction [serial portion] of the program.”
 •  Things that are serialized or slow relative to CPU speed
    –    I/O; network or storage
    –    Locking; exclusive access to some data structure
    –    Dispatching; handling out work
    –    “Take a number”; sequence generation
 •  A simple case of Amdahl’s Law with N=1 …
    –  If a process’ response time is 1 second, including all disk accesses,
       network delays, and contention – and only 10% of the time is CPU –
       then an infinitely-fast CPU will only improve the response time to 900
       ms.
    –  Improving a system element only pays in proportion to the ratio of that
       element in total response time.


CMG-T 2010                        Unix/Linux Quick Start                   Slide 20
Ignorance of The Law is no excuse!




CMG-T 2010       Unix/Linux Quick Start   Slide 21
Workloads




CMG-T 2010    Unix/Linux Quick Start   Slide 22
“Workload Characterization”
   An overloaded term!
 •  Homogeneous workloads
   –  Generally easy to characterize and model
   –  Homogeneity is one justification for “tiering” solution architectures

 •  System response to hosted workload(s)
   –  Some performance engineers view the system from the inside-out; their
      metrics focus is on how the hardware, OS, and peripherals are
      stressed at a low level

 •  Heterogeneous workloads
   –  These are the norm on large or complex systems
   –  See, e.g. Ron Kaminski's various CMG contributions

 •  Circumstantial workloads
   –  e.g. failover processing and post-failover operations often escape
      characterization, leading to in-service surprises


CMG-T 2010                       Unix/Linux Quick Start                       Slide 23
Workload Characteristics: One by One
   Constant Workloads
   •  e.g. Numerical computation, compute intensive batch

   •  Trivial to model, utilization and duration define the work




CMG-T 2010                  Unix/Linux Quick Start             Slide 24
Simple Random Arrivals
 •  Random arrival of transactions with fixed mean service
    time
   –  Little’s Law: QueueLength = Throughput * Response
   –  Utilization Law: Utilization = Throughput * ServiceTime
 •  Complex models are often reduced to this model
   –  By averaging over longer time periods since the formulas only
      work if you have stable averages
   –  By wishful thinking (i.e. how to fool yourself)
 •  e.g. Unix Load Average is actually CPU Queue Length
   –  Throughput up a little, load average up a lot = slow system
   –  So load average is a proxy metric for response time
   –  High load average per CPU implies slow response times




CMG-T 2010                   Unix/Linux Quick Start                 Slide 25
Mixed random arrivals of transactions
  with stable mean service times
   •  Think of the grocery store checkout analogy
     –  Trolleys full of shopping vs. baskets full of shopping
     –  Baskets are quick to service, but get stuck behind trolleys
     –  Relative mixture of transaction types starts to matter
   •  Many transactional systems handle a mixture
     –  Databases, web services
   •  Consider separating fast and slow transactions
     –  So that we have a “10 items or less” line just for baskets
     –  Separate pools of servers for different services
     –  Don’t mix OLTP with DSS queries in databases
   •  Performance is often thread-limited
     –  Thread limit and slow transactions constrains maximum
        throughput
     –  Throughput = Queue / ResponseTime
   •  Model using analytical solvers like PDQ

CMG-T 2010                   Unix/Linux Quick Start                   Slide 26
Load dependent servers – non-stable
  mean service times
 •  Mean service time increases at high throughput
   –  Due to non-scalable algorithms, lock contention, thrashing
   –  System runs out of memory and starts paging or frequent GC
 •  Many systems have “tipping points”
   –  Hysteresis means they don’t come back when load drops
   –  This is why you have to kill catatonic systems
   –  Some systems actually degrade gracefully under load
 •  Model using simulation tools like Hyperformix, Opnet
   –  Behaviour is non-linear and hard to model
   –  Practical option is to avoid tipping points
   –  Best designs shed load to be stable at the limit




CMG-T 2010                   Unix/Linux Quick Start            Slide 27
Self-similar / fractal workloads – bursty
  rather than random
 •  Self-similar
   –  Looks “random” at close up, stays “random” as you zoom out
   –  Work arrives in bursts, transactions aren’t independent
   –  Bursts cluster together in super-bursts, etc.
 •  Network packet streams tend to be fractal
 •  Common in practice, too hard to model
   –  Probably the most common reason why your model is wrong!




CMG-T 2010                 Unix/Linux Quick Start              Slide 28
State Dependent Services

 •  Personalized services that store user history
   –  Transactions for new users are quick
   –  Transactions for users with lots of state/history are slower
   –  As user base builds state and ages you get into lots of trouble…
 •  Social Networks, Recommendation Services
   –  Facebook, Flickr, Netflix, Pandora, Twitter etc.
 •  “Abandon hope all ye who enter here”
   –    Not tractable to model, repeatable tests are tricky
   –    Long fat tail response time distribution and timeouts
   –    Excessively long service times for some users
   –    Solutions: careful algorithm design, lots of caching




CMG-T 2010                     Unix/Linux Quick Start             Slide 29
Workload Modelling Survivalism
   •  Simplify the workload algorithms
     –  move from hard or impossible to simpler models
     –  use caching and pre-compute to get constant service times

   •  Stand further away
     –  averaging is your friend – gets rid of complex fluctuations

   •  Minimalist Models
     –  most models are far too complex – the classic beginners error…
     –  the art of modelling is to only model what really matters

   •  Don’t model details you don’t use
     –  model peak hour of the week, not day to day fluctuations
     –  e.g. “Will the web site survive next Sunday night?”




CMG-T 2010                      Unix/Linux Quick Start                   Slide 30
Contrarian Perspective on Workloads

 •  Classical breakdown, e.g. for CPU
    –  %usr
    –  %sys
 •  A more-practical breakdown
    –  Work; what the system was bought to do – categorized by its
       importance to the business
    –  Overhead; memory management, I/O and network stacks, lock
       management, context switches, migrations
    –  Opportunistic usage; over-achievers (including screen-savers, low-
       priority workload elements, and high-priority workload elements)
    –  Waste; bugs, inefficient code, absence of Best Practices
 •  Stovepiped organizations that feed only classical data to the
    Capacity Planners will predictably over-provision
    –  Gains attainable from improved efficiency, performance discipline, and
       teamwork represent “latent capacity”; it’s often huge


CMG-T 2010                      Unix/Linux Quick Start                      Slide 31
Strategies




CMG-T 2010    Unix/Linux Quick Start   Slide 32
Performance and Capacity Strategies

 1. Empirical Methods (Great! Only expensive if you do it!
   Benchmarks, stress testing, test-to-scale, test-to-fail – with known Best
   Practices & basic performance analysis and tuning
 2. Modeling (Highly recommended, moderate cost)
   Using tools such as TeamQuest Model (TQM), BMC Perform/Predict, Hy-
   Performix, Gunther's PDQ or other application of proper science and math
 3. Expert Opinions (Recommended minimum, cheapest
   Listening to the right experts for Best Practices, analysis and tuning methods,
   and sizing. The hazard with opinions is that there are so many of them!

 4. Guesswork (The Norm)
   Straight-line extrapolations, naïve use of reference benchmarks, massive over-
   provisioning, misdirected or uncontrolled testing, blind luck

 5. Opportunism (Commonplace)
   Spend the available budget


CMG-T 2010                         Unix/Linux Quick Start                      Slide 33
Best Practices

 •  Absent Best Practices …
   –    Performance, stability, capacity, or predictability may all suffer
   –    Capacity planners may end up scaling-up groww inefficiency
   –    Data from from the system will not directly indicate the deviation
   –    One is prone to “re-inventing the wheel”
   –    It’s like having Yellow Fever versus getting the innoculation
 •  Best Practices are …
   –    Repeatable, time-proven practices
   –    Steps to take as a matter of routine
   –    Possibly highly-localized and application-specific
   –    Goal-oriented; eg: simplicity, performance, cost
 •  Best Practices are not …
   –  A lab result or single practitioner’s experience, extrapolated to the
      general case
   –  A guarantee of automatic success in all things


CMG-T 2010                         Unix/Linux Quick Start                     Slide 34
A Best Practice Example

 •  A Best Practice for performance, consistency, and scalability of
    storage for Oracle databases is to use a solution with these
    characteristics:
    –  Unbuffered; no stress on OS buffer memory management
    –  Concurrent; no single-writer lock or similar bottlenecks
    –  Stable data placement; facilitates high-bandwidth reads
 •  Many compliant options exist, including
    –  UFS with direct I/O; QFS with samaio; RAW; Oracle Automated
       Storage Management (ASM); VxFS with QIO, ODM, or cio+direct
 •  See also http://blogs.sun.com/bobs/entry/one_i_o_two_i
    –  I’ve added “stable data placement” to the criteria since that blog entry
       to reflect experiences seen with WAFL-type filesystems
    –  There is a vast amount of energy that gets spent annually learning
       these things the hard way or trying to work-around them



CMG-T 2010                        Unix/Linux Quick Start                     Slide 35
Session 2 of 3


      A survey the data sources and tools available for
   traditional resource-oriented analysis (network, CPU,
        memory, storage), plus the tools available for
 understanding kernel, hardware, and per-thread program
     performance. Pointers will be given to some major
 commercial and open-source tools for performance and
  capacity management. Some emphasis will be placed
 on Solaris features including microstate accounting and
                           DTrace.




CMG-T 2010             Unix/Linux Quick Start        Slide 36
Metrics




CMG-T 2010   Unix/Linux Quick Start   Slide 37
Measurement Data Interfaces
  •  Several generic raw access methods
      –    Read the kernel directly
      –    Structured system data
      –    Process data
      –    Network data
      –    Accounting data
      –    Application data
  •  Command based data interfaces
      –  Scrape data from vmstat, iostat, netstat, sar, ps
      –  Higher overhead, lower resolution, missing metrics
  •  Data available is generally platform- and release-specific
  •  Extremely valuable, but not discussed here …
     –  Application-level instrumentation; e.g. Oracle RDBMS
     –  WAN, SAN, and LAN “sniffers”; sample data “outside the box”



CMG-T 2010                        Unix/Linux Quick Start              Slide 38
Reading kernel memory - kvm
  •    The only way to get data in very old Unix variants
  •    Use kernel namelist symbol table and open /dev/kmem
  •    Solaris wraps up interface in kvm library
  •    Advantages
        –  Still the only way to get at some kinds of data
        –  Low overhead, fast bulk data capture
  •    Disadvantages
        –  Too much intimate implementation detail exposed
        –  No locking protection to ensure consistent data
        –  Highly non-portable, unstable over releases and patches
        –  Tools break when kernel moves between 32 and 64bit address
           support



CMG-T 2010                         Unix/Linux Quick Start               Slide 39
Structured Kernel Statistics - kstat
  •  Solaris 2 introduced kstat and extended usage in each release
  •  Used by Solaris 2 vmstat, iostat, sar, network interface stats, etc.
  •  Advantages
      –  The recommended and supported Solaris metric access API
      –  Does not require setuid root commands to access for reads
      –  Individual named metrics stable over releases
      –  Consistent data using locking, but low overhead
      –  Unchanged when kernel moves to 64bit address support
      –  Extensible to add metrics without breaking existing code
  •  Disadvantages
      –  Somewhat complex hierarchical kstat_chain structure
      –  State changes (device online/offline) cause kstat_chain
         rebuild


CMG-T 2010                     Unix/Linux Quick Start                 Slide 40
Kernel Tracing - TNF, prex, ktrace
  •  Solaris, Linux, Windows and other Unixes have similar features
      –  Solaris has TNF probes and prex command to control them
      –  User level probe library for hires tracepoints allows
         instrumentation of multithreaded applications
      –  Kernel level probes allow disk I/O and scheduler tracing
  •  Advantages
      –  Low overhead, microsecond resolution
      –  I/O trace capability is extremely useful
  •  Disadvantages
      –  Too much data to process with simple tracing capabilities
      –  Trace buffer can overflow or cause locking issues



CMG-T 2010                    Unix/Linux Quick Start                 Slide 41
DTrace – Dynamic Tracing
  •  One of the most exciting new features in Solaris 10, rave reviews
     –  Also in Apple's OS X 10.5; man -k dtrace, plus “Instruments” GUI
  •  Advantages
     –  No overhead when it is not in use
     –  Low overhead probes can be put anywhere/everywhere
     –  Trace data is correlated and filtered at source; get exactly the data
        you want; very sophisticated data providers included
     –  Bundled, supported, designed to be safe for production systems
     –  Stable foundation for many tools; system tools, scripts, and GUIs
  •  Disadvantages
     –  Not on Linux yet
     –  Excessive DTrace probes can cause high systemic overhead (which
        is only a problem is it occurs by accident); proven scripts avoid this
     –  Yet another (awk-like) scripting language to learn; pre-packaged
        scripts can avoid this, also



CMG-T 2010                       Unix/Linux Quick Start                    Slide 42
DTrace – Dynamic Tracing
  •  References
    –  Book: "Solaris Performance and Tools" by Richard McDougall, Jim
       Mauro, and Brendan Gregg, Prentice-Hall, 2006
    –  Guide: “How to Use Oracle® Solaris DTrace from Oracle Solaris and
       OpenSolaris System” @
       http://developers.sun.com/solaris/docs/o-s-dtrace-htg.pdf
    –  Treasure trove: Brendan Gregg's DTrace Toolkit @
       http://www.brendangregg.com/dtrace.html




CMG-T 2010                    Unix/Linux Quick Start                 Slide 43
Hardware counters
  •  Most modern CPUs and systems have hardware counters; tools to access
     these counters are quite varied …
     –  Solaris cpustat for X86 and UltraSPARC pipeline and cache counters, corestat for
        multi-core systems; busstat for server backplanes and I/O buses
     –  Solaris Intel Trace Collector, Vampir for Linux
     –  AMD EMON; only under license
  •  Advantages
     –  See what is really happening; more accurate than kernel stats
     –  Cache usage useful for tuning code algorithms
     –  Pipeline usage useful for HPC tuning for megaflops
     –  Some VA/PA/TLB memory-management details only observable via counters
     –  Backplane and memory bank usage useful for database servers

  •  Disadvantages
     –  Raw data is confusing; requires post-processing scripts
     –  Privilege needed for access; not sharable to hosted virtual domains
     –  Lots of propeller-headed architectural background info needed
     –  Most tools focus on developer code tuning


CMG-T 2010                           Unix/Linux Quick Start                         Slide 44
Configuration information
  •    System configuration data comes from too many sources!
        –  Solaris device tree displayed by prtconf and prtdiag
        –  Solaris 8 adds dynamic configuration notification device picld
        –  SCSI device info using iostat -E in Solaris
        –  Logical volume info from product specific vxprint and metastat
        –  Hardware RAID info from product specific tools
        –  Critical storage config info must be accessed over ethernet…
        –  Linux device tree in /proc is a bit easier to navigate
  •    It is very hard to combine all this data!
  •    DMTF CIM objects try to address this, but no-one seems to use them…
  •    Free tool - Config Engine: http://www.cfengine.org




CMG-T 2010                          Unix/Linux Quick Start                  Slide 45
Application instrumentation Examples
  •    Oracle V$ Tables – detailed metrics used by many tools
  •    Apache logging for web services
  •    ARM standard instrumentation
  •    Custom do-it-yourself and log file scraping
  •    Advantages
        –  Focussed application specific information
        –  Business metrics are needed to do real capacity planning
  •    Disadvantages
        –  No common access methods
        –  ARM is a collection interface only, vendor specific tools, data
        –  Very few applications are instrumented, even fewer have support
           from performance tools vendors




CMG-T 2010                        Unix/Linux Quick Start                 Slide 46
Kernel values, tunables and defaults
   •  There is often far too much emphasis on kernel tweaks
       –  There really are few “magic bullet” tunables
       –  It rarely makes a significant difference
   •  Fix the system configuration or tune the application instead!
   •  Very few adjustable components
       –  “No user serviceable parts inside”
       –  But Unix has so much history people think it is like a 70’s car
       –  Solaris really is dynamic, adaptive and self-tuning
       –  Most other “traditional Unix” tunables are just advisory limits
       –  Tweaks may be workarounds for bugs/problems
       –  Patch or OS release removes the problem - remove the tweak
   Solaris Tunable Parameters Reference Manual (if you must…)
      –  http://docs.sun.com/app/docs/doc/817-0404



CMG-T 2010                       Unix/Linux Quick Start                     Slide 47
Process-based data in /proc

  •    /proc filesystem is a common foundation
       –  Used by ps, proctool and debuggers, pea.se, proc(1) tools on Solaris
       –  Solaris and Linux both have /proc/pid/metric hierarchy
       –  Linux also includes system information in /proc rather than kstat
  •    Advantages
        –  The recommended and supported process access API
        –  Metric data structures reasonably stable over releases
        –  Consistent data using locking
        –  Solaris microstate data provides accurate process state timers
  •    Disadvantages
        –  High overhead for open/read/close of every process on busy
           systems
        –  Linux reports data as formatted ASCII text, Solaris uses binary
           structures that require tools for formatting

CMG-T 2010                        Unix/Linux Quick Start                     Slide 48
Network protocol data
  •  Based on a streams module interface in Solaris
  •  Solaris 2 ndd interface used to configure protocols and interfaces
  •  Solaris 2 mib interface used by netstat -s and snmpd to get TCP
     stats etc.
  •  Advantages
      –  Individual named metrics reasonably stable over releases
      –  Consistent data using locking
      –  Extensible to add metrics without breaking existing code
      –  Solaris ndd can retune TCP online without reboot
      –  System data is often also made available via SNMP protocol
  •  Disadvantages
      –  Underlying API is not supported, SNMP access is preferred




CMG-T 2010                    Unix/Linux Quick Start                Slide 49
Tracing and profiling
   •  Tracing Tools
      –    truss - shows system calls made by a process
      –    sotruss / apitrace - shows shared library calls
      –    prex - controls TNF tracing for user and kernel code
      –    snoop/tcpdump – network traces for analysis with wireshark
   •  Profiling Tools
      –    Compiler profile feedback using -xprofile=collect and use
      –    Sampled profile relink using -p and prof/gprof
      –    Function call tree profile recompile using -pg and gprof
      –    Shared library call profiling setenv LD_PROFILE and gprof
   •  Accurate CPU timing for process using /usr/proc/bin/ptime
   •  Microstate process information using pea.se and pw.se
     10:40:16 name lwmx   pid   ppid    uid    usr%   sys% wait% chld% size   rss     pf
     nis_cachemgr     5   176      1      0    1.40   0.19 0.00 0.00 16320 11584     0.0
     jre              1 17255   3184   5743   11.80   0.19 0.00 0.00 178112 110336    0.0
     sendmail         1 16751      1      0    1.01   0.43 0.00 0.43 18624 16384     0.0
     se.sparc.5.6     1 16741   1186   9506    5.90   0.47 0.00 0.00 16320 14976     0.0
     imapd            1 16366    198   5710    6.88   1.09 1.02 0.00 34048 29888     0.1
     dtmail          10 16364   9070   5710    0.75   1.12 0.00 0.00 102144 94400    0.0




CMG-T 2010                              Unix/Linux Quick Start                              Slide 50
Free Tools
                  (See Separate Slide Deck)
             http://www.slideshare.net/adrianco




CMG-T 2010              Unix/Linux Quick Start    Slide 51
Headroom?
         What’s the matter with U?




CMG-T 2010        Unix/Linux Quick Start   Slide 52
What would you say if you were asked:

 How busy is that system?
 A: “I have no idea.”
 A: “42%”
 A: “Why do you want to know?”
 A: “I’m sorry, but I’m afraid that you don’t understand
  your question.”




CMG-T 2010              Unix/Linux Quick Start       Slide 53
Utilization
  •  It looks simple enough …
     –       Utilization is the proportion of busy time
     –       Always defined over a time interval
     –       Instantaneously, it’s binary; 100% or 0%
     –       Let’s run with this for a while, then circle back to the issues …

                                                                               OnCPU Scheduling for Each CPU




                                                        Mean CPU Util
                                                         OnCPU and
                      usr+sys CPU for Peak Period
                                                                        0.56

            100
            90                                                            0
            80                                                                 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
            70
            60                                                                                      Microseconds
    CPU %




            50
            40
            30                     Utilization
            20
            10
             0
                                    Time


CMG-T 2010                                    Unix/Linux Quick Start                                                            Slide 54
Headroom
  •  Headroom relates to available usable resources
      –  Total Capacity minus Peak Utilization and Margin
      –  Applies to CPU, RAM, Net, Disk and OS
      –  “Usable” is the rub; idle resources may not be actually
         usable due to software bottlenecks, such as locking or
         limited concurrency

                      usr+sys CPU for Peak Period

                100
                                       Margin
                90
                80
                                       Headroom
                70
                60
        CPU %




                50
                40               Utilization
                30
                20
                10
                 0
                                      Time


CMG-T 2010                   Unix/Linux Quick Start                Slide 55
Headroom Estimation

 •  CPU Capacity
   –  Relatively easy to figure out for well-behaved, homogeneous,
      steady-state workloads
   –  “Over-achievers”, bad tuning, and common Best Practice
      deviations inflate perception of required capacity
 •  Network Usage
   –  Use bytes not packets/s
 •  Memory Capacity
   –  Tricky - easier in Solaris 8
 •  Disk Capacity
   –  Can be very complex
   –  A complex gamut of software layers may reside between the
      application and its disk

CMG-T 2010                      Unix/Linux Quick Start               Slide 56
Response Time

  •  Response Time = Queue time + Service time
  •  The Usual Assumptions …
      –  Steady state averages
      –  Random arrivals
      –  Constant service time
      –  M servers processing the same queue

  •  Approximations
      –  Queue length = Throughput * Response Time (Little's Law)
      –  Utilization = Throughput * Service Time (Utilization law)
      –  Response Time = Service Time / (1 - UtilizationM)




CMG-T 2010                      Unix/Linux Quick Start               Slide 57
Response Time Curves

 •  The traditional view of Utilization as a proxy for response time
 •  Systems with many CPUs can run at higher utilization levels, but degrade
    more rapidly when they run out of capacity
 •  Headroom margin should be set according to a response time target


                                                                  Response Time Curves                  R = S / (1 - (U%)m)
                                      10.00
      Response Time Increase Factor




                                       9.00
                                       8.00
                                       7.00                                                                          One CPU
                                                                                                                     Two CPUs
                                       6.00
                                                                                                                     Four CPUs
                                       5.00                                                                          Eight CPUs
                                                                                  Headroom                           16 CPUs
                                       4.00
                                                                                  margin                             32 CPUs
                                       3.00                                                                          64 CPUs
                                       2.00
                                       1.00
                                       0.00
                                              0   10   20   30      40      50       60       70   80   90    100
                                                                 Total System Utilization %




CMG-T 2010                                                           Unix/Linux Quick Start                                 Slide 58
So what's the problem with Utilization?
  •  Unsafe assumptions!
    –  Modern systems are complex, adaptive, highly non-linear, and are often
       virtualized or include shared components
  •  Random arrivals?
     –  Bursty traffic with long tail arrival rate distribution
  •  Constant service time?
     –  Variable clock rate CPUs, inverse load dependent service time
     –  Complex transactions, request and response dependent
  •  M servers processing the same queue?
     –  Virtual servers with varying non-integral concurrency
     –  Non-identical servers or CPUs, Hyperthreading, Multicore, NUMA
  •  Measurement Errors?
     –  Mechanisms with built in bias, e.g. sampling from the scheduler clock
     –  Platform and release specific systemic changes in accounting of
        interrupt time

CMG-T 2010                          Unix/Linux Quick Start                  Slide 59
Variable Clock Rate CPUs
 •    Laptop and other low power devices do this all the time
       –  Watch CPU usage of a video application and toggle mains/battery power
 •    Server CPU Power Optimization - AMD PowerNow!™
       –  AMD Opteron server CPU detects overall utilization and reduces clock
          rate
       –  Actual speeds vary, but for example could reduce from 2.6GHz to
          1.2GHz
       –  Changes are not understood or reported by operating system metrics
       –  Speed changes can occur every few milliseconds (thermal shock issues)
       –  Dual core speed varies per socket, Quad core varies per core
       –  Quad core can dynamically stop entire cores to save power
       –  Note: Older and "low power" Opterons in blades use fixed clock rate
 •    Possible scenario:
       –  You estimate 20% utilization at 2.6GHz
       –  You see 45% reported in practice (at 1.2GHz)
       –  Load doubles, reported utilization drops to 40% (at 2.6GHz)
       –  Actual mapping of utilization to clock rate is unknown at this point


CMG-T 2010                           Unix/Linux Quick Start                      Slide 60
Virtual Machine Monitors

 •  VMware, Xen, IBM LPARs etc.
    –  Non-integral and non-constant fractions of a machine
    –  Naive operating systems and applications don't expect this behavior
    –  However, lots of recent tools development from vendors
 •  Average CPU count must be reported for each measurement
    interval
 •  VMM overhead varies, application scaling characteristics may be
    affected




CMG-T 2010                     Unix/Linux Quick Start                   Slide 61
Threaded CPU Pipelines

 •    CPU microarchitecture optimizations
       –  Extra register sets working with one execution pipeline
       –  When the CPU stalls on a memory read, it switches registers/threads
       –  Operating system sees multiple schedulable entities (CPUs)

 •    Intel Hyperthreading
       –  Each CPU core has an extra thread to use spare cycles
       –  Typical benefit is 20%, so total capacity is 1.2 CPUs
       –  I.e. Second thread much slower when first thread is busy
       –  Hyperthreading aware optimizations in recent operating systems

 •    Sun “CoolThreads”
       –    "Niagara" SPARC CPU has eight cores, one shared floating point unit
       –    Each CPU core has four threads, but each core is a very simple design
       –    Behaves like 32 slow CPUs for integer, snail-like uniprocessor for FP
       –    Overall throughput is very high, performance per watt is exceptional
       –    Niagara 2 has dedicated FPU and 8 threads per core (total 64 threads)
       –    Each generation varies in low-level architectural details



CMG-T 2010                                Unix/Linux Quick Start                    Slide 62
Measurement Errors

   •  Mechanisms with built in bias
       –  e.g. sampling from the scheduler clock underestimates CPU usage
       –  Solaris 9 and before, Linux, AIX, HP-UX “sampled CPU time”
       –  Solaris 10 and HP-UX “measured CPU time” far more accurate
       –  Solaris microstate process accounting always accurate but in Solaris 10
          microstates are also used to generate system-wide CPU

   •  Accounting for interrupt time
       –  Platform and release specific systemic changes
       –  Solaris 8 - sampled interrupt time spread over usr/sys/idle
       –  Solaris 9 - sampled interrupt time accumulated into sys only
       –  Solaris 10 – measured interrupt time spread over usr/sys/idle
       –  Solaris 10 Update 1 – measured interrupt time in sys only



CMG-T 2010                         Unix/Linux Quick Start                       Slide 63
CPU time measurements
   •  Biased sample CPU measurements
      –  See 1998 Paper "Unix CPU Time Measurement Errors"
      –  Microstate measurements are accurate, but are platform and tool specific.
         Sampled metrics are more inaccurate at low utilization
   •  CPU time is sampled by the 100Hz clock interrupt
      –    sampling theory says this is accurate for an unbiased sample
      –    the sample is very biased, as the clock also schedules the CPU
      –    daemons that wakeup on the clock timer can hide in the gaps
      –    problem gets worse as the CPU gets faster
   •  Increase clock interrupt rate? (Solaris)
      –  set hires_tick=1 sets rate to 1000Hz, good for realtime wakeups
      –  harder to hide CPU usage, but slightly higher overhead
   •  Use measured CPU time at per-process level
      –    microstate accounting takes timestamp on each state change
      –    very accurate and also provides extra information
      –    still doesn’t allow for interrupt overhead
      –    Prstat -m and the pea.se command uses this accurate measurement


CMG-T 2010                           Unix/Linux Quick Start                      Slide 64
More CPU Measurement Issues

   •  Load average differences
      –  Just includes CPU queue (Solaris)
      –  Includes CPU and Disk (Linux) – which is a broken metric
   •  Wait for I/O (%wio) is a misleading statistic altogether
      –  Metric removed in Solaris 10 – always zero
      –  Ignore it in all other Unix/Linux releases; add it to %idle
      –  There is no universal “propensity to compute” for a thread
         blocked on I/O; who knows how much %cpu it might use
         when it wakes up?




CMG-T 2010                     Unix/Linux Quick Start                  Slide 65
How to plot Headroom

 •  Measure and report absolute CPU power if you can get it …
 •  Plot shows headroom in blue, margin in red, total power tracking day/night
    workload variation, plotted as mean + two standard deviations.




CMG-T 2010                       Unix/Linux Quick Start                    Slide 66
“Cockcroft Headroom Plot”
 •    Scatter plot of response time
      (ms) vs. Throughput (KB)
      from iostat metrics
 •    Histograms on axes
 •    Throughput time series plot
 •    Shows distributions and
      shape of response time
 •    Fits throughput weighted
      inverse gaussian curve
 •    Coded using "R" statistics
      package
 •    Blogged development at
 http://perfcap.blogspot.com/search?q=chp




CMG-T 2010                             Unix/Linux Quick Start   Slide 67
How busy is that system again?

 •  Check your assumptions …
 •  Record and plot absolute capacity for each measurement interval
 •  Plot response time as a function of throughput, not just utilization
 •  SOA response characteristics are complicated …
 •  More detailed discussion in CMG06 Paper and blog entries
    –  “Utilization is Virtually Useless as a Metric” - Adrian Cockcroft - CMG06


                http://perfcap.blogspot.com/search?q=utilization
                    http://perfcap.blogspot.com/search?q=chp




CMG-T 2010                          Unix/Linux Quick Start                         Slide 68
CPU




CMG-T 2010   Unix/Linux Quick Start   Slide 69
CPU Capacity Measurements

   •  CPU Capacity is defined by CPU type and clock rate, or
      a benchmark rating like SPECrateInt2000
   •  CPU throughput - CPU scheduler transaction rate
      –  measured as the number of voluntary context switches
   •  CPU Queue length
      –  CPU load average gives an approximation via a time
         decayed average of number of jobs running and ready to run
   •  CPU response time
      –  Solaris microstate accounting measures scheduling delay
   •  CPU utilization
      –  Defined as busy time divided by elapsed time for each CPU
      –  Badly distorted and undermined by virtualization……


CMG-T 2010                  Unix/Linux Quick Start                 Slide 70
Controlling and CPUs in Solaris

 •  psrinfo - show CPU status and clock rate
 •  Corestat - show internal behavior of multi-core CPUs
 •  psradm - enable/disable CPUs
 •  pbind - bind a process to a CPU
 •  psrset - create sets of CPUs to partition a system
    –  At least one CPU must remain in the default set, to run kernel services like
       NFS threads
    –  All CPUs still take interrupts from their assigned sources
    –  Processes can be bound to sets
 •  mpstat shows per-CPU counters (per set in Solaris 9)
 CPU minf mjf xcal   intr ithr   csw icsw migr smtx          srw syscl   usr sys   wt idl
 0     45   1    0    232    0   780 234 106 201               0   950    72 28     0   0
 1     29   1    0    243    0   810 243 115 186               0 1045     69 31     0   0
 2     27   1    0    235    0   827 243 110 199               0 1000     75 25     0   0
 3     26   0    0    217    0   794 227 120 189               0   925    70 30     0   0
 4      9   0    0    234   92   403   94   84 1157            0   625    66 34     0   0




CMG-T 2010                          Unix/Linux Quick Start                                  Slide 71
Monitoring CPU mutex lock statistics
 •  To fix mutex contention change the application workload or upgrade to a newer OS
    release
 •  Locking strategies are too complex to be patched
 •  lockstat Command
    –    very powerful and easy to use
    –    Solaris 8 extends lockstat to include kernel CPU time profiling
    –    dynamically changes all locks to be instrumented
    –    displays lots of useful data about which locks are contending

 # lockstat sleep 5
 Adaptive mutex spin: 3318 events
 Count indv cuml rcnt     spin Lock                   Caller
 -------------------------------------------------------------------------------
 601 18% 18% 1.00          1 flock_lock             cleanlocks+0x10
 302   9% 27% 1.00         7 0xf597aab0             dev_get_dev_info+0x4c
 251   8% 35% 1.00         1 0xf597aab0             mod_rele_dev_by_major+0x2c
 245   7% 42% 1.00         3 0xf597aab0             cdev_size+0x74
 160   5% 47% 1.00         7 0xf5b3c738             ddi_prop_search_common+0x50




CMG-T 2010                              Unix/Linux Quick Start                     Slide 72
Network




CMG-T 2010   Unix/Linux Quick Start   Slide 73
Network interface and NFS metrics

 •  Network interface throughput counters from kstat on Solaris
    –  rbytes, obytes — read and output byte counts
    –  multircv, multixmt — multicast byte counts
    –  brdcstrcv, brdcstxmt — broadcast byte counts
    –  norcvbuf, noxmtbuf — buffer allocation failure counts
 •  Linux netstat shows byte throughput (Solaris doesn’t)
 •  NFS Client Statistics Shown in iostat on Solaris
 crun% iostat -xnP                              extended device Statistics
 r/s w/s    kr/s   kw/s wait actv wsvc_t asvc_t %w %b device
 0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 crun:vold(pid363)
 0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 servdist:/usr/dist
 0.0 0.5     0.0    7.9 0.0 0.0      0.0   20.7   0 1 servhome:/export/home/adrianc
 0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 servhome:/var/mail
 0.0 1.3     0.0   10.4 0.0 0.2      0.0 128.0    0 2 c0t2d0s0
 0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 c0t2d0s2




CMG-T 2010                           Unix/Linux Quick Start                           Slide 74
TCP - A Simple Approach

 •  Capacity and Throughput Metrics to Watch
 •  Connections
   –  Current number of established connections
   –  New outgoing connection rate (active opens)
   –  Outgoing connection attempt failure rate
   –  New incoming connection rate (passive opens)
   –  Incoming connection attempt failure rate (resets)
 •  Throughput
   –  Input and output byte rates
   –  Input and output segment rates
   –  Output byte retransmit percentage




CMG-T 2010                   Unix/Linux Quick Start       Slide 75
Obtaining Measurements

 •  Get the TCP MIB via SNMP or netstat -s
 •  Standard TCP metric names:
   –  tcpCurrEstab: current number of established connections
   –  tcpActiveOpens: number of outgoing connections since boot
   –  tcpAttemptFails: number of outgoing failures since boot
   –  tcpPassiveOpens: number of incoming connections since boot
   –  tcpOutRsts: number of resets sent to reject connection
   –  tcpEstabResets: resets sent to terminate established
      connections
   –  (tcpOutRsts - tcpEstabResets): incoming connection failures
   –  tcpOutDataSegs, tcpInDataSegs: data transfer in segments
   –  tcpRetransSegs: retransmitted segments



CMG-T 2010                  Unix/Linux Quick Start              Slide 76
Internet Server Issues

 •  TCP Connections are expensive
   –  TCP is optimized for reliable data on long lived connections
   –  Making a connection uses a lot more CPU than moving data
   –  Connection setup handshake involves several round trip delays
   –  Each open connection consumes about 1 KB plus data buffers
 •  Pending connections cause “listen queue” issues
 •  Each new connection goes through a “slow start” ramp up
 •  Other TCP Issues
   –  TCP windows can limit high latency high speed links
   –  Lost or delayed data causes time-outs and retransmissions




CMG-T 2010                  Unix/Linux Quick Start                Slide 77
TCP Sequence Diagram for HTTP Get




CMG-T 2010      Unix/Linux Quick Start   Slide 78
Stalled HTTP Get and Persistent HTTP




CMG-T 2010       Unix/Linux Quick Start
                                          Slide 79
Memory




CMG-T 2010   Unix/Linux Quick Start   Slide 80
Memory Capacity Measurements

   •  Physical Memory Capacity Utilization and Limits
      –  Kernel memory, Shared Memory segment
      –  Executable code, stack and heap
      –  File system cache usage, Unused free memory
   •  Virtual Memory Capacity - Paging/Swap Space
      –  When there is no more available swap, Unix stops working
   •  Memory Throughput
      –  Hardware counter metrics can track CPU to Memory traffic
      –  Page in and page out rates
   •  Memory Response Time
      –  Platform specific hardware memory latency makes a difference,
         but hard to measure
      –  Time spent waiting for page-in is part of Solaris microstate
         accounting


CMG-T 2010                    Unix/Linux Quick Start                     Slide 81
Page Size Optimization
 •  Systems may support large pages for reduced overhead
   –  Solaris support is more dynamic/flexible than Linux at present
 •  Intimate Shared Memory locks large pages in RAM
   –  No swap space reservation
   –  Used for large database server Shared Global Area
 •  No good metrics to track usage and fragmentation issues
 •  Solaris ppgsz command can set heap and stack pagesize
 •  SPARC Architecture
   –  Base page size is 8KB, Large pages are 4MB
 •  Intel/AMD x86 Architectures
   –  Base page size is 4KB, Large pages are 2MB



CMG-T 2010                   Unix/Linux Quick Start                    Slide 82
Cache principles
 •  Temporal locality - “close in time”
    –  If you need something frequently, keep it near you
    –  If you don’t use it for a while, put it back
    –  If you change it, save the change by putting it back
 •  Spacial locality - “close in space - nearby”
    –  If you go to get one thing, get other stuff that is nearby
    –  You may save a trip by prefetching things
    –  You can waste bandwidth if you fetch too much you don’t use
 •  Caches work well with randomness
    –  Randomness prevents worst case behaviour
    –  Deterministic patterns often cause cache busting accesses
 •  Very careful cache friendly tuning can give great speedups




CMG-T 2010                       Unix/Linux Quick Start              Slide 83
The memory go round - Unix/Linux

  •  Memory usage flows between subsystems


                            Kernel                   System V
                            Memory                   Shared
                            Buffers                  Memory



                            kernel    kernel       shm_unlink
                            free      alloc shmget
                                         Head
                                                           delete
                         exit
                                          Free            read
                             brk          RAM
                             pagein                       write
                                          List            mmap
                              reclaim               reclaim
             Process                                              Filesystem
             Stack and                   Ta il                    Cache
             Heap
                             pageout               pageout
                             scanner               scanner



CMG-T 2010                              Unix/Linux Quick Start                 Slide 84
The memory go round - Solaris 8 and Later

  •  Memory usage flows between subsystems


                              Kernel                     System V
                              Memory                     Shared
                              Buffers                    Memory



                              kernel     kernel        shm_unlink
                                         alloc  shmget
                              free
                                             Head

                           exit          Free RAM List
                               brk                read
                               pagein  delete     write
                                                  mmap
                                         Filesystem
                                reclaim Cache
               Process
               Stack and                     Ta il
               Heap
                               pageout
                               scanner


CMG-T 2010                             Unix/Linux Quick Start       Slide 85
Solaris Swap Space
 •  Swap is very confusing and badly instrumented!
 # se swap.se
 ani_max 54814 ani_resv 19429 ani_free 37981 availrmem 13859 swapfs_minfree 1972
    ramres 11887 swap_resv 19429 swap_alloc 16833 swap_avail 47272 swap_free 49868
 Misleading data printed by swap -s
 134664 K allocated + 20768 K reserved = 155432 K used, 378176 K available
 Corrected labels:
 134664 K allocated + 20768 K unallocated = 155432 K reserved, 378176 K available
 Mislabelled sar -r 1
 freeswap (really swap available) 756352 blocks
 Useful swap data:
 Total swap 520 M available 369 M     reserved 151 M      Total disk 428 M   Total RAM 92 M
 # swap -s
 total: 134056k bytes allocated + 20800k reserved = 154856k used, 378752k available
 # sar -r 1
 18:40:51 freemem freeswap
 18:40:52       4152   756912




CMG-T 2010                              Unix/Linux Quick Start                                Slide 86
Session 3 of 3




      Finish the tool survey begun in Session 2, then
     survey several common errors, traps, and pitfalls.
     This session ends with some broad guidelines on
     how to manage performance and capacity in Unix
                  and Linux environments.




CMG-T 2010              Unix/Linux Quick Start            Slide 87
Disk




CMG-T 2010   Unix/Linux Quick Start   Slide 88
Disk Capacity Measurements

   •  Detailed metrics vary by platform
   •  Easy for the simple disk cases
   •  Hard for cached RAID subsystems
   •  Almost Impossible for shared disk subsystems and
      SANs
      –  Another system or volume can be sharing a backend
         spindle, when it gets busy your own volume can saturate,
         even though you did not change your own workload!




CMG-T 2010                  Unix/Linux Quick Start                  Slide 89
Storage Utilization

 •  Storage virtualization broke utilization metrics a long time ago
 •  Host server measures busy time on a "disk"
     –  Simple disk, "single server" response time gets high near 100%
        utilization
     –  Cached RAID LUN, one I/O stream can report 100% utilization, but
        full capacity supports many threads of I/O since there are many disks
        and RAM buffering
 •  New metric - "Capability Utilization"
     –  Adjusted to report proportion of actual capacity for current workload
        mix
     –  Measured by tools such as Ortera Atlas (http://www.ortera.com)




CMG-T 2010                       Unix/Linux Quick Start                    Slide 90
Solaris Filesystems
 •  ufs - standard, reliable, good for lots of small files,
    –  ufs transaction logging; faster writes and recovery
    –  ufs direct I/O feature; especially useful with databases
    –  snapshot features
 •  tmpfs - fastest if you have enough RAM, volatile
 •  NFS
    –  NFS2 - safe and common, 8KB blocks, slow writes
    –  NFS3 - more readahead and writebehind, faster
        •  default 32KB block size - fast sequential, may be slow random
        •  default TCP instead of UDP, more robust over WAN
    –  NFS4 - adds stateful behavior
 •  cachefs - good for read-mostly NFS speedup
 •  Veritas VxFS – 3rd-party; expensive
    –  Extent-based, with features for database performance and clustering
 •  QFS
    –  Extent-based, with features for database performance and clustering
    –  No more investments being made there
 •  ZFS – 21st century virtualized storage
    –  Feature-rich
    –  Challenging to performance-manage
    –  Focal point for added development


CMG-T 2010                           Unix/Linux Quick Start                  Slide 91
Solaris 10 ZFS: What it doesn't have ...
  •  Nice features
     –  No extra cost - its bundled in a free OS
     –  No volume manager - its built-in
     –  No space management - file systems use a common pool
     –  No long wait for newfs to finish - create a 3TB file system in a second
     –  No fsck - its transactional commit means its consistent on disk
     –  No slow writes - disk write caches are enabled and flushed reliably
     –  No random or small writes - all writes are large batched sequential
     –  No rsync - snapshots can be differenced and replicated remotely
     –  No silent data corruption - all data is checksummed as it is read
     –  No bad archives - all the data in the file system is scrubbed regularly
     –  No penalty for software RAID - RAID-Z has a clever optimization
        (though it has limited practical applications)
     –  No downtime - mirroring, RAID-Z and hot spares
     –  No immediate maintenance - double parity disks if you need them
  •  Wish-list
     –  No way to know how much performance headroom you have!
     –  No clustering support


CMG-T 2010                          Unix/Linux Quick Start                        Slide 92
Solaris ZFS: Fundamental Tradeoffs
  •  All physical I/O ultimately occurs in ZFS <recordsize> quanta,
     which defaults to 128 KB (but can be set per-pool)
     –  Random reads with poor locality may suffer when logical I/O size is
        small relative to the physical I/O size
     –  Sequential reads can suffer when physical I/O size is small relative to
        logical I/O size, though ZFS prefetching often effectively offsets this
  •  All writes are Copy-On-Write (COW) to fresh space in the pool
     –  Randomly-updated data tends to become physically fragmented
     –  Sequential read performance can vary significantly with the degree of
        physical fragmentation
     –  Backend prefetch algorithms are generally thwarted by fragmentation
  •  All I/O is buffered and checksummed
     –  This increases CPU demand relative to raw or direct I/O options
     –  Memory management may become seriously complicated by
        allocations for the Adaptive Replacement Cache (ARC)
     –  ZFS record checksums may be redundant with application data
        protections (e.g. Oracle checksums)

CMG-T 2010                       Unix/Linux Quick Start                    Slide 93
Solaris ZFS: Storage Appliances
  •  Sun S7000 Storage
    –    Inexpensive storage solutions based on commodity components
    –    Spawned from Sun’s “Fishworks” team
    –    Downloadable simulator allows easy experimentation
    –    See http://www.oracle.com/us/products/servers-storage/storage/unified-storage/index.html
  •  Performance Analysis & Management Features
    –  Storage Analytics; powerful GUI-based monitoring and analysis.
       For a most unusual unusual peek at the Storage Analytics tool, see
       the video @ http://blogs.sun.com/brendan/entry/unusual_disk_latency
    –  Configuration option for latency-sensitivity (use SSD-based ZILs) or
       throughput (skip the SSD-based ZILs)




CMG-T 2010                                 Unix/Linux Quick Start                                   Slide 94
Solaris ZFS: Resources
  •  Review these …
    –  “ZFS Evil Tuning Guide”
      http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
    –  “Configuring Oracle® Solaris ZFS for an Oracle Database”
      http://developers.sun.com/solaris/docs/wp-oraclezfsconfig-0510_ds_ac2.pdf
    –  arcstat.pl Perl script
      http://www.solarisinternals.com/wiki/index.php/Arcstat
  •  … but don’t forget that ZFS has several fundamental tradeoffs
     that can be perplexing for performance and capacity management




CMG-T 2010                       Unix/Linux Quick Start                      Slide 95
Linux Filesystems

 •  There are a large number of options!
    –  http://en.wikipedia.org/wiki/Comparison_of_file_systems
 •  EXT3
    –    Common default for many Linux distributions
    –    Efficient for CPU and space, small block size
    –    relatively simple for reliability and recovery
    –    Journalling support options can improve performance
    –    EXT4 came out of development at the end of 2008
 •  XFS
    –  Based on Silicon Graphics XFS, mature and reliable
    –  Better for large files and streaming throughput
    –  High Performance Computing heritage




CMG-T 2010                        Unix/Linux Quick Start         Slide 96
Disk Configurations

 •  Sequential access is ~10 times faster than random
    –  Sequential rates are now about 50-100 MB/s per disk
    –  Random rates are 166 operations/sec, (250/sec at 15000rpm)
    –  The size of each random read should be as big as possible
 •  Reads should be cached in main memory
    –    “The only good fast read is the one you didn’t have to do”
    –    Database shared memory or filesystem cache is microseconds
    –    Disk subsystem cache is milliseconds, plus extra CPU load
    –    Underlying disk is ~6ms, as its unlikely that data is in cache
 •  Writes should be cached in nonvolatile storage
    –  Allows write cancellation and coalescing optimizations
    –  NVRAM inside the system - Direct access to Flash storage
    –  Solid State Disks based on Flash are the "Next Big Thing"



CMG-T 2010                         Unix/Linux Quick Start                 Slide 97
Disk Throughput
    14000




    12000




    10000




    8000

                                                                                           disk_wK/s
                                                                                           disk_rK/s

    6000




    4000




    2000




       0
    3.47222E-05   0.177118056   0.350740741      0.527824074   0.704872685   0.881944444




CMG-T 2010                                    Unix/Linux Quick Start                               Slide 98
Max and Avg Disk Utilization (Same data)

      100



       90



       80



       70



       60


                                                                                            disk_max%
       50
                                                                                            disk_avg%



       40



       30



       20



       10



         0
      3.47222E-05   0.177118056   0.350740741     0.527824074   0.704872685   0.881944444




CMG-T 2010                                      Unix/Linux Quick Start                                  Slide 99
Data from iostat
 •  What can we see here?
  extended disk statistics !
 disk      r/s w/s    Kr/s   Kw/s wait actv    svc_t    %w    %b   !   sd7 root ufs	

 sd7       0.1 1.7     0.1   13.3 0.0 0.2      109.8     0     1   !
 sd15    534.2 17.5 1320.4   35.0 0.0 0.3        0.6     0    26   !
 sd45    291.9 23.0 603.2    49.8 0.0 0.2        0.6     0    15   !   solid state disks	

 sd60      3.1 0.0    25.3    0.0 0.0 0.0        7.8     0     2   !
 sd61      3.3 0.0    26.4    0.0 0.0 0.0        7.6     0     2   !
 sd62      3.2 0.0    26.1    0.0 0.0 0.0        8.1     0     3   !
 sd63      3.8 0.0    30.1    0.0 0.0 0.0        7.2     0     3   !   stripe 8K RR	

 sd64      3.6 0.0    28.8    0.0 0.0 0.0        7.4     0     3   !
 sd65      3.8 0.0    31.2    0.0 0.0 0.0        7.3     0     3   !
 sd67      9.7 1.5    77.8    4.3 0.0 0.1        9.0     0     8   !
 sd68
 sd69
          10.7 1.4
          10.0 1.5
                      85.3
                      79.9
                              4.2 0.0 0.1
                              4.2 0.0 0.1
                                                 9.0
                                                 9.0
                                                         0
                                                         0
                                                              10
                                                               9
                                                                   !
                                                                   !   stripe 	

 sd70     10.4 1.0    83.1    3.2 0.0 0.1        9.1     0     9   !
 sd71      9.9 1.4    78.8    4.6 0.0 0.1        8.7     0     9   !
 sd72     10.0 1.1    79.9    3.7 0.0 0.1        8.5     0     8   !
 sd75      0.0 27.6    0.0 297.3 0.0 0.0         1.1     0     2   !   cached write
 sd210    12.1 0.3 108.9      0.6 0.0 0.1        9.8     0    10   !   log	

 sd211    12.9 0.4 114.8      0.7 0.0 0.1       10.6     0    11   !
 sd212    12.0 0.6 107.1      1.3 0.0 0.1       11.1     0    10   !
 sd213    13.8 0.3 122.2      0.9 0.0 0.2       11.1     0    11   !
 sd214    12.5 0.5 112.1      1.0 0.0 0.1       10.3     0    10   !   stripe	

 sd215    12.1 0.3 109.5      0.8 0.0 0.1       10.5     0    10   !




CMG-T 2010                           Unix/Linux Quick Start                         Slide 100
Simple Disks

 •  Utilization shows capacity usage
   Measured using iostat %b

 •  Response time is svc_t
   svc_t increases due to waiting in the queues caused by bursty
     loads

 •  Service time per I/O is Util/IOPS
   Calculate as(%b/100)/(rps+wps)
   Decreases due to optimization of queued requests as load
    increases




CMG-T 2010                  Unix/Linux Quick Start             Slide 101
Single Disk Parameters

 •  e.g. Seagate 18GB ST318203FC
   –  Obtain from www.seagate.com
   –  RPM = 10000 = 6.0ms = 166/s
   –  Avg read seek = 5.2ms
   –  Avg write seek = 6.0ms
   –  Avg transfer rate = 24.5 MB/s
   –  Random IOPS
      •  Approx 166/s for small requests
      •  Approx 24.5/size for large requests




CMG-T 2010                     Unix/Linux Quick Start   Slide 102
Mirrored Disks

 •  All writes go to both disks
 •  Read policy alternatives
   –  All reads from one side
   –  Alternate from side to side
   –  Split by block number to reduce seek
   –  Read both and use first to respond

 •  Simple Capacity Assumption
   –  Assume duplicated interconnects
   –  Same capacity as unmirrored




CMG-T 2010                   Unix/Linux Quick Start   Slide 103
Concatenated and Fat
   Stripe Disks
 •  Request size less than interlace
 •  Requests go to one disk
 •  Single threaded requests
   –  Same capacity as single disk

 •  Multithreaded requests
   –  Same service time as one disk
   –  Throughput of N disks if more than N threads are evenly
      distributed




CMG-T 2010                   Unix/Linux Quick Start             Slide 104
Striped Disks

 •  Request size more than interlace
 •  Requests split over N disks
   –  Single and multithreaded requests
   –  N = request size / interlace
   –  Throughput of N disks

 •  Service Time Reduction
   –  Reduced size of request reduces service time for large transfers
   –  Need to wait for all disks to complete - slowest dominates




CMG-T 2010                   Unix/Linux Quick Start               Slide 105
RAID5 for Small
   Requests
 •  Writes must calculate parity                       log	

   –  Read parity and old data blocks
   –  Calculate new parity
   –  Write log and data and parity
   –  Triple service time
   –  One third throughput of one disk

 •  Read performs like stripe
   –  Throughput of N-1, service of one
   –  Degraded mode throughput about one




CMG-T 2010                    Unix/Linux Quick Start            Slide 106
RAID5 for Large
   Requests
 •  Write full stripe and parity                       log	


 •  Capacity similar to stripe
    –  Similar read and write performance
    –  Throughput of N-1 disks
    –  Service time for size reduced by N-1
    –  Less interconnect load than mirror

 •  Degraded Mode
    –  Throughput halved and service similar
    –  Extra CPU used to regenerate data




CMG-T 2010                    Unix/Linux Quick Start            Slide 107
Cached RAID5

 •  Nonvolatile cache
   –  No need for recovery log disk

 •  Fast service time for writes
   –  Interconnect transfer time only

 •  Cache optimizes RAID5
   –  Makes all backend writes full stripe




CMG-T 2010                    Unix/Linux Quick Start   Slide 108
Cached Stripe

 •  Write caching for stripes
   –  Greatly reduced service time
   –  Very worthwhile for small transfers
   –  Large transfers should not be cached
   –  In many cases, 128KB is crossover point from small to large

 •  Optimizations
   –  Rewriting same block cancels in cache
   –  Small sequential writes coalesce




CMG-T 2010                   Unix/Linux Quick Start                 Slide 109
Capacity Model Measurements

   •  Derived from iostat outputs!
    extended disk statistics
   disk       r/s   w/s    Kr/s       Kw/s wait actv         svc_t   %w   %b
   sd9       33.1   8.7   271.4       71.3       0.0   2.3    15.8    0   27
   •  Utilization U = %b / 100 = 0.27
   •  Throughput X = r/s + w/s = 41.8
   •  Size K = Kr/s + Kw/s / X = 8.2K
   •  Concurrency N = actv = 2.3
   •  Service time S = U / X = 6.5ms
   •  Response time R = svc_t = 15.8ms




CMG-T 2010                    Unix/Linux Quick Start                       Slide 110
Cache Throughput

 •  Hard to model clustering and write cancellation
    improvements
 •  Make pessimistic assumption that throughput is unchanged
 •  Primary benefit of cache is fast response time
 •  Writes can flood cache and saturate back-end disks
   –  Service times suddenly go from 3ms to 300ms
   –  Very hard to figure out when this will happen
   –  Paranoia is a good policy….




CMG-T 2010                  Unix/Linux Quick Start       Slide 111
Concluding Summary
      Walk out of here with the most useful content fresh in your mind!




CMG-T 2010                    Unix/Linux Quick Start                      Slide 112
Quick Tips #1 - Disk

 •  The system will usually have a disk bottleneck
 •  Track how busy is the busiest disk of all
 •  Look for unbalanced, busy or slow disks with iostat
 •  Options: timestamp, look for busy controllers, ignore idle disks:
 % iostat -xnzCM -T d 30
 Tue Jan 21 09:19:21 2003                 extended device statistics
    r/s    w/s   Mr/s    Mw/s wait actv wsvc_t asvc_t %w %b device
  141.0    8.6    0.6     0.0 0.0 1.5      0.0   10.0   0 25 c0
    3.3    0.0    0.0     0.0 0.0 0.0      0.0    6.5   0   2 c0t0d0
  137.7    8.6    0.6     0.0 0.0 1.5      0.0   10.1   0 74 c0t1d0

 Watch out for sd_max_throttle limiting throughput when set too low
 Watch out for RAID cache being flooded on writes, causes sudden very large
  increase in write service time




CMG-T 2010                        Unix/Linux Quick Start                Slide 113
Quick Tips #2 - Network

 •  If you ever see a slow machine that also appears to be idle, you should
    suspect a network lookup problem. i.e. the system is waiting for some
    other system to respond.
 •  Poor Network Filesystem response times may be hard to see
    –  Use iostat -xn 30 on a Solaris client
    –  wsvc_t is the time spent in the client waiting to send a request
    –  asvc_t is the time spent in the server responding
    –  %b will show 100% whenever any requests are being processed, it does NOT
       mean that the network server is maxed out, as an NFS server is a complex
       system that can serve many requests at once.

 •  Name server delays are also hard to detect
    –  Overloaded LDAP or NIS servers can cause problems
    –  DNS configuration errors or server problems often cause 30s delays as the
       request times out




CMG-T 2010                           Unix/Linux Quick Start                    Slide 114
Quick Tips #3 - Memory

 •  Avoid the common vmstat misconceptions
    –  The first line is average since boot, so ignore it
 •  Linux, Other Unix and earlier Solaris Releases
    –  Ignore “free” memory
    –  Use high page scanner “sr” activity as your RAM shortage indicator
 •  Solaris 8 and Later Releases
    –  Use “free” memory to see how much is left for code to use
    –  Use non-zero page scanner “sr” activity as your RAM shortage indicator
 •  Don’t panic when you see page-ins and page-outs in vmstat
 •  Normal filesystem activity uses paging
 solaris9% vmstat 30
 kthr      memory            page            disk                faults     cpu
 r b w   swap free re    mf pi po fr de sr f0 s0 s1 s6      in      sy  cs us sy id
 0 0 0 2367832 91768 3   31 2 1 1 0 0 0 0 0 0              511     404 350 0 0 99
 0 0 0 2332728 75704 3   29 0 0 0 0 0 0 0 0 0              508     537 410 0 0 99




CMG-T 2010                              Unix/Linux Quick Start                        Slide 115
Quick Tips #4 - CPU

 •  Look for a long run queue (vmstat procs r) - and add CPUs
    –  To speedup with a zero run queue you need faster CPUs, not more of them
 •  Check for CPU system time dominating user time
    –  Most systems should have lots more Usr than Sys, as they are running
       application code
    –  But... dedicated NFS servers should be 100% Sys
    –  And... dedicated web servers have high Sys as well
    –  So... assume that lots of network service drives Sys time
 •  Watch out for processes that hog the CPU
    –  Big problem on user desktop systems - look for looping web browsers
    –  Web search engines may get queries that loop
    –  Use resource management or limit cputime (ulimit -t) in startup scripts to
       terminate web queries




CMG-T 2010                          Unix/Linux Quick Start                          Slide 116
Quick Tips #5 - I/O Wait

 •  Look for processes blocked waiting for disk I/O (vmstat procs b)
    –  This is what causes CPU time to be counted as wait not idle
    –  Nothing else ever causes CPU wait time!
 •  CPU wait time is a subset of idle time, consumes no resources
    –  CPU wait time is a misconceived statistic, and its fallacy is amplified on
       multi-threaded systems
    –  CPU wait time is no longer calculated in Solaris 10; reports as zero
    –  Bottom line - don’t worry about CPU wait time, it’s a broken metric
 •  Look at individual process wait time using microstates
    –  prstat -m or SE toolkit process monitoring
 •  Look at I/O wait time using iostat asvc_t




CMG-T 2010                        Unix/Linux Quick Start                    Slide 117
Quick Tips #6 - iostat

 •  For Solaris remember “expenses” iostat -xPncez 30
   –  Add -M for Megabytes, and -T d for timestamped logging
   –  Use 30 second interval to avoid spikes in load.
   –  Watch asvc_t which is the response time for Solaris
 •  Look for regular disks over 5% busy that have response
    times of more than 10ms as a problem.
 •  If you have cached hardware RAID, look for response times
    of more than 5ms as a problem.
 •  Ignore large response times on idle disks that have
    filesystems - its not a problem and the cause is the fsflush
    process



CMG-T 2010                  Unix/Linux Quick Start             Slide 118
Recipe to fix a slow system

   •  Essential Background Information
      –    What is the business function of the system?
      –    Who and where are the users?
      –    Who says there is a problem, and what is slow?
      –    What changed recently and what is on the way?
   •  What is the system configuration?
      –  CPU/RAM/Disk/Net/OS/Patches, what application software is in use?
   •  What are the busy processes on the system doing?
      –  use top, prstat, pea.se or /usr/ucb/ps uax | head
   •  Report CPU and disk utilization levels, iostat -xPncezM -T d 30
      –  What is making the disks busy?
   •  What is the network name service configuration?
      –  How much network activity is there? Use netstat -i 30 or nx.se 30
   •  Is there enough memory?
      –  Check free memory and the scan rate with vmstat 30


CMG-T 2010                         Unix/Linux Quick Start                    Slide 119
Further Reading - Books
 General Solaris/Unix/Linux Performance Tuning
    –  System Performance Tuning (2nd Edition) by Gian-Paolo D. Musumeci and Mike
       Loukides; O'Reilly & Associates
 Solaris Performance Tuning Books
    –  Solaris Performance and Tools, Richard McDougall, Jim Mauro, Brendan Gregg; Prentice
       Hall
    –  Configuring and Tuning Databases on the Solaris Platform, Allan Packer; Prentice Hall
    –  Sun Performance and Tuning, by Adrian Cockcroft and Rich Pettit; Prentice Hall
 Sun BluePrints™
    –  Capacity Planning for Internet Services, Adrian Cockcroft and Bill Walker; Prentice Hall
    –  Resource Management, Richard McDougall, Adrian Cockcroft et al. Prentice Hall
 Linux
    –  Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D.
       Sherer
    –  Google has a Linux specific search mode http://www.google.com/linux




CMG-T 2010                              Unix/Linux Quick Start                             Slide 120
Questions?
                  (The End)




CMG-T 2010     Unix/Linux Quick Start   Slide 121
Backing Material
             “Test Your Intuition”




CMG-T 2010          Unix/Linux Quick Start   Slide 122
Pop Quiz #1

 •  SITUATION: A system runs at 100% CPU usage for 1 hour
    each day completing a single compute-bound task. The
    SLA requires the task to complete in 4 hours.
 •  Q1: How much “headroom” does this system have?
 •  Q2: How can this task's resource footprint be managed to
    never exceed 80% CPU usage?




CMG-T 2010                Unix/Linux Quick Start          Slide 123
Pop Quiz #1: Answers

 •  SITUATION: A system runs at 100% CPU usage for 1 hour each
    day completing a single compute-bound task. The SLA requires
    the task to complete in 4 hours.
 •  Q1: How much “headroom” does this system have?
 •  A1: 300% (in workload terms) or 75% (in percent-of-system terms) -
    it can do 4x the work it now does and remain within the SLA.
 •  Q2: How can this task's resource footprint be managed to never
    exceed 80% CPU usage?
 •  A2a: Huh? Why would anyone want to do that?
 •  A2b: Resource management.



CMG-T 2010                   Unix/Linux Quick Start              Slide 124
Pop Quiz #2

 •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75%
    CPU busy, with a workload that includes four compute-
    bound threads plus some OLTP. The new target system is
    a 4-way 2000-BogoMIPs system.
 •  Q1: What is the new system's projected CPU utilization?
 •  Q2: How can this system's workload be managed to never
    exceed 75% CPU utilization?




CMG-T 2010                Unix/Linux Quick Start          Slide 125
Pop Quiz #2: Answers

 •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75% CPU
    busy, with a workload that includes four compute-bound threads
    plus some OLTP. The new target system is a 4-way 2000-
    BogoMIPs system.
 •  Q1: What is the new system's projected CPU utilization?
 •  A1: 100%. Each of the four compute-bound threads will keep one
    CPU 100% busy.
 •  Q2: How can this system's workload be managed to never exceed
    75% CPU utilization?
 •  A2a: Huh? Why would anyone want to do that?
 •  A2b: Resource management.


CMG-T 2010                   Unix/Linux Quick Start              Slide 126
Pop Quiz #3

 •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75%
    CPU busy, with a workload that includes four compute-
    bound threads plus some OLTP. The new target system is
    a 4-way 2000-BogoMIPs system. (Same as last quiz, OK?)
 •  Q1: How will the compute-bound thread's performance be
    impacted by the upgrade? (Just roughly speaking – no
    need for precision here!)




CMG-T 2010               Unix/Linux Quick Start         Slide 127
Pop Quiz #3: Answers

 •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75% CPU
    busy, with a workload that includes four compute-bound threads
    plus some OLTP. The new target system is a 4-way 2000-
    BogoMIPs system. (Same as last quiz, OK?)
 •  Q1: How will the compute-bound thread's performance be impacted
    by the upgrade? (Just roughly speaking – no need for precision
    here!)
 •  A1: It should run almost 4x faster. Each new CPU is 4x faster than
    the old ones. (2000/4)/(1000/8) = 4. The OLTP will use some of the
    CPU cycles, but its service demand pales next to the compute jobs.




CMG-T 2010                   Unix/Linux Quick Start              Slide 128
Pop Quiz #4




     • ESSAY QUESTION: “At what point do
       these principles become difficult?”




CMG-T 2010           Unix/Linux Quick Start   Slide 129

Contenu connexe

Similaire à Sol linux cmg-t_1_1.pptx

Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014Connor McDonald
 
SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013Mike Brannon
 
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and BeyondA Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and BeyondSurekha Parekh
 
Oracle ADF Architecture TV - Development - Performance & Tuning
Oracle ADF Architecture TV - Development - Performance & TuningOracle ADF Architecture TV - Development - Performance & Tuning
Oracle ADF Architecture TV - Development - Performance & TuningChris Muir
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6Sravanthi N
 
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...Surekha Parekh
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingRoshan Karunarathna
 
A Time Traveller's Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller's Guide to DB2: Technology Themes for 2014 and BeyondA Time Traveller's Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller's Guide to DB2: Technology Themes for 2014 and BeyondLaura Hood
 
Micro Front-End & Microservices - Plansoft
Micro Front-End & Microservices - PlansoftMicro Front-End & Microservices - Plansoft
Micro Front-End & Microservices - PlansoftMiki Lombardi
 
Open Programmable Architecture for Java-enabled Network Devices
Open Programmable Architecture for Java-enabled Network DevicesOpen Programmable Architecture for Java-enabled Network Devices
Open Programmable Architecture for Java-enabled Network DevicesTal Lavian Ph.D.
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
GE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe ConversionGE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe Conversionguatham
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionShobha Kumar
 

Similaire à Sol linux cmg-t_1_1.pptx (20)

Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014
 
SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013
 
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and BeyondA Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller’s Guide to DB2: Technology Themes for 2014 and Beyond
 
Oracle ADF Architecture TV - Development - Performance & Tuning
Oracle ADF Architecture TV - Development - Performance & TuningOracle ADF Architecture TV - Development - Performance & Tuning
Oracle ADF Architecture TV - Development - Performance & Tuning
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Distributed Systems in Data Engineering
Distributed Systems in Data EngineeringDistributed Systems in Data Engineering
Distributed Systems in Data Engineering
 
TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6
 
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
 
A Time Traveller's Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller's Guide to DB2: Technology Themes for 2014 and BeyondA Time Traveller's Guide to DB2: Technology Themes for 2014 and Beyond
A Time Traveller's Guide to DB2: Technology Themes for 2014 and Beyond
 
Micro Front-End & Microservices - Plansoft
Micro Front-End & Microservices - PlansoftMicro Front-End & Microservices - Plansoft
Micro Front-End & Microservices - Plansoft
 
Edge computing system for large scale distributed sensing systems
Edge computing system for large scale distributed sensing systemsEdge computing system for large scale distributed sensing systems
Edge computing system for large scale distributed sensing systems
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
Open Programmable Architecture for Java-enabled Network Devices
Open Programmable Architecture for Java-enabled Network DevicesOpen Programmable Architecture for Java-enabled Network Devices
Open Programmable Architecture for Java-enabled Network Devices
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
GE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe ConversionGE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe Conversion
 
Clean sw 3_architecture
Clean sw 3_architectureClean sw 3_architecture
Clean sw 3_architecture
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
 

Dernier

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 

Dernier (20)

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 

Sol linux cmg-t_1_1.pptx

  • 1. CMG-T 2010: Unix/Linux Quick-Start Bob Sneed, EPiC Performance Associates for CMG 2010, December 9, 2010 V1.1 - December 10, 2010 CMG-T 2010 1
  • 2. Preliminaries •  Disclaimers •  About the authors –  Adrian Cockcroft –  Bob Sneed •  An overview of the three sessions •  NOTE: These colored-background slides are intended to ease navigation within this slide deck CMG-T 2010 Unix/Linux Quick Start Slide 2
  • 3. Disclaimers Opinions and views expressed herein are those of the authors. (The factual bits are mostly Adrian's.) Bob is not a doctor - and does not even play one on TV. Oops! Bob is not with Intellimagic; it’s an error in the program. There is no warranty, expressed or implied, in the quality of the information herein, or its fitness for any given purpose. If you goof up applying this stuff and have a bad outcome or destroy a bunch of data – it's entirely your own fault. These CMG-T materials do not refer to and are not endorsed by the authors’ current or past employers. They are based on the authors’ career experiences. Bob can’t speak to all of Adrian's slides quite as well as he does. Batteries not included. Your mileage may vary (YMMV). CMG-T 2010 Unix/Linux Quick Start Slide 3
  • 4. This CMG-T content is primarily based on Adrian Cockcroft's 2009 materials … CMG-T 2010 Unix/Linux Quick Start Slide 4
  • 5. Adrian Cockcroft •  Where's Adrian? –  Netflix 2007-present; Director of Engineering, Cloud Architectures –  CMG 2007 Michelson Award Winner for lifetime contribution to computer measurement; http://perfcap.blogspot.com/2007/12/cmg07-and-a- michelson-award.html –  eBay Research Labs 2004-2007; Distinguished Engineer –  Sun Microsystems 1988-2004; Distinguished Engineer •  Adrian’s recent conference interests –  QCon and Velocity •  Adrian’s books –  “Sun Performance and Tuning”, Prentice Hall, 1994, 1998 (2nd Ed) –  “Resource Management”, Prentice Hall, 2000 –  “Capacity Planning for Internet Services”, Prentice Hall, 2001 •  Adrian’s online presence –  Slides: http://www.slideshare.net/adrianco –  Blog: http://perfcap.blogspot.com/ CMG-T 2010 Unix/Linux Quick Start Slide 5
  • 6. Bob Sneed •  About Bob –  EPiC Performance Associates, 2009-present; Owner & Principal, Independent Consultant –  Sun Microsystems,1997-2009; Sr. Staff Engineer, variously worked in Sun Competency Centers, Performance and Availability Engineering Group (PAE), and the Systems Quality Office (SQO) •  Papers and presentations by Bob … –  “Sun/Oracle Best Practices”, Sun Blueprint, January 2001 –  “Oracle I/O; Supply and Demand”, Sun User's Performance Group (SUPerG), 2001 –  “Performance Forensics”, Sun Blueprint, December 2003 (previously a CMG 2002 paper) –  “I/O Microbenchmarking with Oracle in Mind”, Hotsos Symposium, 2006 –  “Capacity; It's Not All About U!”, Hotsos Symposium, 2008“Best Practices for Optimal Configuration of Oracle Databases on Sun Hardware”, (with co-author; Allan Packer), Oracle Open World, 2009 –  “CPU QoS”, Southern Area CMG, 2010 & Hotsos Symposium 2010 –  Coming March 2011; “Brute-Force Parallelism” CMG-T 2010 Unix/Linux Quick Start Slide 6
  • 7. Overview •  Session 1 is an introduction to (or review of) key principles and terminology for performance and capacity planning. This discussion will highlight many common high-level strategic errors that can lead to under- or over-provisioning or disappointing results despite adequate provisioning. •  Session 2 will survey the data sources and tools available for traditional resource-oriented analysis (network, CPU, memory, storage), plus the tools available for understanding kernel, hardware, and per-thread program performance. Pointers will be given to some major commercial and open-source tools for performance and capacity management. Some emphasis will be placed on Solaris features including microstate accounting and DTrace. •  Session 3 will conclude the tool survey begun in Session 2, then survey several common errors, traps, and pitfalls. This session ends with some broad guidelines on how to manage performance and capacity in Unix and Linux environments. CMG-T 2010 Unix/Linux Quick Start Slide 7
  • 8. Session 1 of 3 An introduction to (or review of) key principles and terminology for performance and capacity planning. This discussion will highlight many common high-level strategic errors that can lead to under- or over- provisioning or disappointing results despite adequate provisioning CMG-T 2010 Unix/Linux Quick Start Slide 8
  • 9. Definitions CMG-T 2010 Unix/Linux Quick Start Slide 9
  • 10. Capacity Planning Definitions •  Capacity –  Resource utilization and headroom •  Planning –  Predicting future needs by analyzing historical data and modelling future scenarios •  Performance Monitoring –  Collecting and reporting on performance data •  Unix/Linux (apologies to users of OSX, HP-UX, AIX etc.) –  Emphasis on Solaris and Linux –  Much of the discussion is independent of the OS CMG-T 2010 Unix/Linux Quick Start Slide 10
  • 11. Measurement Terms and Definitions •  Bandwidth - gross work per unit time [unattainable] •  Throughput - net work per unit time •  Peak throughput - at maximum acceptable response time •  Utilization - busy time relative to elapsed time [can be misleading] •  Queue length - number of requests waiting •  Service time - time to process a unit of work after waiting •  Response time - time to complete a unit of work including waiting •  Key Performance Indicator (KPI) – a measurement you have decided to watch because it has some business value •  Laws – constraints which physics and math place on reality CMG-T 2010 Unix/Linux Quick Start Slide 11
  • 12. Service Level Agreements (SLA) •  Behavioral goals for the system in terms of KPIs –  NOTE: With or without contracted SLAs, the industry's universal KPI tends to be “call center traffic” or “rate of complaints” •  Response time target –  Rule of thumb: Estimate 95th percentile response time as three times mean response time –  e.g. if SLA says 1 second response, measured average should be less than 333ms •  Utilization Target (a proxy for Response Time) –  Specified as a minimum and maximum –  Minimum utilization target to keep costs down –  Maximum utilization target for good response times and capacity headroom for future workload fluctuations CMG-T 2010 Unix/Linux Quick Start Slide 12
  • 13. Capacity Planning Requirements •  We care about CPU, Memory, Network and Disk resources, and Application response times •  We need to know how much of each resource we are using now, and will use in the future •  We need to know how much headroom we have to handle higher loads •  We want to understand how headroom varies, and how it relates to application response times and throughput •  The application workload must be characterized so we can understand and manage system behaviours •  We want to be able to find the bottleneck in an under-performing system CMG-T 2010 Unix/Linux Quick Start Slide 13
  • 14. Laws CMG-T 2010 Unix/Linux Quick Start Slide 14
  • 15. Ignorance of The Law is no excuse! •  Physics –  The speed of light –  Moore's Law •  Management –  Murphy's Law –  Parkinson's Law •  Performance and Capacity –  Amdahl's Law –  Little's Law CMG-T 2010 Unix/Linux Quick Start Slide 15
  • 16. The Speed of Light; Deal with it! SPEED LIMIT 186,000 Miles Per Second It’s not just a Good Idea! It’s the Law! CMG-T 2010 Unix/Linux Quick Start Slide 16
  • 17. Physics Laws (Well, sort-of …) •  The speed of light –  Nothing can go faster; 186,000 miles/second –  “A foot is a nanosecond” –  A thousand miles is (1000/186000=) 5.4 ms, plus routing and handling delays, in the case of digital traffic •  Moore's Law –  From Wikipedia: “Moore's law describes a long-term trend in the history of computing hardware. The number of transistors that can be placed inexpensively on an integrated circuit has doubled approximately every two years.[1] The trend has continued for more than half a century and is not expected to stop until 2015 or later.[2]” –  NOTE: This speaks only of circuit density, not speed. –  CORRELARY: This ride is coming to an end? CMG-T 2010 Unix/Linux Quick Start Slide 17
  • 18. Management Laws •  Parkinson's Law –  Often stated simply as; “Work expands to fill the time available.” –  Corollary: “Software bloat occurs in direct proportion to gains from Moore's Law.” –  Corollary: “Disk retention strategies vary in direct proportion to increasing storage density.” –  Corollary: “Performance optimization has no limits unless you set them.” (Bob just made that up.) •  Murphy's Law –  Often stated simply as; “Anything that can go wrong will go wrong; and at the most inopportune time.” –  This is what Capacity Planning aims to prevent. CMG-T 2010 Unix/Linux Quick Start Slide 18
  • 19. Performance and Capacity Laws Must-know! •  Little’s Law –  Simple form: X = N/R, where X is throughput, N is degree of concurrency, and R is response time •  Little’s Law is crucial to spotting serialization (N=1) or determining other values of N –  Q1: With 1 KB I/O size and 0.5 ms response time for reads what is the throughput for a single I/O-bound stream? –  A1: N/R = 1 KB / 0.0005 = 2000 KB/sec = (2000/1024) or 1.95 MB/sec –  Q2: What degree of concurrency would be required to attain 20 MB/sec at 0.5 ms response time and 1 KB I/O size? –  A2: N = X*R, so 20*1024*0.0005 = 10.24 (Shucks! We need an integer N, so make that 11.) CMG-T 2010 Unix/Linux Quick Start Slide 19
  • 20. Performance and Capacity Laws Must-know! •  Amdahl’s Law –  From Wikipedia: “The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction [serial portion] of the program.” •  Things that are serialized or slow relative to CPU speed –  I/O; network or storage –  Locking; exclusive access to some data structure –  Dispatching; handling out work –  “Take a number”; sequence generation •  A simple case of Amdahl’s Law with N=1 … –  If a process’ response time is 1 second, including all disk accesses, network delays, and contention – and only 10% of the time is CPU – then an infinitely-fast CPU will only improve the response time to 900 ms. –  Improving a system element only pays in proportion to the ratio of that element in total response time. CMG-T 2010 Unix/Linux Quick Start Slide 20
  • 21. Ignorance of The Law is no excuse! CMG-T 2010 Unix/Linux Quick Start Slide 21
  • 22. Workloads CMG-T 2010 Unix/Linux Quick Start Slide 22
  • 23. “Workload Characterization” An overloaded term! •  Homogeneous workloads –  Generally easy to characterize and model –  Homogeneity is one justification for “tiering” solution architectures •  System response to hosted workload(s) –  Some performance engineers view the system from the inside-out; their metrics focus is on how the hardware, OS, and peripherals are stressed at a low level •  Heterogeneous workloads –  These are the norm on large or complex systems –  See, e.g. Ron Kaminski's various CMG contributions •  Circumstantial workloads –  e.g. failover processing and post-failover operations often escape characterization, leading to in-service surprises CMG-T 2010 Unix/Linux Quick Start Slide 23
  • 24. Workload Characteristics: One by One Constant Workloads •  e.g. Numerical computation, compute intensive batch •  Trivial to model, utilization and duration define the work CMG-T 2010 Unix/Linux Quick Start Slide 24
  • 25. Simple Random Arrivals •  Random arrival of transactions with fixed mean service time –  Little’s Law: QueueLength = Throughput * Response –  Utilization Law: Utilization = Throughput * ServiceTime •  Complex models are often reduced to this model –  By averaging over longer time periods since the formulas only work if you have stable averages –  By wishful thinking (i.e. how to fool yourself) •  e.g. Unix Load Average is actually CPU Queue Length –  Throughput up a little, load average up a lot = slow system –  So load average is a proxy metric for response time –  High load average per CPU implies slow response times CMG-T 2010 Unix/Linux Quick Start Slide 25
  • 26. Mixed random arrivals of transactions with stable mean service times •  Think of the grocery store checkout analogy –  Trolleys full of shopping vs. baskets full of shopping –  Baskets are quick to service, but get stuck behind trolleys –  Relative mixture of transaction types starts to matter •  Many transactional systems handle a mixture –  Databases, web services •  Consider separating fast and slow transactions –  So that we have a “10 items or less” line just for baskets –  Separate pools of servers for different services –  Don’t mix OLTP with DSS queries in databases •  Performance is often thread-limited –  Thread limit and slow transactions constrains maximum throughput –  Throughput = Queue / ResponseTime •  Model using analytical solvers like PDQ CMG-T 2010 Unix/Linux Quick Start Slide 26
  • 27. Load dependent servers – non-stable mean service times •  Mean service time increases at high throughput –  Due to non-scalable algorithms, lock contention, thrashing –  System runs out of memory and starts paging or frequent GC •  Many systems have “tipping points” –  Hysteresis means they don’t come back when load drops –  This is why you have to kill catatonic systems –  Some systems actually degrade gracefully under load •  Model using simulation tools like Hyperformix, Opnet –  Behaviour is non-linear and hard to model –  Practical option is to avoid tipping points –  Best designs shed load to be stable at the limit CMG-T 2010 Unix/Linux Quick Start Slide 27
  • 28. Self-similar / fractal workloads – bursty rather than random •  Self-similar –  Looks “random” at close up, stays “random” as you zoom out –  Work arrives in bursts, transactions aren’t independent –  Bursts cluster together in super-bursts, etc. •  Network packet streams tend to be fractal •  Common in practice, too hard to model –  Probably the most common reason why your model is wrong! CMG-T 2010 Unix/Linux Quick Start Slide 28
  • 29. State Dependent Services •  Personalized services that store user history –  Transactions for new users are quick –  Transactions for users with lots of state/history are slower –  As user base builds state and ages you get into lots of trouble… •  Social Networks, Recommendation Services –  Facebook, Flickr, Netflix, Pandora, Twitter etc. •  “Abandon hope all ye who enter here” –  Not tractable to model, repeatable tests are tricky –  Long fat tail response time distribution and timeouts –  Excessively long service times for some users –  Solutions: careful algorithm design, lots of caching CMG-T 2010 Unix/Linux Quick Start Slide 29
  • 30. Workload Modelling Survivalism •  Simplify the workload algorithms –  move from hard or impossible to simpler models –  use caching and pre-compute to get constant service times •  Stand further away –  averaging is your friend – gets rid of complex fluctuations •  Minimalist Models –  most models are far too complex – the classic beginners error… –  the art of modelling is to only model what really matters •  Don’t model details you don’t use –  model peak hour of the week, not day to day fluctuations –  e.g. “Will the web site survive next Sunday night?” CMG-T 2010 Unix/Linux Quick Start Slide 30
  • 31. Contrarian Perspective on Workloads •  Classical breakdown, e.g. for CPU –  %usr –  %sys •  A more-practical breakdown –  Work; what the system was bought to do – categorized by its importance to the business –  Overhead; memory management, I/O and network stacks, lock management, context switches, migrations –  Opportunistic usage; over-achievers (including screen-savers, low- priority workload elements, and high-priority workload elements) –  Waste; bugs, inefficient code, absence of Best Practices •  Stovepiped organizations that feed only classical data to the Capacity Planners will predictably over-provision –  Gains attainable from improved efficiency, performance discipline, and teamwork represent “latent capacity”; it’s often huge CMG-T 2010 Unix/Linux Quick Start Slide 31
  • 32. Strategies CMG-T 2010 Unix/Linux Quick Start Slide 32
  • 33. Performance and Capacity Strategies 1. Empirical Methods (Great! Only expensive if you do it! Benchmarks, stress testing, test-to-scale, test-to-fail – with known Best Practices & basic performance analysis and tuning 2. Modeling (Highly recommended, moderate cost) Using tools such as TeamQuest Model (TQM), BMC Perform/Predict, Hy- Performix, Gunther's PDQ or other application of proper science and math 3. Expert Opinions (Recommended minimum, cheapest Listening to the right experts for Best Practices, analysis and tuning methods, and sizing. The hazard with opinions is that there are so many of them! 4. Guesswork (The Norm) Straight-line extrapolations, naïve use of reference benchmarks, massive over- provisioning, misdirected or uncontrolled testing, blind luck 5. Opportunism (Commonplace) Spend the available budget CMG-T 2010 Unix/Linux Quick Start Slide 33
  • 34. Best Practices •  Absent Best Practices … –  Performance, stability, capacity, or predictability may all suffer –  Capacity planners may end up scaling-up groww inefficiency –  Data from from the system will not directly indicate the deviation –  One is prone to “re-inventing the wheel” –  It’s like having Yellow Fever versus getting the innoculation •  Best Practices are … –  Repeatable, time-proven practices –  Steps to take as a matter of routine –  Possibly highly-localized and application-specific –  Goal-oriented; eg: simplicity, performance, cost •  Best Practices are not … –  A lab result or single practitioner’s experience, extrapolated to the general case –  A guarantee of automatic success in all things CMG-T 2010 Unix/Linux Quick Start Slide 34
  • 35. A Best Practice Example •  A Best Practice for performance, consistency, and scalability of storage for Oracle databases is to use a solution with these characteristics: –  Unbuffered; no stress on OS buffer memory management –  Concurrent; no single-writer lock or similar bottlenecks –  Stable data placement; facilitates high-bandwidth reads •  Many compliant options exist, including –  UFS with direct I/O; QFS with samaio; RAW; Oracle Automated Storage Management (ASM); VxFS with QIO, ODM, or cio+direct •  See also http://blogs.sun.com/bobs/entry/one_i_o_two_i –  I’ve added “stable data placement” to the criteria since that blog entry to reflect experiences seen with WAFL-type filesystems –  There is a vast amount of energy that gets spent annually learning these things the hard way or trying to work-around them CMG-T 2010 Unix/Linux Quick Start Slide 35
  • 36. Session 2 of 3 A survey the data sources and tools available for traditional resource-oriented analysis (network, CPU, memory, storage), plus the tools available for understanding kernel, hardware, and per-thread program performance. Pointers will be given to some major commercial and open-source tools for performance and capacity management. Some emphasis will be placed on Solaris features including microstate accounting and DTrace. CMG-T 2010 Unix/Linux Quick Start Slide 36
  • 37. Metrics CMG-T 2010 Unix/Linux Quick Start Slide 37
  • 38. Measurement Data Interfaces •  Several generic raw access methods –  Read the kernel directly –  Structured system data –  Process data –  Network data –  Accounting data –  Application data •  Command based data interfaces –  Scrape data from vmstat, iostat, netstat, sar, ps –  Higher overhead, lower resolution, missing metrics •  Data available is generally platform- and release-specific •  Extremely valuable, but not discussed here … –  Application-level instrumentation; e.g. Oracle RDBMS –  WAN, SAN, and LAN “sniffers”; sample data “outside the box” CMG-T 2010 Unix/Linux Quick Start Slide 38
  • 39. Reading kernel memory - kvm •  The only way to get data in very old Unix variants •  Use kernel namelist symbol table and open /dev/kmem •  Solaris wraps up interface in kvm library •  Advantages –  Still the only way to get at some kinds of data –  Low overhead, fast bulk data capture •  Disadvantages –  Too much intimate implementation detail exposed –  No locking protection to ensure consistent data –  Highly non-portable, unstable over releases and patches –  Tools break when kernel moves between 32 and 64bit address support CMG-T 2010 Unix/Linux Quick Start Slide 39
  • 40. Structured Kernel Statistics - kstat •  Solaris 2 introduced kstat and extended usage in each release •  Used by Solaris 2 vmstat, iostat, sar, network interface stats, etc. •  Advantages –  The recommended and supported Solaris metric access API –  Does not require setuid root commands to access for reads –  Individual named metrics stable over releases –  Consistent data using locking, but low overhead –  Unchanged when kernel moves to 64bit address support –  Extensible to add metrics without breaking existing code •  Disadvantages –  Somewhat complex hierarchical kstat_chain structure –  State changes (device online/offline) cause kstat_chain rebuild CMG-T 2010 Unix/Linux Quick Start Slide 40
  • 41. Kernel Tracing - TNF, prex, ktrace •  Solaris, Linux, Windows and other Unixes have similar features –  Solaris has TNF probes and prex command to control them –  User level probe library for hires tracepoints allows instrumentation of multithreaded applications –  Kernel level probes allow disk I/O and scheduler tracing •  Advantages –  Low overhead, microsecond resolution –  I/O trace capability is extremely useful •  Disadvantages –  Too much data to process with simple tracing capabilities –  Trace buffer can overflow or cause locking issues CMG-T 2010 Unix/Linux Quick Start Slide 41
  • 42. DTrace – Dynamic Tracing •  One of the most exciting new features in Solaris 10, rave reviews –  Also in Apple's OS X 10.5; man -k dtrace, plus “Instruments” GUI •  Advantages –  No overhead when it is not in use –  Low overhead probes can be put anywhere/everywhere –  Trace data is correlated and filtered at source; get exactly the data you want; very sophisticated data providers included –  Bundled, supported, designed to be safe for production systems –  Stable foundation for many tools; system tools, scripts, and GUIs •  Disadvantages –  Not on Linux yet –  Excessive DTrace probes can cause high systemic overhead (which is only a problem is it occurs by accident); proven scripts avoid this –  Yet another (awk-like) scripting language to learn; pre-packaged scripts can avoid this, also CMG-T 2010 Unix/Linux Quick Start Slide 42
  • 43. DTrace – Dynamic Tracing •  References –  Book: "Solaris Performance and Tools" by Richard McDougall, Jim Mauro, and Brendan Gregg, Prentice-Hall, 2006 –  Guide: “How to Use Oracle® Solaris DTrace from Oracle Solaris and OpenSolaris System” @ http://developers.sun.com/solaris/docs/o-s-dtrace-htg.pdf –  Treasure trove: Brendan Gregg's DTrace Toolkit @ http://www.brendangregg.com/dtrace.html CMG-T 2010 Unix/Linux Quick Start Slide 43
  • 44. Hardware counters •  Most modern CPUs and systems have hardware counters; tools to access these counters are quite varied … –  Solaris cpustat for X86 and UltraSPARC pipeline and cache counters, corestat for multi-core systems; busstat for server backplanes and I/O buses –  Solaris Intel Trace Collector, Vampir for Linux –  AMD EMON; only under license •  Advantages –  See what is really happening; more accurate than kernel stats –  Cache usage useful for tuning code algorithms –  Pipeline usage useful for HPC tuning for megaflops –  Some VA/PA/TLB memory-management details only observable via counters –  Backplane and memory bank usage useful for database servers •  Disadvantages –  Raw data is confusing; requires post-processing scripts –  Privilege needed for access; not sharable to hosted virtual domains –  Lots of propeller-headed architectural background info needed –  Most tools focus on developer code tuning CMG-T 2010 Unix/Linux Quick Start Slide 44
  • 45. Configuration information •  System configuration data comes from too many sources! –  Solaris device tree displayed by prtconf and prtdiag –  Solaris 8 adds dynamic configuration notification device picld –  SCSI device info using iostat -E in Solaris –  Logical volume info from product specific vxprint and metastat –  Hardware RAID info from product specific tools –  Critical storage config info must be accessed over ethernet… –  Linux device tree in /proc is a bit easier to navigate •  It is very hard to combine all this data! •  DMTF CIM objects try to address this, but no-one seems to use them… •  Free tool - Config Engine: http://www.cfengine.org CMG-T 2010 Unix/Linux Quick Start Slide 45
  • 46. Application instrumentation Examples •  Oracle V$ Tables – detailed metrics used by many tools •  Apache logging for web services •  ARM standard instrumentation •  Custom do-it-yourself and log file scraping •  Advantages –  Focussed application specific information –  Business metrics are needed to do real capacity planning •  Disadvantages –  No common access methods –  ARM is a collection interface only, vendor specific tools, data –  Very few applications are instrumented, even fewer have support from performance tools vendors CMG-T 2010 Unix/Linux Quick Start Slide 46
  • 47. Kernel values, tunables and defaults •  There is often far too much emphasis on kernel tweaks –  There really are few “magic bullet” tunables –  It rarely makes a significant difference •  Fix the system configuration or tune the application instead! •  Very few adjustable components –  “No user serviceable parts inside” –  But Unix has so much history people think it is like a 70’s car –  Solaris really is dynamic, adaptive and self-tuning –  Most other “traditional Unix” tunables are just advisory limits –  Tweaks may be workarounds for bugs/problems –  Patch or OS release removes the problem - remove the tweak Solaris Tunable Parameters Reference Manual (if you must…) –  http://docs.sun.com/app/docs/doc/817-0404 CMG-T 2010 Unix/Linux Quick Start Slide 47
  • 48. Process-based data in /proc •  /proc filesystem is a common foundation –  Used by ps, proctool and debuggers, pea.se, proc(1) tools on Solaris –  Solaris and Linux both have /proc/pid/metric hierarchy –  Linux also includes system information in /proc rather than kstat •  Advantages –  The recommended and supported process access API –  Metric data structures reasonably stable over releases –  Consistent data using locking –  Solaris microstate data provides accurate process state timers •  Disadvantages –  High overhead for open/read/close of every process on busy systems –  Linux reports data as formatted ASCII text, Solaris uses binary structures that require tools for formatting CMG-T 2010 Unix/Linux Quick Start Slide 48
  • 49. Network protocol data •  Based on a streams module interface in Solaris •  Solaris 2 ndd interface used to configure protocols and interfaces •  Solaris 2 mib interface used by netstat -s and snmpd to get TCP stats etc. •  Advantages –  Individual named metrics reasonably stable over releases –  Consistent data using locking –  Extensible to add metrics without breaking existing code –  Solaris ndd can retune TCP online without reboot –  System data is often also made available via SNMP protocol •  Disadvantages –  Underlying API is not supported, SNMP access is preferred CMG-T 2010 Unix/Linux Quick Start Slide 49
  • 50. Tracing and profiling •  Tracing Tools –  truss - shows system calls made by a process –  sotruss / apitrace - shows shared library calls –  prex - controls TNF tracing for user and kernel code –  snoop/tcpdump – network traces for analysis with wireshark •  Profiling Tools –  Compiler profile feedback using -xprofile=collect and use –  Sampled profile relink using -p and prof/gprof –  Function call tree profile recompile using -pg and gprof –  Shared library call profiling setenv LD_PROFILE and gprof •  Accurate CPU timing for process using /usr/proc/bin/ptime •  Microstate process information using pea.se and pw.se 10:40:16 name lwmx pid ppid uid usr% sys% wait% chld% size rss pf nis_cachemgr 5 176 1 0 1.40 0.19 0.00 0.00 16320 11584 0.0 jre 1 17255 3184 5743 11.80 0.19 0.00 0.00 178112 110336 0.0 sendmail 1 16751 1 0 1.01 0.43 0.00 0.43 18624 16384 0.0 se.sparc.5.6 1 16741 1186 9506 5.90 0.47 0.00 0.00 16320 14976 0.0 imapd 1 16366 198 5710 6.88 1.09 1.02 0.00 34048 29888 0.1 dtmail 10 16364 9070 5710 0.75 1.12 0.00 0.00 102144 94400 0.0 CMG-T 2010 Unix/Linux Quick Start Slide 50
  • 51. Free Tools (See Separate Slide Deck) http://www.slideshare.net/adrianco CMG-T 2010 Unix/Linux Quick Start Slide 51
  • 52. Headroom? What’s the matter with U? CMG-T 2010 Unix/Linux Quick Start Slide 52
  • 53. What would you say if you were asked: How busy is that system? A: “I have no idea.” A: “42%” A: “Why do you want to know?” A: “I’m sorry, but I’m afraid that you don’t understand your question.” CMG-T 2010 Unix/Linux Quick Start Slide 53
  • 54. Utilization •  It looks simple enough … –  Utilization is the proportion of busy time –  Always defined over a time interval –  Instantaneously, it’s binary; 100% or 0% –  Let’s run with this for a while, then circle back to the issues … OnCPU Scheduling for Each CPU Mean CPU Util OnCPU and usr+sys CPU for Peak Period 0.56 100 90 0 80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 70 60 Microseconds CPU % 50 40 30 Utilization 20 10 0 Time CMG-T 2010 Unix/Linux Quick Start Slide 54
  • 55. Headroom •  Headroom relates to available usable resources –  Total Capacity minus Peak Utilization and Margin –  Applies to CPU, RAM, Net, Disk and OS –  “Usable” is the rub; idle resources may not be actually usable due to software bottlenecks, such as locking or limited concurrency usr+sys CPU for Peak Period 100 Margin 90 80 Headroom 70 60 CPU % 50 40 Utilization 30 20 10 0 Time CMG-T 2010 Unix/Linux Quick Start Slide 55
  • 56. Headroom Estimation •  CPU Capacity –  Relatively easy to figure out for well-behaved, homogeneous, steady-state workloads –  “Over-achievers”, bad tuning, and common Best Practice deviations inflate perception of required capacity •  Network Usage –  Use bytes not packets/s •  Memory Capacity –  Tricky - easier in Solaris 8 •  Disk Capacity –  Can be very complex –  A complex gamut of software layers may reside between the application and its disk CMG-T 2010 Unix/Linux Quick Start Slide 56
  • 57. Response Time •  Response Time = Queue time + Service time •  The Usual Assumptions … –  Steady state averages –  Random arrivals –  Constant service time –  M servers processing the same queue •  Approximations –  Queue length = Throughput * Response Time (Little's Law) –  Utilization = Throughput * Service Time (Utilization law) –  Response Time = Service Time / (1 - UtilizationM) CMG-T 2010 Unix/Linux Quick Start Slide 57
  • 58. Response Time Curves •  The traditional view of Utilization as a proxy for response time •  Systems with many CPUs can run at higher utilization levels, but degrade more rapidly when they run out of capacity •  Headroom margin should be set according to a response time target Response Time Curves R = S / (1 - (U%)m) 10.00 Response Time Increase Factor 9.00 8.00 7.00 One CPU Two CPUs 6.00 Four CPUs 5.00 Eight CPUs Headroom 16 CPUs 4.00 margin 32 CPUs 3.00 64 CPUs 2.00 1.00 0.00 0 10 20 30 40 50 60 70 80 90 100 Total System Utilization % CMG-T 2010 Unix/Linux Quick Start Slide 58
  • 59. So what's the problem with Utilization? •  Unsafe assumptions! –  Modern systems are complex, adaptive, highly non-linear, and are often virtualized or include shared components •  Random arrivals? –  Bursty traffic with long tail arrival rate distribution •  Constant service time? –  Variable clock rate CPUs, inverse load dependent service time –  Complex transactions, request and response dependent •  M servers processing the same queue? –  Virtual servers with varying non-integral concurrency –  Non-identical servers or CPUs, Hyperthreading, Multicore, NUMA •  Measurement Errors? –  Mechanisms with built in bias, e.g. sampling from the scheduler clock –  Platform and release specific systemic changes in accounting of interrupt time CMG-T 2010 Unix/Linux Quick Start Slide 59
  • 60. Variable Clock Rate CPUs •  Laptop and other low power devices do this all the time –  Watch CPU usage of a video application and toggle mains/battery power •  Server CPU Power Optimization - AMD PowerNow!™ –  AMD Opteron server CPU detects overall utilization and reduces clock rate –  Actual speeds vary, but for example could reduce from 2.6GHz to 1.2GHz –  Changes are not understood or reported by operating system metrics –  Speed changes can occur every few milliseconds (thermal shock issues) –  Dual core speed varies per socket, Quad core varies per core –  Quad core can dynamically stop entire cores to save power –  Note: Older and "low power" Opterons in blades use fixed clock rate •  Possible scenario: –  You estimate 20% utilization at 2.6GHz –  You see 45% reported in practice (at 1.2GHz) –  Load doubles, reported utilization drops to 40% (at 2.6GHz) –  Actual mapping of utilization to clock rate is unknown at this point CMG-T 2010 Unix/Linux Quick Start Slide 60
  • 61. Virtual Machine Monitors •  VMware, Xen, IBM LPARs etc. –  Non-integral and non-constant fractions of a machine –  Naive operating systems and applications don't expect this behavior –  However, lots of recent tools development from vendors •  Average CPU count must be reported for each measurement interval •  VMM overhead varies, application scaling characteristics may be affected CMG-T 2010 Unix/Linux Quick Start Slide 61
  • 62. Threaded CPU Pipelines •  CPU microarchitecture optimizations –  Extra register sets working with one execution pipeline –  When the CPU stalls on a memory read, it switches registers/threads –  Operating system sees multiple schedulable entities (CPUs) •  Intel Hyperthreading –  Each CPU core has an extra thread to use spare cycles –  Typical benefit is 20%, so total capacity is 1.2 CPUs –  I.e. Second thread much slower when first thread is busy –  Hyperthreading aware optimizations in recent operating systems •  Sun “CoolThreads” –  "Niagara" SPARC CPU has eight cores, one shared floating point unit –  Each CPU core has four threads, but each core is a very simple design –  Behaves like 32 slow CPUs for integer, snail-like uniprocessor for FP –  Overall throughput is very high, performance per watt is exceptional –  Niagara 2 has dedicated FPU and 8 threads per core (total 64 threads) –  Each generation varies in low-level architectural details CMG-T 2010 Unix/Linux Quick Start Slide 62
  • 63. Measurement Errors •  Mechanisms with built in bias –  e.g. sampling from the scheduler clock underestimates CPU usage –  Solaris 9 and before, Linux, AIX, HP-UX “sampled CPU time” –  Solaris 10 and HP-UX “measured CPU time” far more accurate –  Solaris microstate process accounting always accurate but in Solaris 10 microstates are also used to generate system-wide CPU •  Accounting for interrupt time –  Platform and release specific systemic changes –  Solaris 8 - sampled interrupt time spread over usr/sys/idle –  Solaris 9 - sampled interrupt time accumulated into sys only –  Solaris 10 – measured interrupt time spread over usr/sys/idle –  Solaris 10 Update 1 – measured interrupt time in sys only CMG-T 2010 Unix/Linux Quick Start Slide 63
  • 64. CPU time measurements •  Biased sample CPU measurements –  See 1998 Paper "Unix CPU Time Measurement Errors" –  Microstate measurements are accurate, but are platform and tool specific. Sampled metrics are more inaccurate at low utilization •  CPU time is sampled by the 100Hz clock interrupt –  sampling theory says this is accurate for an unbiased sample –  the sample is very biased, as the clock also schedules the CPU –  daemons that wakeup on the clock timer can hide in the gaps –  problem gets worse as the CPU gets faster •  Increase clock interrupt rate? (Solaris) –  set hires_tick=1 sets rate to 1000Hz, good for realtime wakeups –  harder to hide CPU usage, but slightly higher overhead •  Use measured CPU time at per-process level –  microstate accounting takes timestamp on each state change –  very accurate and also provides extra information –  still doesn’t allow for interrupt overhead –  Prstat -m and the pea.se command uses this accurate measurement CMG-T 2010 Unix/Linux Quick Start Slide 64
  • 65. More CPU Measurement Issues •  Load average differences –  Just includes CPU queue (Solaris) –  Includes CPU and Disk (Linux) – which is a broken metric •  Wait for I/O (%wio) is a misleading statistic altogether –  Metric removed in Solaris 10 – always zero –  Ignore it in all other Unix/Linux releases; add it to %idle –  There is no universal “propensity to compute” for a thread blocked on I/O; who knows how much %cpu it might use when it wakes up? CMG-T 2010 Unix/Linux Quick Start Slide 65
  • 66. How to plot Headroom •  Measure and report absolute CPU power if you can get it … •  Plot shows headroom in blue, margin in red, total power tracking day/night workload variation, plotted as mean + two standard deviations. CMG-T 2010 Unix/Linux Quick Start Slide 66
  • 67. “Cockcroft Headroom Plot” •  Scatter plot of response time (ms) vs. Throughput (KB) from iostat metrics •  Histograms on axes •  Throughput time series plot •  Shows distributions and shape of response time •  Fits throughput weighted inverse gaussian curve •  Coded using "R" statistics package •  Blogged development at http://perfcap.blogspot.com/search?q=chp CMG-T 2010 Unix/Linux Quick Start Slide 67
  • 68. How busy is that system again? •  Check your assumptions … •  Record and plot absolute capacity for each measurement interval •  Plot response time as a function of throughput, not just utilization •  SOA response characteristics are complicated … •  More detailed discussion in CMG06 Paper and blog entries –  “Utilization is Virtually Useless as a Metric” - Adrian Cockcroft - CMG06 http://perfcap.blogspot.com/search?q=utilization http://perfcap.blogspot.com/search?q=chp CMG-T 2010 Unix/Linux Quick Start Slide 68
  • 69. CPU CMG-T 2010 Unix/Linux Quick Start Slide 69
  • 70. CPU Capacity Measurements •  CPU Capacity is defined by CPU type and clock rate, or a benchmark rating like SPECrateInt2000 •  CPU throughput - CPU scheduler transaction rate –  measured as the number of voluntary context switches •  CPU Queue length –  CPU load average gives an approximation via a time decayed average of number of jobs running and ready to run •  CPU response time –  Solaris microstate accounting measures scheduling delay •  CPU utilization –  Defined as busy time divided by elapsed time for each CPU –  Badly distorted and undermined by virtualization…… CMG-T 2010 Unix/Linux Quick Start Slide 70
  • 71. Controlling and CPUs in Solaris •  psrinfo - show CPU status and clock rate •  Corestat - show internal behavior of multi-core CPUs •  psradm - enable/disable CPUs •  pbind - bind a process to a CPU •  psrset - create sets of CPUs to partition a system –  At least one CPU must remain in the default set, to run kernel services like NFS threads –  All CPUs still take interrupts from their assigned sources –  Processes can be bound to sets •  mpstat shows per-CPU counters (per set in Solaris 9) CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 45 1 0 232 0 780 234 106 201 0 950 72 28 0 0 1 29 1 0 243 0 810 243 115 186 0 1045 69 31 0 0 2 27 1 0 235 0 827 243 110 199 0 1000 75 25 0 0 3 26 0 0 217 0 794 227 120 189 0 925 70 30 0 0 4 9 0 0 234 92 403 94 84 1157 0 625 66 34 0 0 CMG-T 2010 Unix/Linux Quick Start Slide 71
  • 72. Monitoring CPU mutex lock statistics •  To fix mutex contention change the application workload or upgrade to a newer OS release •  Locking strategies are too complex to be patched •  lockstat Command –  very powerful and easy to use –  Solaris 8 extends lockstat to include kernel CPU time profiling –  dynamically changes all locks to be instrumented –  displays lots of useful data about which locks are contending # lockstat sleep 5 Adaptive mutex spin: 3318 events Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 601 18% 18% 1.00 1 flock_lock cleanlocks+0x10 302 9% 27% 1.00 7 0xf597aab0 dev_get_dev_info+0x4c 251 8% 35% 1.00 1 0xf597aab0 mod_rele_dev_by_major+0x2c 245 7% 42% 1.00 3 0xf597aab0 cdev_size+0x74 160 5% 47% 1.00 7 0xf5b3c738 ddi_prop_search_common+0x50 CMG-T 2010 Unix/Linux Quick Start Slide 72
  • 73. Network CMG-T 2010 Unix/Linux Quick Start Slide 73
  • 74. Network interface and NFS metrics •  Network interface throughput counters from kstat on Solaris –  rbytes, obytes — read and output byte counts –  multircv, multixmt — multicast byte counts –  brdcstrcv, brdcstxmt — broadcast byte counts –  norcvbuf, noxmtbuf — buffer allocation failure counts •  Linux netstat shows byte throughput (Solaris doesn’t) •  NFS Client Statistics Shown in iostat on Solaris crun% iostat -xnP extended device Statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 crun:vold(pid363) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 servdist:/usr/dist 0.0 0.5 0.0 7.9 0.0 0.0 0.0 20.7 0 1 servhome:/export/home/adrianc 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 servhome:/var/mail 0.0 1.3 0.0 10.4 0.0 0.2 0.0 128.0 0 2 c0t2d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t2d0s2 CMG-T 2010 Unix/Linux Quick Start Slide 74
  • 75. TCP - A Simple Approach •  Capacity and Throughput Metrics to Watch •  Connections –  Current number of established connections –  New outgoing connection rate (active opens) –  Outgoing connection attempt failure rate –  New incoming connection rate (passive opens) –  Incoming connection attempt failure rate (resets) •  Throughput –  Input and output byte rates –  Input and output segment rates –  Output byte retransmit percentage CMG-T 2010 Unix/Linux Quick Start Slide 75
  • 76. Obtaining Measurements •  Get the TCP MIB via SNMP or netstat -s •  Standard TCP metric names: –  tcpCurrEstab: current number of established connections –  tcpActiveOpens: number of outgoing connections since boot –  tcpAttemptFails: number of outgoing failures since boot –  tcpPassiveOpens: number of incoming connections since boot –  tcpOutRsts: number of resets sent to reject connection –  tcpEstabResets: resets sent to terminate established connections –  (tcpOutRsts - tcpEstabResets): incoming connection failures –  tcpOutDataSegs, tcpInDataSegs: data transfer in segments –  tcpRetransSegs: retransmitted segments CMG-T 2010 Unix/Linux Quick Start Slide 76
  • 77. Internet Server Issues •  TCP Connections are expensive –  TCP is optimized for reliable data on long lived connections –  Making a connection uses a lot more CPU than moving data –  Connection setup handshake involves several round trip delays –  Each open connection consumes about 1 KB plus data buffers •  Pending connections cause “listen queue” issues •  Each new connection goes through a “slow start” ramp up •  Other TCP Issues –  TCP windows can limit high latency high speed links –  Lost or delayed data causes time-outs and retransmissions CMG-T 2010 Unix/Linux Quick Start Slide 77
  • 78. TCP Sequence Diagram for HTTP Get CMG-T 2010 Unix/Linux Quick Start Slide 78
  • 79. Stalled HTTP Get and Persistent HTTP CMG-T 2010 Unix/Linux Quick Start Slide 79
  • 80. Memory CMG-T 2010 Unix/Linux Quick Start Slide 80
  • 81. Memory Capacity Measurements •  Physical Memory Capacity Utilization and Limits –  Kernel memory, Shared Memory segment –  Executable code, stack and heap –  File system cache usage, Unused free memory •  Virtual Memory Capacity - Paging/Swap Space –  When there is no more available swap, Unix stops working •  Memory Throughput –  Hardware counter metrics can track CPU to Memory traffic –  Page in and page out rates •  Memory Response Time –  Platform specific hardware memory latency makes a difference, but hard to measure –  Time spent waiting for page-in is part of Solaris microstate accounting CMG-T 2010 Unix/Linux Quick Start Slide 81
  • 82. Page Size Optimization •  Systems may support large pages for reduced overhead –  Solaris support is more dynamic/flexible than Linux at present •  Intimate Shared Memory locks large pages in RAM –  No swap space reservation –  Used for large database server Shared Global Area •  No good metrics to track usage and fragmentation issues •  Solaris ppgsz command can set heap and stack pagesize •  SPARC Architecture –  Base page size is 8KB, Large pages are 4MB •  Intel/AMD x86 Architectures –  Base page size is 4KB, Large pages are 2MB CMG-T 2010 Unix/Linux Quick Start Slide 82
  • 83. Cache principles •  Temporal locality - “close in time” –  If you need something frequently, keep it near you –  If you don’t use it for a while, put it back –  If you change it, save the change by putting it back •  Spacial locality - “close in space - nearby” –  If you go to get one thing, get other stuff that is nearby –  You may save a trip by prefetching things –  You can waste bandwidth if you fetch too much you don’t use •  Caches work well with randomness –  Randomness prevents worst case behaviour –  Deterministic patterns often cause cache busting accesses •  Very careful cache friendly tuning can give great speedups CMG-T 2010 Unix/Linux Quick Start Slide 83
  • 84. The memory go round - Unix/Linux •  Memory usage flows between subsystems Kernel System V Memory Shared Buffers Memory kernel kernel shm_unlink free alloc shmget Head delete exit Free read brk RAM pagein write List mmap reclaim reclaim Process Filesystem Stack and Ta il Cache Heap pageout pageout scanner scanner CMG-T 2010 Unix/Linux Quick Start Slide 84
  • 85. The memory go round - Solaris 8 and Later •  Memory usage flows between subsystems Kernel System V Memory Shared Buffers Memory kernel kernel shm_unlink alloc shmget free Head exit Free RAM List brk read pagein delete write mmap Filesystem reclaim Cache Process Stack and Ta il Heap pageout scanner CMG-T 2010 Unix/Linux Quick Start Slide 85
  • 86. Solaris Swap Space •  Swap is very confusing and badly instrumented! # se swap.se ani_max 54814 ani_resv 19429 ani_free 37981 availrmem 13859 swapfs_minfree 1972 ramres 11887 swap_resv 19429 swap_alloc 16833 swap_avail 47272 swap_free 49868 Misleading data printed by swap -s 134664 K allocated + 20768 K reserved = 155432 K used, 378176 K available Corrected labels: 134664 K allocated + 20768 K unallocated = 155432 K reserved, 378176 K available Mislabelled sar -r 1 freeswap (really swap available) 756352 blocks Useful swap data: Total swap 520 M available 369 M reserved 151 M Total disk 428 M Total RAM 92 M # swap -s total: 134056k bytes allocated + 20800k reserved = 154856k used, 378752k available # sar -r 1 18:40:51 freemem freeswap 18:40:52 4152 756912 CMG-T 2010 Unix/Linux Quick Start Slide 86
  • 87. Session 3 of 3 Finish the tool survey begun in Session 2, then survey several common errors, traps, and pitfalls. This session ends with some broad guidelines on how to manage performance and capacity in Unix and Linux environments. CMG-T 2010 Unix/Linux Quick Start Slide 87
  • 88. Disk CMG-T 2010 Unix/Linux Quick Start Slide 88
  • 89. Disk Capacity Measurements •  Detailed metrics vary by platform •  Easy for the simple disk cases •  Hard for cached RAID subsystems •  Almost Impossible for shared disk subsystems and SANs –  Another system or volume can be sharing a backend spindle, when it gets busy your own volume can saturate, even though you did not change your own workload! CMG-T 2010 Unix/Linux Quick Start Slide 89
  • 90. Storage Utilization •  Storage virtualization broke utilization metrics a long time ago •  Host server measures busy time on a "disk" –  Simple disk, "single server" response time gets high near 100% utilization –  Cached RAID LUN, one I/O stream can report 100% utilization, but full capacity supports many threads of I/O since there are many disks and RAM buffering •  New metric - "Capability Utilization" –  Adjusted to report proportion of actual capacity for current workload mix –  Measured by tools such as Ortera Atlas (http://www.ortera.com) CMG-T 2010 Unix/Linux Quick Start Slide 90
  • 91. Solaris Filesystems •  ufs - standard, reliable, good for lots of small files, –  ufs transaction logging; faster writes and recovery –  ufs direct I/O feature; especially useful with databases –  snapshot features •  tmpfs - fastest if you have enough RAM, volatile •  NFS –  NFS2 - safe and common, 8KB blocks, slow writes –  NFS3 - more readahead and writebehind, faster •  default 32KB block size - fast sequential, may be slow random •  default TCP instead of UDP, more robust over WAN –  NFS4 - adds stateful behavior •  cachefs - good for read-mostly NFS speedup •  Veritas VxFS – 3rd-party; expensive –  Extent-based, with features for database performance and clustering •  QFS –  Extent-based, with features for database performance and clustering –  No more investments being made there •  ZFS – 21st century virtualized storage –  Feature-rich –  Challenging to performance-manage –  Focal point for added development CMG-T 2010 Unix/Linux Quick Start Slide 91
  • 92. Solaris 10 ZFS: What it doesn't have ... •  Nice features –  No extra cost - its bundled in a free OS –  No volume manager - its built-in –  No space management - file systems use a common pool –  No long wait for newfs to finish - create a 3TB file system in a second –  No fsck - its transactional commit means its consistent on disk –  No slow writes - disk write caches are enabled and flushed reliably –  No random or small writes - all writes are large batched sequential –  No rsync - snapshots can be differenced and replicated remotely –  No silent data corruption - all data is checksummed as it is read –  No bad archives - all the data in the file system is scrubbed regularly –  No penalty for software RAID - RAID-Z has a clever optimization (though it has limited practical applications) –  No downtime - mirroring, RAID-Z and hot spares –  No immediate maintenance - double parity disks if you need them •  Wish-list –  No way to know how much performance headroom you have! –  No clustering support CMG-T 2010 Unix/Linux Quick Start Slide 92
  • 93. Solaris ZFS: Fundamental Tradeoffs •  All physical I/O ultimately occurs in ZFS <recordsize> quanta, which defaults to 128 KB (but can be set per-pool) –  Random reads with poor locality may suffer when logical I/O size is small relative to the physical I/O size –  Sequential reads can suffer when physical I/O size is small relative to logical I/O size, though ZFS prefetching often effectively offsets this •  All writes are Copy-On-Write (COW) to fresh space in the pool –  Randomly-updated data tends to become physically fragmented –  Sequential read performance can vary significantly with the degree of physical fragmentation –  Backend prefetch algorithms are generally thwarted by fragmentation •  All I/O is buffered and checksummed –  This increases CPU demand relative to raw or direct I/O options –  Memory management may become seriously complicated by allocations for the Adaptive Replacement Cache (ARC) –  ZFS record checksums may be redundant with application data protections (e.g. Oracle checksums) CMG-T 2010 Unix/Linux Quick Start Slide 93
  • 94. Solaris ZFS: Storage Appliances •  Sun S7000 Storage –  Inexpensive storage solutions based on commodity components –  Spawned from Sun’s “Fishworks” team –  Downloadable simulator allows easy experimentation –  See http://www.oracle.com/us/products/servers-storage/storage/unified-storage/index.html •  Performance Analysis & Management Features –  Storage Analytics; powerful GUI-based monitoring and analysis. For a most unusual unusual peek at the Storage Analytics tool, see the video @ http://blogs.sun.com/brendan/entry/unusual_disk_latency –  Configuration option for latency-sensitivity (use SSD-based ZILs) or throughput (skip the SSD-based ZILs) CMG-T 2010 Unix/Linux Quick Start Slide 94
  • 95. Solaris ZFS: Resources •  Review these … –  “ZFS Evil Tuning Guide” http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide –  “Configuring Oracle® Solaris ZFS for an Oracle Database” http://developers.sun.com/solaris/docs/wp-oraclezfsconfig-0510_ds_ac2.pdf –  arcstat.pl Perl script http://www.solarisinternals.com/wiki/index.php/Arcstat •  … but don’t forget that ZFS has several fundamental tradeoffs that can be perplexing for performance and capacity management CMG-T 2010 Unix/Linux Quick Start Slide 95
  • 96. Linux Filesystems •  There are a large number of options! –  http://en.wikipedia.org/wiki/Comparison_of_file_systems •  EXT3 –  Common default for many Linux distributions –  Efficient for CPU and space, small block size –  relatively simple for reliability and recovery –  Journalling support options can improve performance –  EXT4 came out of development at the end of 2008 •  XFS –  Based on Silicon Graphics XFS, mature and reliable –  Better for large files and streaming throughput –  High Performance Computing heritage CMG-T 2010 Unix/Linux Quick Start Slide 96
  • 97. Disk Configurations •  Sequential access is ~10 times faster than random –  Sequential rates are now about 50-100 MB/s per disk –  Random rates are 166 operations/sec, (250/sec at 15000rpm) –  The size of each random read should be as big as possible •  Reads should be cached in main memory –  “The only good fast read is the one you didn’t have to do” –  Database shared memory or filesystem cache is microseconds –  Disk subsystem cache is milliseconds, plus extra CPU load –  Underlying disk is ~6ms, as its unlikely that data is in cache •  Writes should be cached in nonvolatile storage –  Allows write cancellation and coalescing optimizations –  NVRAM inside the system - Direct access to Flash storage –  Solid State Disks based on Flash are the "Next Big Thing" CMG-T 2010 Unix/Linux Quick Start Slide 97
  • 98. Disk Throughput 14000 12000 10000 8000 disk_wK/s disk_rK/s 6000 4000 2000 0 3.47222E-05 0.177118056 0.350740741 0.527824074 0.704872685 0.881944444 CMG-T 2010 Unix/Linux Quick Start Slide 98
  • 99. Max and Avg Disk Utilization (Same data) 100 90 80 70 60 disk_max% 50 disk_avg% 40 30 20 10 0 3.47222E-05 0.177118056 0.350740741 0.527824074 0.704872685 0.881944444 CMG-T 2010 Unix/Linux Quick Start Slide 99
  • 100. Data from iostat •  What can we see here? extended disk statistics ! disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b ! sd7 root ufs sd7 0.1 1.7 0.1 13.3 0.0 0.2 109.8 0 1 ! sd15 534.2 17.5 1320.4 35.0 0.0 0.3 0.6 0 26 ! sd45 291.9 23.0 603.2 49.8 0.0 0.2 0.6 0 15 ! solid state disks sd60 3.1 0.0 25.3 0.0 0.0 0.0 7.8 0 2 ! sd61 3.3 0.0 26.4 0.0 0.0 0.0 7.6 0 2 ! sd62 3.2 0.0 26.1 0.0 0.0 0.0 8.1 0 3 ! sd63 3.8 0.0 30.1 0.0 0.0 0.0 7.2 0 3 ! stripe 8K RR sd64 3.6 0.0 28.8 0.0 0.0 0.0 7.4 0 3 ! sd65 3.8 0.0 31.2 0.0 0.0 0.0 7.3 0 3 ! sd67 9.7 1.5 77.8 4.3 0.0 0.1 9.0 0 8 ! sd68 sd69 10.7 1.4 10.0 1.5 85.3 79.9 4.2 0.0 0.1 4.2 0.0 0.1 9.0 9.0 0 0 10 9 ! ! stripe sd70 10.4 1.0 83.1 3.2 0.0 0.1 9.1 0 9 ! sd71 9.9 1.4 78.8 4.6 0.0 0.1 8.7 0 9 ! sd72 10.0 1.1 79.9 3.7 0.0 0.1 8.5 0 8 ! sd75 0.0 27.6 0.0 297.3 0.0 0.0 1.1 0 2 ! cached write sd210 12.1 0.3 108.9 0.6 0.0 0.1 9.8 0 10 ! log sd211 12.9 0.4 114.8 0.7 0.0 0.1 10.6 0 11 ! sd212 12.0 0.6 107.1 1.3 0.0 0.1 11.1 0 10 ! sd213 13.8 0.3 122.2 0.9 0.0 0.2 11.1 0 11 ! sd214 12.5 0.5 112.1 1.0 0.0 0.1 10.3 0 10 ! stripe sd215 12.1 0.3 109.5 0.8 0.0 0.1 10.5 0 10 ! CMG-T 2010 Unix/Linux Quick Start Slide 100
  • 101. Simple Disks •  Utilization shows capacity usage Measured using iostat %b •  Response time is svc_t svc_t increases due to waiting in the queues caused by bursty loads •  Service time per I/O is Util/IOPS Calculate as(%b/100)/(rps+wps) Decreases due to optimization of queued requests as load increases CMG-T 2010 Unix/Linux Quick Start Slide 101
  • 102. Single Disk Parameters •  e.g. Seagate 18GB ST318203FC –  Obtain from www.seagate.com –  RPM = 10000 = 6.0ms = 166/s –  Avg read seek = 5.2ms –  Avg write seek = 6.0ms –  Avg transfer rate = 24.5 MB/s –  Random IOPS •  Approx 166/s for small requests •  Approx 24.5/size for large requests CMG-T 2010 Unix/Linux Quick Start Slide 102
  • 103. Mirrored Disks •  All writes go to both disks •  Read policy alternatives –  All reads from one side –  Alternate from side to side –  Split by block number to reduce seek –  Read both and use first to respond •  Simple Capacity Assumption –  Assume duplicated interconnects –  Same capacity as unmirrored CMG-T 2010 Unix/Linux Quick Start Slide 103
  • 104. Concatenated and Fat Stripe Disks •  Request size less than interlace •  Requests go to one disk •  Single threaded requests –  Same capacity as single disk •  Multithreaded requests –  Same service time as one disk –  Throughput of N disks if more than N threads are evenly distributed CMG-T 2010 Unix/Linux Quick Start Slide 104
  • 105. Striped Disks •  Request size more than interlace •  Requests split over N disks –  Single and multithreaded requests –  N = request size / interlace –  Throughput of N disks •  Service Time Reduction –  Reduced size of request reduces service time for large transfers –  Need to wait for all disks to complete - slowest dominates CMG-T 2010 Unix/Linux Quick Start Slide 105
  • 106. RAID5 for Small Requests •  Writes must calculate parity log –  Read parity and old data blocks –  Calculate new parity –  Write log and data and parity –  Triple service time –  One third throughput of one disk •  Read performs like stripe –  Throughput of N-1, service of one –  Degraded mode throughput about one CMG-T 2010 Unix/Linux Quick Start Slide 106
  • 107. RAID5 for Large Requests •  Write full stripe and parity log •  Capacity similar to stripe –  Similar read and write performance –  Throughput of N-1 disks –  Service time for size reduced by N-1 –  Less interconnect load than mirror •  Degraded Mode –  Throughput halved and service similar –  Extra CPU used to regenerate data CMG-T 2010 Unix/Linux Quick Start Slide 107
  • 108. Cached RAID5 •  Nonvolatile cache –  No need for recovery log disk •  Fast service time for writes –  Interconnect transfer time only •  Cache optimizes RAID5 –  Makes all backend writes full stripe CMG-T 2010 Unix/Linux Quick Start Slide 108
  • 109. Cached Stripe •  Write caching for stripes –  Greatly reduced service time –  Very worthwhile for small transfers –  Large transfers should not be cached –  In many cases, 128KB is crossover point from small to large •  Optimizations –  Rewriting same block cancels in cache –  Small sequential writes coalesce CMG-T 2010 Unix/Linux Quick Start Slide 109
  • 110. Capacity Model Measurements •  Derived from iostat outputs! extended disk statistics disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd9 33.1 8.7 271.4 71.3 0.0 2.3 15.8 0 27 •  Utilization U = %b / 100 = 0.27 •  Throughput X = r/s + w/s = 41.8 •  Size K = Kr/s + Kw/s / X = 8.2K •  Concurrency N = actv = 2.3 •  Service time S = U / X = 6.5ms •  Response time R = svc_t = 15.8ms CMG-T 2010 Unix/Linux Quick Start Slide 110
  • 111. Cache Throughput •  Hard to model clustering and write cancellation improvements •  Make pessimistic assumption that throughput is unchanged •  Primary benefit of cache is fast response time •  Writes can flood cache and saturate back-end disks –  Service times suddenly go from 3ms to 300ms –  Very hard to figure out when this will happen –  Paranoia is a good policy…. CMG-T 2010 Unix/Linux Quick Start Slide 111
  • 112. Concluding Summary Walk out of here with the most useful content fresh in your mind! CMG-T 2010 Unix/Linux Quick Start Slide 112
  • 113. Quick Tips #1 - Disk •  The system will usually have a disk bottleneck •  Track how busy is the busiest disk of all •  Look for unbalanced, busy or slow disks with iostat •  Options: timestamp, look for busy controllers, ignore idle disks: % iostat -xnzCM -T d 30 Tue Jan 21 09:19:21 2003 extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 141.0 8.6 0.6 0.0 0.0 1.5 0.0 10.0 0 25 c0 3.3 0.0 0.0 0.0 0.0 0.0 0.0 6.5 0 2 c0t0d0 137.7 8.6 0.6 0.0 0.0 1.5 0.0 10.1 0 74 c0t1d0 Watch out for sd_max_throttle limiting throughput when set too low Watch out for RAID cache being flooded on writes, causes sudden very large increase in write service time CMG-T 2010 Unix/Linux Quick Start Slide 113
  • 114. Quick Tips #2 - Network •  If you ever see a slow machine that also appears to be idle, you should suspect a network lookup problem. i.e. the system is waiting for some other system to respond. •  Poor Network Filesystem response times may be hard to see –  Use iostat -xn 30 on a Solaris client –  wsvc_t is the time spent in the client waiting to send a request –  asvc_t is the time spent in the server responding –  %b will show 100% whenever any requests are being processed, it does NOT mean that the network server is maxed out, as an NFS server is a complex system that can serve many requests at once. •  Name server delays are also hard to detect –  Overloaded LDAP or NIS servers can cause problems –  DNS configuration errors or server problems often cause 30s delays as the request times out CMG-T 2010 Unix/Linux Quick Start Slide 114
  • 115. Quick Tips #3 - Memory •  Avoid the common vmstat misconceptions –  The first line is average since boot, so ignore it •  Linux, Other Unix and earlier Solaris Releases –  Ignore “free” memory –  Use high page scanner “sr” activity as your RAM shortage indicator •  Solaris 8 and Later Releases –  Use “free” memory to see how much is left for code to use –  Use non-zero page scanner “sr” activity as your RAM shortage indicator •  Don’t panic when you see page-ins and page-outs in vmstat •  Normal filesystem activity uses paging solaris9% vmstat 30 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 s0 s1 s6 in sy cs us sy id 0 0 0 2367832 91768 3 31 2 1 1 0 0 0 0 0 0 511 404 350 0 0 99 0 0 0 2332728 75704 3 29 0 0 0 0 0 0 0 0 0 508 537 410 0 0 99 CMG-T 2010 Unix/Linux Quick Start Slide 115
  • 116. Quick Tips #4 - CPU •  Look for a long run queue (vmstat procs r) - and add CPUs –  To speedup with a zero run queue you need faster CPUs, not more of them •  Check for CPU system time dominating user time –  Most systems should have lots more Usr than Sys, as they are running application code –  But... dedicated NFS servers should be 100% Sys –  And... dedicated web servers have high Sys as well –  So... assume that lots of network service drives Sys time •  Watch out for processes that hog the CPU –  Big problem on user desktop systems - look for looping web browsers –  Web search engines may get queries that loop –  Use resource management or limit cputime (ulimit -t) in startup scripts to terminate web queries CMG-T 2010 Unix/Linux Quick Start Slide 116
  • 117. Quick Tips #5 - I/O Wait •  Look for processes blocked waiting for disk I/O (vmstat procs b) –  This is what causes CPU time to be counted as wait not idle –  Nothing else ever causes CPU wait time! •  CPU wait time is a subset of idle time, consumes no resources –  CPU wait time is a misconceived statistic, and its fallacy is amplified on multi-threaded systems –  CPU wait time is no longer calculated in Solaris 10; reports as zero –  Bottom line - don’t worry about CPU wait time, it’s a broken metric •  Look at individual process wait time using microstates –  prstat -m or SE toolkit process monitoring •  Look at I/O wait time using iostat asvc_t CMG-T 2010 Unix/Linux Quick Start Slide 117
  • 118. Quick Tips #6 - iostat •  For Solaris remember “expenses” iostat -xPncez 30 –  Add -M for Megabytes, and -T d for timestamped logging –  Use 30 second interval to avoid spikes in load. –  Watch asvc_t which is the response time for Solaris •  Look for regular disks over 5% busy that have response times of more than 10ms as a problem. •  If you have cached hardware RAID, look for response times of more than 5ms as a problem. •  Ignore large response times on idle disks that have filesystems - its not a problem and the cause is the fsflush process CMG-T 2010 Unix/Linux Quick Start Slide 118
  • 119. Recipe to fix a slow system •  Essential Background Information –  What is the business function of the system? –  Who and where are the users? –  Who says there is a problem, and what is slow? –  What changed recently and what is on the way? •  What is the system configuration? –  CPU/RAM/Disk/Net/OS/Patches, what application software is in use? •  What are the busy processes on the system doing? –  use top, prstat, pea.se or /usr/ucb/ps uax | head •  Report CPU and disk utilization levels, iostat -xPncezM -T d 30 –  What is making the disks busy? •  What is the network name service configuration? –  How much network activity is there? Use netstat -i 30 or nx.se 30 •  Is there enough memory? –  Check free memory and the scan rate with vmstat 30 CMG-T 2010 Unix/Linux Quick Start Slide 119
  • 120. Further Reading - Books General Solaris/Unix/Linux Performance Tuning –  System Performance Tuning (2nd Edition) by Gian-Paolo D. Musumeci and Mike Loukides; O'Reilly & Associates Solaris Performance Tuning Books –  Solaris Performance and Tools, Richard McDougall, Jim Mauro, Brendan Gregg; Prentice Hall –  Configuring and Tuning Databases on the Solaris Platform, Allan Packer; Prentice Hall –  Sun Performance and Tuning, by Adrian Cockcroft and Rich Pettit; Prentice Hall Sun BluePrints™ –  Capacity Planning for Internet Services, Adrian Cockcroft and Bill Walker; Prentice Hall –  Resource Management, Richard McDougall, Adrian Cockcroft et al. Prentice Hall Linux –  Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer –  Google has a Linux specific search mode http://www.google.com/linux CMG-T 2010 Unix/Linux Quick Start Slide 120
  • 121. Questions? (The End) CMG-T 2010 Unix/Linux Quick Start Slide 121
  • 122. Backing Material “Test Your Intuition” CMG-T 2010 Unix/Linux Quick Start Slide 122
  • 123. Pop Quiz #1 •  SITUATION: A system runs at 100% CPU usage for 1 hour each day completing a single compute-bound task. The SLA requires the task to complete in 4 hours. •  Q1: How much “headroom” does this system have? •  Q2: How can this task's resource footprint be managed to never exceed 80% CPU usage? CMG-T 2010 Unix/Linux Quick Start Slide 123
  • 124. Pop Quiz #1: Answers •  SITUATION: A system runs at 100% CPU usage for 1 hour each day completing a single compute-bound task. The SLA requires the task to complete in 4 hours. •  Q1: How much “headroom” does this system have? •  A1: 300% (in workload terms) or 75% (in percent-of-system terms) - it can do 4x the work it now does and remain within the SLA. •  Q2: How can this task's resource footprint be managed to never exceed 80% CPU usage? •  A2a: Huh? Why would anyone want to do that? •  A2b: Resource management. CMG-T 2010 Unix/Linux Quick Start Slide 124
  • 125. Pop Quiz #2 •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75% CPU busy, with a workload that includes four compute- bound threads plus some OLTP. The new target system is a 4-way 2000-BogoMIPs system. •  Q1: What is the new system's projected CPU utilization? •  Q2: How can this system's workload be managed to never exceed 75% CPU utilization? CMG-T 2010 Unix/Linux Quick Start Slide 125
  • 126. Pop Quiz #2: Answers •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75% CPU busy, with a workload that includes four compute-bound threads plus some OLTP. The new target system is a 4-way 2000- BogoMIPs system. •  Q1: What is the new system's projected CPU utilization? •  A1: 100%. Each of the four compute-bound threads will keep one CPU 100% busy. •  Q2: How can this system's workload be managed to never exceed 75% CPU utilization? •  A2a: Huh? Why would anyone want to do that? •  A2b: Resource management. CMG-T 2010 Unix/Linux Quick Start Slide 126
  • 127. Pop Quiz #3 •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75% CPU busy, with a workload that includes four compute- bound threads plus some OLTP. The new target system is a 4-way 2000-BogoMIPs system. (Same as last quiz, OK?) •  Q1: How will the compute-bound thread's performance be impacted by the upgrade? (Just roughly speaking – no need for precision here!) CMG-T 2010 Unix/Linux Quick Start Slide 127
  • 128. Pop Quiz #3: Answers •  SITUATION: An 8-way 1000-BogoMIPs box runs at 75% CPU busy, with a workload that includes four compute-bound threads plus some OLTP. The new target system is a 4-way 2000- BogoMIPs system. (Same as last quiz, OK?) •  Q1: How will the compute-bound thread's performance be impacted by the upgrade? (Just roughly speaking – no need for precision here!) •  A1: It should run almost 4x faster. Each new CPU is 4x faster than the old ones. (2000/4)/(1000/8) = 4. The OLTP will use some of the CPU cycles, but its service demand pales next to the compute jobs. CMG-T 2010 Unix/Linux Quick Start Slide 128
  • 129. Pop Quiz #4 • ESSAY QUESTION: “At what point do these principles become difficult?” CMG-T 2010 Unix/Linux Quick Start Slide 129