2. About
● Software Developer/Architect
● Polyglot - C++, Java, Erlang, Haskell, Scala,
Closure, Golang
● Interested in Distributed Systems and Scalability
● Speaker & Developer Advocate
3. Agenda
● Widely Used Methodologies
● USE
● RED
● Queueing Theory
● Universal Scalability Law
● Questions
4. Street Light Method
● Absence of a real methodology
● Observe using tools the user is familiar with
● The issue found may be an issue, but not the
issue.
● Bias based on tool familiarity
5. Random Change
Method
● Pick a random variable to change
● Change it in one direction
● Observe the effect
● Change it in the other direction
● Observe the effect
● Were any observations better ? Keep those and
repeat with some other variable.
6. Blame Someone
Else
● Find a system/component which you are not
responsible for
● Blame that system/component
● When proven wrong, start over with step 1.
7. Ad hoc Checklist
● Step through a canned checklist to rule out
obvious culprits
● Handles known issues which have been
observed in the past
● No way to figure out Unknowns
● Lists need to be updated to accommodate new
findings
8. Scientific Method
Study the unknown by making hypotheses and then
testing them using the following steps
● Question
● Hypothesise
● Prediction
● Test
● Analyse
9. Scientific Method
Examples
Question : What is causing slow database queries ?
Hypothesis : Certain periodic scripts running on the
server are performing a large number of Disk I/O,
impacting production query performance.
Prediction : If file system I/O latency is observed
while running a query, it will show the FS is
responsible for the slow queries
Test : Tracing database file system latency as a ratio
of query latency shows < 5% time is spent waiting on
the FS.
Analysis : The FS and the disks are not responsible
for slow queries.
10. Scientific Method
● Might not find a solution immediately, but rules
out a large part of the system.
● A new hypothesis can be developed and the
steps repeated.
● Can also be used for negative tests which
progressively make the system worse, to learn
more about the target system.
11. USE and RED :
Different sides of
the same coin ?
Perspectives : Two perspectives of system
performance based on the audience, metrics or
approaches.
● Resource Analysis - Bottom up (USE)
● Workload Analysis - Top Down (RED)
12. USE and RED :
Resource Analysis
Analysis of system resources : CPU’s, disks, network
interfaces, interconnects,buses etc.
Focus of resource analysis is around measures of
utilisation, to identify when resources are
approaching their limits.
Measurements of resources typically focus on
● IOPS
● Utilisation
● Saturation
● Throughput
13. USE and RED :
Workload Analysis
Analysis of applications : How is the system
responding to the applied load(workload).
Measurements of workloads typically focus on
● Requests/Time - Workload applied to the
system
● Latency : The time it takes the system to
respond
● Completion - The completed requests and the
error counts/percentages
14. USE Methodology :
Used for Resource Analysis : For every resource,
check utilisation, saturation and errors.
Resource : All physical server components(CPU’s,
disks, busses, network interfaces etc.)
Utilisation : For a given time interval , the amount of
time the resource was busy performing work.
Saturation : The degree to which a resource has extra
work, which it cannot service, often waiting in a
queue.
Errors : The count/percentage of errors.
15. USE Methodology :
● Involves iterating over resources rather than
tools/metrics.
● Directs analysis to a limited number of key
metrics
● Check all system resources as quickly as
possible. This avoids the street light effect.
17. USE Methodology :
Applying USE :
1. Identify Resources
a. CPU - Cores, Sockets, v-CPU’s
b. Main memory - DRAM
c. Networking interfaces : Ethernet ports
d. Storage Devices : Disks
e. Network and Storage Controllers
f. CPU, Memory and I/O interconnects
2. For each identified resource, define metrics for
utilisation, saturation and errors(which ever
applicable)
a. CPU - Utilisation, run-queue length
b. Memory - Available/Free, paging/swapping
c. Network Interfaces - receive and transmit
throughput, maximum bandwidth
d. Storage - device busy %, wait queue length,
device errors
3. The above metrics can be averages per interval
or counts. Include instructions for fetching
each metric.
18. USE Methodology :
Interpreting USE Metrics :
1. Utilisation - 100% utilisation is generally a sign
of a bottleneck . As utilisation increases,
queueing delays start building up.
2. Saturation - Any degree of saturation can be a
problem. Measured either as the length of a
queue or the wait time spent queued.
3. Errors - Non zero counters should be
investigated especially if they are increasing as
performance degrades.
Queueing Theory is a must know for performance
engineers. We briefly cover that later.
19. RED Methodology :
Measure 3 key metrics for all applications
● Rate - The number of requests per second your
applications are serving
● Errors - The number of failed requests per
second
● Duration - Distributions of the amount of time
each request takes.
Measuring these consistently across your
applications allows you to scale your monitoring
team. It also lays focus on the 80-20 principle.
Further drill down analysis can be performed using
code instrumentation, tracing, APM tools etc.
However these metrics and the dashboards serve as
the starting point for monitoring the performance of
your applications.
21. RED Methodology :
Workload Characterisation : An important aspect is to
qualify and quantify the workload your system is
experiencing
● Who is causing the load : PID, Client ID, Remote
IP Address
● Why is the load being called - Code path,
upstream and downstream system response
times/errors
● What is the load characteristics - Throughput,
IOPS, read heavy/write intensive
● How is the load changing over time
● Is the load a characteristic of the architecture
of the system or due to resource constraints
The intention of workload characterisation is to
eliminate unnecessary work. Remember the fastest
query is the one which is never run
22. Queueing Theory
Broad field of study which focuses on the nuances of
how waiting lines behave. Boils down to answering
some basic questions
● How likely is it that something will queue up
and wait ?
● How long will the queue be ?
● How long will be the wait ?
● How busy will be the person/system serving
the line ?
● How much capacity is needed to meet
expected demand
23. Queueing Theory:
Introduction
Queueing theory is a part of a wider area of
mathematics which falls under the probability theory.
Queueing systems are non-linear. Things change
almost linearly until they reach a tipping point and
then it transitions to a much faster or slower
behavior. There can be multiple such tipping points.
24. Queueing Systems:
Real Life Examples
● Getting a coffee at Starbucks
● At the traffic light
● Computer CPU and disks
● Third in line for take off at the airport
● Load balancer
● Pre-ordering an iPhone
● Connection pooling
● Trying to pick the line at the grocery store
In general any system for handling units of
work is a queueing system. (Software teams as
well)
25. The obvious answer would be “There’s more work to
do than the current capacity to handle it”
However this is not correct. Queueing happens even
when there’s more than enough capacity to handle the
work.
Why would this be the case ? There are 3 reasons
● Irregular Arrivals - Arrivals don’t happen at
regular intervals. They are sometimes far apart
and sometimes closer together, so they
overlap.
● Irregular Job Sizes - Jobs don’t complete in the
same time. Some take much longer and others
shorter.
● Waste/Ide Time - Idle time can never be
recovered. While the system is idle it is not
utilised, however it cannot work at more than
100% capacity.
Why does Queueing
happen ?
26. There are 3 core concepts when one talks of
queueing systems. These are
1. Jobs/Tasks - The unit of work that a system is
supposed to perform
2. System/Servers - The things that perform the
work
3. Queues - are where the jobs/tasks wait when
the server/system is busy and cannot perform
work as it arrives.
Note Queueing theory applies to stable systems. A
stable system is one which is able to keep up with the
incoming work, where all jobs eventually get
processed and the queue does not grow to infinity.
Queueing Theory
Fundamentals
27. ● Arrival Rate (ʎ) - How often new jobs arrive at
the front of the queue.
● Queue Length (L) - The average number of jobs
waiting in the queue.
● Wait Time (W) - How long does the job have to
wait in the queue, on average.
● Service Time (S) - How long does it take to
service a job after it leaves the queue.
● Service Rate (µ) - The inverse of service time
● Residence Time ( R ) - How long jobs are in the
system.(W+S)
● Utilisation (U) - How much time the system is
busy performing jobs.
● Concurrency (L) - The numbers of jobs waiting
or being processed.
● Number of Servers(M) - The number of servers
available to process jobs.
Queueing Theory:
Important Metrics
28. Queueing Theory:
Important Laws
Little’s Law : L = ʎR : The concurrency of a system is
arrival rate times the residence time. More generally
when talking about just the queue, it is stated as
L=ʎW.
Utilisation Law : U =ʎS or ʎ/µ( utilisation is the arrival
rate times the service time). If there are M servers
then the more general form is ʎS/M.
29. Kendall's Notation
A mechanism to classify different kinds of queueing
systems. Designated using a slash delimited notation
using numbers and letters which indicate the
following
1. How arrival rate behaves. This is the first letter.
2. How service times behave. This is the second
letter.
3. The number of servers. This is the third
number/letter
4. Special Parameters for cases where a system
rejects arrivals beyond capacity, has some
prioritisation logic for selecting jobs
etc.(Optional in a majority of cases)
30. Kendall's Notation
M - Stands for Markovian(Memoryless). Implies jobs
arrive randomly and independently, with an arrival
rate of ʎ. The arrival events are generated by
Poisson’s ratio( 1/ʎ, exponential)
G- Gaussian distribution, which is general with no
specific distribution assumed.
Common queueing systems are
M/M/1
M/M/c
M/G/1
M/G/c
31. Uses of Queueing
Theory
Queueing Theory is mostly used to calculate wait
times and queue lengths. For an M/M/1 case, the
residence time for the job is
R=S/(1-U). (A generalised solution is S/(1-U^c))
This is an exponential distribution(hockey stick)
which can be validated by running it with different
values of S and U. Some indicative values are shown
below for an arrival rate of 5 jobs/minute
S= 1 min, U = 50% => R = 2min , L = 10(from Little’s
Law)
S=1 min, U = 75% => R = 4 min, L = 20
S = 1 min U = 90% => R = 10 min , L = 50
As Utilisation increases, the Residence time
increases exponentially along with concurrency.
32. Universal
Scalability Law:
Scalability
Wikipedia Definition
The capability of a system, network, or process to
handle a growing amount of work, or its potential to be
enlarged in order to accommodate that growth
In more mathematical terms, formally, we can say
that Scalability is an increasing function of load or
size.
33. Universal
Scalability Law:
Linear Scalability
There are no real linearly scalable systems. The real
test of linear scalability is whether the transactions
per second per node remain constant as the number
of nodes increases.
Example :
A system achieves 182k transactions per second for
3 nodes and 449k transactions for 12 nodes.
3 nodes : 60700 tps per node
12 nodes = 37400 tps per node
39% drop in throughput versus linear scalability. If it
actually linear scale = 728k tps for 12 nodes
35. Universal
Scalability Law:
Efficiency Loss
There are two reasons for efficiency loss
● Contention - Contention degrades scalability
because parts of the system cannot be
parallelized and lead to queuing.
● Crosstalk - Crosstalk introduces a penalty as
workers (threads, CPUs, etc) communicate to
share and synchronize mutable state
36. Universal
Scalability Law:
Perfect Linear
Scaling
Perfect Linear scaling can be represented as
T = ʎN where ʎ = coefficient of performance, N = Size
(Nodes/CPU’s/threads etc). Hence if N doubles, the
throughput of the system should double.
37. Universal
Scalability Law:
Amdahl’s Law
Amdahl’s Law states that the maximum speedup
possible is the reciprocal of the
non-parallelizable(serializable) portion of the work.
X = λN /(1 + σ(N − 1))
σ = coefficient of serialization
A system with serialization will asymptotically
approach a speedup limit
σ = .05
38. Universal
Scalability Law:
Crosstalk happens between each pair of workers in
the system(CPU, threads, nodes). This would lead to
having n(n-1) interactions(no. Of nodes in a fully
connected graph). Introducing crosstalk in the
equation leads to
X(N) = λN/(1 + σ(N − 1) + κN(N − 1))
The crosstalk penalty grows fast. Because it’s
quadratic eventually it grows faster than the linear
speedup of the ideal system we started with, no
matter how small κ is.
Capacity planning :
Nmax = Math.floor((1 − σ)/ κ )
Refer to https://cran.r-project.org/web/packages/usl/ for
calculating the values for the coefficients.