SlideShare une entreprise Scribd logo
1  sur  39
Performance Engineering
Methodologies
Maneesh Chaturvedi
maneesh.chaturvedi@gmail.com
rethinkingdesign.tech
github.com/maneeshchaturvedi
About
● Software Developer/Architect
● Polyglot - C++, Java, Erlang, Haskell, Scala,
Closure, Golang
● Interested in Distributed Systems and Scalability
● Speaker & Developer Advocate
Agenda
● Widely Used Methodologies
● USE
● RED
● Queueing Theory
● Universal Scalability Law
● Questions
Street Light Method
● Absence of a real methodology
● Observe using tools the user is familiar with
● The issue found may be an issue, but not the
issue.
● Bias based on tool familiarity
Random Change
Method
● Pick a random variable to change
● Change it in one direction
● Observe the effect
● Change it in the other direction
● Observe the effect
● Were any observations better ? Keep those and
repeat with some other variable.
Blame Someone
Else
● Find a system/component which you are not
responsible for
● Blame that system/component
● When proven wrong, start over with step 1.
Ad hoc Checklist
● Step through a canned checklist to rule out
obvious culprits
● Handles known issues which have been
observed in the past
● No way to figure out Unknowns
● Lists need to be updated to accommodate new
findings
Scientific Method
Study the unknown by making hypotheses and then
testing them using the following steps
● Question
● Hypothesise
● Prediction
● Test
● Analyse
Scientific Method
Examples
Question : What is causing slow database queries ?
Hypothesis : Certain periodic scripts running on the
server are performing a large number of Disk I/O,
impacting production query performance.
Prediction : If file system I/O latency is observed
while running a query, it will show the FS is
responsible for the slow queries
Test : Tracing database file system latency as a ratio
of query latency shows < 5% time is spent waiting on
the FS.
Analysis : The FS and the disks are not responsible
for slow queries.
Scientific Method
● Might not find a solution immediately, but rules
out a large part of the system.
● A new hypothesis can be developed and the
steps repeated.
● Can also be used for negative tests which
progressively make the system worse, to learn
more about the target system.
USE and RED :
Different sides of
the same coin ?
Perspectives : Two perspectives of system
performance based on the audience, metrics or
approaches.
● Resource Analysis - Bottom up (USE)
● Workload Analysis - Top Down (RED)
USE and RED :
Resource Analysis
Analysis of system resources : CPU’s, disks, network
interfaces, interconnects,buses etc.
Focus of resource analysis is around measures of
utilisation, to identify when resources are
approaching their limits.
Measurements of resources typically focus on
● IOPS
● Utilisation
● Saturation
● Throughput
USE and RED :
Workload Analysis
Analysis of applications : How is the system
responding to the applied load(workload).
Measurements of workloads typically focus on
● Requests/Time - Workload applied to the
system
● Latency : The time it takes the system to
respond
● Completion - The completed requests and the
error counts/percentages
USE Methodology :
Used for Resource Analysis : For every resource,
check utilisation, saturation and errors.
Resource : All physical server components(CPU’s,
disks, busses, network interfaces etc.)
Utilisation : For a given time interval , the amount of
time the resource was busy performing work.
Saturation : The degree to which a resource has extra
work, which it cannot service, often waiting in a
queue.
Errors : The count/percentage of errors.
USE Methodology :
● Involves iterating over resources rather than
tools/metrics.
● Directs analysis to a limited number of key
metrics
● Check all system resources as quickly as
possible. This avoids the street light effect.
USE Methodology :
USE Methodology :
Applying USE :
1. Identify Resources
a. CPU - Cores, Sockets, v-CPU’s
b. Main memory - DRAM
c. Networking interfaces : Ethernet ports
d. Storage Devices : Disks
e. Network and Storage Controllers
f. CPU, Memory and I/O interconnects
2. For each identified resource, define metrics for
utilisation, saturation and errors(which ever
applicable)
a. CPU - Utilisation, run-queue length
b. Memory - Available/Free, paging/swapping
c. Network Interfaces - receive and transmit
throughput, maximum bandwidth
d. Storage - device busy %, wait queue length,
device errors
3. The above metrics can be averages per interval
or counts. Include instructions for fetching
each metric.
USE Methodology :
Interpreting USE Metrics :
1. Utilisation - 100% utilisation is generally a sign
of a bottleneck . As utilisation increases,
queueing delays start building up.
2. Saturation - Any degree of saturation can be a
problem. Measured either as the length of a
queue or the wait time spent queued.
3. Errors - Non zero counters should be
investigated especially if they are increasing as
performance degrades.
Queueing Theory is a must know for performance
engineers. We briefly cover that later.
RED Methodology :
Measure 3 key metrics for all applications
● Rate - The number of requests per second your
applications are serving
● Errors - The number of failed requests per
second
● Duration - Distributions of the amount of time
each request takes.
Measuring these consistently across your
applications allows you to scale your monitoring
team. It also lays focus on the 80-20 principle.
Further drill down analysis can be performed using
code instrumentation, tracing, APM tools etc.
However these metrics and the dashboards serve as
the starting point for monitoring the performance of
your applications.
RED Methodology :
RED Methodology :
Workload Characterisation : An important aspect is to
qualify and quantify the workload your system is
experiencing
● Who is causing the load : PID, Client ID, Remote
IP Address
● Why is the load being called - Code path,
upstream and downstream system response
times/errors
● What is the load characteristics - Throughput,
IOPS, read heavy/write intensive
● How is the load changing over time
● Is the load a characteristic of the architecture
of the system or due to resource constraints
The intention of workload characterisation is to
eliminate unnecessary work. Remember the fastest
query is the one which is never run
Queueing Theory
Broad field of study which focuses on the nuances of
how waiting lines behave. Boils down to answering
some basic questions
● How likely is it that something will queue up
and wait ?
● How long will the queue be ?
● How long will be the wait ?
● How busy will be the person/system serving
the line ?
● How much capacity is needed to meet
expected demand
Queueing Theory:
Introduction
Queueing theory is a part of a wider area of
mathematics which falls under the probability theory.
Queueing systems are non-linear. Things change
almost linearly until they reach a tipping point and
then it transitions to a much faster or slower
behavior. There can be multiple such tipping points.
Queueing Systems:
Real Life Examples
● Getting a coffee at Starbucks
● At the traffic light
● Computer CPU and disks
● Third in line for take off at the airport
● Load balancer
● Pre-ordering an iPhone
● Connection pooling
● Trying to pick the line at the grocery store
In general any system for handling units of
work is a queueing system. (Software teams as
well)
The obvious answer would be “There’s more work to
do than the current capacity to handle it”
However this is not correct. Queueing happens even
when there’s more than enough capacity to handle the
work.
Why would this be the case ? There are 3 reasons
● Irregular Arrivals - Arrivals don’t happen at
regular intervals. They are sometimes far apart
and sometimes closer together, so they
overlap.
● Irregular Job Sizes - Jobs don’t complete in the
same time. Some take much longer and others
shorter.
● Waste/Ide Time - Idle time can never be
recovered. While the system is idle it is not
utilised, however it cannot work at more than
100% capacity.
Why does Queueing
happen ?
There are 3 core concepts when one talks of
queueing systems. These are
1. Jobs/Tasks - The unit of work that a system is
supposed to perform
2. System/Servers - The things that perform the
work
3. Queues - are where the jobs/tasks wait when
the server/system is busy and cannot perform
work as it arrives.
Note Queueing theory applies to stable systems. A
stable system is one which is able to keep up with the
incoming work, where all jobs eventually get
processed and the queue does not grow to infinity.
Queueing Theory
Fundamentals
● Arrival Rate (ʎ) - How often new jobs arrive at
the front of the queue.
● Queue Length (L) - The average number of jobs
waiting in the queue.
● Wait Time (W) - How long does the job have to
wait in the queue, on average.
● Service Time (S) - How long does it take to
service a job after it leaves the queue.
● Service Rate (µ) - The inverse of service time
● Residence Time ( R ) - How long jobs are in the
system.(W+S)
● Utilisation (U) - How much time the system is
busy performing jobs.
● Concurrency (L) - The numbers of jobs waiting
or being processed.
● Number of Servers(M) - The number of servers
available to process jobs.
Queueing Theory:
Important Metrics
Queueing Theory:
Important Laws
Little’s Law : L = ʎR : The concurrency of a system is
arrival rate times the residence time. More generally
when talking about just the queue, it is stated as
L=ʎW.
Utilisation Law : U =ʎS or ʎ/µ( utilisation is the arrival
rate times the service time). If there are M servers
then the more general form is ʎS/M.
Kendall's Notation
A mechanism to classify different kinds of queueing
systems. Designated using a slash delimited notation
using numbers and letters which indicate the
following
1. How arrival rate behaves. This is the first letter.
2. How service times behave. This is the second
letter.
3. The number of servers. This is the third
number/letter
4. Special Parameters for cases where a system
rejects arrivals beyond capacity, has some
prioritisation logic for selecting jobs
etc.(Optional in a majority of cases)
Kendall's Notation
M - Stands for Markovian(Memoryless). Implies jobs
arrive randomly and independently, with an arrival
rate of ʎ. The arrival events are generated by
Poisson’s ratio( 1/ʎ, exponential)
G- Gaussian distribution, which is general with no
specific distribution assumed.
Common queueing systems are
M/M/1
M/M/c
M/G/1
M/G/c
Uses of Queueing
Theory
Queueing Theory is mostly used to calculate wait
times and queue lengths. For an M/M/1 case, the
residence time for the job is
R=S/(1-U). (A generalised solution is S/(1-U^c))
This is an exponential distribution(hockey stick)
which can be validated by running it with different
values of S and U. Some indicative values are shown
below for an arrival rate of 5 jobs/minute
S= 1 min, U = 50% => R = 2min , L = 10(from Little’s
Law)
S=1 min, U = 75% => R = 4 min, L = 20
S = 1 min U = 90% => R = 10 min , L = 50
As Utilisation increases, the Residence time
increases exponentially along with concurrency.
Universal
Scalability Law:
Scalability
Wikipedia Definition
The capability of a system, network, or process to
handle a growing amount of work, or its potential to be
enlarged in order to accommodate that growth
In more mathematical terms, formally, we can say
that Scalability is an increasing function of load or
size.
Universal
Scalability Law:
Linear Scalability
There are no real linearly scalable systems. The real
test of linear scalability is whether the transactions
per second per node remain constant as the number
of nodes increases.
Example :
A system achieves 182k transactions per second for
3 nodes and 449k transactions for 12 nodes.
3 nodes : 60700 tps per node
12 nodes = 37400 tps per node
39% drop in throughput versus linear scalability. If it
actually linear scale = 728k tps for 12 nodes
Universal
Scalability Law:
Linear Scalability
Real world systems exhibit some loss of efficiency
and in fact over a certain number, the system can
exhibit retrograde performance.
Universal
Scalability Law:
Efficiency Loss
There are two reasons for efficiency loss
● Contention - Contention degrades scalability
because parts of the system cannot be
parallelized and lead to queuing.
● Crosstalk - Crosstalk introduces a penalty as
workers (threads, CPUs, etc) communicate to
share and synchronize mutable state
Universal
Scalability Law:
Perfect Linear
Scaling
Perfect Linear scaling can be represented as
T = ʎN where ʎ = coefficient of performance, N = Size
(Nodes/CPU’s/threads etc). Hence if N doubles, the
throughput of the system should double.
Universal
Scalability Law:
Amdahl’s Law
Amdahl’s Law states that the maximum speedup
possible is the reciprocal of the
non-parallelizable(serializable) portion of the work.
X = λN /(1 + σ(N − 1))
σ = coefficient of serialization
A system with serialization will asymptotically
approach a speedup limit
σ = .05
Universal
Scalability Law:
Crosstalk happens between each pair of workers in
the system(CPU, threads, nodes). This would lead to
having n(n-1) interactions(no. Of nodes in a fully
connected graph). Introducing crosstalk in the
equation leads to
X(N) = λN/(1 + σ(N − 1) + κN(N − 1))
The crosstalk penalty grows fast. Because it’s
quadratic eventually it grows faster than the linear
speedup of the ideal system we started with, no
matter how small κ is.
Capacity planning :
Nmax = Math.floor((1 − σ)/ κ )
Refer to https://cran.r-project.org/web/packages/usl/ for
calculating the values for the coefficients.
References/
Attribution:
https://www.weave.works/blog/the-red-method-key-metrics-for-
microservices-architecture/
http://www.brendangregg.com/usemethod.html#:~:text=The%20
Utilization%20Saturation%20and%20Errors,identifying%20resou
rce%20bottlenecks%20or%20errors.
https://en.wikipedia.org/wiki/Scientific_method#:~:text=The%20p
rocess%20of%20the%20scientific,seeking%20answers%20to%
20the%20question.
https://en.wikipedia.org/wiki/Neil_J._Gunther
https://www.amazon.com/Introduction-Queueing-Theory-Applica
tions-Statistics/dp/0817684204
https://www.amazon.com/Fundamentals-Queueing-Theory-Don
ald-Gross/dp/047179127X

Contenu connexe

Similaire à Performance engineering methodologies

Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE MethodBrendan Gregg
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project PresentationGary Spencer
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016Brendan Gregg
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection1crore projects
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud
 
Operating System
Operating SystemOperating System
Operating Systemcpjcollege
 
OPERATING SYSTEMS - INTRODUCTION
OPERATING SYSTEMS - INTRODUCTIONOPERATING SYSTEMS - INTRODUCTION
OPERATING SYSTEMS - INTRODUCTIONpriyasoundar
 
Unit 2 part 2(Process)
Unit 2 part 2(Process)Unit 2 part 2(Process)
Unit 2 part 2(Process)WajeehaBaig
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSMaurvi04
 
Data structures and algorithms Module-1.pdf
Data structures and algorithms Module-1.pdfData structures and algorithms Module-1.pdf
Data structures and algorithms Module-1.pdfDukeCalvin
 
Real time os(suga)
Real time os(suga) Real time os(suga)
Real time os(suga) Nagarajan
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 

Similaire à Performance engineering methodologies (20)

Operating System
Operating SystemOperating System
Operating System
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 
Software Performance
Software Performance Software Performance
Software Performance
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
 
Bt0070
Bt0070Bt0070
Bt0070
 
Operating System.pptx
Operating System.pptxOperating System.pptx
Operating System.pptx
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
 
Linux basics
Linux basicsLinux basics
Linux basics
 
Operating System
Operating SystemOperating System
Operating System
 
OPERATING SYSTEMS - INTRODUCTION
OPERATING SYSTEMS - INTRODUCTIONOPERATING SYSTEMS - INTRODUCTION
OPERATING SYSTEMS - INTRODUCTION
 
Unit 2 part 2(Process)
Unit 2 part 2(Process)Unit 2 part 2(Process)
Unit 2 part 2(Process)
 
TOC ConWIP Kanban
TOC ConWIP KanbanTOC ConWIP Kanban
TOC ConWIP Kanban
 
Os unit i
Os unit iOs unit i
Os unit i
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 
Data structures and algorithms Module-1.pdf
Data structures and algorithms Module-1.pdfData structures and algorithms Module-1.pdf
Data structures and algorithms Module-1.pdf
 
Real time os(suga)
Real time os(suga) Real time os(suga)
Real time os(suga)
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 

Dernier (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 

Performance engineering methodologies

  • 2. About ● Software Developer/Architect ● Polyglot - C++, Java, Erlang, Haskell, Scala, Closure, Golang ● Interested in Distributed Systems and Scalability ● Speaker & Developer Advocate
  • 3. Agenda ● Widely Used Methodologies ● USE ● RED ● Queueing Theory ● Universal Scalability Law ● Questions
  • 4. Street Light Method ● Absence of a real methodology ● Observe using tools the user is familiar with ● The issue found may be an issue, but not the issue. ● Bias based on tool familiarity
  • 5. Random Change Method ● Pick a random variable to change ● Change it in one direction ● Observe the effect ● Change it in the other direction ● Observe the effect ● Were any observations better ? Keep those and repeat with some other variable.
  • 6. Blame Someone Else ● Find a system/component which you are not responsible for ● Blame that system/component ● When proven wrong, start over with step 1.
  • 7. Ad hoc Checklist ● Step through a canned checklist to rule out obvious culprits ● Handles known issues which have been observed in the past ● No way to figure out Unknowns ● Lists need to be updated to accommodate new findings
  • 8. Scientific Method Study the unknown by making hypotheses and then testing them using the following steps ● Question ● Hypothesise ● Prediction ● Test ● Analyse
  • 9. Scientific Method Examples Question : What is causing slow database queries ? Hypothesis : Certain periodic scripts running on the server are performing a large number of Disk I/O, impacting production query performance. Prediction : If file system I/O latency is observed while running a query, it will show the FS is responsible for the slow queries Test : Tracing database file system latency as a ratio of query latency shows < 5% time is spent waiting on the FS. Analysis : The FS and the disks are not responsible for slow queries.
  • 10. Scientific Method ● Might not find a solution immediately, but rules out a large part of the system. ● A new hypothesis can be developed and the steps repeated. ● Can also be used for negative tests which progressively make the system worse, to learn more about the target system.
  • 11. USE and RED : Different sides of the same coin ? Perspectives : Two perspectives of system performance based on the audience, metrics or approaches. ● Resource Analysis - Bottom up (USE) ● Workload Analysis - Top Down (RED)
  • 12. USE and RED : Resource Analysis Analysis of system resources : CPU’s, disks, network interfaces, interconnects,buses etc. Focus of resource analysis is around measures of utilisation, to identify when resources are approaching their limits. Measurements of resources typically focus on ● IOPS ● Utilisation ● Saturation ● Throughput
  • 13. USE and RED : Workload Analysis Analysis of applications : How is the system responding to the applied load(workload). Measurements of workloads typically focus on ● Requests/Time - Workload applied to the system ● Latency : The time it takes the system to respond ● Completion - The completed requests and the error counts/percentages
  • 14. USE Methodology : Used for Resource Analysis : For every resource, check utilisation, saturation and errors. Resource : All physical server components(CPU’s, disks, busses, network interfaces etc.) Utilisation : For a given time interval , the amount of time the resource was busy performing work. Saturation : The degree to which a resource has extra work, which it cannot service, often waiting in a queue. Errors : The count/percentage of errors.
  • 15. USE Methodology : ● Involves iterating over resources rather than tools/metrics. ● Directs analysis to a limited number of key metrics ● Check all system resources as quickly as possible. This avoids the street light effect.
  • 17. USE Methodology : Applying USE : 1. Identify Resources a. CPU - Cores, Sockets, v-CPU’s b. Main memory - DRAM c. Networking interfaces : Ethernet ports d. Storage Devices : Disks e. Network and Storage Controllers f. CPU, Memory and I/O interconnects 2. For each identified resource, define metrics for utilisation, saturation and errors(which ever applicable) a. CPU - Utilisation, run-queue length b. Memory - Available/Free, paging/swapping c. Network Interfaces - receive and transmit throughput, maximum bandwidth d. Storage - device busy %, wait queue length, device errors 3. The above metrics can be averages per interval or counts. Include instructions for fetching each metric.
  • 18. USE Methodology : Interpreting USE Metrics : 1. Utilisation - 100% utilisation is generally a sign of a bottleneck . As utilisation increases, queueing delays start building up. 2. Saturation - Any degree of saturation can be a problem. Measured either as the length of a queue or the wait time spent queued. 3. Errors - Non zero counters should be investigated especially if they are increasing as performance degrades. Queueing Theory is a must know for performance engineers. We briefly cover that later.
  • 19. RED Methodology : Measure 3 key metrics for all applications ● Rate - The number of requests per second your applications are serving ● Errors - The number of failed requests per second ● Duration - Distributions of the amount of time each request takes. Measuring these consistently across your applications allows you to scale your monitoring team. It also lays focus on the 80-20 principle. Further drill down analysis can be performed using code instrumentation, tracing, APM tools etc. However these metrics and the dashboards serve as the starting point for monitoring the performance of your applications.
  • 21. RED Methodology : Workload Characterisation : An important aspect is to qualify and quantify the workload your system is experiencing ● Who is causing the load : PID, Client ID, Remote IP Address ● Why is the load being called - Code path, upstream and downstream system response times/errors ● What is the load characteristics - Throughput, IOPS, read heavy/write intensive ● How is the load changing over time ● Is the load a characteristic of the architecture of the system or due to resource constraints The intention of workload characterisation is to eliminate unnecessary work. Remember the fastest query is the one which is never run
  • 22. Queueing Theory Broad field of study which focuses on the nuances of how waiting lines behave. Boils down to answering some basic questions ● How likely is it that something will queue up and wait ? ● How long will the queue be ? ● How long will be the wait ? ● How busy will be the person/system serving the line ? ● How much capacity is needed to meet expected demand
  • 23. Queueing Theory: Introduction Queueing theory is a part of a wider area of mathematics which falls under the probability theory. Queueing systems are non-linear. Things change almost linearly until they reach a tipping point and then it transitions to a much faster or slower behavior. There can be multiple such tipping points.
  • 24. Queueing Systems: Real Life Examples ● Getting a coffee at Starbucks ● At the traffic light ● Computer CPU and disks ● Third in line for take off at the airport ● Load balancer ● Pre-ordering an iPhone ● Connection pooling ● Trying to pick the line at the grocery store In general any system for handling units of work is a queueing system. (Software teams as well)
  • 25. The obvious answer would be “There’s more work to do than the current capacity to handle it” However this is not correct. Queueing happens even when there’s more than enough capacity to handle the work. Why would this be the case ? There are 3 reasons ● Irregular Arrivals - Arrivals don’t happen at regular intervals. They are sometimes far apart and sometimes closer together, so they overlap. ● Irregular Job Sizes - Jobs don’t complete in the same time. Some take much longer and others shorter. ● Waste/Ide Time - Idle time can never be recovered. While the system is idle it is not utilised, however it cannot work at more than 100% capacity. Why does Queueing happen ?
  • 26. There are 3 core concepts when one talks of queueing systems. These are 1. Jobs/Tasks - The unit of work that a system is supposed to perform 2. System/Servers - The things that perform the work 3. Queues - are where the jobs/tasks wait when the server/system is busy and cannot perform work as it arrives. Note Queueing theory applies to stable systems. A stable system is one which is able to keep up with the incoming work, where all jobs eventually get processed and the queue does not grow to infinity. Queueing Theory Fundamentals
  • 27. ● Arrival Rate (ʎ) - How often new jobs arrive at the front of the queue. ● Queue Length (L) - The average number of jobs waiting in the queue. ● Wait Time (W) - How long does the job have to wait in the queue, on average. ● Service Time (S) - How long does it take to service a job after it leaves the queue. ● Service Rate (µ) - The inverse of service time ● Residence Time ( R ) - How long jobs are in the system.(W+S) ● Utilisation (U) - How much time the system is busy performing jobs. ● Concurrency (L) - The numbers of jobs waiting or being processed. ● Number of Servers(M) - The number of servers available to process jobs. Queueing Theory: Important Metrics
  • 28. Queueing Theory: Important Laws Little’s Law : L = ʎR : The concurrency of a system is arrival rate times the residence time. More generally when talking about just the queue, it is stated as L=ʎW. Utilisation Law : U =ʎS or ʎ/µ( utilisation is the arrival rate times the service time). If there are M servers then the more general form is ʎS/M.
  • 29. Kendall's Notation A mechanism to classify different kinds of queueing systems. Designated using a slash delimited notation using numbers and letters which indicate the following 1. How arrival rate behaves. This is the first letter. 2. How service times behave. This is the second letter. 3. The number of servers. This is the third number/letter 4. Special Parameters for cases where a system rejects arrivals beyond capacity, has some prioritisation logic for selecting jobs etc.(Optional in a majority of cases)
  • 30. Kendall's Notation M - Stands for Markovian(Memoryless). Implies jobs arrive randomly and independently, with an arrival rate of ʎ. The arrival events are generated by Poisson’s ratio( 1/ʎ, exponential) G- Gaussian distribution, which is general with no specific distribution assumed. Common queueing systems are M/M/1 M/M/c M/G/1 M/G/c
  • 31. Uses of Queueing Theory Queueing Theory is mostly used to calculate wait times and queue lengths. For an M/M/1 case, the residence time for the job is R=S/(1-U). (A generalised solution is S/(1-U^c)) This is an exponential distribution(hockey stick) which can be validated by running it with different values of S and U. Some indicative values are shown below for an arrival rate of 5 jobs/minute S= 1 min, U = 50% => R = 2min , L = 10(from Little’s Law) S=1 min, U = 75% => R = 4 min, L = 20 S = 1 min U = 90% => R = 10 min , L = 50 As Utilisation increases, the Residence time increases exponentially along with concurrency.
  • 32. Universal Scalability Law: Scalability Wikipedia Definition The capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth In more mathematical terms, formally, we can say that Scalability is an increasing function of load or size.
  • 33. Universal Scalability Law: Linear Scalability There are no real linearly scalable systems. The real test of linear scalability is whether the transactions per second per node remain constant as the number of nodes increases. Example : A system achieves 182k transactions per second for 3 nodes and 449k transactions for 12 nodes. 3 nodes : 60700 tps per node 12 nodes = 37400 tps per node 39% drop in throughput versus linear scalability. If it actually linear scale = 728k tps for 12 nodes
  • 34. Universal Scalability Law: Linear Scalability Real world systems exhibit some loss of efficiency and in fact over a certain number, the system can exhibit retrograde performance.
  • 35. Universal Scalability Law: Efficiency Loss There are two reasons for efficiency loss ● Contention - Contention degrades scalability because parts of the system cannot be parallelized and lead to queuing. ● Crosstalk - Crosstalk introduces a penalty as workers (threads, CPUs, etc) communicate to share and synchronize mutable state
  • 36. Universal Scalability Law: Perfect Linear Scaling Perfect Linear scaling can be represented as T = ʎN where ʎ = coefficient of performance, N = Size (Nodes/CPU’s/threads etc). Hence if N doubles, the throughput of the system should double.
  • 37. Universal Scalability Law: Amdahl’s Law Amdahl’s Law states that the maximum speedup possible is the reciprocal of the non-parallelizable(serializable) portion of the work. X = λN /(1 + σ(N − 1)) σ = coefficient of serialization A system with serialization will asymptotically approach a speedup limit σ = .05
  • 38. Universal Scalability Law: Crosstalk happens between each pair of workers in the system(CPU, threads, nodes). This would lead to having n(n-1) interactions(no. Of nodes in a fully connected graph). Introducing crosstalk in the equation leads to X(N) = λN/(1 + σ(N − 1) + κN(N − 1)) The crosstalk penalty grows fast. Because it’s quadratic eventually it grows faster than the linear speedup of the ideal system we started with, no matter how small κ is. Capacity planning : Nmax = Math.floor((1 − σ)/ κ ) Refer to https://cran.r-project.org/web/packages/usl/ for calculating the values for the coefficients.