Web performance is good, understanding performance is better.
What you need to understand in order to be able to have IT systems that perform well at a reasonable cost.
1. Performance is good,
Understanding performance is better
Peter HJ van Eijk
Chairman NLCMG
A non-profit community of professionals
Feb 11, 2012
2. CMG 101
Computer Cloud Measurement Group
Understand:
• Definitions of availability and response time
• Psychological and business effect of delay/response time. User
interfaces, cost of downtime
• Transactions, and their structure.
• Waterfall diagrams for transactions and web page downloads
• Performance measures (seconds, bytes, bits per seconds, IOPS, etc).
• Reporting measures / metrics.
• Visualization of quantitative data, how to
• Resources (CPU, memory, disk, network, software)
• Elementary queuing theory
• Phases in development and how to incorporate performance and capacity
(analysis, design, etc.), performance engineering
• Typical free and commercial tools, or at least their functionality
– monitoring, reporting, alerting, analysis, modelling
3. Availability and Response Time
• Availability: Ability of a
Configuration Item or IT
Service to perform its
agreed Function when
required. *…+ Availability is
usually calculated as a
percentage.
• Response Time: A
measure of the time taken
to complete an Operation
or Transaction
8. Transactions and their structure
waterfall diagrams
A single user level transaction decomposes into
multiple transactions on components
Client Server
Yslow detail
Query
Netwerk latency
Ack
Server
turnaround
time
Reply
Ack
10. How to diagnose a problem,
where to look? Resource = capacity
(Test) client
WAN Link
Users
Router Switch
(CPE)
Firewall, Proxy Application
LAN switches
End to end Load Balancer
HTTP front end Server Network
MySQL DB
NAS
Network lines
SAN
Example breakdowns
11. Resource contribution to response time,
modeling different resource allocations
Modelling different network bandwidth’s effect on response time Excessive client/server
chatter leads to a user
64K
interaction time of more
256K
than 7 minutes!
ICTRO 2Mb
Op basis van 50 mSec
GBO roundtrip op het WAN
0 100 200 300 400 500 How much faster will
this be with?
Server tijd (sec) Client tijd (sec)
•Very fast network/
Netwerk tijd delay (sec) Netwerk tijd bandbreedte (sec)
•Very fast client /
Na het uitvragen van de medewerkersnummers (er zijn 373 Janssen’s), worden dienstverbanddetails
per stuk uitgevraagd (in totaal 612). Dit leidt op het GBO LAN tot 30 sec doorlooptijd (gemeten).
•Very fast server
12. Queuing theory
Response depends on capacity At higher
loads, congestion can set
in
Actual throughput
12
10
Delay factor
8 Perfect
6
Sweet spot
4
Congestion
2
0
10% 20% 30% 40% 50% 60% 70% 80% 90%
Sweet spot
Utilisation Traffic load
13. So what was the bottleneck?
• KNMI: static page served from database
1000/sec
• Ministry: very chatty client/server interaction
• DNB: JSP application server serves static
content
• Anne Frank: many, large digital assets, no use
of CDN
• Hospital information system: client (front-end)
code
15. Typical free and commercial tools
and their functionality
Functionality Example tools
• Monitoring • Nagios
• Reporting • Cacti
• Alerting • WatchMouse
• Analysis • PDQ
• Modelling • R
• Etc … • Yslow
• …
16. CMG 101
• We want to develop a ‘standard’ body of
knowledge
– To educate our people
– Speak more of the same language
– Enable tool vendors to more easily express their
offerings
• Note: defining what is in the course is not the
same as developing a course
17. Call for Action
• Want to know more?
• Want to collaborate, contribute?
• Want to get a course?
• Want to sponsor?
• Talk to me
Peter HJ van Eijk
@petersgriddle
inbox@peterhjvaneijk.nl
+31 2268 4939
www.nlcmg.nl NLCMG is a chapter of CMG.org
18. Some of my performance projects
• KNMI (Weather service): website meltdown after
weather emergency (“weeralarm”)
• DNB (Dutch Banks Authority): website meltdown
during 2008 financial crisis
• Unnamed Ministry: information system with
multi-minute response times
• Crisis.nl: ….
• Anne Frank website: … anticipated surge after
major redesign
• Hospital information system: storage sizing