How to make sure a website can survive go-live and cope with ever increasing
traffic and amounts of data: knowing what to measure and log, during both
development and production phases; load testing ; identifying bottlenecks;
preventing disasters
3. Synopsys
How to make sure a website can survive go-live and cope with ever increasing
traffic and amounts of data: knowing what to measure and log, during both
development and production phases; load testing; identifying bottlenecks;
preventing disasters
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 3
4. Table of contents
The workshop is comprised of 2 parts:
• Theory
• As you might guess, it’s all about slides
• Can we skip this or do you want it really detailed? Raise hands!
• Practice
• Part 1: load testing
• Part 2: performance logging
Requirements:
• a working eZ Publish 4 installation on Linux (Debian/Ubuntu preferred)
[a VirtualBox image is available if you don’t have this]
• Shell access, root access
• Internet access
• LibreOffice (or any other spreadsheet software)
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 4
6. GOALS
Finishing the workshop early to go bathing does not count ;-)
1. A scalable web site
• It is impossible to go for infinite scalability
• Expected traffic figures should ideally be known beforehand
• If not, a round of load testing before go-live is highly recommended
2. “Fast enough” pages
• Definition of enough hash to be agreed upon: for a webshop it is smaller than for
an institutional site
• Page load times experienced by the user depends on user bandwidth as well as
html/js optimization (but that takes a dedicated workshop of its own)
scaling > faster pages
• Typical developer mistake: test pages on his own laptop (concurrency = 1)
• The fast page becomes extremely slow when concurrency increases
• If traffic never increases, your career as web developer is on a wrong path
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 6
7. What is “scalable” anyway?
Lies, damn lies and statistics (W. Churchill)
• Developers measure traffic in PVS (page views per second/minute/hour/day)
• It is easy to relate to server load
• PV != Hits
• But serving static content should never pose a problem anyway
• Customers measure traffic in concurrent users
• It is a good idea to agree on metrics when defining goals
• Analytics packages generally measure user session length and average page
impressions per session => average page views per second per user
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 7
8. Math!
Back-of-the-napkin type
A few useful formulas:
• Apache MaxClients x max memory for a web page = server memory - OS memory
(assuming you are not running other stuff on the webserver, which you shouldn’t)
• Max PVs = Ap. MaxClients x 60 / ( page generation + delivery time )
• PVs = Max concurrent users x Avg user sess. impressions / Avg user sess. length
• Tips
• User session != webserver process
• Using a reverse proxy is almost always a good idea
• Apache processes never release memory until they are recycled
• Clients with low bandwidth keep a webserver process occupied for a long time; R.P. acts as
“buffer”
• Avoiding server swapping gives better results under peak traffic
• Limiting traffic at the webserver preserves the rest of the server farm from meltdown
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 8
9. Nosce te ipsum
WE NEED TO MEASURE RESOURCE CONSUMPTION
TO FIND AND REMOVE SCALABILITY BOTTLENECKS
• Many resources are involved in serving web pages
• Network
• Server hardware
• OS
• Apache / PHP / APC / eZ Publish
• Database
• Solr / external services / more…
• Scalability is determined by the most scarce resource (bottleneck)
• …which is generally not known beforehand
• Improving response time for a resource which is not loaded can have the
perverse effect of overloading the bottleneck resource and actually decrease
performances!
• eZ Publish does many things “behind the back” of the developer
• developers suck anyway*
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 9
10. The art of measurement, I
What to measure
The more you measure, the slower the system (Heisemberg principle)
The more you measure, the harder it is to grasp the overall system state
For eZ Publish applications, start with:
• Ram, CPU, IO (disk), DB requests
• Can be measured either globally on the server (BLACK BOX) or “per page” (WHITE BOX)
• “per page” numbers will usually not vary between environments
• Time taken to generate web pages
• will vary depending on many factors (dev != prod)
• Split between the time eZ does “computation” and access to external resources
• Other?
• Number of active user sessions
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 10
11. Information overload
Can you spot the problem?
Oracle AWR report
(this is the
«summary», the report
actually goes on for 10
pages)
Up to eZ 4.6 eZ 4.7 and later
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 11
12. The art of measurement, II
When to measure
1. During development - to avoid nasty surprises when it’s too late
• measured data should be easily understandable by developer
• it should in fact always be straight in his face
• it should be easy to drill down on specific problems
• all the way down to profiling every php function call
2. Before go-live - to validate production HW and architecture
• Never assume that production hw will magically solve all problems
• Sysadmins are morons anyway*
• This is the good time for some load-testing
3. In real-life usage - for post-mortem analysis, troubleshooting and more
• A small percentage of users could be getting slow pages without overall stats being
impacted
• Things always change over time
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 12
13. The art of measurement, III
How to measure
• Black Box: measure load of the (web)server
• CLI tools: vmstat, free, iostat, top, ps, atop, dstat, etc…
• PHP: APC control panel
• Apache: mod_status
• Mysql: mtop, innotop, percona toolkit, mysqli_get_client_stats, monyog, mysql ent.
monitor
• Monitoring systems: munin, cacti, zenoss, etc…
(nb: availability monitoring != performance monitoring)
• Need to correlate data with web traffic
• Need to execute load testing to simulate real-life traffic
• White Box: measure load per page
• eZ debug output is good - but it is too detailed
• It does slow down the site a bit
• Reporting needed to compare evolution over time: have to log data somewhere
• Enter ezperformancelogger (but also ezsnpd, ezmunin, etc…)
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 13
14. The art of measurement, IV
Tips
• Always check error logs if there is something unexpected in measured data
• Testing should be done using a realistic data set (eg. 10.000 users, not 10)
• The clock of all servers should be in sync to allow correlation (no, really)
• Measuring VMs: time is generally a lie
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 14
16. Load testing strategies I
1. Baseline test
a) Test downloading a small static file, a big static file
b) Test executing the most simple php page which executes a db query
c) Increment concurrent users until you get no increase in hits / second
• Useful to uncover configuration errors in network / db / AMP stack
• This is the “idealistic” goal for your dynamic pages
2. Bruteforce test
a) Hit the homepage N times in a row
b) Increment concurrent users until you get no increase in hits / second
• Can be run on other most-visited pages on the site as well
• Make sure you’re not testing 404 pages (or redirects)
• Reset user sessions between runs if they are auto-generated
• Keep open vmstat and iostat while test runs to quickly identify bottleneck
• Does not really correlate to concurrent users
• Results difficult to communicate to customer
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 16
17. Load testing strategies II
3. Full site navigation test
a) Enable logging of interesting data
b) Use wget or httrack to navigate the whole site
c) Get log files data into a spreadhseet
• Useful to uncover pages with bad resource usage
• Can be run with both cold and warm caches to gauge cache efficiency
4. Scenario testing
• Need support from end user to determine most likely/useful scenarios
• Takes time to configure in load-testing tool
• Do not believe tools that promise to automagically generate a scenario by “sniffing”
browser sessions: manual intervention will be needed
• Always validate first each single response before running the whole test
• The one test which is closer to real life…
• …but also one which is easy to manipulate (many knobs to tweak)
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 17
18. Load testing tips
• Do not use the webserver to execute the client (to avoid impact on cpu)
• Do not measure routers, firewalls or network card performance either (by testing from
remote network), unless what you want is real-world measures
• Always write down complete hw and sw specs – some of it will have changed next
time you want to run the test for comparison (a good idea: zip and save complete
apache and php config files, write down command line used for client in the report)
• Automate tasks to avoid human error / getting bored
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 18
19. Load testing tools
(the ones I know about)
• Apache bench
• Good: always available
• Bad: not very flexible; limited support for advanced http features
• Siege
• Good: better than Apache Bench; some support for scenario testing
• Bad: not available by default in many linux distros (or windows)
• Jmeter
• Good: allows complex scenario testing; can run tests from a farm of machines
• Bad: has a learning curve; needs Java
• Httperf, web polygraph, …
• Web-based tools
• Good: can test from many locations across the world; easy to use
• Bad: usually do not offer too much flexibility; $$$
• Roll-your-own (php) script
• Good: flexible; can be used on servers where you can not install any other software
• Bad: can not compare with other measures; guarantee of correctness
• Excellent: it is there to use! https://github.com/gggeek/ezab
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 19
20. Load testing tools: ezab
https://github.com/gggeek/ezab
Comes with two scripts:
• ezab.php replacement for ApacheBench when it is not available
• abrunner.php runs ab many times in a rows and produces reports
Useful for strategies I and II
Example: testing the VirtualBox VM from the host OS – baseline data
c
favicon.ico phpinfo.php
500 1000 2000 200
450 900 1800 180
400 800 1600 160
Requests per second Requests per second
350 700 1400 140
Time per request ms (mean) Time per request ms (mean)
300 600 1200 120
Time per request (90%) Time per request (90%)
250 500 1000 100
ms
rps
ms
rps
Time per request (min) Time per request (min)
200 400 800 80
Time per request (max) Time per request (max)
150 300 600 60
Time per request (median) Time per request (median)
100 200 400 40
50 100 200 20
0 0 0 0
1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128
concurrent clients concurrent clients
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 20
21. Load testing tools: ezab
This is an exercice you are expected to carry out
Testing performance of the eZ Publish 4 installation (homepage) on VirtualBox
1. Install abrunner
wget https://raw.github.com/gggeek/ezab/master/abrunner.php
2. Execute: test homepage of the installed site
php abrunner.php -s ezpublish4.ezsc -u / -c "1 2 4 8 16" -a
3. Import into LibreOffice the resulting csv file: test_logs/_.csv
4. Create a graph by selecting the first 7 columns
• Chart Type: line
• Data range: «1° column as label»
5. Icing on the cake: set a separate Y axis for the number of requests/second
6. Stop VM, add cpus, reboot and repeat steps 1-5, compare graphs
• Use the -l option to ezab to get different file names for reports
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 21
22. Load testing: eZ Publish homepage
6000 10
See anything strange?
9
NB: this VM has 4 VCPUs 5000
8
7 Requests per second
4000
6 Time per request ms (mean)
3000 5 Time per request (90%)
Time per request (min)
4
Time per request (max)
2000
3 Time per request (median)
2
1000
1
0 0
1 2 4 8 16
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 22
23. Load testing: eZ Publish homepage
6000 10
See anything strange?
9
NB: this VM has 4 VCPUs 5000
8
7 Requests per second
4000
6 Time per request ms (mean)
3000 5 Time per request (90%)
• Performance is waay too little: 4
Time per request (min)
Time per request (max)
2000
6 page views per second 3 Time per request (median)
2
• RPS does not increase going 1000
1
from 1 to 16 concurrent clients 0 0
1 2 4 8 16
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 23
24. Load testing: eZ Publish homepage
350 200
See anything strange?
180
NB: this VM has 4 VCPUs 300
160
250 140 Requests per second
120 Time per request ms (mean)
200
100 Time per request (90%)
• Performance is waay too little: 150
80
Time per request (min)
Time per request (max)
6 page views per second 100 60 Time per request (median)
40
• RPS does not increase going 50
20
from 1 to 16 concurrent clients 0 0
1 2 4 8 16
• Xdebug is ON, APC is OFF!
sudo apt-get install php-apc
sudo mv /etc/php5/apache2/conf.d/xdebug.ini /etc/php5/apache2/conf.d/xdbg.ini.bak
sudo service apache2 restart
• Test again: we get 150 rps at concurrency 4 and 8 :-)
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 24
25. Load testing: eZ Publish homepage
Q: Is the site CPU or memory bound?
The answer is:
Cpu bound
At 16 concurrent clients,
cpu usage is near 100%
RAM is more than enough
And RPS decrease
No swap Cpu idle time
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 25
26. Load testing tools: HTTRACK
• Website copier
• GUI app on windows, web-based (or command-line) on linux
• Used for further exercices later on
• Alternatives: wget -R
• Install and launch:
sudo apt-get install webhttrack
sudo /usr/lib/httrack/htsserver /usr/share/httrack/
• Connect to http://192.168.56.101:8080/
Tips
• Make sure the server can send requests to itself: add to /etc/hosts the
ezpublish4.ezsc hostname
• If a robots.txt file is present, it will be respected by default
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 26
28. eZ Performance Logger
http://projects.ez.no/ezperformancelogger
• Allows the developer to define a set of variables (KPI) to be measured
• Each variable is measured for every page view (rest / ajax pages as well)
• Measured KPIs can be sent to multiple logging systems
• Csv formatted log probably the easiest to later parse
• Apache’s own access log probably the best suited to avoid any performance hit
• Supports logging directly to Google Analytics or Piwik via rewriting of html pages
• Common KPIs are available (eg. db queries, db time), custom ones can be added
• Integrates with Munin to visualize the measured data
• Throws in full integration with XHProf profiler as bonus
• According to facebook “good enough” to keep enabled in production
• Does NOT come with a nice GUI of its own
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 28
29. Installation
Requirements
• eZ Publish 4.x
• Apache webserver recommended
• Optional: Xhprof
• Optional: a Google Analytics account or Piwik
• Optional: Munin
The extension comes preinstalled in the Virtual Machine for the Workshop
To install by hand, follow the standard procedure – no need to touch the database
For advanced users: in classes/tracers, alternative connectors are provided for mysqli
database and ezdfs cluster, which allow to measure performance data even in
production environments (where ezdebug is turned off).
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 29
30. First steps
1. Unzip and activate extension
2. Set logging format to csv-formatted file:
create file settings/override/ezperformancelogger.ini.append.php
[GeneralSettings]
LogMethods[]
LogMethods[]=csv
[csvSettings]
FileName=var/log/ezperflog.csv
3. Set performance indicators to be logged, eg: memory used, execution time
[GeneralSettings]
TrackVariables[]
TrackVariables[]=mem_usage
TrackVariables[]=execution_time
4. Test that it is working:
1. Browse to the homepage
2. Check for presence of var/log/ezperflog.csv
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 30
31. Measuring database queries per page
The eZPublish database connector measures all the commands sent to the database:
• Number of queries executed
• Time taken
This is only done when the debug output is enabled.
eZPerformanceLogger allows to log any existing «timing point»
1. Enable debug output (in settings/override/site.ini.append.php)
[DebugSettings]
DebugOutput=enabled
2. Add number of queries and time taken to the performance indicators logged
TrackVariables[]=accumulators/mysqli_query/count
TrackVariables[]=accumulators/mysqli_query
3. Rotate the csv file, since it will now have a different number of columns
php extension/ezperformancelogger/bin/php/rotateperflogs.php
4. Reload homepage, check log file
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 31
32. Making sure all requests are measured
By default ajax calls and requests which end up in a redirect are not logged.
How to fix:
• Edit index.php, on line 198 add
eZExecution::addCleanupHandler( array( 'eZPerfLogger', 'cleanup' ) );
• Browse content in the Admin interface (which uses ezjscore)
• Look for calls to ezjscore/call in var/log/ezperflog.csv
• Other frontend controllers have to be patched as well
• index_ajax.php (removed in recent versions)
• index_treemenu.php, index_treemenu_tags.php, index_soap.php
• index_cluster.php currently not supported
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 32
33. Visualization of data: spreadsheets
Q: is the site database-bound ?
1. Use httrack to navigate the whole site (cache warmup)
Tip: exclude from files to be downloaded all images, css, js, m4v
2. Rotate log file
php extension/ezperformancelogger/bin/php/rotateperflogs.php
3. Use httrack to navigate the whole site again
4. Rotate log file
5. Import log file into LibreOffice
6. Graph db queries per page, db time per page as % of page time
Q: how effective is the view cache?
i. Disable the view cache
ii. Clear all caches
iii. Execute steps 1 to 6 again
iv. Compare the number of queries per page
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 33
35. Visualization of data: Munin I
http://www.munin-monitoring.org/
• Munin is an open source monitoring tool
• It generates daily and weekly graphs for collected data
• It collects a lot of data from the operating system
• It comes with a wide set of plugins for existing software, such as Apache
and MySql
• Creating plugins for new software is relatively easy
• Agent-based architecture: a munin “master” server can collect and display
data from multiple “node” servers
• For our scenario, the webserver acts as both master and node
• Master: runs a cronjob that generates reports by querying nodes and stores
them in /var/cache/munin; the reports are made available via Apache
• Slave: runs a daemon: munin-node, listening on port 4949
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 35
37. Visualization of data: Munin II
Integration of eZ Performance Logger
• The default interval for collecting data is 5 minutes (it should be more flexible in
version 2)
• All ezperformancelogger KPIs can be shown in a Munin graph
• By default, the “per page” value of the PKI is shown
• In every graph, the Average, Maximum and Minimum value are shown
• Via eZ Publish settings, appearance of those graphs can be tuned
• Note: the timestamp of the last time the munin plugin has collected data from
ezperformancelogger for any specific KPI is stored in var/<vardir>/log
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 37
38. Integrating with Munin I
1. make sure you have a valid munin-node installation on your webserver
Connect to http://192.168.56.101/munin
If you get an access denied method, edit /etc/apache2/conf.d/munin
<Directory /var/cache/munin/www>
Allow from all
2. Symlink the file bin/scripts/ezmuninperflogger_ into /usr/share/munin/plugins/
and make it executable
cd extension/ezperformancelogger/bin/scripts
chmod 755 ezmuninperflogger_
sudo ln –s
/var/www/ezpublish4/extension/ezperformancelogger/bin/scripts/ezmuninperflogger_
/usr/share/munin/plugins
Fix an error in the script ezmuninperflogger_: on line 1 put
#!/bin/bash
instead of
#!/bin/sh
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 38
39. Integrating with Munin II
3. Create a configuration file for the munin plugin:
sudo vi /etc/munin/plugin-conf.d/ezmuninperflogger
[ezmuninperflogger_*]
env.php /usr/bin/php
env.ezpublishroot /var/www/ezpublish4
4. Restart the munin node service
sudo service munin-node restart
5. check if the configuration works: sudo munin-node-configure --suggest
If it does, you should see in the output a line similar to:
ezmuninperflogger_ | no | yes (+execution_time +mem_usage)
the "yes" in the 2nd column is important. Between parenthesis you get the list of
variables which can be graphed
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 39
40. Integrating with Munin III
6. activate the plugin:
sudo munin-node-configure --suggest –shell
You should get 3 lines with "ln -s ..." commands. Execute them (nb: as root)
7. test that it works: run: sudo munin-run ezmuninperflogger_<$varname>
8. restart munin-node again: sudo service munin-node restart
9. navigate the site, wait 5 minutes, connect to Munin again.
Troubleshooting tip: munin logs are available in /var/log/munin
10. Integrate Munin in the eZ administration interface: edit ezperformancelogger.ini
[MuninSettings]
MuninURL=http://192.168.56.101/munin/
11. Optionally, you can customize how the variables recorded will show up in Munin
graphs by editing more ini settings in section [MuninSettings]
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 40
42. Drilling down on hot code paths
XHProf
• http://pecl.php.net/package/xhprof
• Profiler from Facebook
• Designed to be fast enough to be used in production (at least faster than Xdebug ;-)
• Comes with its own web-based GUI
• Installation
sudo apt-get install graphviz
sudo pecl config-set preferred_state beta
sudo pecl install xhprof
sudo vi /etc/php5/apache2/conf.d/xhprof.ini => add extension=xhprof.so
sudo service apache2 restart
• While at it, disable apc (???)
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 42
43. Activating XHprof
• edit your config.php file and add the following lines at the top:
(if you miss the config.php file, copy config.php-RECOMMENDED into config.php)
include( 'extension/ezperformancelogger/classes/ezxhproflogger.php' );
eZXHProfLogger::start();
• Log in to admin interface, go to Setup tab, bottom-left menu item: XHProf Profiling
• You can see the data recorded for the pages you have just browsed to
• Click on the name of a run to get profiling information in all its gory detail
Tips
• To avoid logging profiling of all pages, you can start it in any place in the code
• Links to profiling runs will be displayed in the debug output as well...
• ...but enabling debug output does have an impact on profiling
• A cronjob is available to periodically remove old profiling data
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 43
47. Questions?
Thanks for participating!
These slides
https://dl.dropbox.com/u/520168/eZ%20Performance%20Measurement.pdf
Source code, command snippets
https://gist.github.com/gggeek (look for gists numbered 1 to 9)
About me
Consultant for eZ Systems since 2007
gg@ez.no
@gggeek
http://share.ez.no/blogs/gaetano-giunta
http://projects.ez.no/users/community/gaetano_giunta
With helpful support from Yannick Modah Gouez! ( ymg@ez.no )
* = I hope you were not offended by jokes about developers and sysadmins.
I consider myself a devop: someone incarnating the worst aspects of both ;-)
PRESENTER: GAETANO GIUNTA 6/9/2012 SLIDE 47