The document discusses performance tuning for Grails applications. It covers optimizing for latency, throughput, and quality of operations. Key aspects discussed include Amdahl's law, Little's law, profiling tools, common pitfalls, and recommendations for improving performance like eliminating blocking and focusing on feedback cycles. Specific techniques mentioned include optimizing SQL queries, reducing regular expressions, improving caching, and using thread dumps to diagnose production issues.
2. Agenda
• What is performance and what are we optimising
for?
• How do you do performance tuning and
optimisation?
• common missteps, tips and tricks related to Grails
applications profiling and tuning
3. Performance aspects
• Latency of operations
• Throughput of operations
• Quality of operations - correctness, consistency,
resilience, security, usability, availability ...
4. Why?
• Optimising costs to run your system - operational
efficiency
• Tuning your system to meet it's performance
requirements with optimal cost
• Performance is a feature of your system: keeping
up the quality of the operations under high load
6. Little's law
• What does Little's law tell us when we are using
the thread-per-request model?
MeanNumberInSystem = MeanThroughput * MeanResponseTime
→
MeanThroughput = MeanNumberInSystem / MeanResponseTime
L = λW
7. How?
• Set up your own feedback cycle for tuning your own system
• Measure & profile
• start with the tools you have available. You can add more tools and methods in the next
iteration.
• Think & learn, analyse and plan the next change
• find tools and methods to measure something in the next iteration you want to know
about more
• Implement a single change
• Iterate: do a lot of iterations and optimally change one thing at a time - this will help you to
learn about your system's performance and operational aspects
• Set up a different feedback cycle for production environments. Don't forget that usually
it's irrelevant if the system performs well on your laptop. If you are not involved in
operations, use innovative means to set up a feedback cycle.
8.
9. More
• In concurrent execution: Amdahl's law - you won't be able to speed up a single computation
task if you cannot parallellize it.
• In traditional synchronous Grails code this means, that each request thread shouldn't block
other threads. It doesn't necessarily mean that you have to switch to asynchronous handling
of requests. However that might be helpful for error handling reasons. Asynchronous doesn't
mean fast.
• Find the most limiting bottleneck and eliminate it, one by one
• re-measure after each change because the behaviour of concurrent execution can be
different after a small change in reducing blocking - usually the next problem is not the 2. one
on the list from the previous measurement.
• "Mature optimization" - keep the clarity and consistency of the solution. Don't do things just
"because this is faster". Don't introduce accidental complexity.
• Find out also how your systems gets saturated - the saturation point . How does latency
change when load is added? Can your system survive? What happens when it gets over
loaded?
10. JVM code profiler concepts
• Sampling
• statistical ways to get information about the execution using JVM profiling
interfaces with a given time interval, for example 100 milliseconds.
Statistical methods are used to calculate values based on the samples.
• Unreliable results, but certainly useful in some cases since the
overhead of sampling is minimal compared to instrumentation
• Usually helps to get better understanding of the problem if you learn to
look past the numeric values returned from measurements.
• Instrumentation
• exact measurements of method execution details
11. Load generation tools
• Simple command line tools
• ab - apache bench
• wrk - modern HTTP benchmarking tool
• has lua scripting support for doing things like
checking the reply
• Load testing toolkits
• Support testing use cases and state full flows
12. Common pitfalls in profiling
Groovy and Grails code
• Measuring wall clock time
• Measuring CPU time for certain method
• Instrumentation usually provides false results
because of JIT compilation and other reasons
like spin locks
• lack of proper JVM warmup
• Relying on gut feeling
13. Ground your feet
• Find a way to review production performance graphs regularly,
especially after making changes to the system
• system utilisation over time (CPU load, IO load & wait, Memory
usage), system input workload (requests) over time, etc.
• In the Cloud, use tools like New Relic to get a view in operations
• CloudFoundry based Pivotal Web Services and IBM Bluemix
have New Relic available
• In the development environment, use a profiler and debugger to
get understanding. You can use grails-melody plugin to get insight
on SQL that's executed.
14. Recommendations
• Concentrate on eliminating blocking because of Amdahl's law
• Look for low hanging fruit (next slide) if you are in a rush - it's
worth doing.
• Concentrate on constantly improving the performance tuning
feedback cycles you have in place for development and
production environments.
• Innovate to get iterations going: you don't necessary need
expensive tools or toolkits. Continuous improvement is more
important than fancy tools.
• Take small steps.
15. Environment related
problems
• Improper JVM configuration for Grails apps
• out-of-the-box Tomcat parameters
• a single JVM running with a huge heap on a big
box
• If you have a big powerful box, it's better to
run multiple small JVMs and put a load
balancer in front of them
16. Low hanging fruit
• SQL and database related bottlenecks: learn how to profile SQL queries and tune your database queries and your database
• grails-melody plugin can be used to spot costly SQL queries in development and testing environments. Nothing prevents use
in production however there is a risk that running it in production environment has negative side effects.
• New Relic in CloudFoundry (works for production environments)
• Eliminate stack traces thrown in normal program flow - use profiler or debugger to find if any are called in normal program flow
• High CPU usage: Check regexps that are used a lot (use profiler's CPU time measurement to spot those, search for the code
for candidate regexps). Also check the regexps with different input size. Make sure valid input doesn't trigger "catastrophic
backtracking". Understand what it is. Use a regexp analyser to find out the number of operations a certain input triggers in
handling the input.
• Check concurrency patterns like synchronised usage: using java.util.Hashtable/Properties is blocking
• these block: System.getProperty("some.config.value","some.default"), Boolean.getBoolean("some.feature.flag")
• .each -> for loop
• Cache implementation that serves stale information while entry is being updated (blocks only when there isn't information
available)
• Cache implementation that locks a certain key for updating to prevent cache storms
• should use "static transactional = false" in services that don't need transactions
• Don't call transactional services from GSP taglibs (or GSP views), that might cause a large number of short transactions during
view rendering
17. Tool for getting insight in sudden
production performance problems
• kill -3 <PID>
• Makes a thread dump of all threads and outputs it
to System.out which ends up in catalina.out in
default Tomcat config.
• the java process keeps running and it doesn't get
terminated
18. Java Mission Control
& Flight Recorder
• Oracle JDK 7 and 8 includes Java Mission Control
since 1.7.0_40 .
• JAVA_HOME/bin/jmc executable for launching the
client UI for jmc
• JMC includes Java Flight Recorder which has been
designed to be used in production.
• JFR can record data without the UI and store events
in a circular buffer for investigation of production
problems.
19. JFR isn't free
• JFR is a commercial non-free feature, available
only in Oracle JVMs (originally from JRockit).
• You must buy a license from Oracle for each JVM
using it.
• "... require Oracle Java SE Advanced or Oracle Java SE Suite licenses for the computer running the observed
JVM" , http://www.oracle.com/technetwork/java/javase/documentation/java-se-product-editions-397069.pdf ,
page 5
20. Controlling JFR
• enabling JFR with default continuous "black box"
recording:
• Runtime controlling using jcmd commands
• help for commands with jcmd <pid> help JFR.start
jcmd <pid> help JFR.stop
jcmd <pid> help JFR.dump
jcmd <pid> help JFR.check
export _JAVA_OPTIONS="-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
-XX:FlightRecorderOptions=defaultrecording=true"
21. wrk http load testing tool
sample output
1 Running 10s test @ http://localhost:8080/empty-test-app/empty/index
2 10 threads and 10 connections
3 Thread Stats Avg Stdev Max +/- Stdev
4 Latency 1.46ms 4.24ms 17.41ms 93.28%
5 Req/Sec 2.93k 0.90k 5.11k 85.67%
6 Latency Distribution
7 50% 320.00us
8 75% 352.00us
9 90% 406.00us
10 99% 17.34ms
11 249573 requests in 10.00s, 41.22MB read
12 Socket errors: connect 1, read 0, write 0, timeout 5
13 Requests/sec: 24949.26
14 Transfer/sec: 4.12MB
check latency, the max and it's
distribution
Total throughput
https://github.com/lhotari/grails-perf-testapps/empty-test-app