This document discusses enterprise application performance, including:
- Performance basics like response time, throughput, and availability
- Common metrics like response time, transactions per second, and concurrent users
- Factors that affect performance such as software issues, configuration settings, and hardware resources
- Case studies where the author analyzed memory leaks, optimized services, and addressed an inability to meet non-functional requirements
- Learnings around heap dump analysis, hotspot identification, and database monitoring
4. Performance Basics - Scope
Short response time for a given piece of work
High throughput
Low utilization of computing resources
High availability of application
Implicit Requirements
Efficiency
Scalability
5. Performance Basics - Scope [Contd.]
Importance of ‘Performance’ in the world of software engineering
Response Time
1 second of slower performance on pages could cost Amazon $1.6 billion in sales each year
25% of users will leave a site if a page takes more than 4 seconds to load
Throughput
Facebook serves more than 2 million ‘Like’ buttons per second
Facebook held a clear lead in total page views during March 2011, recording about 85 billion. This
was more than three times as many as number two Google, which had about 25.6 billion
http://performance-testing.org/performance-testing-statistics
9. Performance Basics – Affecting Factors
Recommended JVM Options
Type Option
Entire heap size Specify the same value for -Xms and -Xmx.
New area size
-XX:NewRatio: value of 2 to 4
-XX:NewSize=? –XX:MaxNewSize=?. Also good to
specify NewSize instead of NewRatio.
Perm size
-XX:PermSize=256 m -XX:MaxPermSize=256 m. Specify the value to
an extent not to cause any trouble in the operation because it does
not affect the performance.
GC log
-Xloggc:$CATALINA_BASE/logs/gc.log -XX:+PrintGCDetails -
XX:+PrintGCDateStamps. Leaving a GC log does not particularly affect
the performance of Java applications. You are recommended to leave
a GC log as much as possible.
GC algorithm
-XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -
XX:+UseConcMarkSweepGC-
XX:CMSInitiatingOccupancyFraction=75.This is only a generally
recommendable configuration. Other choices could be better
depending on the characteristics of the application.
Creating a heap dump when an OOM error occurs
-XX:+HeapDumpOnOutOfMemoryError -
XX:HeapDumpPath=$CATALINA_BASE/logs
Actions after an OOM occurs
-XX:OnOutOfMemoryError=$CATALINA_HOME/bin/stop.sh or -
XX:OnOutOfMemoryError=$CATALINA_HOME/bin/restart.sh. After
leaving a heap dump, take a proper operation according to a
management policy.
10. Performance Basics – Affecting Factors [Contd.]
Hardware
Single Machine
Web
server
App
server
App
server
Shared Resources –
• CPU
• Memory
• Network
• Disk
Hardware /
Configuration
OS / Filesystem
Windows
Linux
11. Performance Basics – Affecting Factors [Contd.]
Hardware
DB Server /
Hardware
Oracle Std.
Oracele
Enterprise
CPU
CPU
CPUCPU
CPU CPU
MySQL /
MSSQL
• CPU utilization - 4
• No Partition
• Limited online operations
• Limited indexing
• CPU utilization – No Limit
• Supports Oracle real application clusters
• Adaptive Execution Plan
• Different Locking strategy
• Different Index performance
12. Learnings – I###
Issue
Memory Leak
Approach
Analysis of heap dumps
Replication of memory leak within DEV environment (with various types of load and
duration)
Code review (i.e. Java and Spring configuration) of probable suspect(s)
M@@@@@B@@ beans configured as PROTOTYPE
M@@@@@@B@@ beans instantiated using DefaultAutoProxyCreator with
proxyTargetClass=’true'
13. Learnings – I###[Contd.]
Understanding basics of Heap Dump Analysis
Heap Dump
Shallow Heap – Heap occupied by the object
Retained Heap - Sum of the shallow heap size of all objects that would be removed when this
object is garbage collected
Example
15. Learnings – O## ##
Issue
Optimization of frequently used services (to increase TPS)
Approach
Baselining of Load Testing parameters (Heap settings, no. of concurrent users)
Hotspot identification
Fetching all the columns (including non required columns)
Incorrect ordering of where condition
Outcome
Total Request: 200
Concurrent: 20
Before Optimization After Optimization
Min Response Time (ms) 306 135
Max Response Time (ms) 3692 1195
Avg. Response Time (ms) 1947 487
Transaction Per Second 6.94 15.46
16. Learnings – M#########
Issue
Unable to meet NFR
Approach
Application optimization
Replacing aspect based Audit Logging with normal asynchronous mode
Turning on ‘read-only’ flag to true for retrieval operations
Hotspot identification
Fetching all the columns (including non required columns)
Database connection pool configuration
Outcome
Still unable to meet the NFR :(