Jee performance tuning existing applications

JEE Performance Tuning : Existing Applications
Part 1

© 2011 Fiserv, Inc. or its affiliates.

Agenda

• Performance Tuning Roadmap
• Understand Performance Objectives
• Performance Profiling Tools
• Performance Profiling Process
• Measuring Performance Metrics
• JVM Tuning
• Understanding Typical JEE Layered Architectures
• Tuning other JEE Layers (JMS, JDBC, Transaction Management)
• OS Tuning
• App Server Tuning
• Best Practices
• Performance Checklist

2 © 2011 Fiserv, Inc. or its affiliates.

Performance Tuning Roadmap
The following sections provide a step wise tuning roadmap. Its also discusses
various performance tuning activities like Performance Planning, Profiling,
Measurement, Tuning etc


Performance Tuning Roadmap

• Understand Performance Objectives - Determine environmental constraints, volume of
request, amount of data, configuration of hardware and software etc
• Performance Planning -Specify performance targets and benchmarks
• Performance Profiling - an activity of collecting performance data from a running application
that may be intrusive on application performance responsiveness or throughput
• Measure Your Performance Metrics - After you have determined your performance criteria,
take measurements of the metrics you will use to quantify your performance objectives. E.g.
Monitor Disk and CPU Utilization or Monitor Data Transfers Across the Network.
• Performance Analysis – Analyzing existing Profiling results
• Locate Bottlenecks in Your System
• Performance Tuning - an activity of changing tunable, source code, or configuration
attribute(s) for the purposes of improving application responsiveness or throughput
• Achieve Performance Objectives


Understand Your Performance Objectives
To determine your performance objectives, you need to understand the application deployed
and the environmental constraints placed on the system. Gather information about the levels
of activity that components of the application are expected to meet, such as:
• The anticipated number of users.
• The number and size of requests.
• The amount of data and its consistency.
• Determining your target CPU utilization.
• The configuration of hardware and software such as CPU type, disk size vs. disk speed,
sufficient memory
• The ability to interoperate between domains, use legacy systems, support legacy data.
• Development, implementation, and maintenance costs.

Footnote (Arial 8pt)


Performance Planning

• Specify performance targets and benchmarks, including scaling requirements. Include all user
types, such as information-gathering requests and transaction clients, in your benchmarks.
Performance requirements should include the required response times for end users, the
perceived steady state and peak user loads, the average and peak amount of data transferred
per request, and the expected growth in user load over the first or next 12 months.
• Create a testing environment that mirrors the expected real-world environment as closely as
possible. The only reliable way to determine a system's scalability is to perform load tests in
which the volume and characteristics of the anticipated traffic are simulated as realistically as
possible.
• Load-test the system, find bottlenecks, and eliminate them


Performance Profiling

• Performance profiling is an activity of collecting performance data from an operating or
running application that may be intrusive on application performance responsiveness or
throughput
Types of Profilers:
1. Method profiler
• Collects information about method execution times
• Look for: internal/external method times, frequently
• called methods/classes etc.
2. Memory profiler
• Collects information about object creation and/or
• garbage collection
3. Thread profiler
• Looks for thread conflict situations


Profiling Tools

• YourKit Java Profiler
• JProfiler
• HP JMeter
• VisualVM (Comes with JDK)
• JProbe
• NetBeans Profiler (free supported by Oracle)
• http://netbeans.org/kb/docs/java/profiler-intro.html
• JConsole (Java Monitoring and Management Console)
• Eclipse Memory Analyzer (MAT)
• Eclipse TPTP
• Jamon (http://jamonapi.sourceforge.net/)
• GCViewer(http://www.tagtraum.com/gcviewer.html)


Measure Your Performance Metrics
• If the aim of the work is to improve the performance in a well-defined benchmark, then the
measurement can be well defined. In real life, it may be necessary to design a controlled set of
experiments to measure performance on the system being worked on.
• You measure the performance, analyze the results, and make changes. This should be iterative
process.

Applying Change

Work Force
Performance
Enterprise Measurement
Application

Fluctuations


Performance Metrics - Suggested Measurements
• JVM heap size
• Total response time
• Total server-side service time
• DB Layer time
• Transaction boundaries
• Cache sizes
• CPU utilization
• Stack traces
• GC pauses
• Network bandwidth


Heap & CPU Uses Report (Using VisualVM)


Method level View(Using VisualVM)


Performance Optimization Process


Performance Tuning in Depth
The following sections talks about various Performance Tuning at different
layers like OS tuning, JVM tuning, Integration Layer tuning, Database tuning etc


Understanding Typical JEE Layered Architectures


Performance Tuning Process

Enterprise performance problems come from below main areas:
• network
• databases,
• web servers
• application servers performance parameters
• JVM
• Application Architecture/Coding/Design
Each area typically causes about a quarter of the performance problems.


Application Classifications

• SOA Applications
• RIA Web Based Applications
• Client Server Application
• Standalone applications
• Cloud based applications


OS Performance Monitoring
• Memory Utilization
Attributes of a system’s memory such as paging or swapping activity, locking, and voluntary
and involuntary context switching should be monitored, .
top command
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2120 testuser 20 0 4373m 15m 7152 S 0 0.2 0:00.10 java

• CPU Utilization
For an application to reach its highest performance or scalability it needs to not only take full
advantage of the CPU cycles available to it but also to utilize them in a manner that is not
wasteful. To identify how an application is utilizing CPU cycles, you monitor CPU utilization at
the operating system level. CPU utilization on most operating systems is reported in both
user CPU utilization and kernel or system (sys) CPU utilization
top command
top - 16:15:45 up 21 days, 2:27, 3 users, load average: 17.94, 12.30, 5.52 Tasks: 150 total, 26 running, 124 sleeping, 0 stopped, 0 zombie Cpu(s): 87.3%
us, 1.2% sy, 0.0% ni, 27.6% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 4039848k total, 3999776k used, 40072k free, 92824k buffers Swap: 2097144k total,
224k used, 2096920k free, 1131652k cached


OS Performance Monitoring Contd
• Disk I/O Utilization
If an application performs disk operations, disk I/O should be monitored for possible
performance issues. Disk I/O utilization is the most useful monitoring statistic for
understanding application disk usage since it is a measure of active disk I/O time.
iostat -xc

• Network Utilization
Distributed Java applications may find performance and scalability limited to either network
bandwidth or network I/O performance.
netstat -i


JVM Tuning


Work Flow of Tuning JVM


32 bit versus 64 bit Java Runtimes
The amount of memory available to the Java Heap and Native Heap for a Java process is limited
by the Operating System and hardware. A 32 bit Java process has a 4 GB process address
space available shared by the Java Heap, Native Heap and the Operating System.

If You need more heap say 8 GB, Then 32 bit JVM would be unfit for this requirement.64 bit
processes do not have this limit and the addressability is in terabytes.64 bit Java allows
massive Java heaps (benchmark released with heaps upto 200 GB)

However the ability to use more memory is not “free”. 64 bit applications also require more
memory as java Object references and internal pointers are larger. The same Java
application running on a 64 bit Java Runtime may have 70% more footprint as compared to
running on a 32 bit Runtime. 64 bit applications also perform slower as more data is
manipulated and cache performance is reduced. (As data is larger, processor cache is less
effective). 64 bit applications can be upto 20% slower.
64 bit JVM is only recommended if a Java heap much greater than 2 GB is required or application
uses computationally intensive algorithms for statistics, encryption etc for high precision
support.
32 bit versus 64 bit Runtimes bring another interesting consideration: Scaling. When considering
application scaling there are two choices : Monolithic scaling with small number of 64 bit JVMs
(scaling up) or Horizontal scaling with many clustered 32 bit JVMs (scaling out).


Recommended GC Monitoring Options

GC Command Line Option Most Applicable

-XX:+PrintGCTimeStamps -XX:+PrintGCDetails - Minimal set of command line options to enable
Xloggc:<filename> for all applications.

-XX:PrintGCDateStamps Use when wanting to see a calendar date and
time of day rather than a time stamp indicating
the number of seconds since the JVM was
launched. Requires Java 6 Update 4 or later.

-XX:+PrintGCApplicationStoppedTime - Useful when tuning an application for low
XX:+PrintGCApplicationConcurrentTime - response time/latency to help distinguish
XX:+PrintSafepointStatistics between pause events arising from VM
safepoint operations and other sources.


Tuning Garbage Collection with Sun JDK
The following example JVM settings are recommended for most engine tier servers:
• -server -Xmx1024m -XX:MaxPermSize=128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB -
XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -
XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=256 -
XX:CMSInitiatingOccupancyFraction=60 -XX:+DisableExplicitGC

The above options have the following effect:
• -XX:+UseTLAB—Uses thread-local object allocation blocks. This improves concurrency by
reducing contention on the shared heap lock.
• -XX:+UseParNewGC—Uses a parallel version of the young generation copying collector
alongside the concurrent mark-and-sweep collector. This minimizes pauses by using all
available CPUs in parallel. The collector is compatible with both the default collector and the
Concurrent Mark and Sweep (CMS) collector.
• -Xms, -Xmx—Places boundaries on the heap size to increase the predictability of garbage
collection. The heap size is limited in replica servers so that even Full GCs do not trigger SIP
retransmissions. -Xms sets the starting size to prevent pauses caused by heap expansion.
• -XX:MaxTenuringThreshold=0—Makes the full NewSize available to every NewGC cycle, and
reduces the pause time by not evaluating tenured objects. Technically, this setting promotes all
live objects to the older generation, rather than copying them.


Tuning Garbage Collection with Sun JDK CONTD
• -XX:SurvivorRatio=128—Specifies a high survivor ratio, which goes along with the zero
tenuring threshold to ensure that little space is reserved for absent survivors.
• -XX:+UseConcMarkSweepGC or -J-XX:+UseParNewGC - try these switches if you are having
problems with intrusive garbage collection pauses. This switch causes the JVM to use different
algorithms for major garbage collection events (also for minor collections, if run on a
multiprocessor workstation), ones which do not "stop the world" for the entire garbage
collection process. You should also add the line -J-XX:+CMSClassUnloadingEnabled and -J-
XX:+CMSPermGenSweepingEnabled to your netbeans.conf file so that class unloading is
enabled (it isn't by default when using this collector).
• XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -
XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 The concurrent
collector can be used in a mode in which the concurrent phases are done incrementally. Recall
that during a concurrent phase the garbage collector thread is using one or more processors.
The incremental mode is meant to lessen the impact of long concurrent phases by periodically
stopping the concurrent phase to yield back the processor to the application.


Tuning Other Layers & Best Practices


Transaction Management Strategies

Transaction Isolation Level
Transaction isolation levels are specified to maintain data integrity during concurrent transactions. Usually
databases allow for the following transaction isolation level, listed here in decreasing order of performance:
• READ_UNCOMMITED allows a transaction to read data that can be changed or removed before the end of
another transaction that is writing the data. This offers the best performance since it does not require any
serialization, but may lead to dirty and ghost reads.
• READ_COMMITED requires that only committed data is read.
• REPEATABLE_READ requires that within a transaction, multiple reads of the same entity return the data in
the same state. This can be achieved through the use of pessimistic or optimistic locking. In pessimistic
locking the corresponding database row is locked, blocking other transactions from accessing the row until
the transaction completes. In optimistic locking, no lock is obtained on the entity, but data integrity is
maintained through other means, such as version numbers. Stale data is detected if the version number in
the database is greater than the version number in memory indicating that the entity’s state was changed
by another transaction; at this point the application can roll back its transaction, refresh the state of the
entity from the database, and retry the transaction. Due to the cost associated with transaction rollbacks,
optimistic locking may not be the best option for highly concurrent applications where the data is modified
frequently. An optimistic locking approach may provide better performance for applications where the
data is seldom modified
• SERIALIZABLE requires that all transactions in a system occur in isolation, as if executed serially. In the
pessimistic approach this may require locking a range in a table or a table lock. This offers slowest
performance.


Distributed Transaction Tuning
XA Two Phase Commit Protocol
• A common use of JMS is to consume messages from a queue or topic, process them using a
database or EJB, then acknowledge / commit the message.
• If you are using more than one resource; e.g. reading a JMS message and writing to a database,
you really should use XA - its purpose is to provide atomic transactions for multiple
transactional resources. For example there is a small window from when you complete
updating the database and your changes are committed up to the point at which you
commit/acknowledge the message; if there is a network/hardware/process failure inside that
window, the message will be redelivered and you may end up processing duplicates.
• As the XA protocol requires multiple syncs to disk to ensure it can always recover properly
under every possible failure scenario. This adds significant cost (in terms of latency,
performance, resources and complexity).
XA protocol needs standard XA complaint Data sources & Adaptors

Weblogic : Use Logging Last Resource Optimization
When using transactional database applications, consider using the JDBC data source Logging Last
Resource (LLR) transaction policy instead of XA. The LLR optimization can significantly improve
transaction performance by safely eliminating some of the 2PC XA overhead for database
processing, especially for two-phase commit database insert, update, and delete operations.


Tuning JDBC & Database


Tuning JDBC DAO Layer

Tuning Options
• Connection Pool Sizing and Testing.
• Caching Statements.
• Connection Pool Request Timeouts.
• Recovering Leaked Connection.
• PinnedToThread.


Connection Pool Sizing and Testing
• Sizing
• Initial capacity and Maximum capacity.
• Shrink Frequency.
• Testing
• Test Frequency.
• Test Reserved/ Released Connections
• Maximum Connections Made Unavailable
• Test Table Name


Caching SQL Statements.
• Reuses Callable and Prepared Statements in Cache.
• Reduces CPU usage at Database side and
• Improve performance.
• Statement Cache Size
• Configured per connection pool.
• It cache size for each connection in pool.


Recovering Leaked Connection & Timeout
• Leaked Connection
• Forcibly reclaims unused connection.
• Inactive Connection Timeout.
• Connection Request Timeout.
• Connection Reserve Timeout.
• Maximum number of request that can wait for connection.
• PinnedToThread
• Pins Connection to ExecuteThread
• Connection.close() doesn’t return connection to pool.
• Configure initial capacity = maximum capacity.
• In most cases, maximum number of connection used does not exceed number of execute
threads.
• Configure connection refreshing, if database calls fails because of stale connections.
• Try to avoid PinnedToThread if database resource is limited.


JMS Tuning
• You should get rid of the properties which you don't need and inflate the message. For
example, use the setDisableMessageID method on the MessageProducer class to disable
message ids if you don't need them. This decreases the size of the message and also avoids the
overhead of creating a unique ID.
• Also invoking setDisableMessageTimeStamp method on the MessageProducer class disables
message timestamps and contributes to making the message smaller.
• you should use setTimeToLive, which controls the amount of time (in milliseconds) after which
the message expires. By default, the message never expires, so setting the optimal message
age, will reduce memory overhead, thus improving performance.
• As far as the message body is concerned, you should avoid using messages with a large data
section. Verbose formats such as XML take up a lot of space on the wire and performance will
suffer as result. Consider usingByteMessages if you need to transfer XML messages to ensure
that the message is transmitted efficiently, avoiding unnecessary data conversion.
• Also you should be careful with the ObjectMessage type. ObjectMessage is convenient but it
comes at a cost: the body of an ObjectMessage uses Java serialization to serialize it to bytes.
The Java serialized form of even small objects is very verbose so takes up a lot of space on the
wire, also Java serialization is slow compared to custom marshalling techniques. Only
use ObjectMessage if you really can't use one of the other message types, for example if you
really don't know the type of the payload until runtime.
• http://www.precisejava.com/javaperf/j2ee/JMS.htm


JMS Tuning- CONTD
• Another element which influences the performances of your messages is the acknowledge
mode:
• CLIENT_ACKNOWLEDGE mode is the least feasible option since the JMS server cannot send
subsequent messages till it receives an acknowledgement from the client.
• AUTO_ACKNOWLEDGE mode follows the policy of delivering the message once-and-only once
but this incurs an overhead on the server to maintain this policy and requires an
acknowledgement to be sent from the server for each message received on the client.
• DUPS_OK_ACKNOWLEDGE mode has a different policy of sending the message more than once
thereby reducing the overhead on the server but might impose an overhead on the network
traffic by sending the message more than once.
• From a performance point of view, usually DUPS_OK_ACKNOWLEDGE gives better
performance than AUTO_ACKNOWLEDGE.
• Another alternative is to create a transacted session and batch up many acknowledgements
with one acknowledge/commit.
Other server specific tunings
• Tuning the Persistent Store
• Tuning App Server JMS Config
• Tuning Message Bridge
server documentation http://docs.oracle.com/cd/E13222_01/wls/docs92/perform/jmstuning.html


Adding Cache Layer


Best Practices

• Only Execute Mandatory Tasks
• Increase CPU speed (scale up)
• Optimize Algorithm
• Exploit Parallelism (Scale Out)
• Divide and conquer
• Fork-join
• Optimize Large Latencies
• Best of all world - Do them all
• Reduce Use Of XML


Technology based Tuning
Long list. I would cover in next session.
• Spring
• Hibernate
• EJB Level Tuning
• ESB tools like Mule, Camel, Websphere Message Broker
• Tibco EMS/RV
• JAXB/XML Beans
• JSP/JSP/Servlet
• ExtJS/Javascript
• Database tuning
• JAX-WS and JAX-RS (Web Service)
• Cluster tuning
• Grid Tuning
• OS Tuning
• Hardware Tuning/Network tuning (Adding extra bandwidth in network card)
• Web Server Tuning


Application Server Tuning
1. Setting Java Parameters for Starting App Server
2. Development vs. Production Mode Default Tuning Values
3. Thread Management
4. Tuning Network I/O
5. Setting Your Compiler Options
6. Using Server Clusters to Improve Performance
7. Monitoring a Server Domain


Sample Server: Weblogic Server Tuning
• Uses a platform-optimized, native socket multiplexor.
• Uses own socket reader threads and frees up weblogic threads.
• Available for most of the Platform Solaris, Linux, HP-UX, AIX, Win
• Can be configured using Weblogic Admin Console.
• Number of simultaneous operations that can be performed by applications.
• Production Mode Default 25
• Tuning criteria.
• Request turn around time.
• Number of CPUs
• % Socket Reader Threads (Default 33%).
• In 8.1 Execute Queue can be tuned for Overflow condition
• Increases thread count dynamically.
• Thread usage can be controlled by creating additional Execute Queues
• Performance Optimization for critical application.
• Throttle the performance
• To protect application from Deadlock
• It can have Negative impact on overall performance.


Sample Server: Weblogic Server Tuning
• StuckThread Detection
• Detects when execute thread can not complete work or accept new work.
• Warning purpose only, doesn’t change behavior / state of the thread.
• Stuck Thread Max Time , Stuck Thread Timer Interval.
• Connection Backlog Buffering.
• The number of backlogged TCP connection requests.

• Native IO gives better performance,
• consider Java IO if NativeIO is not stable.
• High number of thread can have negative impact on performance.
• More threads does not imply that you can process more work.
• Avoid application designs that require creating new threads.


Application Server Tuning : Further Reading
1. Weblogic Server
http://docs.oracle.com/cd/E13222_01/wls/docs100/perform/topten.html
2. Websphere Server
http://www.ibm.com/developerworks/websphere/downloads/peformtuning.html
3. Jboss Server
www.jboss.com/pdf/JB_JEAP4_3_PerformanceTuning_wp.pdf
4. Tomcat Server
tomcat.apache.org/articles/performance.pdf
5. Glassfish Server
docs.oracle.com/cd/E18930_01/html/821-2431/index.htm


Performance Checklist
• Planning for performance is the single most important indicator of success for a J2EE project's performance.
• J2EE profiling needs more than a J2SE profiler—it needs to be J2EE "aware" so J2EE requests can be followed
and logged, and communications, sessions, transactions, and bean life cycles can be monitored.
• Enterprise performance problems tend to come about equally from four main areas: databases, web
servers, application servers, and the network.
• Common database problems are insufficient indexing, fragmented databases, out-of-date statistics, and
faulty application design. Solutions include tuning the index, compacting the database, updating the
database, and rewriting the application so the database server controls the query process.
• Common web-server problems are poor design algorithms, incorrect configurations, poorly written code,
memory problems, and overloaded CPUs.
• Common application-server problems are poor cache management, unoptimized database queries, incorrect
software configuration, and poor concurrent handling of client requests.
• Common network problems are inadequate bandwidth somewhere along the communication route, and
undersized, misconfigured, or incompatible routers, switches, firewalls, and load balancers.
• Monitor JVM heap sizes, request response times, request service times, JDBC requests, RMI
communications, file descriptors, bean life cycles, transaction boundaries, cache sizes, CPU utilization, stack
traces, GC pauses, and network bandwidth.


Performance Checklist- Contd

• Watch out for slow response times, excessive database table scans, database deadlocks, unavailable pages,
memory leaks, and high CPU usage (consistently over 85%).
• Load testing should be repeatable. Tests should include expected peak loads. Tests should be as close to the
expected deployed system as possible and should be able to run for a long period of time.
• One testing methodology is to determine the maximum acceptable response time page download, estimate
the maximum number of simultaneous users, increase simulated users until the application response delay
becomes unacceptable, and tune until you reach a good response time for the desired number of users.
• Page display should be as fast as possible. Use simple pages with static layouts where possible. Let users get
to their destination page quickly. Work with the browser's capabilities.
• Use priority queues to provide different levels of service.
• Be prepared to handle network congestion and communication failures.
• High-performance applications probably need clustering and load balancing.
• Close JMS resources when you finish with them.
• Start the consumer before the producer.


Performance Checklist- Contd

• Separate nontransactional and transactional sessions.
• Use nonpersistent messages.
• Use shorter or compressed messages.
• Tune the redelivery count, the Delivery TimeToLive, and the Delivery capacity.
• Use asynchronous processing (MessageListener), parallel processing (ConnectionConsumers and
ServerSessionPools), flow control, load-balancing message queues, and duplicate delivery mode
(Session.DUPS_OK_ACKNOWLEDGE). Avoid Session.CLIENT_ACKNOWLEDGE.
• Use publish-and-subscribe when dealing with many active listeners and point-to-point for only a few
active listeners.


Jee performance tuning existing applications

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à Jee performance tuning existing applications

Similaire à Jee performance tuning existing applications (20)

Dernier

Dernier (20)

Jee performance tuning existing applications

Notes de l'éditeur