SlideShare une entreprise Scribd logo
1  sur  31
7 Deadly Sins of Enterprise Java
Programming and Deployment in the
Multicore Era

                                                       Anil Kumar:
Mahesh Somani,                                               anil.kumar@intel.com
  msomani@ebay.com                                     Kumar Shiv:
                                                             kumar.shiv@intel.com


                                JavaOne 2010
          *Other names and brands may be claimed as the property of others.
Agenda
• SALIGIA:         (First letter of the seven deadly sins in Latin)
  – Superbia, Avaritia, Luxuria, Invidia, Gula, Ira, Acedia


      Latin      meaning            Implication for a Geek
      Superbia   pride              my code is piece of perfection
      Luxuria    extravagance beefing up unnecessary areas
      Gula       gluttony           too many features and objects allocation
      Acedia     neglect            neglect scaling testing and corner cases
      Avaritia   greed              too much cost cutting on critical resources
      Invidia    envy               watching competition gaining market share
      Ira        wrath              what follows from Almighty! (management)


                                          JavaOne 2010
  2
                    *Other names and brands may be claimed as the property of others.
Agenda

• Performance progression in Multi-core era
• Quick details on latest s/w and h/w platforms
• Discussion of seven common pitfalls
• Summary
• References




                                     JavaOne 2010
 3
               *Other names and brands may be claimed as the property of others.
Multi-core era: Progression of performance
 • Phenomenal performance gain from hw+sw combine




                                                                                                         2010
                                                                                                  Year
      1,200                  SPECjbb2005 K bops: 2S platform
      1,000                                                                                     928
                                                                                                      1,011
       800
                                                                              604    632
       600                                                            557
                     2005
              Year




       400                                                      368
                                                        252
       200                          64
                36          51                 138
         0




 • H/W capabilities increased inline with Moore's Law
      – ~10x-15x gain just from h/w
 • S/W changes to unlock full potential of h/w capabilities
      – ~Doubling of performance

                                                  JavaOne 2010
  4
                            *Other names and brands may be claimed as the property of others.
Multi-core era: Rapid increase in # of cores
             Processor          Micro-                 Xeon             # of      Hyper-      LLC
      Year   Code Name      Architecture               Series         Cores Threading Cache
   2005 Irwindale              NetBurst              Xeon DP              1             2   2MB L2
   2005 Paxville-DP            NetBurst         Dual-Core Xeon            2             2   4MB L2
   2006 Dempsey-DP             NetBurst                 5000              2             2   4MB L2
   2006 Woodcrest                 Core                  5100              2        None     4MB L2
   2007 Wolfdale-DP               Core                  5200              2        None     6MB L2
   2006 Clovertown                Core                  5300              4        None     8MB L2
   2007 Harpertown              Penryn                  5400              4        None     12MB L2
   2009 Nehalem-EP             Nehalem                  5500              4             2   8MB L3
   2010 Westmere-EP            Nehalem                  5600              6             2   12MB L3

 • In addition to # of cores, many other advance features to deliver
   excellent user experience by default

                                          JavaOne 2010
  5
                    *Other names and brands may be claimed as the property of others.
Multi-core era: Increase in # of cores to continue 
Tick              Tock           Tick            Tock             Tick             Tock            Tick      Tock



   65nm                                45nm                              32nm                             22nm




                  Intel® Core™                        Nehalem                       Sandy Bridge
                   Microarchitecture                Microarchitecture                 Microarchitecture



                                 Intel® Xeon® 5600
           Intel’s first 32nm SERVER processor with
                               6 cores and 12 threads


  • Many more advance features to enhance user experience
                                                     JavaOne 2010
       6
                               *Other names and brands may be claimed as the property of others.
Multi-core era: More Platform level features 
           Q1’10                         Q2 ’10                             Q3 ’10
           JAN     FEB        MAR       APR        MAY        JUN        JUL        AUG         SEP          OCT
                   Launch of Xeon 5600 & 7500 series


                                                 Up to 8 cores per socket
Xeon 7500 Series
• 4 Socket & greater                                                   DDR3

• Nehalem EX (up to 8C + HT)
• New Mission Critical RAS                               Intel® QPI

• New Levels of Scalability                              2010/11
                                                         Platform
                                                                               • Glueless 8 socket Servers
• NUMA                                                                         • 16S – 32S Scalable Node controller
                                              PCI Express* 2.0 Technology



                                                    Up to 6 cores per socket
Xeon 5600 Processor
•   2 Socket
•   Westmere (up to 6C + HT)                         DDR3                Intel®
                                                                          QPI
•   Intel® AES-NI
•   Intel TXT
•   NUMA
                                                      PCI Express* 2.0 Technology


                                               JavaOne 2010
     7
                         *Other names and brands may be claimed as the property of others.
Intel’s role in improving Java Performance
       Out of box optimal settings
               Of H/W platform


 Working with                                                               Influencing
 JVM vendors                   Intel Java Team                           future CPU design
 Oracle and IBM


         Working with
          H/W and S/W               Working with ISVs
         profiling Tools           Application level stack


                  Application characterization helps in
                 better deployment decisions as well as
                     optimal utilization of a platform

 • Many more advance features to enhance user experience
                                      JavaOne 2010
  8
                *Other names and brands may be claimed as the property of others.
S/W impact: Intel relationship with ISVs
                     Java Applications (widely used apps)
                            JVMs (3 major JVMs)
                       OS (all major Operating Systems)

 • Very active role with OS vendors
 • Engaged with all three major JVM vendors
       – Sun HotSpot (Now Oracle HotSpot)
       – Oracle JRockit
       – IBM J9
 • Interaction with widely used Java applications
 • Optimizing complete s/w stack for latest processor while ensuring
   excellent performance on existing s/w stack

Close active relationship with ISV partners like: eBay

                                             JavaOne 2010
   9
                       *Other names and brands may be claimed as the property of others.
Application environments
• Batch processing: stand alone and/or cluster of systems:
   • Computation intensive etc.
• 3-tier applications servers
                       Java                   Backend
       Client          App
                      server                    DB etc.

• High frequency trading/financial latency sensitive apps
• Java + native mix
• Virtualized environment
• Cloud environment : very limited

                                      JavaOne 2010
  10
                *Other names and brands may be claimed as the property of others.
Factors impacting deployment configuration

                                                                             Application deployment
                                        Application                          configuration by itself
                                                                             can be very complex
                                               +
                                               JVM

                                               OS

                                                                   Other
                               Power
                       Turbo
                             Management
                                        Prefetching               BIOS
                                                                 settings
DIMM population                                                                           DIMM population
(Capacity, Latency)                 HT: Hyper Threading                                   (Capacity, Latency)
DIMM Type (speed)                          # of Cores                                     DIMM Type (speed)
                          Processor                    Processor     Bandwidth
     Memory                                                                                    Memory
                       SKU: GHz, Caches             SKU: GHz, Caches


                                    Disk I/O         Network

     Simple logical mapping (real interaction much more complex and intertwined )
                                            JavaOne 2010
    11
                      *Other names and brands may be claimed as the property of others.
Performance methodology
                                                                                    Application
                                                                                    experts
                                                                                    view
                 Application level monitoring
s/w
                  Many performance and scaling issues




h/w         H/W Performance monitoring counters                                      View of
                                                                                     Intel Java
                                                                                     performance
                                                                                     team



 • Many performance and scaling issues get
   reflected in h/w resource utilization which get
   tracked by performance monitoring counters
      – There are >400 performance monitoring counters
                                      JavaOne 2010
 12
                *Other names and brands may be claimed as the property of others.
Performance monitoring counters analysis

• Severe cases are easy to spot
• Moderate or extracting last 30% of the performance:
  – It is more art than science !
      – 4-5 counters can be collected at a time with min 100 ms granularity
      – Absolute values are not very useful
      – Solution involves relative values and correlation of h/w resources

                                                                                                                 problem

                                                        Latency
                                                                                                                 Load

• Histogram patterns to identify phases and anomalies
                4,000,000                                                       L2_LD_MIss
                                                                                                   L2 cache load miss
                3,500,000
                3,000,000
                2,500,000
                2,000,000
                1,500,000
                1,000,000                                                                                                                 GC
                 500,000
                       0
                            1   5   9   13   17   21   25   29   33   37   41   Time
                                                                                45 49 53     57   61   65   69   73   77   81   85   89   93   97




                                                                      JavaOne 2010
 13
                       *Other names and brands may be claimed as the property of others.
Locating source of problem
• Out-of-box collection from performance analyzers
  only useful for simple applications
  – Inlining makes it very hard to track source of problem
• What are next steps then?
  – disabling inlining when collecting profiles works often
  – Analysis across JITed methods code punctuated with h/w
    counters helps to identify the issue when methods hotness
    profile is FLAT
                 Methods             h/w counters (normalized per sec)
                                       CLK     Inst. Retd. Cache misses
             A     mov eax, [ebx]      110          52          55
                   mov [var], ebx       82          43          5
             B     str DB hello, 0      15          56          12
                   push <mem>           85          25          10
             C     add <reg>,<reg>     200          90          25
                   add <reg>,<mem>      24          32          8


• Close cooperation between Intel Java team and
  application performance team is crucial
                                       JavaOne 2010
 14
                 *Other names and brands may be claimed as the property of others.
Seven common pitfalls

1. Multi-threading, serialization, locks                                       App architects
                                                                               and
2. Lack of basic characterization                                              programmers

3.    JVM selection and JVM parameters                                                Testing
4.    Heap management and GC                                                          and
                                                                                      deployment
5.    Estimate and peak performance
6.    Monitoring (GC log etc.)

7. S/W + H/W configuration                                                               Issues
                                                                                         during
     –   Including Network, Disk I/O, OS                                                 support

         Customer IP and data security being paramount,
             only generic examples are being shared
                                        JavaOne 2010
 15
                  *Other names and brands may be claimed as the property of others.
1: Multithreading, serialization and locks
• Often first attempt at multi-threading riddled with
  too many locks
   – As programmers, it is better to be safe than sorry
   – But, once application is running, H/W level and JVM level
     profiling can identify potential locks for revisit
   – False sharing another cause of poor scaling
                          new            obj1             obj2             obj3         obj4
   Scaling issue if objects              Thread1       Thread2           Thread3       Thread4
   manipulation is very                    CPU            CPU             CPU            CPU
   often and threads can
   run across multiple chips

• App is multithreaded but serialization at JVM or
  class library level or JNI component
• Most issues are exposed when pushing system
  utilization beyond 60% or some throughput level
                                         JavaOne 2010
  16
                   *Other names and brands may be claimed as the property of others.
2: Lack of basic characterization
• Baseline measurement for light load conditions
         – Throughput and response time very critical as feedback
           to tester as well as to identify any anomaly
                                                      Anomaly



                                   Response
                                     Time
                                                             50%            100%     CPU utilization %

• Basic profile of application:
         – Some surprises could detected early
              80                                                   90
              70                                                   80
                                                                   70
 Throughput




              60
              50                                                   60
                                                                   50
              40
                                                                   40
              30                                                   30
              20                                                   20
              10                                                   10
               0                                                    0
                   0   1       2              3   4      5              0    1       2       3    4      5

                           # of chips                                            # of chips

                                                      JavaOne 2010
17
                             *Other names and brands may be claimed as the property of others.
3: JVM selection and JVM parameters
• What if end-user environment is unknown?
  – But, some information could given to user

• JVM selection:
  – Often latest JVMs provides best performance for latest h/w
  – Throughput computing: ~10% impact is very common
       Oracle Hot Spot JVMs latest versions for Xeon 5500/5600/7500 series
       S314665: A Journey to the Center of the Java Universe, Wed1PM, Parc 55/Embarcadero

  – Response time sensitive apps:
       Standard JVMs vs. Real time JVMs

• JVM parameters:
  – Up to ~50% benefit                 (some possibility of negative impact in niche cases)

       Locks, strings, heap/GC are common examples helping most applications


                                             JavaOne 2010
  18
                       *Other names and brands may be claimed as the property of others.
4: Heap management and GC
• Desired goal for heap:
  – Avoid memory swapping while able to use large enough
    heap to reduce GC frequency
     Total RAM > Total (Java heap + non-heap memory)
     Old space > Total (long live objects) to avoid old space GC
  – What if # of instances launched is unknown?

• GC choices:                               Throughput
  – Throughput computing:
                                                                                      Heap
  – Response time sensitive apps:
      – When deploying multiple instances, # of GC threads impacts response

  – 64bit JVM: beware of compressed pointers/references
      – Sudden jump on 4GB or 32GB heap boundaries


                                        JavaOne 2010
 19
                  *Other names and brands may be claimed as the property of others.
5: Estimate and peak performance
• Model and anticipate demand
• Pay attention to demand spikes at specific time of day
• Stress test in the target environment.
  – Don’t assume linear performance
• Software and hardware configuration lead to non-
  linear behavior
  – Hyper threading
                                                                                   HT gain
  – Resource caps                                                                  application
                                                                                   dependent
                               Throughput




                                                           CPU utilization %
                                                           50%             100%




                                            JavaOne 2010
  20
               *Other names and brands may be claimed as the property of others.
6: Monitoring
• Low overhead, always-on
• Helps with root cause analysis
• Hardware, OS, JVM, and Application level
  monitoring
• Capture and log the important metrics periodically
     – CPU, Processes, GC
     – Logical resource caps like thread pools and connection
       pools
     – Errors, external resource utilization (DB, services)




                                      JavaOne 2010
21
                *Other names and brands may be claimed as the property of others.
6: Monitoring and Profiling: case studies
• Insufficient heap size
  – Default heap size very inconsistent across JVMs/OS
• Memory swapping from too large heap
  – JVM starting first, non-Java memory/shared memory space
• Inconsistent default nursery/old space size
• Thread pool size auto-tuning for various
  deployment
• No monitoring, detection and notification to user
• OS level:
  – Too many context switches, interrupts, exceptions




                                    JavaOne 2010
 22
              *Other names and brands may be claimed as the property of others.
7: S/W + H/W configuration
• Inconsistent user experience
• Degradation from changes in s/w and/or h/w
  upgrade:

     – H/W features:
       – Turbo, CPU SKU, NUMA, memory population, # of cores
         increased but GHz decreased, Power management
     – S/W features
       – Deployment configurations: not our area of expertise
     – Disk and network I/O: did not keep up with the increased
       processing power




                                      JavaOne 2010
23
                *Other names and brands may be claimed as the property of others.
Summary
 1.    Architect the design to scale
 2.    Control the Java + JNI environment
 3.    Heap and GC type
 4.    JVM and parameter selection
 5.    Estimate and peak performance
 6.    Light weight monitoring
 7.    H/W and S/W configuration
              Thank you !
Anil Kumar: anil.kumar@intel.com
Kumar Shiv: kumar.shiv@intel.com
Mahesh Somani: msomani@ebay.com
                http://software.intel.com/sites/oss/pdfs/322727-001US_Java_Perf_Xeon_wp.pdf

                                     JavaOne 2010
  24
               *Other names and brands may be claimed as the property of others.
Backup




                               JavaOne 2010
25
         *Other names and brands may be claimed as the property of others.
EMON and VTune: H/W counters profiling

• EMON (Intel internal Tool)
     – >500 h/w counters can be profiles from 30 minutes run
     – Analysis helps in understanding:
       – How application is stressing h/w resources
       – Helps in predicting/estimating where scaling issue may occur
       – Can help in deployment strategy for similar scenarios

• Intel VTune Performance Analyzer:
     – H/W counters causing bottleneck can be profiled using Intel
       VTune Performance Analyzer to identify the methods
     – Oh! Yes, after JITing and optimizations, method name and asm
       code matches perfectly  (just kidding)
     – Requires in-depth knowledge and some tricks to map asm code
       to Java source code (disable inlining, if possible)
     – http://software.intel.com/en-us/intel-vtune/

                                       JavaOne 2010
26
                 *Other names and brands may be claimed as the property of others.
Non-Uniform Memory Access (NUMA)




                                 Nehalem             Nehalem
                                   EP                  EP




                                                          Tylersburg
                                                              EP



      Intel® C ore™ microarchitecture (Nehalem-EP)
      Intel® Next Generation Server Processor Technology (Tylersburg-EP)




                                         JavaOne 2010
27
             *Other names and brands may be claimed as the property of others.
Scaling over older generation
• Most Java applications should get significant boost
     – 50% or more gain for SPECjbb2005, SPECjvm2008 and
       SPECjAppServer2004 for Nehalem-EP over Core 2

• For some niche apps Xeon 5400 > Xeon 5500
     – When fits into (2x6MB L2) of Xeon 5400 series and
     – Does not fit into (4x256k L2 + 8MB L3) of Xeon 5500 series

                                                        Xeon 5500 series Nehalem-EP based
       Xeon 5400 series Core 2 based
                                                        Core 1      Core 2       Core 3   Core 4
      Core 1 Core 2    Core 3 Core 4                   HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1

       L1      L1         L1        L1                     L1          L1           L1      L1
                                                        256k L2 256k L2 256k L2 256k L2
        6MB L2             6MB L2
                                                                          8MB L3


                                            JavaOne 2010
28
                      *Other names and brands may be claimed as the property of others.
JavaOne 2010
29
     *Other names and brands may may be claimed as the property of others.
         *Other names and brands be claimed as the property of others.
Core scaling:                                       Performance evaluation within a socket

                                                                       • Compare without HT threads
                                                                               Core 1 Core 2 Core 3 Core 4 Core 5 Core 6
                                                                              HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1
                                                                      run 1    X
                                                                      run 2    X         X
Xeon 5600 series (Westmere-EP)
                                                                      run 3    X         X        X
                                                                      run 4    X         X        X            X
   Core 1

            Core 2

                        Core 3

                                 Core 4

                                           Core 5

                                                    Core 6
                                                                      run 5    X         X        X            X          X
                                                                      run 6    X         X        X            X          X              X
HT:0
       HT:1
       HT:0
              HT:1
              HT:0
                          HT:1
                          HT:0
                                   HT:1
                                   HT:0
                                             HT:1
                                             HT:0
                                                      HT:1


                                                                       • Compare with HT threads
                       12M Shared
                     Last Level Cache
                                                                               Core 1 Core 2 Core 3 Core 4 Core 5 Core 6
                                                                              HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1
                                                                      run 1    X    X
                                                                      run 2    X    X    X    X
                                                                      run 3    X    X    X    X    X     X
                                                                      run 4    X    X    X    X    X     X     X     X
                                                                      run 5    X    X    X    X    X     X     X     X    X     X
                                                                      run 6    X    X    X    X    X     X     X     X    X     X        X   X

                                                                                                       X : Logical thread will be used



                                                                JavaOne 2010
 30
                                          *Other names and brands may be claimed as the property of others.
Socket scaling: Overall performance evaluation


• Core scaling ensures performance within a socket
• Socket scaling ensures overall performance
• Multiple JVM instances:



          Run 1           Run 2                 Run 3                Run 4

• Single JVM instance:
   – Good to have NUMA disabled for consistency
   – Stresses snooping bandwidth




           Run 1             Run 2            Run 3                  Run 4
                                         JavaOne 2010
  31
                   *Other names and brands may be claimed as the property of others.

Contenu connexe

Tendances

2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito cta2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito ctaLorenzo Corbetta
 
VR-Zone | Stuff for the Geeks (February 13th Issue)
VR-Zone | Stuff for the Geeks (February 13th Issue)VR-Zone | Stuff for the Geeks (February 13th Issue)
VR-Zone | Stuff for the Geeks (February 13th Issue)VR-Zone .com
 
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be SlowELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be SlowBenjamin Zores
 
Durgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpgaDurgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpgaObsidian Software
 
Cots moves to multicore: Wind River
Cots moves to multicore: Wind RiverCots moves to multicore: Wind River
Cots moves to multicore: Wind RiverKonrad Witte
 
Intel_Low Power Intelligent Solutions with Intel Atom Processor
Intel_Low Power Intelligent Solutions with Intel Atom ProcessorIntel_Low Power Intelligent Solutions with Intel Atom Processor
Intel_Low Power Intelligent Solutions with Intel Atom ProcessorIşınsu Akçetin
 
IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 serversTony Pearson
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialJeff Larkin
 
AMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press PresentationAMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press PresentationAMD
 
Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)IBM Danmark
 
Shmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShannon McFarland
 
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013Michael Noel
 
Server Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogServer Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogConFoo
 
Aberdeen presentation show 2010
Aberdeen presentation show 2010Aberdeen presentation show 2010
Aberdeen presentation show 2010LarryRAguilar
 

Tendances (20)

2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito cta2013 02 08 annunci power 7 plus sito cta
2013 02 08 annunci power 7 plus sito cta
 
VR-Zone | Stuff for the Geeks (February 13th Issue)
VR-Zone | Stuff for the Geeks (February 13th Issue)VR-Zone | Stuff for the Geeks (February 13th Issue)
VR-Zone | Stuff for the Geeks (February 13th Issue)
 
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be SlowELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
 
SDN
SDNSDN
SDN
 
Durgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpgaDurgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpga
 
Linux on System z – performance update
Linux on System z – performance updateLinux on System z – performance update
Linux on System z – performance update
 
Cots moves to multicore: Wind River
Cots moves to multicore: Wind RiverCots moves to multicore: Wind River
Cots moves to multicore: Wind River
 
Intel_Low Power Intelligent Solutions with Intel Atom Processor
Intel_Low Power Intelligent Solutions with Intel Atom ProcessorIntel_Low Power Intelligent Solutions with Intel Atom Processor
Intel_Low Power Intelligent Solutions with Intel Atom Processor
 
IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 servers
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorial
 
About Aberdeen
About AberdeenAbout Aberdeen
About Aberdeen
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
AMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press PresentationAMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press Presentation
 
Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)Fremtidens platform til koncernsystemer (IBM System z)
Fremtidens platform til koncernsystemer (IBM System z)
 
Shmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxy
 
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
 
Introducing JSR-283
Introducing JSR-283Introducing JSR-283
Introducing JSR-283
 
Server Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and WatchdogServer Administration in Python with Fabric, Cuisine and Watchdog
Server Administration in Python with Fabric, Cuisine and Watchdog
 
Aberdeen presentation show 2010
Aberdeen presentation show 2010Aberdeen presentation show 2010
Aberdeen presentation show 2010
 
6dec2011 - DELL
6dec2011 - DELL6dec2011 - DELL
6dec2011 - DELL
 

Similaire à Seven deadly

OSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOpenStorageSummit
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable CloudChris Genazzio
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Whiptail XLR8r SSD Array
Whiptail XLR8r SSD ArrayWhiptail XLR8r SSD Array
Whiptail XLR8r SSD ArrayDarren Williams
 
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015odpeer
 
Ben Pashkoff - java embedded - 24mai2011
Ben Pashkoff - java embedded - 24mai2011Ben Pashkoff - java embedded - 24mai2011
Ben Pashkoff - java embedded - 24mai2011Agora Group
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureBob Rhubart
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...DevOps.com
 
Intel Roadmap 2010
Intel Roadmap 2010Intel Roadmap 2010
Intel Roadmap 2010Umair Mohsin
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...DevOps.com
 
AMD Opteron 6200 and 4200 Series Presentation
AMD Opteron 6200 and 4200 Series PresentationAMD Opteron 6200 and 4200 Series Presentation
AMD Opteron 6200 and 4200 Series PresentationAMD
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomMicrosoft Singapore
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community
 
IPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalabilityIPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalabilityCosimo Streppone
 
Optimizing Performance of your Oracle Database using 8Gb Fibre Channel
Optimizing Performance of your Oracle Database using 8Gb Fibre ChannelOptimizing Performance of your Oracle Database using 8Gb Fibre Channel
Optimizing Performance of your Oracle Database using 8Gb Fibre ChannelEmulex Corporation
 
configurations type cloud VNX
configurations type cloud VNXconfigurations type cloud VNX
configurations type cloud VNXErwan Quigna
 

Similaire à Seven deadly (20)

OSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan PowellOSS Presentation Keynote by Evan Powell
OSS Presentation Keynote by Evan Powell
 
Big Data Smarter Networks
Big Data Smarter NetworksBig Data Smarter Networks
Big Data Smarter Networks
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable Cloud
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Whiptail XLR8r SSD Array
Whiptail XLR8r SSD ArrayWhiptail XLR8r SSD Array
Whiptail XLR8r SSD Array
 
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
 
Ben Pashkoff - java embedded - 24mai2011
Ben Pashkoff - java embedded - 24mai2011Ben Pashkoff - java embedded - 24mai2011
Ben Pashkoff - java embedded - 24mai2011
 
Sponge v2
Sponge v2Sponge v2
Sponge v2
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the Future
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Intel Roadmap 2010
Intel Roadmap 2010Intel Roadmap 2010
Intel Roadmap 2010
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
AMD Opteron 6200 and 4200 Series Presentation
AMD Opteron 6200 and 4200 Series PresentationAMD Opteron 6200 and 4200 Series Presentation
AMD Opteron 6200 and 4200 Series Presentation
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicom
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
IPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalabilityIPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalability
 
Optimizing Performance of your Oracle Database using 8Gb Fibre Channel
Optimizing Performance of your Oracle Database using 8Gb Fibre ChannelOptimizing Performance of your Oracle Database using 8Gb Fibre Channel
Optimizing Performance of your Oracle Database using 8Gb Fibre Channel
 
configurations type cloud VNX
configurations type cloud VNXconfigurations type cloud VNX
configurations type cloud VNX
 

Seven deadly

  • 1. 7 Deadly Sins of Enterprise Java Programming and Deployment in the Multicore Era Anil Kumar: Mahesh Somani, anil.kumar@intel.com msomani@ebay.com Kumar Shiv: kumar.shiv@intel.com JavaOne 2010 *Other names and brands may be claimed as the property of others.
  • 2. Agenda • SALIGIA: (First letter of the seven deadly sins in Latin) – Superbia, Avaritia, Luxuria, Invidia, Gula, Ira, Acedia Latin meaning Implication for a Geek Superbia pride my code is piece of perfection Luxuria extravagance beefing up unnecessary areas Gula gluttony too many features and objects allocation Acedia neglect neglect scaling testing and corner cases Avaritia greed too much cost cutting on critical resources Invidia envy watching competition gaining market share Ira wrath what follows from Almighty! (management) JavaOne 2010 2 *Other names and brands may be claimed as the property of others.
  • 3. Agenda • Performance progression in Multi-core era • Quick details on latest s/w and h/w platforms • Discussion of seven common pitfalls • Summary • References JavaOne 2010 3 *Other names and brands may be claimed as the property of others.
  • 4. Multi-core era: Progression of performance • Phenomenal performance gain from hw+sw combine 2010 Year 1,200 SPECjbb2005 K bops: 2S platform 1,000 928 1,011 800 604 632 600 557 2005 Year 400 368 252 200 64 36 51 138 0 • H/W capabilities increased inline with Moore's Law – ~10x-15x gain just from h/w • S/W changes to unlock full potential of h/w capabilities – ~Doubling of performance JavaOne 2010 4 *Other names and brands may be claimed as the property of others.
  • 5. Multi-core era: Rapid increase in # of cores Processor Micro- Xeon # of Hyper- LLC Year Code Name Architecture Series Cores Threading Cache 2005 Irwindale NetBurst Xeon DP 1 2 2MB L2 2005 Paxville-DP NetBurst Dual-Core Xeon 2 2 4MB L2 2006 Dempsey-DP NetBurst 5000 2 2 4MB L2 2006 Woodcrest Core 5100 2 None 4MB L2 2007 Wolfdale-DP Core 5200 2 None 6MB L2 2006 Clovertown Core 5300 4 None 8MB L2 2007 Harpertown Penryn 5400 4 None 12MB L2 2009 Nehalem-EP Nehalem 5500 4 2 8MB L3 2010 Westmere-EP Nehalem 5600 6 2 12MB L3 • In addition to # of cores, many other advance features to deliver excellent user experience by default JavaOne 2010 5 *Other names and brands may be claimed as the property of others.
  • 6. Multi-core era: Increase in # of cores to continue  Tick Tock Tick Tock Tick Tock Tick Tock 65nm 45nm 32nm 22nm Intel® Core™ Nehalem Sandy Bridge Microarchitecture Microarchitecture Microarchitecture Intel® Xeon® 5600 Intel’s first 32nm SERVER processor with 6 cores and 12 threads • Many more advance features to enhance user experience JavaOne 2010 6 *Other names and brands may be claimed as the property of others.
  • 7. Multi-core era: More Platform level features  Q1’10 Q2 ’10 Q3 ’10 JAN FEB MAR APR MAY JUN JUL AUG SEP OCT Launch of Xeon 5600 & 7500 series Up to 8 cores per socket Xeon 7500 Series • 4 Socket & greater DDR3 • Nehalem EX (up to 8C + HT) • New Mission Critical RAS Intel® QPI • New Levels of Scalability 2010/11 Platform • Glueless 8 socket Servers • NUMA • 16S – 32S Scalable Node controller PCI Express* 2.0 Technology Up to 6 cores per socket Xeon 5600 Processor • 2 Socket • Westmere (up to 6C + HT) DDR3 Intel® QPI • Intel® AES-NI • Intel TXT • NUMA PCI Express* 2.0 Technology JavaOne 2010 7 *Other names and brands may be claimed as the property of others.
  • 8. Intel’s role in improving Java Performance Out of box optimal settings Of H/W platform Working with Influencing JVM vendors Intel Java Team future CPU design Oracle and IBM Working with H/W and S/W Working with ISVs profiling Tools Application level stack Application characterization helps in better deployment decisions as well as optimal utilization of a platform • Many more advance features to enhance user experience JavaOne 2010 8 *Other names and brands may be claimed as the property of others.
  • 9. S/W impact: Intel relationship with ISVs Java Applications (widely used apps) JVMs (3 major JVMs) OS (all major Operating Systems) • Very active role with OS vendors • Engaged with all three major JVM vendors – Sun HotSpot (Now Oracle HotSpot) – Oracle JRockit – IBM J9 • Interaction with widely used Java applications • Optimizing complete s/w stack for latest processor while ensuring excellent performance on existing s/w stack Close active relationship with ISV partners like: eBay JavaOne 2010 9 *Other names and brands may be claimed as the property of others.
  • 10. Application environments • Batch processing: stand alone and/or cluster of systems: • Computation intensive etc. • 3-tier applications servers Java Backend Client App server DB etc. • High frequency trading/financial latency sensitive apps • Java + native mix • Virtualized environment • Cloud environment : very limited JavaOne 2010 10 *Other names and brands may be claimed as the property of others.
  • 11. Factors impacting deployment configuration Application deployment Application configuration by itself can be very complex + JVM OS Other Power Turbo Management Prefetching BIOS settings DIMM population DIMM population (Capacity, Latency) HT: Hyper Threading (Capacity, Latency) DIMM Type (speed) # of Cores DIMM Type (speed) Processor Processor Bandwidth Memory Memory SKU: GHz, Caches SKU: GHz, Caches Disk I/O Network Simple logical mapping (real interaction much more complex and intertwined ) JavaOne 2010 11 *Other names and brands may be claimed as the property of others.
  • 12. Performance methodology Application experts view Application level monitoring s/w Many performance and scaling issues h/w H/W Performance monitoring counters View of Intel Java performance team • Many performance and scaling issues get reflected in h/w resource utilization which get tracked by performance monitoring counters – There are >400 performance monitoring counters JavaOne 2010 12 *Other names and brands may be claimed as the property of others.
  • 13. Performance monitoring counters analysis • Severe cases are easy to spot • Moderate or extracting last 30% of the performance: – It is more art than science ! – 4-5 counters can be collected at a time with min 100 ms granularity – Absolute values are not very useful – Solution involves relative values and correlation of h/w resources problem Latency Load • Histogram patterns to identify phases and anomalies 4,000,000 L2_LD_MIss L2 cache load miss 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 GC 500,000 0 1 5 9 13 17 21 25 29 33 37 41 Time 45 49 53 57 61 65 69 73 77 81 85 89 93 97 JavaOne 2010 13 *Other names and brands may be claimed as the property of others.
  • 14. Locating source of problem • Out-of-box collection from performance analyzers only useful for simple applications – Inlining makes it very hard to track source of problem • What are next steps then? – disabling inlining when collecting profiles works often – Analysis across JITed methods code punctuated with h/w counters helps to identify the issue when methods hotness profile is FLAT Methods h/w counters (normalized per sec) CLK Inst. Retd. Cache misses A mov eax, [ebx] 110 52 55 mov [var], ebx 82 43 5 B str DB hello, 0 15 56 12 push <mem> 85 25 10 C add <reg>,<reg> 200 90 25 add <reg>,<mem> 24 32 8 • Close cooperation between Intel Java team and application performance team is crucial JavaOne 2010 14 *Other names and brands may be claimed as the property of others.
  • 15. Seven common pitfalls 1. Multi-threading, serialization, locks App architects and 2. Lack of basic characterization programmers 3. JVM selection and JVM parameters Testing 4. Heap management and GC and deployment 5. Estimate and peak performance 6. Monitoring (GC log etc.) 7. S/W + H/W configuration Issues during – Including Network, Disk I/O, OS support Customer IP and data security being paramount, only generic examples are being shared JavaOne 2010 15 *Other names and brands may be claimed as the property of others.
  • 16. 1: Multithreading, serialization and locks • Often first attempt at multi-threading riddled with too many locks – As programmers, it is better to be safe than sorry – But, once application is running, H/W level and JVM level profiling can identify potential locks for revisit – False sharing another cause of poor scaling new obj1 obj2 obj3 obj4 Scaling issue if objects Thread1 Thread2 Thread3 Thread4 manipulation is very CPU CPU CPU CPU often and threads can run across multiple chips • App is multithreaded but serialization at JVM or class library level or JNI component • Most issues are exposed when pushing system utilization beyond 60% or some throughput level JavaOne 2010 16 *Other names and brands may be claimed as the property of others.
  • 17. 2: Lack of basic characterization • Baseline measurement for light load conditions – Throughput and response time very critical as feedback to tester as well as to identify any anomaly Anomaly Response Time 50% 100% CPU utilization % • Basic profile of application: – Some surprises could detected early 80 90 70 80 70 Throughput 60 50 60 50 40 40 30 30 20 20 10 10 0 0 0 1 2 3 4 5 0 1 2 3 4 5 # of chips # of chips JavaOne 2010 17 *Other names and brands may be claimed as the property of others.
  • 18. 3: JVM selection and JVM parameters • What if end-user environment is unknown? – But, some information could given to user • JVM selection: – Often latest JVMs provides best performance for latest h/w – Throughput computing: ~10% impact is very common Oracle Hot Spot JVMs latest versions for Xeon 5500/5600/7500 series S314665: A Journey to the Center of the Java Universe, Wed1PM, Parc 55/Embarcadero – Response time sensitive apps: Standard JVMs vs. Real time JVMs • JVM parameters: – Up to ~50% benefit (some possibility of negative impact in niche cases) Locks, strings, heap/GC are common examples helping most applications JavaOne 2010 18 *Other names and brands may be claimed as the property of others.
  • 19. 4: Heap management and GC • Desired goal for heap: – Avoid memory swapping while able to use large enough heap to reduce GC frequency Total RAM > Total (Java heap + non-heap memory) Old space > Total (long live objects) to avoid old space GC – What if # of instances launched is unknown? • GC choices: Throughput – Throughput computing: Heap – Response time sensitive apps: – When deploying multiple instances, # of GC threads impacts response – 64bit JVM: beware of compressed pointers/references – Sudden jump on 4GB or 32GB heap boundaries JavaOne 2010 19 *Other names and brands may be claimed as the property of others.
  • 20. 5: Estimate and peak performance • Model and anticipate demand • Pay attention to demand spikes at specific time of day • Stress test in the target environment. – Don’t assume linear performance • Software and hardware configuration lead to non- linear behavior – Hyper threading HT gain – Resource caps application dependent Throughput CPU utilization % 50% 100% JavaOne 2010 20 *Other names and brands may be claimed as the property of others.
  • 21. 6: Monitoring • Low overhead, always-on • Helps with root cause analysis • Hardware, OS, JVM, and Application level monitoring • Capture and log the important metrics periodically – CPU, Processes, GC – Logical resource caps like thread pools and connection pools – Errors, external resource utilization (DB, services) JavaOne 2010 21 *Other names and brands may be claimed as the property of others.
  • 22. 6: Monitoring and Profiling: case studies • Insufficient heap size – Default heap size very inconsistent across JVMs/OS • Memory swapping from too large heap – JVM starting first, non-Java memory/shared memory space • Inconsistent default nursery/old space size • Thread pool size auto-tuning for various deployment • No monitoring, detection and notification to user • OS level: – Too many context switches, interrupts, exceptions JavaOne 2010 22 *Other names and brands may be claimed as the property of others.
  • 23. 7: S/W + H/W configuration • Inconsistent user experience • Degradation from changes in s/w and/or h/w upgrade: – H/W features: – Turbo, CPU SKU, NUMA, memory population, # of cores increased but GHz decreased, Power management – S/W features – Deployment configurations: not our area of expertise – Disk and network I/O: did not keep up with the increased processing power JavaOne 2010 23 *Other names and brands may be claimed as the property of others.
  • 24. Summary 1. Architect the design to scale 2. Control the Java + JNI environment 3. Heap and GC type 4. JVM and parameter selection 5. Estimate and peak performance 6. Light weight monitoring 7. H/W and S/W configuration Thank you ! Anil Kumar: anil.kumar@intel.com Kumar Shiv: kumar.shiv@intel.com Mahesh Somani: msomani@ebay.com http://software.intel.com/sites/oss/pdfs/322727-001US_Java_Perf_Xeon_wp.pdf JavaOne 2010 24 *Other names and brands may be claimed as the property of others.
  • 25. Backup JavaOne 2010 25 *Other names and brands may be claimed as the property of others.
  • 26. EMON and VTune: H/W counters profiling • EMON (Intel internal Tool) – >500 h/w counters can be profiles from 30 minutes run – Analysis helps in understanding: – How application is stressing h/w resources – Helps in predicting/estimating where scaling issue may occur – Can help in deployment strategy for similar scenarios • Intel VTune Performance Analyzer: – H/W counters causing bottleneck can be profiled using Intel VTune Performance Analyzer to identify the methods – Oh! Yes, after JITing and optimizations, method name and asm code matches perfectly  (just kidding) – Requires in-depth knowledge and some tricks to map asm code to Java source code (disable inlining, if possible) – http://software.intel.com/en-us/intel-vtune/ JavaOne 2010 26 *Other names and brands may be claimed as the property of others.
  • 27. Non-Uniform Memory Access (NUMA) Nehalem Nehalem EP EP Tylersburg EP Intel® C ore™ microarchitecture (Nehalem-EP) Intel® Next Generation Server Processor Technology (Tylersburg-EP) JavaOne 2010 27 *Other names and brands may be claimed as the property of others.
  • 28. Scaling over older generation • Most Java applications should get significant boost – 50% or more gain for SPECjbb2005, SPECjvm2008 and SPECjAppServer2004 for Nehalem-EP over Core 2 • For some niche apps Xeon 5400 > Xeon 5500 – When fits into (2x6MB L2) of Xeon 5400 series and – Does not fit into (4x256k L2 + 8MB L3) of Xeon 5500 series Xeon 5500 series Nehalem-EP based Xeon 5400 series Core 2 based Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 L1 L1 L1 L1 L1 L1 L1 L1 256k L2 256k L2 256k L2 256k L2 6MB L2 6MB L2 8MB L3 JavaOne 2010 28 *Other names and brands may be claimed as the property of others.
  • 29. JavaOne 2010 29 *Other names and brands may may be claimed as the property of others. *Other names and brands be claimed as the property of others.
  • 30. Core scaling: Performance evaluation within a socket • Compare without HT threads Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 run 1 X run 2 X X Xeon 5600 series (Westmere-EP) run 3 X X X run 4 X X X X Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 run 5 X X X X X run 6 X X X X X X HT:0 HT:1 HT:0 HT:1 HT:0 HT:1 HT:0 HT:1 HT:0 HT:1 HT:0 HT:1 • Compare with HT threads 12M Shared Last Level Cache Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 HT0 HT1 run 1 X X run 2 X X X X run 3 X X X X X X run 4 X X X X X X X X run 5 X X X X X X X X X X run 6 X X X X X X X X X X X X X : Logical thread will be used JavaOne 2010 30 *Other names and brands may be claimed as the property of others.
  • 31. Socket scaling: Overall performance evaluation • Core scaling ensures performance within a socket • Socket scaling ensures overall performance • Multiple JVM instances: Run 1 Run 2 Run 3 Run 4 • Single JVM instance: – Good to have NUMA disabled for consistency – Stresses snooping bandwidth Run 1 Run 2 Run 3 Run 4 JavaOne 2010 31 *Other names and brands may be claimed as the property of others.