SlideShare une entreprise Scribd logo
1  sur  69
Kris Mok, Software Engineer, Taobao
@rednaxelafx
莫枢 /“撒迦”
JVM @ Taobao
Agenda

Customization

  Tuning
JVM @ Taobao
Open Source

  Training
INTRODUCTION
Java Strengths
•   Good abstraction
•   Good performance
•   Good tooling (IDE, profiler, etc.)
•   Easy to recruit good programmers
Java Weaknesses
• Tension between “abstraction leak” and
  performance
  – Abstraction and performance don’t always
    come together
• More control/info over GC and object
  overhead wanted sometimes
Our Team
• Domain-Specific Computing Team
  – performance- and efficency-oriented
  – specific solutions to specific problems
  – do the low-level plumbing for specific
    applications targeting specific hardware
  – we’re hiring!
    • software and hardware hackers
Our Team (cont.)
• Current Focus
  – JVM-level customization/tuning
    • long term project
    • based on HotSpot Express 20 from OpenJDK
    • serving:
       – 10,000+ JVM instances serving online
       – 1,000+ Java developers
  – Hadoop tuning
  – Dedicated accelerator card adoption
JVM CUSTOMIZATION
@ TAOBAO
Themes
• Performance
• Monitoring/Diagnostics
• Stability
Tradeoffs
• Would like to make as little impact on
  existing Java application code as possible
• But if the performance/efficiency gains are
  significant enough, we’re willing to make
  extensions to the VM/core libs
JVM Customizations
•   GC Invisible Heap (GCIH)
•   JNI Wrapper improvement
•   New instructions
•   PrintGCReason / CMS bug fix
•   ArrayAllocationWarningSize
•   Change VM argument defaults
•   etc.
Case 1: in-memory cache
• Certain data is computed offline and then
  fed to online systems in a read-only,
  “cache” fashion
in-memory cache
• Fastest way to access them is to
  – put them in-process, in-memory,
  – access as normal Java objects,
  – no serialization/JNI involved per access
in-memory cache
• Large, static, long-live data in the GC heap
  – may lead to long GC pauses at full GC,
  – or long overall concurrent GC cycle
• What if we take them out of the GC heap?
  – but without having to serialize them?
GC Inivisible Heap
• “GC Invisible Heap” (GCIH)
  – an extension to HotSpot VM
  – an in-process, in-memory heap space
  – not managed by the GC
  – stores normal Java objects
• Currently works with ParNew+CMS
GCIH interface
• “moveIn(Object root)”
  – given the root of an object graph, move the
    whole graph out of GC heap and into GCIH
• “moveOut()”
  – GCIH space reset to a clean state
  – abandon all data in current GCIH space
  – (earlier version) move the object graph back
    into GC heap
GCIH interface (cont.)
• Current restrictions
  – data in GCIH should be read-only
  – objects in GCIH may not be used as monitors
  – no outgoing references allowed
• Restrictions may be relaxed in the future
GCIH interface (cont.)
• To update data
  – moveOut – (update) - moveIn
-XX:PermSize
-XX:MaxPermSize                                Original
                               -Xms/-Xmx
              -Xmn

 Perm             Young                      Old Cache Data

                          GC Managed Heap




-XX:PermSize
-XX:MaxPermSize                                Using GCIH
                      -Xms/-Xmx
                                                       -XX:GCIHSize
              -Xmn

 Perm             Young                Old       Cache Data

                  GC Managed Heap                         GCIH
Actual performance
• Reduces stop-the-world full GC pause
  time
• Reduces concurrent-mark and concurrent-
  sweep time
  – but the two stop-the-world phases of CMS
    aren’t necessarily significantly faster
Total time of CMS GC phases
             2.0000
                                   concurrent-mark
             1.8000                                                       concurrent-
                                                                            sweep
             1.6000
             1.4000
             1.2000
time (sec)




             1.0000
             0.8000
             0.6000
             0.4000
             0.2000
                          initial-mark               preclean   remark                  reset
             0.0000
                               1             2          3          4            5         6
               Original     0.0072         1.7943     0.0373     0.0118       1.5717    0.0263
               w/GCIH       0.0043         0.5400     0.0159     0.0035       0.6266    0.0240
Alternatives
GCIH                            BigMemory
• × extension to the JVM        • √ runs on standard JVM
• √ in-process, in-memory       • √ in-process, in-memory
• √ not under GC control        • √ not under GC control
• √ direct access of Java       • × serialize/deserialize
  objects                         Java objects
• √ no JNI overhead on          • × JNI overhead on
  access                          access
• √ object graph is in better   • × N/A
  locality
GCIH future
• still in early stage of development now
• may try to make the API surface more like
  RTSJ
Experimental: object data sharing
• Sharing of GCIH between JVMs on the
  same box
• Real-world application:
  – A kind special Map/Reduce jobs uses a big
    piece of precomputed cache data
  – Multiple homogenous jobs run on the same
    machine, using the same cache data
  – can save memory to run more jobs on a
    machine, when CPU isn’t the bottleneck
Before sharing

JVM1 JVM2 JVM3                   …   JVMn

Sharable   Sharable   Sharable       Sharable
  Objs       Objs       Objs           Objs




 Other      Other      Other          Other
 Objs       Objs       Objs           Objs
After sharing

           JVM1 JVM2 JVM3                   …   JVMn

Sharable   Sharable   Sharable   Sharable       Sharable
  Objs       Objs       Objs       Objs           Objs




            Other      Other      Other          Other
            Objs       Objs       Objs           Objs
Case 2: JNI overhead
• JNI carries a lot overhead at invocation
  boundaries
• JNI invocations involves calling JNI native
  wrappers in the VM
JNI wrapper
• Wrappers are in hand-written assembler
• But not necessarily always well-tuned
• Look for opportunities to optimize for
  common cases
Wrapper example
...
0x00002aaaab19be92:   cmpl     $0x0,0x30(%r15) // check the suspend flag
0x00002aaaab19be9a:   je       0x2aaaab19bec6
0x00002aaaab19bea0:   mov      %rax,-0x8(%rbp)
0x00002aaaab19bea4:   mov      %r15,%rdi
0x00002aaaab19bea7:   mov      %rsp,%r12
0x00002aaaab19beaa:   sub      $0x0,%rsp
0x00002aaaab19beae:   and      $0xfffffffffffffff0,%rsp
0x00002aaaab19beb2:   mov      $0x2b7d73bcbda0,%r10
0x00002aaaab19bebc:   rex.WB   callq *%r10
0x00002aaaab19bebf:   mov      %r12,%rsp
0x00002aaaab19bec2:   mov      -0x8(%rbp),%rax
0x00002aaaab19bec6:   movl     $0x8,0x238(%r15) //change thread state to
thread in java
... //continue
Wrapper example (cont.)
• The common case
  – Threads are more unlikely to be suspended
    when running through this wrapper
• Optimize for the common case
  – move the logic that handles suspended state
    out-of-line
Modified wrapper example
...
0x00002aaaab19be3a:   cmpl     $0x0,0x30(%r15) // check the suspend flag
0x00002aaaab19be42:   jne      0x2aaaab19bf52
0x00002aaaab19be48:   movl     $0x8,0x238(%r15) //change thread state to
thread in java

... //continue

0x00002aaaab19bf52:   mov      %rax,-0x8(%rbp)
0x00002aaaab19bf56:   mov      %r15,%rdi
0x00002aaaab19bf59:   mov      %rsp,%r12
0x00002aaaab19bf5c:   sub      $0x0,%rsp
0x00002aaaab19bf60:   and      $0xfffffffffffffff0,%rsp
0x00002aaaab19bf64:   mov      $0x2ae3772aae70,%r10
0x00002aaaab19bf6e:   rex.WB   callq *%r10
0x00002aaaab19bf71:   mov      %r12,%rsp
0x00002aaaab19bf74:   mov      -0x8(%rbp),%rax
0x00002aaaab19bf78:   jmpq     0x2aaaab19be48
...
Performance
• 5%-10% improvement of raw JNI
  invocation performance on various
  microarchitectures
Case 3: new instructions
• SSE 4.2 brings new instructions
  – e.g. CRC32c
• We’re using Westmere now
• Should take advantage of SSE 4.2
CRC32 / CRC32C
• CRC32
 – well known, commonly used checksum
 – used in HDFS
 – JDK’s impl uses zlib, through JNI
• CRC32c
 – an variant of CRC32
 – hardware support by SSE 4.2
Intrinsify CRC32c
• Add new intrinsic methods to directly
  support CRC32c instruction in HotSpot VM
• Hardware accelerated
• To be used in modified HDFS
• Completely avoids JNI overhead
  – HADOOP-7446 still carries JNI overhead




                                             blog post
Other intrinsics
• May intrinsify other operation in the future
  – AES-NI
  – others on applications’ demand
Case 4: frequent CMS GC
• An app experienced back-to-back CMS
  GC cycles after running for a few days
• The Java heaps were far from full
• What’s going on?
The GC Log
2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew:
1747712K->40832K(1922432K), 0.0887510 secs] 1747712K-
>40832K(4019584K), 0.0888740 secs] [Times: user=0.19
sys=0.00, real=0.09 secs]
2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew:
1788544K->109881K(1922432K), 0.0910540 secs] 1788544K-
>109881K(4019584K), 0.0911960 secs] [Times: user=0.24
sys=0.07, real=0.09 secs]
2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial-
mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs]
[Times: user=0.47 sys=0.00, real=0.46 secs]
2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark-
start]
GC log visualized




     The tool used here is GCHisto from Tony Printezis
Need more info
• -XX:+PrintGCReason to the rescue
  – added this new flag to the VM
  – print the direct cause of a GC cycle
The GC Log
2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew:
1747712K->40832K(1922432K), 0.0887510 secs] 1747712K-
>40832K(4019584K), 0.0888740 secs] [Times: user=0.19
sys=0.00, real=0.09 secs]
2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew:
1788544K->109881K(1922432K), 0.0910540 secs] 1788544K-
>109881K(4019584K), 0.0911960 secs] [Times: user=0.24
sys=0.07, real=0.09 secs]
 CMS Perm: collect because of occupancy 0.920845 / 0.920000
CMS perm gen initiated
2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial-
mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs]
[Times: user=0.47 sys=0.00, real=0.46 secs]
2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark-
start]
• Relevant VM arguments
  – -XX:PermSize=96m -XX:MaxPermSize=256m
• The problem was caused by bad
  interaction between CMS GC triggering
  and PermGen expansion
  – Thanks, Ramki!
• The (partial) fix
// Support for concurrent collection policy decisions.
bool CompactibleFreeListSpace::should_concurrent_collect() const {
  // In the future we might want to add in frgamentation stats --
  // including erosion of the "mountain" into this decision as well.
  return !adaptive_freelists() && linearAllocationWouldFail();
  return false;
}
After the change
Case 5: huge objects
• An app bug allocated a huge
  object, causing unexpected OOM
• Where did it come from?
huge objects and arrays
• Most Java objects are small
• Huge objects usually happen to be arrays
• A lot of collection objects use arrays as
  backing storage
  – ArrayLists, HashMaps, etc.
• Tracking huge array allocation can help
  locate huge allocation problems
product(intx, ArrayAllocationWarningSize, 512*M,   
        "array allocation with size larger than"   
        "this (bytes) will be given a warning"     
        "into the GC log")
Demo
import java.util.ArrayList;

public class Demo {
  private static void foo() {
    new ArrayList<Object>(128 * 1024 * 1024);
  }

    public static void main(String[] args) {
      foo();
    }
}
Demo

$ java Demo
==WARNNING== allocating large array:
thread_id[0x0000000059374800], thread_name[main], array_size[
536870928 bytes], array_length[134217728 elememts]
        at java.util.ArrayList.<init>(ArrayList.java:112)
        at Demo.foo(Demo.java:5)
        at Demo.main(Demo.java:9)
Case 6: bad optimizations?
• Some loop optimization bugs were found
  before launch of Oracle JDK 7
• Actually, they exist in recent JDK 6, too
  – some of the fixes weren’t in until JDK6u29
  – can’t wait until an official update with the fixes
  – roll our own workaround
Workarounds
• Explicitly set -XX:-UseLoopPredicate
  when using recent JDK 6
• Or …
Workarounds (cont.)
• Change the defaults of the opt flags to turn
  them off

product(bool, UseLoopPredicate, true false,                 
  "Generate a predicate to select fast/slow loop versions")
A Case Study

JVM TUNING
@ TAOBAO
JVM Tuning
• Most JVM tuning efforts are spent on
  memory related issues
  – we do too
  – lots of reading material available
• Let’s look at something else
  – use JVM internal knowledge to guide tuning
Case: Velocity template
             compilation
• An internal project seeks to compile
  Velocity templates into Java bytecodes
Compilation process
• Parse *.vm source into AST
  – reuse original parser and AST from Velocity
• Traverse the AST and generate Java
  source code as target
  – works like macro expansion
• Use Java Compiler API to generate
  bytecodes
Example
Velocity template source

 Check $dev.Name out!



                       generated Java source

                  _writer.write("Check ");
                  _writer.write(
                    _context.get(_context.get("dev"),
                    "Name", Integer.valueOf(26795951)));
                  _writer.write(" out!");
Performance: interpreted vs. compiled
                                4500
execution time (ms/10K times)




                                4000


                                3500


                                3000


                                2500


                                2000                                                                          compiled
                                                                                                              interpreted
                                1500


                                1000


                                500


                                   0
                                       1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16

                                                           template complexity
Problem
• In the compiled version
  – 1 “complexity” ≈ 800 bytes of bytecode
  – So 11 “complexities” > 8000 bytes of bytecode
Compiled templates larger
than “11” are not JIT’d!
 develop(intx, HugeMethodLimit,  8000,              
    "don't compile methods larger than"             
    "this if +DontCompileHugeMethods")              
  product(bool, DontCompileHugeMethods, true,       
    "don't compile methods > HugeMethodLimit")      


                                   Case Study Summary
-XX:-DontCompileHugeMethods
                                4500
execution time (ms/10K times)




                                4000


                                3500


                                3000


                                2500


                                2000                                                                          compiled
                                                                                                              interpreted
                                1500


                                1000


                                500


                                   0
                                       1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16

                                                           template complexity
JVM OPEN SOURCE
@ TAOBAO
Open Source
• Participate in OpenJDK
  – Already submitted 4 patches into the HotSpot
    VM and its Serviceability Agent
  – Active on OpenJDK mailing-lists
• Sign the OCA
  – Work in progress, almost there
  – Submit more patches after OCA is accepted
• Future open sourcing of custom
  modifications
Open Source (cont.)
• The submitted patches
  – 7050685: jsdbproc64.sh has a typo in the
    package name
  – 7058036: FieldsAllocationStyle=2 does not work
    in 32-bit VM
  – 7060619: C1 should respect inline and dontinline
    directives from CompilerOracle
  – 7072527: CMS: JMM GC counters overcount in
    some cases
• Due to restrictions in contribution
  process, more significant patches cannot be
  submitted until our OCA is accepted
JVM TRAINING
@ TAOBAO
JVM Training
• Regular internal courses on
  – JVM internals
  – JVM tuning
  – JVM troubleshooting
• Discussion group for people interested in
  JVM internals
QUESTIONS?
Kris Mok, Software Engineer, Taobao
@rednaxelafx
莫枢 /“撒迦”

Contenu connexe

Tendances

Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyYusuke Yamamoto
 
What the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupWhat the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupGerrit Grunwald
 
Panamaを先取り!? JVMCIでJITと遊ぶ
Panamaを先取り!? JVMCIでJITと遊ぶPanamaを先取り!? JVMCIでJITと遊ぶ
Panamaを先取り!? JVMCIでJITと遊ぶYasumasa Suenaga
 
GraalVM Native Images by Oleg Selajev @shelajev
GraalVM Native Images by Oleg Selajev @shelajevGraalVM Native Images by Oleg Selajev @shelajev
GraalVM Native Images by Oleg Selajev @shelajevOracle Developers
 
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明National Cheng Kung University
 
Java常见问题排查
Java常见问题排查Java常见问题排查
Java常见问题排查bluedavy lin
 
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionOracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionMarkus Michalewicz
 
Spark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Nimbus GoehausenSpark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Nimbus GoehausenSpark Summit
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVMSylvain Wallez
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅NAVER D2
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
 

Tendances (20)

Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudyスローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
スローダウン、ハングを一発解決 スレッドダンプはトラブルシューティングの味方 #wlstudy
 
What the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupWhat the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startup
 
The Internals of "Hello World" Program
The Internals of "Hello World" ProgramThe Internals of "Hello World" Program
The Internals of "Hello World" Program
 
Panamaを先取り!? JVMCIでJITと遊ぶ
Panamaを先取り!? JVMCIでJITと遊ぶPanamaを先取り!? JVMCIでJITと遊ぶ
Panamaを先取り!? JVMCIでJITと遊ぶ
 
How A Compiler Works: GNU Toolchain
How A Compiler Works: GNU ToolchainHow A Compiler Works: GNU Toolchain
How A Compiler Works: GNU Toolchain
 
GraalVM Native Images by Oleg Selajev @shelajev
GraalVM Native Images by Oleg Selajev @shelajevGraalVM Native Images by Oleg Selajev @shelajev
GraalVM Native Images by Oleg Selajev @shelajev
 
LLVM
LLVMLLVM
LLVM
 
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
Linux : PSCI
Linux : PSCILinux : PSCI
Linux : PSCI
 
Java常见问题排查
Java常见问题排查Java常见问题排查
Java常见问题排查
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
Completable future
Completable futureCompletable future
Completable future
 
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 VersionOracle Multitenant meets Oracle RAC - IOUG 2014 Version
Oracle Multitenant meets Oracle RAC - IOUG 2014 Version
 
Spark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Nimbus GoehausenSpark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Nimbus Goehausen
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVM
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 

Similaire à Kris Mok's Presentation on JVM Customization at Taobao

Tunning mobicent-jean deruelle
Tunning mobicent-jean deruelleTunning mobicent-jean deruelle
Tunning mobicent-jean deruelleIvelin Ivanov
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020Jelastic Multi-Cloud PaaS
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesBruno Borges
 
Fisl - Deployment
Fisl - DeploymentFisl - Deployment
Fisl - DeploymentFabio Akita
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展Leon Chen
 
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftReplatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftVMware Tanzu
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Anna Shymchenko
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance TuningJeremy Leisy
 
Javaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxJavaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxGrace Jansen
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvmPrem Kuppumani
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Dinakar Guniguntala
 
JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015Charles Nutter
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Danielle Womboldt
 
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...Ceph Community
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...ScyllaDB
 
JITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfJITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfRichHagarty
 

Similaire à Kris Mok's Presentation on JVM Customization at Taobao (20)

Basics of JVM Tuning
Basics of JVM TuningBasics of JVM Tuning
Basics of JVM Tuning
 
Tunning mobicent-jean deruelle
Tunning mobicent-jean deruelleTunning mobicent-jean deruelle
Tunning mobicent-jean deruelle
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
Fisl - Deployment
Fisl - DeploymentFisl - Deployment
Fisl - Deployment
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展
 
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftReplatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
Javaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxJavaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptx
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
 
JITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfJITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdf
 

Dernier

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Kris Mok's Presentation on JVM Customization at Taobao

  • 1. Kris Mok, Software Engineer, Taobao @rednaxelafx 莫枢 /“撒迦”
  • 3. Agenda Customization Tuning JVM @ Taobao Open Source Training
  • 5. Java Strengths • Good abstraction • Good performance • Good tooling (IDE, profiler, etc.) • Easy to recruit good programmers
  • 6. Java Weaknesses • Tension between “abstraction leak” and performance – Abstraction and performance don’t always come together • More control/info over GC and object overhead wanted sometimes
  • 7. Our Team • Domain-Specific Computing Team – performance- and efficency-oriented – specific solutions to specific problems – do the low-level plumbing for specific applications targeting specific hardware – we’re hiring! • software and hardware hackers
  • 8. Our Team (cont.) • Current Focus – JVM-level customization/tuning • long term project • based on HotSpot Express 20 from OpenJDK • serving: – 10,000+ JVM instances serving online – 1,000+ Java developers – Hadoop tuning – Dedicated accelerator card adoption
  • 11. Tradeoffs • Would like to make as little impact on existing Java application code as possible • But if the performance/efficiency gains are significant enough, we’re willing to make extensions to the VM/core libs
  • 12. JVM Customizations • GC Invisible Heap (GCIH) • JNI Wrapper improvement • New instructions • PrintGCReason / CMS bug fix • ArrayAllocationWarningSize • Change VM argument defaults • etc.
  • 13. Case 1: in-memory cache • Certain data is computed offline and then fed to online systems in a read-only, “cache” fashion
  • 14. in-memory cache • Fastest way to access them is to – put them in-process, in-memory, – access as normal Java objects, – no serialization/JNI involved per access
  • 15. in-memory cache • Large, static, long-live data in the GC heap – may lead to long GC pauses at full GC, – or long overall concurrent GC cycle • What if we take them out of the GC heap? – but without having to serialize them?
  • 16. GC Inivisible Heap • “GC Invisible Heap” (GCIH) – an extension to HotSpot VM – an in-process, in-memory heap space – not managed by the GC – stores normal Java objects • Currently works with ParNew+CMS
  • 17. GCIH interface • “moveIn(Object root)” – given the root of an object graph, move the whole graph out of GC heap and into GCIH • “moveOut()” – GCIH space reset to a clean state – abandon all data in current GCIH space – (earlier version) move the object graph back into GC heap
  • 18. GCIH interface (cont.) • Current restrictions – data in GCIH should be read-only – objects in GCIH may not be used as monitors – no outgoing references allowed • Restrictions may be relaxed in the future
  • 19. GCIH interface (cont.) • To update data – moveOut – (update) - moveIn
  • 20. -XX:PermSize -XX:MaxPermSize Original -Xms/-Xmx -Xmn Perm Young Old Cache Data GC Managed Heap -XX:PermSize -XX:MaxPermSize Using GCIH -Xms/-Xmx -XX:GCIHSize -Xmn Perm Young Old Cache Data GC Managed Heap GCIH
  • 21. Actual performance • Reduces stop-the-world full GC pause time • Reduces concurrent-mark and concurrent- sweep time – but the two stop-the-world phases of CMS aren’t necessarily significantly faster
  • 22. Total time of CMS GC phases 2.0000 concurrent-mark 1.8000 concurrent- sweep 1.6000 1.4000 1.2000 time (sec) 1.0000 0.8000 0.6000 0.4000 0.2000 initial-mark preclean remark reset 0.0000 1 2 3 4 5 6 Original 0.0072 1.7943 0.0373 0.0118 1.5717 0.0263 w/GCIH 0.0043 0.5400 0.0159 0.0035 0.6266 0.0240
  • 23. Alternatives GCIH BigMemory • × extension to the JVM • √ runs on standard JVM • √ in-process, in-memory • √ in-process, in-memory • √ not under GC control • √ not under GC control • √ direct access of Java • × serialize/deserialize objects Java objects • √ no JNI overhead on • × JNI overhead on access access • √ object graph is in better • × N/A locality
  • 24. GCIH future • still in early stage of development now • may try to make the API surface more like RTSJ
  • 25. Experimental: object data sharing • Sharing of GCIH between JVMs on the same box • Real-world application: – A kind special Map/Reduce jobs uses a big piece of precomputed cache data – Multiple homogenous jobs run on the same machine, using the same cache data – can save memory to run more jobs on a machine, when CPU isn’t the bottleneck
  • 26. Before sharing JVM1 JVM2 JVM3 … JVMn Sharable Sharable Sharable Sharable Objs Objs Objs Objs Other Other Other Other Objs Objs Objs Objs
  • 27. After sharing JVM1 JVM2 JVM3 … JVMn Sharable Sharable Sharable Sharable Sharable Objs Objs Objs Objs Objs Other Other Other Other Objs Objs Objs Objs
  • 28. Case 2: JNI overhead • JNI carries a lot overhead at invocation boundaries • JNI invocations involves calling JNI native wrappers in the VM
  • 29. JNI wrapper • Wrappers are in hand-written assembler • But not necessarily always well-tuned • Look for opportunities to optimize for common cases
  • 30. Wrapper example ... 0x00002aaaab19be92: cmpl $0x0,0x30(%r15) // check the suspend flag 0x00002aaaab19be9a: je 0x2aaaab19bec6 0x00002aaaab19bea0: mov %rax,-0x8(%rbp) 0x00002aaaab19bea4: mov %r15,%rdi 0x00002aaaab19bea7: mov %rsp,%r12 0x00002aaaab19beaa: sub $0x0,%rsp 0x00002aaaab19beae: and $0xfffffffffffffff0,%rsp 0x00002aaaab19beb2: mov $0x2b7d73bcbda0,%r10 0x00002aaaab19bebc: rex.WB callq *%r10 0x00002aaaab19bebf: mov %r12,%rsp 0x00002aaaab19bec2: mov -0x8(%rbp),%rax 0x00002aaaab19bec6: movl $0x8,0x238(%r15) //change thread state to thread in java ... //continue
  • 31. Wrapper example (cont.) • The common case – Threads are more unlikely to be suspended when running through this wrapper • Optimize for the common case – move the logic that handles suspended state out-of-line
  • 32. Modified wrapper example ... 0x00002aaaab19be3a: cmpl $0x0,0x30(%r15) // check the suspend flag 0x00002aaaab19be42: jne 0x2aaaab19bf52 0x00002aaaab19be48: movl $0x8,0x238(%r15) //change thread state to thread in java ... //continue 0x00002aaaab19bf52: mov %rax,-0x8(%rbp) 0x00002aaaab19bf56: mov %r15,%rdi 0x00002aaaab19bf59: mov %rsp,%r12 0x00002aaaab19bf5c: sub $0x0,%rsp 0x00002aaaab19bf60: and $0xfffffffffffffff0,%rsp 0x00002aaaab19bf64: mov $0x2ae3772aae70,%r10 0x00002aaaab19bf6e: rex.WB callq *%r10 0x00002aaaab19bf71: mov %r12,%rsp 0x00002aaaab19bf74: mov -0x8(%rbp),%rax 0x00002aaaab19bf78: jmpq 0x2aaaab19be48 ...
  • 33. Performance • 5%-10% improvement of raw JNI invocation performance on various microarchitectures
  • 34. Case 3: new instructions • SSE 4.2 brings new instructions – e.g. CRC32c • We’re using Westmere now • Should take advantage of SSE 4.2
  • 35. CRC32 / CRC32C • CRC32 – well known, commonly used checksum – used in HDFS – JDK’s impl uses zlib, through JNI • CRC32c – an variant of CRC32 – hardware support by SSE 4.2
  • 36. Intrinsify CRC32c • Add new intrinsic methods to directly support CRC32c instruction in HotSpot VM • Hardware accelerated • To be used in modified HDFS • Completely avoids JNI overhead – HADOOP-7446 still carries JNI overhead blog post
  • 37. Other intrinsics • May intrinsify other operation in the future – AES-NI – others on applications’ demand
  • 38. Case 4: frequent CMS GC • An app experienced back-to-back CMS GC cycles after running for a few days • The Java heaps were far from full • What’s going on?
  • 39. The GC Log 2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew: 1747712K->40832K(1922432K), 0.0887510 secs] 1747712K- >40832K(4019584K), 0.0888740 secs] [Times: user=0.19 sys=0.00, real=0.09 secs] 2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew: 1788544K->109881K(1922432K), 0.0910540 secs] 1788544K- >109881K(4019584K), 0.0911960 secs] [Times: user=0.24 sys=0.07, real=0.09 secs] 2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial- mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs] [Times: user=0.47 sys=0.00, real=0.46 secs] 2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark- start]
  • 40. GC log visualized The tool used here is GCHisto from Tony Printezis
  • 41. Need more info • -XX:+PrintGCReason to the rescue – added this new flag to the VM – print the direct cause of a GC cycle
  • 42. The GC Log 2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew: 1747712K->40832K(1922432K), 0.0887510 secs] 1747712K- >40832K(4019584K), 0.0888740 secs] [Times: user=0.19 sys=0.00, real=0.09 secs] 2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew: 1788544K->109881K(1922432K), 0.0910540 secs] 1788544K- >109881K(4019584K), 0.0911960 secs] [Times: user=0.24 sys=0.07, real=0.09 secs] CMS Perm: collect because of occupancy 0.920845 / 0.920000 CMS perm gen initiated 2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial- mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs] [Times: user=0.47 sys=0.00, real=0.46 secs] 2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark- start]
  • 43. • Relevant VM arguments – -XX:PermSize=96m -XX:MaxPermSize=256m
  • 44. • The problem was caused by bad interaction between CMS GC triggering and PermGen expansion – Thanks, Ramki!
  • 45. • The (partial) fix // Support for concurrent collection policy decisions. bool CompactibleFreeListSpace::should_concurrent_collect() const { // In the future we might want to add in frgamentation stats -- // including erosion of the "mountain" into this decision as well. return !adaptive_freelists() && linearAllocationWouldFail(); return false; }
  • 47. Case 5: huge objects • An app bug allocated a huge object, causing unexpected OOM • Where did it come from?
  • 48. huge objects and arrays • Most Java objects are small • Huge objects usually happen to be arrays • A lot of collection objects use arrays as backing storage – ArrayLists, HashMaps, etc. • Tracking huge array allocation can help locate huge allocation problems
  • 49. product(intx, ArrayAllocationWarningSize, 512*M, "array allocation with size larger than" "this (bytes) will be given a warning" "into the GC log")
  • 50. Demo import java.util.ArrayList; public class Demo { private static void foo() { new ArrayList<Object>(128 * 1024 * 1024); } public static void main(String[] args) { foo(); } }
  • 51. Demo $ java Demo ==WARNNING== allocating large array: thread_id[0x0000000059374800], thread_name[main], array_size[ 536870928 bytes], array_length[134217728 elememts] at java.util.ArrayList.<init>(ArrayList.java:112) at Demo.foo(Demo.java:5) at Demo.main(Demo.java:9)
  • 52. Case 6: bad optimizations? • Some loop optimization bugs were found before launch of Oracle JDK 7 • Actually, they exist in recent JDK 6, too – some of the fixes weren’t in until JDK6u29 – can’t wait until an official update with the fixes – roll our own workaround
  • 53. Workarounds • Explicitly set -XX:-UseLoopPredicate when using recent JDK 6 • Or …
  • 54. Workarounds (cont.) • Change the defaults of the opt flags to turn them off product(bool, UseLoopPredicate, true false, "Generate a predicate to select fast/slow loop versions")
  • 55. A Case Study JVM TUNING @ TAOBAO
  • 56. JVM Tuning • Most JVM tuning efforts are spent on memory related issues – we do too – lots of reading material available • Let’s look at something else – use JVM internal knowledge to guide tuning
  • 57. Case: Velocity template compilation • An internal project seeks to compile Velocity templates into Java bytecodes
  • 58. Compilation process • Parse *.vm source into AST – reuse original parser and AST from Velocity • Traverse the AST and generate Java source code as target – works like macro expansion • Use Java Compiler API to generate bytecodes
  • 59. Example Velocity template source Check $dev.Name out! generated Java source _writer.write("Check "); _writer.write( _context.get(_context.get("dev"), "Name", Integer.valueOf(26795951))); _writer.write(" out!");
  • 60. Performance: interpreted vs. compiled 4500 execution time (ms/10K times) 4000 3500 3000 2500 2000 compiled interpreted 1500 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 template complexity
  • 61. Problem • In the compiled version – 1 “complexity” ≈ 800 bytes of bytecode – So 11 “complexities” > 8000 bytes of bytecode Compiled templates larger than “11” are not JIT’d! develop(intx, HugeMethodLimit, 8000, "don't compile methods larger than" "this if +DontCompileHugeMethods") product(bool, DontCompileHugeMethods, true, "don't compile methods > HugeMethodLimit") Case Study Summary
  • 62. -XX:-DontCompileHugeMethods 4500 execution time (ms/10K times) 4000 3500 3000 2500 2000 compiled interpreted 1500 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 template complexity
  • 64. Open Source • Participate in OpenJDK – Already submitted 4 patches into the HotSpot VM and its Serviceability Agent – Active on OpenJDK mailing-lists • Sign the OCA – Work in progress, almost there – Submit more patches after OCA is accepted • Future open sourcing of custom modifications
  • 65. Open Source (cont.) • The submitted patches – 7050685: jsdbproc64.sh has a typo in the package name – 7058036: FieldsAllocationStyle=2 does not work in 32-bit VM – 7060619: C1 should respect inline and dontinline directives from CompilerOracle – 7072527: CMS: JMM GC counters overcount in some cases • Due to restrictions in contribution process, more significant patches cannot be submitted until our OCA is accepted
  • 67. JVM Training • Regular internal courses on – JVM internals – JVM tuning – JVM troubleshooting • Discussion group for people interested in JVM internals
  • 69. Kris Mok, Software Engineer, Taobao @rednaxelafx 莫枢 /“撒迦”