The document discusses a swap-aware JVM garbage collection policy and parallel logging. It proposes making the default GC policy in Java 8 aware of whether data has been swapped to reduce long GC times due to swap I/O. The full GC process of ParallelCompact is described. For swap-aware GC, solutions are proposed to either skip compacting swapped live data or avoid compaction by remapping virtual memory. For parallel logging in databases, the document discusses implementing a data structure called Grasshopper to support parallel logging in the Shore-MT database and evaluating its performance.
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
Progress_190118
1. Hyojeong Lee
Distributed Computing System Laboratory
Department of Computer Science and Engineering
Seoul National University, Korea
Progress Report
3. ● Motivation
● Necessity of swapping
● Performing memory-intensive workloads such as deep learning algorithm,
the size of generated data is large and unpredictable.
● Current GC policy is not aware of swapping.
● ParallelCompact, default GC policy of Java8 do not consider whether data
is swapped.
● Swap I/O by processing swapped data results in long GC time.
● For examples,
● SVD++ in Sparkbench (swap I/O & execution time)
Swap-aware JVM GC Policy
4. ● Background
● Full GC process
i. Mark
ii. Summary
iii. Compact (sliding)
Swap-aware JVM GC Policy
Eden Sur0 Sur1 Old
Region
OldYoung
5. (cf) Full GC process
old
source_reg
dest_addr
live_size
…
512k
Bottom
Top End
142595
7. (cf) Full GC process - Summary_phase
Traverse the whose space
to pick up regions might be
compacted
source_reg = cur
dest_addr = addr
live_size = own size
…
source_reg = k+i
dest_addr = x
live_size = 0
…
x
k
source_reg = 0
dest_addr = x + size*k
live_size = size
…
8. (cf) Full GC process - Summary_phase
Looking for the first region that
contains dead data = full_cp
Binary
search
full_cp
9. (cf) Full GC process - Summary_phase
Looking for the limited
region with specific size of
dead wood from full_cp
• Density = live_data / capacity
• Limited_dead_wood = Normal_distribution(density) * capacity
10. (cf) Full GC process - Summary_phase
full_cp limited_cp
Limited_dead_wood
11. (cf) Full GC process - Compact_phase (sliding)
best_cp (with Max reclaim-ration=dead-ration)
a.k.a: dense_prefix
Compact regions
12. (cf) Full GC process - Compact_phase (sliding)
Bottom Top End
14. ● Definition
● What we all know,
● The default GC policy of Java8 does not take into account whether
data is swapped,
● Resulting in long GC time due to Swap I/O.
● As a result of analyzing codes,
● Parallel Compact GC goes through the process of
‘Mark → Summary → Compact’.
● Swap I/O occurs when copying swapped data while compacting.
Swap-aware JVM GC Policy
15. ● Survey
● Effectively manage JVM heap to provide a way to handle big
data workloads
● Characterizing and optimizing hotspot parallel GC on multicore
systems, 2018, EuroSys: Work stealing technique
● Analysis and optimizations of java FGC, 2018, ISBN: Region
skipping technique
● How to handle swap overhead
● Relation-based ordering of objects in an object heap, 2002: Using
reference counter, objects with temporal locality can be placed in
the same cache line, minimizing overhead by swapping.
● As a result,
● information about swap is usually not directly used.
Swap-aware JVM GC Policy
16. ● Features of workloads
● High Memory usage
● High Locality
(So, there is compaction for swapped data.)
● Target workload
● Simple java program for validation
● GraphChi
● Platform for large and complex graph operations
● Used frequently for evaluating throughput in studies suggesting
GC policy for memory-intensive workloads
● Deeplearning4j
● Platform for memory-intensive deep learning workloads
Swap-aware JVM GC Policy
17. ● Our solutions (Compact w/ swap)
1. Kernel(page) level: Swap flag
● DONE. Whether data is swapped using pagemap (summary)
● DOING. Whether data is copied or skipped, based on swappiness
of region (compact)
● TODO. Large overhead for checking swappiness →
Inserting a data structure to keep swappiness in the kernel can
reduce the overhead.
Swap-aware JVM GC Policy
dense_prefix
Swap space
Swapped live data → Skip!
18. ● Our solutions (Compact w/o swap)
Swap-aware JVM GC Policy
dense_prefix
Virtual (heap)
Physical (kernel)
Actually, no need to compact
→ Just remapping virtual space!
19. ● Kernel(page) level: Swap flag - Attempt (1)
● Implementation
● Evaluation (w/ Simple java program)
● Simple java program
● Make swapped objects target for compaction
● Result
Swap-aware JVM GC Policy
dense_prefix
Swap space
Swapped live data → Skip!
20. ● Contents
● Parallel logging in Database: Shore-MT
● Data structure as supporter (Grasshopper)
● DONE
● Baseline evaluation
● Implementation of parallel logging (2-thread enabled)
Parallel Logging (WAL)
21. ● DOING
● Analyze source codes of target DB(Shore-MT) and make a
plan for Implementing Grasshopper
Parallel Logging (WAL)
22. ● Swap-aware JVM GC policy
● ~ 01.25: Page level solution
● ~ 02.28: Kernel level solution
● Parallel logging
● ~ 01.25: Implement Grasshopper
● ~ 02.01: Evaluation & Optimization
● ~ 02.28: Parallel logging on Lustre file system
● Improve paper: An Efficient Journaling Mechanism in Lustre File
System for Fast Storage Devices
TODO