Docker, JVM and CPU

•

0 j'aime•403 vues

With the increasing adoption of cloud native technologies and containerization; the gap between Java development and system administration is decreasing. Whether you are using Docker Swarm, Kubernetes or Mesos/Marathon as a container orchestrator; fundamental challenges for running docker in production are common. In this talk, I would like to share some of the basic linux concepts about CPU scheduling every Java Developer should know to be able to perform effective configuration and troubleshooting for docker containers. Yes, Docker provides isolation, but only if you know how best to configure it.

Logiciels

Docker, JVM and CPU
Aparna Chaudhary (@aparnachaudhary)

System is slow!
Service foo is
slow everyday at
23:00
All services on
this specific node
are slow
Noisy Neighbours
@aparnachaudhary

CPU Shares
● Default CPU isolation
● Provides a priority weighting across all cpu cycles across all cores.
● Default weight for any container is 1024
Containers
per node
increases
CPU per
container
reduces
@aparnachaudhary
docker run --rm -d mytinyservice:1.0.0

Core Core
Core Core
23:00 @aparnachaudhary
CPU Shares and noisy neighbours
16:00
FooFoo
Bar Bar
OnDietOnDiet
Foo becomes slow
because of Bar

CPU Set
● Limits container’s processes to specific CPU cores
● A comma-separated list of cores
● A hyphen separated range of cores
Used typically in
case of a database
@aparnachaudhary
docker run --rm -d cpuset=0-1 mydb:1.0.0
docker run --rm -d cpuset=1,3 mydb:1.0.0

CPU Limits
Allows container to use CPU time for duration of cpu-quota every cpu-period
--cpu-period = 100 microseconds (default)
--cpu-quota = 20 microseconds
Then application is allowed to use 20 microseconds of CPU time every 100
microseconds.
@aparnachaudhary

Scenario: Application uses all CPU quota at the start
--cpu-period=100 µs --cpu-period=100 µs --cpu-period=100 µs
80 µs20 µs 80 µs20 µs 80 µs20 µs
--cpu-period=100 µs
--cpu-quota=20 µs
Latency experienced = 80 µs Run
Throttle
@aparnachaudhary

Scenario: STW GC Run
--cpu-period=100 µs --cpu-period=100 µs --cpu-period=100 µs
80
--cpu-period=100 µs
--cpu-quota=20 µs
Latency experienced = 95 µs
Run
Throttle
GC
5 15 805 15 805 15
@aparnachaudhary

Scenario: Concurrent GC Run (STW + non-STW phases)
--cpu-period=100 µs --cpu-period=100 µs --cpu-period=100 µs
80
--cpu-period=100 µs
--cpu-quota=20 µs
Latency experienced = 95 µs
Run
Throttle
GC
10 8010 8010
CPU time stolen by GC -
end user experiences
STW.
10 10 10
Larger the number of GC
threads, higher the latency
experienced by end user.
@aparnachaudhary

Running JVM in docker
needs thorough
understanding of how
JVM GC interacts with
cgroup CPU scheduling
@aparnachaudhary

CPU, Cgroups, JVM - What do I do?
CPU Shares:
● Unpredictable performance because of
noisy neighbours
● Simple to configure
● Allows use of idle CPU resources
● Difficult capacity planning
CPU Limits:
● Predictable performance if tuned properly
for GC behavior
● Difficult to configure properly
● Idle CPU resources are not utilized
● Better capacity planning
@aparnachaudhary

cgroups Pseudo files
● Fastest way to gather stats
● No root access needed
@aparnachaudhary

Decide on QoS (Best
Effort, Burstable,
Guaranteed)
Use cgroup pseudo files to
understand resource
utilization
CPU share will cause
unpredictable
performance because of
noisy neighbours
CPU limits may cause
throttling of application
during GC.
Make periodic
thread dumps
@aparnachaudhary
Use JVM Flags
-XX:+PrintFlagsFinal
-XX:ParallelGCThreads
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:<file-path>

References
● GC analysis - http://gceasy.io/
● Thread dump analysis - http://fastthread.io/
● CFS CPU Scheduler -
https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
@aparnachaudhary

Recommandé

Docker tips-for-java-developersAparna Chaudhary

(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014Amazon Web Services

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs

LSA2 - 01 Virtualization with KVMMarian Marinov

(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...Amazon Web Services

Odoo Performance LimitsOdoo

Reconnaissance of Virtio: What’s new and how it’s all connected?Samsung Open Source Group

LSA2 - 02 Control GroupsMarian Marinov

Recommandé

Docker tips-for-java-developersAparna Chaudhary

(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014Amazon Web Services

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs

LSA2 - 01 Virtualization with KVMMarian Marinov

(PFC303) Milliseconds Matter: Design, Deploy, and Operate Your Application fo...Amazon Web Services

Odoo Performance LimitsOdoo

Reconnaissance of Virtio: What’s new and how it’s all connected?Samsung Open Source Group

LSA2 - 02 Control GroupsMarian Marinov

クラウド環境におけるキャッシュメモリQoS制御の評価Ryousei Takano

Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Anne Nicolas

Performance Tuning EC2 InstancesBrendan Gregg

Dev opsTom Hall

New Ways to Find Latency in Linux Using TracingScyllaDB

SiteGround Tech TeamBuildingMarian Marinov

Практический опыт профайлинга и оптимизации производительности Ruby-приложенийOlga Lavrentieva

Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov

Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry codeAnne Nicolas

Odoo Online platform: architecture and challengesOdoo

OSNoise Tracer: Who Is Stealing My CPU Time?ScyllaDB

Comparison of foss distributed storageMarian Marinov

import rdma: zero-copy networking with RDMA and Pythongroveronline

Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg

Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services

Mastering java in containers - MadridJUGJorge Morales

Linux Kernel Init ProcessKernel TLV

Windows kernel debugging workshop in floridaSisimon Soman

Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB

Exactly once with spark streamingQuentin Ambard

VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld

Kvm performance optimization for ubuntuSim Janghoon

Contenu connexe

Tendances

クラウド環境におけるキャッシュメモリQoS制御の評価Ryousei Takano

Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Anne Nicolas

Performance Tuning EC2 InstancesBrendan Gregg

Dev opsTom Hall

New Ways to Find Latency in Linux Using TracingScyllaDB

SiteGround Tech TeamBuildingMarian Marinov

Практический опыт профайлинга и оптимизации производительности Ruby-приложенийOlga Lavrentieva

Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov

Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry codeAnne Nicolas

Odoo Online platform: architecture and challengesOdoo

OSNoise Tracer: Who Is Stealing My CPU Time?ScyllaDB

Comparison of foss distributed storageMarian Marinov

import rdma: zero-copy networking with RDMA and Pythongroveronline

Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg

Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services

Mastering java in containers - MadridJUGJorge Morales

Linux Kernel Init ProcessKernel TLV

Windows kernel debugging workshop in floridaSisimon Soman

Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB

Exactly once with spark streamingQuentin Ambard

Tendances (20)

クラウド環境におけるキャッシュメモリQoS制御の評価

Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)

Performance Tuning EC2 Instances

Dev ops

New Ways to Find Latency in Linux Using Tracing

SiteGround Tech TeamBuilding

Практический опыт профайлинга и оптимизации производительности Ruby-приложений

Performance comparison of Distributed File Systems on 1Gbit networks

Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code

Odoo Online platform: architecture and challenges

OSNoise Tracer: Who Is Stealing My CPU Time?

Comparison of foss distributed storage

import rdma: zero-copy networking with RDMA and Python

Velocity 2017 Performance analysis superpowers with Linux eBPF

Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013

Mastering java in containers - MadridJUG

Linux Kernel Init Process

Windows kernel debugging workshop in florida

Scaling Apache Pulsar to 10 Petabytes/Day

Exactly once with spark streaming

Similaire à Docker, JVM and CPU

VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld

Kvm performance optimization for ubuntuSim Janghoon

Project ACRN CPU sharing BVT scheduler in ACRN hypervisorProject ACRN

Deep Dive on Amazon EC2 Instances (March 2017)Julien SIMON

Tizen Developer Conference 2017 San Francisco - Tizen Power Management Servic...Chanwoo Choi

ceph-barcelona-v-1.2Ranga Swami Reddy Muthumula

Ceph barcelona-v-1.2Ranga Swami Reddy Muthumula

Haproxy - zastosowaniaŁukasz Jagiełło

Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu

XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.The Linux Foundation

Oracle VM 3 hard partitioningGary Waldrom

Power managementScott Shu

DockerCon EU '17 - Dockerizing AureaŁukasz Piątkowski

Oracle VM 3 Hard PartitioningAmonra IT

20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdfJunZhao68

VMworld 2013: Extreme Performance Series: Monster Virtual Machines VMworld

Tuning the Kernel for Varnish CachePer Buer

(NET404) Making Every Packet CountAmazon Web Services

ACRN vMeet-Up EU 2021 - Real Time Management and Performance OptimizationProject ACRN

Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreVietnam Open Infrastructure User Group

Similaire à Docker, JVM and CPU (20)

VMworld 2016: vSphere 6.x Host Resource Deep Dive

Kvm performance optimization for ubuntu

Project ACRN CPU sharing BVT scheduler in ACRN hypervisor

Deep Dive on Amazon EC2 Instances (March 2017)

Tizen Developer Conference 2017 San Francisco - Tizen Power Management Servic...

ceph-barcelona-v-1.2

Ceph barcelona-v-1.2

Haproxy - zastosowania

Build an High-Performance and High-Durable Block Storage Service Based on Ceph

XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.

Oracle VM 3 hard partitioning

Power management

DockerCon EU '17 - Dockerizing Aurea

Oracle VM 3 Hard Partitioning

20160927-tierney-improving-performance-40G-100G-data-transfer-nodes.pdf

VMworld 2013: Extreme Performance Series: Monster Virtual Machines

Tuning the Kernel for Varnish Cache

(NET404) Making Every Packet Count

ACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization

Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core

Dernier

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek

Right Money Management App For Your Financial GoalsJhone kinadey

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10

5 Signs You Need a Fashion PLM Software.pdfWave PLM

AI & Machine Learning Presentation TemplatePresentation.STUDIO

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Dernier (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

Diamond Application Development Crafting Solutions with Precision

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

Right Money Management App For Your Financial Goals

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf

5 Signs You Need a Fashion PLM Software.pdf

AI & Machine Learning Presentation Template

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution

Optimizing AI for immediate response in Smart CCTV

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

VTU technical seminar 8Th Sem on Scikit-learn

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Unlocking the Future of AI Agents with Large Language Models

Docker, JVM and CPU

1. Docker, JVM and CPU Aparna Chaudhary (@aparnachaudhary)

2. System is slow! Service foo is slow everyday at 23:00 All services on this specific node are slow Noisy Neighbours @aparnachaudhary

3. CPU Shares ● Default CPU isolation ● Provides a priority weighting across all cpu cycles across all cores. ● Default weight for any container is 1024 Containers per node increases CPU per container reduces @aparnachaudhary docker run --rm -d mytinyservice:1.0.0

4. Core Core Core Core 23:00 @aparnachaudhary CPU Shares and noisy neighbours 16:00 FooFoo Bar Bar OnDietOnDiet Foo becomes slow because of Bar

5. CPU Set ● Limits container’s processes to specific CPU cores ● A comma-separated list of cores ● A hyphen separated range of cores Used typically in case of a database @aparnachaudhary docker run --rm -d cpuset=0-1 mydb:1.0.0 docker run --rm -d cpuset=1,3 mydb:1.0.0

6. CPU Limits Allows container to use CPU time for duration of cpu-quota every cpu-period --cpu-period = 100 microseconds (default) --cpu-quota = 20 microseconds Then application is allowed to use 20 microseconds of CPU time every 100 microseconds. @aparnachaudhary

7. Scenario: Application uses all CPU quota at the start --cpu-period=100 µs --cpu-period=100 µs --cpu-period=100 µs 80 µs20 µs 80 µs20 µs 80 µs20 µs --cpu-period=100 µs --cpu-quota=20 µs Latency experienced = 80 µs Run Throttle @aparnachaudhary

8. Scenario: STW GC Run --cpu-period=100 µs --cpu-period=100 µs --cpu-period=100 µs 80 --cpu-period=100 µs --cpu-quota=20 µs Latency experienced = 95 µs Run Throttle GC 5 15 805 15 805 15 @aparnachaudhary

9. Scenario: Concurrent GC Run (STW + non-STW phases) --cpu-period=100 µs --cpu-period=100 µs --cpu-period=100 µs 80 --cpu-period=100 µs --cpu-quota=20 µs Latency experienced = 95 µs Run Throttle GC 10 8010 8010 CPU time stolen by GC - end user experiences STW. 10 10 10 Larger the number of GC threads, higher the latency experienced by end user. @aparnachaudhary

10. Running JVM in docker needs thorough understanding of how JVM GC interacts with cgroup CPU scheduling @aparnachaudhary

11. CPU, Cgroups, JVM - What do I do? CPU Shares: ● Unpredictable performance because of noisy neighbours ● Simple to configure ● Allows use of idle CPU resources ● Difficult capacity planning CPU Limits: ● Predictable performance if tuned properly for GC behavior ● Difficult to configure properly ● Idle CPU resources are not utilized ● Better capacity planning @aparnachaudhary

12. Docker and Diagnostics

13. Docker REST API @aparnachaudhary

14. cgroups Pseudo files ● Fastest way to gather stats ● No root access needed @aparnachaudhary

15. Remember...

16. Decide on QoS (Best Effort, Burstable, Guaranteed) Use cgroup pseudo files to understand resource utilization CPU share will cause unpredictable performance because of noisy neighbours CPU limits may cause throttling of application during GC. Make periodic thread dumps @aparnachaudhary Use JVM Flags -XX:+PrintFlagsFinal -XX:ParallelGCThreads -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<file-path>

17. References ● GC analysis - http://gceasy.io/ ● Thread dump analysis - http://fastthread.io/ ● CFS CPU Scheduler - https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt @aparnachaudhary