SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Evaluation of Container Virtualized
MEGADOCK System
in Distributed Computing Environment
March 23th, 2017
SIG BIO 49@Japan Advanced Institute of Science and Technology
Kento Aoyama1,2, Yuki Yamamoto1,2, Masahito Ohue1,3, Yutaka Akiyama1,2,3
1) Department of Computer Science, School of Computing
Tokyo Institute of Technology
2) Education Academy of Computational Life Sciences (ACLS)
Tokyo Institute of Technology
3) Advanced Computational Drug Discovery Unit, Institute of Innovative Research
Tokyo Institute of Technology
“Docker” 2
https://www.docker.com/what-container
No. of pulled containers from DockerHub
Docker and Bioinformatics 3
A. Paolo, D. Tommaso, A. B. Ramirez, E. Palumbo, C. Notredame, and D.
Gruber, “Benchmark Report : Univa Grid Engine , Nextflow , and Docker
for running Genomic Analysis Workflows.”
Docker Integration Benchmark Report
@Centre for Genomic Regulation
(Barcelona, Spain)
• Univa Grid Engine (Job Scheduler)
• Nextflow (Workflow manager)
• Docker (Linux Container)
• Reproducibility
• Portability
To develop the
Container-Native HPC Bioinformatics Application
Using Linux Container
which has …
• Low Dependency on Environment
• High-Performance
• Parallel execution performance
• Overhead of virtualization
• Dynamically Scaling
Research Purpose 4
• To evaluate the
Performance of Docker Container-Virtualization
in Bioinformatics Application
Target Application
• MEGADOCK[1]
• FFT-grid-based Protein-Protein Docking software
• Multi-threading, Multi-node, Multi-GPU (OpenMP, MPI, GPU)
• Extremely compute intensive workloads
Today’s Report 5
[1] Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high-performance protein-protein docking
software for heterogeneous supercomputers”, Bioinformatics, 30(22): 3281-3283, 2014.
Background
Linux Container
Docker
Container & Bioinformatics
6
Kernel-Shared Virtualization
• Lightweight : small size, fast deploy, easy sharing
• Performance : few virtualization overhead, faster than VM
Linux Container 7
Hardware
Linux Kernel
Container
App
Bins/Libs
Container
App
Bins/Libs
Hardware
Virtual
Machine
App
Guest
OS
Bins/Libs
Virtual
Machine
App
Guest
OS
Bins/Libs
Hypervisor
Virtual Machines Containers
Linux Container
• virtualizes the host resource as containers
• Filesystem, hostname, IPC, PID, Network, User, etc.
• can be used like Virtual Machines
Linux Kernel Features
• Containers are sharing same host kernel
• namespace[1], chroot, cgroup, SELinux, etc.
Container-based Virtualization 8
[1] E. W. Biederman. “Multiple instances of the global Linux namespaces.”,
In Proceedings of the 2006 Ottawa Linux Symposium, 2006.
Machine
Linux Kernel Space
Container
Process
Process
Container
Process
Process
Linux Container – Performance [1] 9
[1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual
machines and Linux containers,” IEEE International Symposium on Performance Analysis of Systems and
Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.)
0.96 1.00 0.98
0.78
0.83
0.99
0.82
0.98
0.00
0.20
0.40
0.60
0.80
1.00
PXZ [MB/s] Linpack [GFLOPS] Random Access [GUPS]
PerformanceRatio
[basedNative]
Native Docker KVM KVM-tuned
Docker [1]
• Most popular Linux Container management platform
• Many useful components and services
Linux Container Management Tools 10
[1] Solomon Hykes and others. “What is Docker?” - https://www.docker.com/what-docker
[2] W. Bhimji, S. Canon, D. Jacobsen, L. Gerhardt, M. Mustafa, and J. Porter, “Shifter : Containers for
HPC,” Cray User Group, pp. 1–12, 2016.
[3] “Singularity” - http://singularity.lbl.gov/
[1]
[2] [3]
Easy container sharing – Docker Hub 11
Portability & Reproducibility
• Easy to share the application environment via Docker Hub
• Containers can be executed on other host machine
Ubuntu
Docker Engine
Container
App
Bins/Libs
Image
App
Bins/Libs
Docker Hub
Image
App
Bins/Libs
Push Pull
Dockerfile
apt-get install …
wget …
…
make
CentOS
Docker Engine
Container
App
Bins/Libs
Image
App
Bins/Libs
Generate
Share
AUFS (Advanced multi layered unification filesystem) [1]
• Docker default filesystem as AUFS
• Layers can be reused in other container image
• AUFS helps software Reproducibility
Docker - Filesystem 12
[1] Advanced multi layered unification filesystem. http://aufs.sourceforge.net, 2014.
Docker Container (image)
f49eec89601e 129.5 MB ubuntu:16.04 (base image)
366a03547595 39.85 MB
ef122501292c 133.6 MB
e50c89716342 660.4 KB
tag: beta
tag: version-1.0
tag: version-1.0.2
tag: version-1.25aec9aa5462c 24.17 MB
tag: latest0d3cccd04bdb 6.07 MB
Why in the field of Bioinformatics?
• Types of Applications
• Data Analysis, Machine Learning
• MD Simulation, Docking calc. , etc.
• Data-centric workload
• Compute : Large
• Data I/O : Case by case
• Communication : Small
• Container performs well on compute-Intensive workload[1]
For Bioinformatics Apps : 1 13
[1] W. Felter, et al. “An updated performance comparison of virtual
machines and Linux containers,” IEEE International Symposium on
Performance Analysis of Systems and Software, pp.171-172, 2015.
Reproducibility
• Different version of library can make different result
• e.g.) Genomic analysis pipeline [Paolo, 2016]
Container A’
Container A
Container BContainer A
For Bioinformatics Apps : 2 14
Library A
Application A Application B
version >= 1.2 version < 1.1
Application A
Library version 1.3
Result A’
Application A
Library version 1.2
Result A
conflict
different
result
Dependency
Isolation
Application
Reproducibility
Dependency conflict
• Different application can requires different version of same library
Performance
• Few performance overhead
Reproducibility
• Dependency Isolation from other applications/libraries
Portability, Generality
• Sharing/Porting to other environment
Features for Bioinformatics Apps 15
Features Native VM Container
Performance
Scalability
Great Bad Good
Reproducibility Bad Good Great
Portability
Generality
Bad Great Great
Proposed Method
16
MEGADOCK 17
Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high-
performance protein-protein docking software for
heterogeneous supercomputers”, Bioinformatics,
30(22): 3281-3283, 2014.
High-performance protein-protein interaction predictions
• FFT-grid based docking software
• Extremely compute-intensive
• OpenMP/MPI/GPU support
• Great HPC Performance
Container-based Application Distribution 18
ResourceResource
MEGA
DOCK
Resource
MEGA
DOCK
Add/Remove
Container
Resource
MEGA
DOCK
Add/Remove
Application
Layer
Compute
Resource
Layer
• All application dependencies exist in the Container
• Easy-to-test application
• Easy-to-scale size of resources
Test Environment Production Environment
Experiments
19
Experiment I
Evaluate container virtualization overhead on Physical Machine
• Physical Machine (single-node) + Docker
• Physical Machine (single-node, GPU) + NVIDIA-Docker
Experiment II
Evaluate container virtualization overhead on Cloud Environment
• Virtual Machines (multi-node) + Docker
• Virtual Machines (multi-node, GPU) + NVIDIA-Docker
Experiments 20
Measurement
• megadock-gpu exec. time
• time command (6 times, median)
Dataset
• 100 pair-pdb (KEGG pathway)
Options (OpenMP, OpenMPI)
• MPI : 12 threads / 4 MPI process / 1 node
• GPU : 1 GPU / 1 process / 1 node
Overview of Experiment I 21
Physical Machine
MPI
MPI
MPI
MPI
Physical Machine
Docker
MPI
MPI
MPI
MPI
Physical Machine
GPU
MEGADOCK
GPU
Physical Machine
NVIDIA Docker
MEGADOCK
GPU
GPU
(b)(a)
(d)(c)
Test Case Native Docker
CPU (MPI) (a) (b)
GPU (c) (d)
Hardware/Software Specification 22
Software Env. Physical Machine Docker NVIDIA Docker (GPU)
OS (image) CentOS 7.2.1511 ubuntu:14.04 nvidia/cuda8.0-devel
Linux Kernel 3.10.0 3.10.0 3.10.0
GCC 4.8.5 4.8.4 4.8.4
FFTW 3.3.5 3.3.5 3.3.5
OpenMPI 1.10.0 1.6.5 N/A
Docker Engine 1.12.3 N/A N/A
NVCC 8.0.44 N/A 8.0.44
NVIDIA Docker 1.0.0 rc.3 N/A N/A
NVIDIA Driver 367.48 N/A 367.48
CPU Intel Xeon E5-1630, 3.7 [GHz] ×8 [core]
Memory 32 [GB]
Local SSD 128 [GB]
GPU NVIDIA Tesla K40
Execution time 23
7353.80
1646.09
7850.57
1638.05
0
1500
3000
4500
6000
7500
9000
CPU (MPI) GPU
Time[sec]
Native Docker
+6.32 % slower
Profile Result (CPU time) 24
Process native [sec] docker [sec] diff Ratio (all)
FFT3D 7.40E+04 7.63E+04 +3.01% 76.84%
MPIDP-Master 8010.98 8325.9 +3.78% 8.38%
Create Voxel 3743.7 3993.29 +6.25% 4.02%
FFT Convolution 3551.08 3576.43 +0.71% 3.60%
Score Sort 2462.61 2459.7 -0.12% 2.48%
Output Detail 2139.94 2225.96 +3.86% 2.24%
Ligand Preparation 1035.51 1849.11 +44.00% 1.86%
MPI_Barrier 236.95 231.05 -2.55% 0.23%
MPI_Init 0.94 4.54 79.30% 0.00%
… … … … …
(a) MEGADOCK-Azure[2]
Measurement
• megadock-dp exec. time
• time command (3 times, median)
Dataset
• ZDOCK benchmark 1.0 [1]
(59 * 59 = 3481 pairs)
Options (OpenMP, OpenMPI)
• MPI : 12 threads / 4 MPI process / 1 node
All file input/output in Local SSD
Overview of Experiment II-(a) 25
Virtual
Machine
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
Master Process
Worker Process
(Other)
[1] R. Chen, et al. “A protein-protein docking benchmark,” Proteins: Structure,
Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003.
[2] Masahito Ohue, et al. ”MEGADOCK-Azure: High-performance protein-protein
interaction prediction system on Microsoft Azure HPC”, IIBMP2016.
(b) MEGADOCK + Docker on Microsoft Azure
Measurement
• megadock-dp exec. time
• time command (3 times, median)
Dataset
• ZDOCK benchmark 1.0
(59 * 59 = 3481 pairs)
Options (OpenMP, OpenMPI)
• MPI : 12 threads / 4 MPI process / 1 node
All file input/output in Local SSD
Docker Swarm
• All Containers in 1 overlay network
Overview of Experiment II-(b) 26
Virtual Machine
Docker
MPI
MPI
MPI
MPI
Docker
MPI
MPI
MPI
MPI
Docker
MPI
MPI
MPI
MPI
Docker
MPI
MPI
MPI
MPI
Docker
MPI
MPI
MPI
MPI
Docker
MPI
MPI
MPI
MPI
Docker
MPI
MPI
MPI
MPI
Docker Swarm
(Docker Network)
Master Process
Worker Process
(Other)
[1] R. Chen, J. Mintseris, J. Janin, and Z. Weng, “A protein-protein docking benchmark,”
Proteins: Structure, Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003.
VM Instance/Software Specification 27
Software Env. Virtual Machine Docker
OS (image) SUSE Linux Enterprise Server 12 ubuntu:14.04
Linux Kernel 3.12.43 3.12.43
GCC 4.8.3 4.8.4
FFTW 3.3.4 3.3.5
OpenMPI 1.10.2 1.6.5
Docker Engine 1.12.6 N/A
VM Instance Standard_D14_v2
CPU Intel Xeon E5-2673, 2.40 [GHz] × 16 [core]
Memory 112 [GB]
Local SSD 800 [GB]
Execution time 28
145,534
25,515
13,132
6,006
4,098
117,219
25,145
12,331
6,344
3,971
0
25,000
50,000
75,000
100,000
125,000
150,000
1 5 10 20 30
Time[sec]
# of VMs
VM Docker on VM
May be a measurement mistake
Scalability (Strong Scaling, based VM=1) 29
0
5
10
15
20
25
30
35
40
45
0 100 200 300 400 500
Speed-up
# of worker cores
Ideal VM Docker on VM
VM=5
VM=1
VM=10
VM=20
VM=30
comparable scalability
Experiment I
• MEGADOCK + Docker on Physical Machine
showed 6.32% lower performance.
• Docker can cause 0-4% compute-performance down[1]
• Communications via Docker NAT (Network Address Translation)
• MEGADOCK (GPU) + NVIDIA-Docker on Physical Machine
showed comparable performance to native.
• GPU calc. is independent from container virtualization
• Container virtualization has few overhead on memory bandwidth
Experiment II
• MEGADOCK + Docker on Microsoft Azure
performed comparable scalability.
• Container virtualization overhead is smaller than other cloud environment factor
Result & Discussion 30
[1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual
machines and Linux containers”, IEEE International Symposium on Performance Analysis of Systems
and Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.)
• Performance overhead of
Docker container-virtualization is small.
• suitable for GPU-accelerated-App and Cloud Environment
• Container-Virtualization can isolate
application environment from host environment.
• same container image can be used on various machines
• Physical machine on local environment
• Virtual machine on cloud environment
• Docker is useful for computational research work
Conclusion 31
Multi-Node & Multi-GPU Evaluation on Cloud
• NVIDIA-Docker is not available on Docker Swarm mode
• Kubernetes[1] officially support 1GPU/1node
• (experimental-feature: multi-GPU support)
Container-based Task Distribution
• Web-Service-Application like container-based distribution
• easy to scale computing resource
• easy to extends multiple task (e.g. GHOST-MP, MEGADOCK)
Future Work 32
[1] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and
Kubernetes,” acmqueue, vol. 14, no. 1, p. 24, 2016.

Contenu connexe

Tendances

Tendances (20)

Shifter singularity - june 7, 2018 - bw symposium
Shifter  singularity - june 7, 2018 - bw symposiumShifter  singularity - june 7, 2018 - bw symposium
Shifter singularity - june 7, 2018 - bw symposium
 
DockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるDockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐる
 
Deploy microservices in containers with Docker and friends - KCDC2015
Deploy microservices in containers with Docker and friends - KCDC2015Deploy microservices in containers with Docker and friends - KCDC2015
Deploy microservices in containers with Docker and friends - KCDC2015
 
How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016
 
Tsunami of Technologies. Are we prepared?
Tsunami of Technologies. Are we prepared?Tsunami of Technologies. Are we prepared?
Tsunami of Technologies. Are we prepared?
 
P2P Container Image Distribution on IPFS With containerd and nerdctl
P2P Container Image Distribution on IPFS With containerd and nerdctlP2P Container Image Distribution on IPFS With containerd and nerdctl
P2P Container Image Distribution on IPFS With containerd and nerdctl
 
Docker and the Container Ecosystem
Docker and the Container EcosystemDocker and the Container Ecosystem
Docker and the Container Ecosystem
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker Security
 
Faster and Easier Software Development using Docker Platform
Faster and Easier Software Development using Docker PlatformFaster and Easier Software Development using Docker Platform
Faster and Easier Software Development using Docker Platform
 
Open Source By The Numbers
Open Source By The NumbersOpen Source By The Numbers
Open Source By The Numbers
 
Hack the whale
Hack the whaleHack the whale
Hack the whale
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
 
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
 
App container rkt
App container rktApp container rkt
App container rkt
 
Container Security: How We Got Here and Where We're Going
Container Security: How We Got Here and Where We're GoingContainer Security: How We Got Here and Where We're Going
Container Security: How We Got Here and Where We're Going
 
The Docker ecosystem and the future of application deployment
The Docker ecosystem and the future of application deploymentThe Docker ecosystem and the future of application deployment
The Docker ecosystem and the future of application deployment
 
Cloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDCloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CD
 
Head first docker
Head first dockerHead first docker
Head first docker
 
Postgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh ShahPostgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh Shah
 
Docker: A New Way to Turbocharging Your Apps Development
Docker: A New Way to Turbocharging Your Apps DevelopmentDocker: A New Way to Turbocharging Your Apps Development
Docker: A New Way to Turbocharging Your Apps Development
 

En vedette

Анализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернелАнализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернел
Zero Science Lab
 
RDMA on ARM
RDMA on ARMRDMA on ARM
RDMA on ARM
inside-BigData.com
 
environmental analysis and its technique
environmental analysis and its technique environmental analysis and its technique
environmental analysis and its technique
Sonu Nitish
 

En vedette (20)

Business Environment and Analysis
Business Environment and AnalysisBusiness Environment and Analysis
Business Environment and Analysis
 
Ghkol 의료시스템 해외진출 전략세미나 발표자료(161213)
Ghkol 의료시스템 해외진출 전략세미나 발표자료(161213)Ghkol 의료시스템 해외진출 전략세미나 발표자료(161213)
Ghkol 의료시스템 해외진출 전략세미나 발표자료(161213)
 
ゆるふわなDockerの使い方
ゆるふわなDockerの使い方ゆるふわなDockerの使い方
ゆるふわなDockerの使い方
 
Анализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернелАнализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернел
 
Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
 
RDMA on ARM
RDMA on ARMRDMA on ARM
RDMA on ARM
 
Linux device drivers
Linux device driversLinux device drivers
Linux device drivers
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
빅데이터 시대의 현명한 선택, UIA 플랫폼
빅데이터 시대의 현명한 선택, UIA 플랫폼빅데이터 시대의 현명한 선택, UIA 플랫폼
빅데이터 시대의 현명한 선택, UIA 플랫폼
 
Ceph Object Store
Ceph Object StoreCeph Object Store
Ceph Object Store
 
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java ProgramsTMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
 
Disaster Recovery and Ceph Block Storage: Introducing Multi-Site Mirroring
Disaster Recovery and Ceph Block Storage: Introducing Multi-Site MirroringDisaster Recovery and Ceph Block Storage: Introducing Multi-Site Mirroring
Disaster Recovery and Ceph Block Storage: Introducing Multi-Site Mirroring
 
2014 산업단지 안전 서비스디자인 김현선디자인연구소 한국디자인진흥원
2014 산업단지 안전 서비스디자인 김현선디자인연구소 한국디자인진흥원2014 산업단지 안전 서비스디자인 김현선디자인연구소 한국디자인진흥원
2014 산업단지 안전 서비스디자인 김현선디자인연구소 한국디자인진흥원
 
【18-E-3】クラウド・ネイティブ時代の2016年だから始める Docker 基礎講座
【18-E-3】クラウド・ネイティブ時代の2016年だから始める Docker 基礎講座【18-E-3】クラウド・ネイティブ時代の2016年だから始める Docker 基礎講座
【18-E-3】クラウド・ネイティブ時代の2016年だから始める Docker 基礎講座
 
1. numPYNQ - Project Presentation
1. numPYNQ - Project Presentation1. numPYNQ - Project Presentation
1. numPYNQ - Project Presentation
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
environmental analysis and its technique
environmental analysis and its technique environmental analysis and its technique
environmental analysis and its technique
 
A tour of (advanced) Akka features in 40 minutes
A tour of (advanced) Akka features in 40 minutesA tour of (advanced) Akka features in 40 minutes
A tour of (advanced) Akka features in 40 minutes
 
Migrating to Java 9 Modules
Migrating to Java 9 ModulesMigrating to Java 9 Modules
Migrating to Java 9 Modules
 

Similaire à Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment

Using Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and servicesUsing Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and services
F5 Networks
 

Similaire à Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment (20)

Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
 
Reproducibility of computational workflows is automated using continuous anal...
Reproducibility of computational workflows is automated using continuous anal...Reproducibility of computational workflows is automated using continuous anal...
Reproducibility of computational workflows is automated using continuous anal...
 
Cont0519
Cont0519Cont0519
Cont0519
 
Docker SF Meetup January 2016
Docker SF Meetup January 2016Docker SF Meetup January 2016
Docker SF Meetup January 2016
 
Codecamp 2020 microservices made easy workshop
Codecamp 2020 microservices made easy workshopCodecamp 2020 microservices made easy workshop
Codecamp 2020 microservices made easy workshop
 
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App FactoryRevolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
 
Alibaba Cloud Conference 2016 - Docker Open Source
Alibaba Cloud Conference   2016 - Docker Open Source Alibaba Cloud Conference   2016 - Docker Open Source
Alibaba Cloud Conference 2016 - Docker Open Source
 
From CoreOS to Kubernetes and Concourse CI
From CoreOS to Kubernetes and Concourse CIFrom CoreOS to Kubernetes and Concourse CI
From CoreOS to Kubernetes and Concourse CI
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Analyzing data with docker v4
Analyzing data with docker   v4Analyzing data with docker   v4
Analyzing data with docker v4
 
Introductio to Docker and usage in HPC applications
Introductio to Docker and usage in HPC applicationsIntroductio to Docker and usage in HPC applications
Introductio to Docker and usage in HPC applications
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure Systems
 
Bioinformatics Analysis Environment for Your Laboratory Use
Bioinformatics Analysis Environment for Your Laboratory UseBioinformatics Analysis Environment for Your Laboratory Use
Bioinformatics Analysis Environment for Your Laboratory Use
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
 
Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
 
Open Access Week 2017: Life Sciences and Open Sciences - worfkflows and tools
Open Access Week 2017: Life Sciences and Open Sciences - worfkflows and toolsOpen Access Week 2017: Life Sciences and Open Sciences - worfkflows and tools
Open Access Week 2017: Life Sciences and Open Sciences - worfkflows and tools
 
UniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeUniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtime
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
 
Using Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and servicesUsing Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and services
 

Dernier

Dernier (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment

  • 1. Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment March 23th, 2017 SIG BIO 49@Japan Advanced Institute of Science and Technology Kento Aoyama1,2, Yuki Yamamoto1,2, Masahito Ohue1,3, Yutaka Akiyama1,2,3 1) Department of Computer Science, School of Computing Tokyo Institute of Technology 2) Education Academy of Computational Life Sciences (ACLS) Tokyo Institute of Technology 3) Advanced Computational Drug Discovery Unit, Institute of Innovative Research Tokyo Institute of Technology
  • 3. Docker and Bioinformatics 3 A. Paolo, D. Tommaso, A. B. Ramirez, E. Palumbo, C. Notredame, and D. Gruber, “Benchmark Report : Univa Grid Engine , Nextflow , and Docker for running Genomic Analysis Workflows.” Docker Integration Benchmark Report @Centre for Genomic Regulation (Barcelona, Spain) • Univa Grid Engine (Job Scheduler) • Nextflow (Workflow manager) • Docker (Linux Container) • Reproducibility • Portability
  • 4. To develop the Container-Native HPC Bioinformatics Application Using Linux Container which has … • Low Dependency on Environment • High-Performance • Parallel execution performance • Overhead of virtualization • Dynamically Scaling Research Purpose 4
  • 5. • To evaluate the Performance of Docker Container-Virtualization in Bioinformatics Application Target Application • MEGADOCK[1] • FFT-grid-based Protein-Protein Docking software • Multi-threading, Multi-node, Multi-GPU (OpenMP, MPI, GPU) • Extremely compute intensive workloads Today’s Report 5 [1] Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers”, Bioinformatics, 30(22): 3281-3283, 2014.
  • 7. Kernel-Shared Virtualization • Lightweight : small size, fast deploy, easy sharing • Performance : few virtualization overhead, faster than VM Linux Container 7 Hardware Linux Kernel Container App Bins/Libs Container App Bins/Libs Hardware Virtual Machine App Guest OS Bins/Libs Virtual Machine App Guest OS Bins/Libs Hypervisor Virtual Machines Containers
  • 8. Linux Container • virtualizes the host resource as containers • Filesystem, hostname, IPC, PID, Network, User, etc. • can be used like Virtual Machines Linux Kernel Features • Containers are sharing same host kernel • namespace[1], chroot, cgroup, SELinux, etc. Container-based Virtualization 8 [1] E. W. Biederman. “Multiple instances of the global Linux namespaces.”, In Proceedings of the 2006 Ottawa Linux Symposium, 2006. Machine Linux Kernel Space Container Process Process Container Process Process
  • 9. Linux Container – Performance [1] 9 [1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and Linux containers,” IEEE International Symposium on Performance Analysis of Systems and Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.) 0.96 1.00 0.98 0.78 0.83 0.99 0.82 0.98 0.00 0.20 0.40 0.60 0.80 1.00 PXZ [MB/s] Linpack [GFLOPS] Random Access [GUPS] PerformanceRatio [basedNative] Native Docker KVM KVM-tuned
  • 10. Docker [1] • Most popular Linux Container management platform • Many useful components and services Linux Container Management Tools 10 [1] Solomon Hykes and others. “What is Docker?” - https://www.docker.com/what-docker [2] W. Bhimji, S. Canon, D. Jacobsen, L. Gerhardt, M. Mustafa, and J. Porter, “Shifter : Containers for HPC,” Cray User Group, pp. 1–12, 2016. [3] “Singularity” - http://singularity.lbl.gov/ [1] [2] [3]
  • 11. Easy container sharing – Docker Hub 11 Portability & Reproducibility • Easy to share the application environment via Docker Hub • Containers can be executed on other host machine Ubuntu Docker Engine Container App Bins/Libs Image App Bins/Libs Docker Hub Image App Bins/Libs Push Pull Dockerfile apt-get install … wget … … make CentOS Docker Engine Container App Bins/Libs Image App Bins/Libs Generate Share
  • 12. AUFS (Advanced multi layered unification filesystem) [1] • Docker default filesystem as AUFS • Layers can be reused in other container image • AUFS helps software Reproducibility Docker - Filesystem 12 [1] Advanced multi layered unification filesystem. http://aufs.sourceforge.net, 2014. Docker Container (image) f49eec89601e 129.5 MB ubuntu:16.04 (base image) 366a03547595 39.85 MB ef122501292c 133.6 MB e50c89716342 660.4 KB tag: beta tag: version-1.0 tag: version-1.0.2 tag: version-1.25aec9aa5462c 24.17 MB tag: latest0d3cccd04bdb 6.07 MB
  • 13. Why in the field of Bioinformatics? • Types of Applications • Data Analysis, Machine Learning • MD Simulation, Docking calc. , etc. • Data-centric workload • Compute : Large • Data I/O : Case by case • Communication : Small • Container performs well on compute-Intensive workload[1] For Bioinformatics Apps : 1 13 [1] W. Felter, et al. “An updated performance comparison of virtual machines and Linux containers,” IEEE International Symposium on Performance Analysis of Systems and Software, pp.171-172, 2015.
  • 14. Reproducibility • Different version of library can make different result • e.g.) Genomic analysis pipeline [Paolo, 2016] Container A’ Container A Container BContainer A For Bioinformatics Apps : 2 14 Library A Application A Application B version >= 1.2 version < 1.1 Application A Library version 1.3 Result A’ Application A Library version 1.2 Result A conflict different result Dependency Isolation Application Reproducibility Dependency conflict • Different application can requires different version of same library
  • 15. Performance • Few performance overhead Reproducibility • Dependency Isolation from other applications/libraries Portability, Generality • Sharing/Porting to other environment Features for Bioinformatics Apps 15 Features Native VM Container Performance Scalability Great Bad Good Reproducibility Bad Good Great Portability Generality Bad Great Great
  • 17. MEGADOCK 17 Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high- performance protein-protein docking software for heterogeneous supercomputers”, Bioinformatics, 30(22): 3281-3283, 2014. High-performance protein-protein interaction predictions • FFT-grid based docking software • Extremely compute-intensive • OpenMP/MPI/GPU support • Great HPC Performance
  • 18. Container-based Application Distribution 18 ResourceResource MEGA DOCK Resource MEGA DOCK Add/Remove Container Resource MEGA DOCK Add/Remove Application Layer Compute Resource Layer • All application dependencies exist in the Container • Easy-to-test application • Easy-to-scale size of resources Test Environment Production Environment
  • 20. Experiment I Evaluate container virtualization overhead on Physical Machine • Physical Machine (single-node) + Docker • Physical Machine (single-node, GPU) + NVIDIA-Docker Experiment II Evaluate container virtualization overhead on Cloud Environment • Virtual Machines (multi-node) + Docker • Virtual Machines (multi-node, GPU) + NVIDIA-Docker Experiments 20
  • 21. Measurement • megadock-gpu exec. time • time command (6 times, median) Dataset • 100 pair-pdb (KEGG pathway) Options (OpenMP, OpenMPI) • MPI : 12 threads / 4 MPI process / 1 node • GPU : 1 GPU / 1 process / 1 node Overview of Experiment I 21 Physical Machine MPI MPI MPI MPI Physical Machine Docker MPI MPI MPI MPI Physical Machine GPU MEGADOCK GPU Physical Machine NVIDIA Docker MEGADOCK GPU GPU (b)(a) (d)(c) Test Case Native Docker CPU (MPI) (a) (b) GPU (c) (d)
  • 22. Hardware/Software Specification 22 Software Env. Physical Machine Docker NVIDIA Docker (GPU) OS (image) CentOS 7.2.1511 ubuntu:14.04 nvidia/cuda8.0-devel Linux Kernel 3.10.0 3.10.0 3.10.0 GCC 4.8.5 4.8.4 4.8.4 FFTW 3.3.5 3.3.5 3.3.5 OpenMPI 1.10.0 1.6.5 N/A Docker Engine 1.12.3 N/A N/A NVCC 8.0.44 N/A 8.0.44 NVIDIA Docker 1.0.0 rc.3 N/A N/A NVIDIA Driver 367.48 N/A 367.48 CPU Intel Xeon E5-1630, 3.7 [GHz] ×8 [core] Memory 32 [GB] Local SSD 128 [GB] GPU NVIDIA Tesla K40
  • 24. Profile Result (CPU time) 24 Process native [sec] docker [sec] diff Ratio (all) FFT3D 7.40E+04 7.63E+04 +3.01% 76.84% MPIDP-Master 8010.98 8325.9 +3.78% 8.38% Create Voxel 3743.7 3993.29 +6.25% 4.02% FFT Convolution 3551.08 3576.43 +0.71% 3.60% Score Sort 2462.61 2459.7 -0.12% 2.48% Output Detail 2139.94 2225.96 +3.86% 2.24% Ligand Preparation 1035.51 1849.11 +44.00% 1.86% MPI_Barrier 236.95 231.05 -2.55% 0.23% MPI_Init 0.94 4.54 79.30% 0.00% … … … … …
  • 25. (a) MEGADOCK-Azure[2] Measurement • megadock-dp exec. time • time command (3 times, median) Dataset • ZDOCK benchmark 1.0 [1] (59 * 59 = 3481 pairs) Options (OpenMP, OpenMPI) • MPI : 12 threads / 4 MPI process / 1 node All file input/output in Local SSD Overview of Experiment II-(a) 25 Virtual Machine MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI VM MPI MPI MPI MPI Master Process Worker Process (Other) [1] R. Chen, et al. “A protein-protein docking benchmark,” Proteins: Structure, Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003. [2] Masahito Ohue, et al. ”MEGADOCK-Azure: High-performance protein-protein interaction prediction system on Microsoft Azure HPC”, IIBMP2016.
  • 26. (b) MEGADOCK + Docker on Microsoft Azure Measurement • megadock-dp exec. time • time command (3 times, median) Dataset • ZDOCK benchmark 1.0 (59 * 59 = 3481 pairs) Options (OpenMP, OpenMPI) • MPI : 12 threads / 4 MPI process / 1 node All file input/output in Local SSD Docker Swarm • All Containers in 1 overlay network Overview of Experiment II-(b) 26 Virtual Machine Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker MPI MPI MPI MPI Docker Swarm (Docker Network) Master Process Worker Process (Other) [1] R. Chen, J. Mintseris, J. Janin, and Z. Weng, “A protein-protein docking benchmark,” Proteins: Structure, Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003.
  • 27. VM Instance/Software Specification 27 Software Env. Virtual Machine Docker OS (image) SUSE Linux Enterprise Server 12 ubuntu:14.04 Linux Kernel 3.12.43 3.12.43 GCC 4.8.3 4.8.4 FFTW 3.3.4 3.3.5 OpenMPI 1.10.2 1.6.5 Docker Engine 1.12.6 N/A VM Instance Standard_D14_v2 CPU Intel Xeon E5-2673, 2.40 [GHz] × 16 [core] Memory 112 [GB] Local SSD 800 [GB]
  • 29. Scalability (Strong Scaling, based VM=1) 29 0 5 10 15 20 25 30 35 40 45 0 100 200 300 400 500 Speed-up # of worker cores Ideal VM Docker on VM VM=5 VM=1 VM=10 VM=20 VM=30 comparable scalability
  • 30. Experiment I • MEGADOCK + Docker on Physical Machine showed 6.32% lower performance. • Docker can cause 0-4% compute-performance down[1] • Communications via Docker NAT (Network Address Translation) • MEGADOCK (GPU) + NVIDIA-Docker on Physical Machine showed comparable performance to native. • GPU calc. is independent from container virtualization • Container virtualization has few overhead on memory bandwidth Experiment II • MEGADOCK + Docker on Microsoft Azure performed comparable scalability. • Container virtualization overhead is smaller than other cloud environment factor Result & Discussion 30 [1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and Linux containers”, IEEE International Symposium on Performance Analysis of Systems and Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.)
  • 31. • Performance overhead of Docker container-virtualization is small. • suitable for GPU-accelerated-App and Cloud Environment • Container-Virtualization can isolate application environment from host environment. • same container image can be used on various machines • Physical machine on local environment • Virtual machine on cloud environment • Docker is useful for computational research work Conclusion 31
  • 32. Multi-Node & Multi-GPU Evaluation on Cloud • NVIDIA-Docker is not available on Docker Swarm mode • Kubernetes[1] officially support 1GPU/1node • (experimental-feature: multi-GPU support) Container-based Task Distribution • Web-Service-Application like container-based distribution • easy to scale computing resource • easy to extends multiple task (e.g. GHOST-MP, MEGADOCK) Future Work 32 [1] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and Kubernetes,” acmqueue, vol. 14, no. 1, p. 24, 2016.