SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Prof. Anne C. Elster, PhD
Dept.of Computer Science, HPC-Lab
Norwegian University of Science & Technology
Trondheim, Norway
CloudLightning
and the OPM-based Use Case
(HW à Middleware)
Thank yous to: My Post Docs and
graduate students!
06/07:Spring 2007
08/09:Spring 2009
10/11: @ SC 10
09/10:Spring 2010
07/08:@ SC 07
11/12:Spring 2012
Spring 2014
Challenge re: Open Source – SW stack!
• Leverage GPUs and libraries inside Linux containers
e.g. using NVIDIA-docker
5
OUTLINE
Overview of
EU H2020 project Cloud Lightning(CL)
Use case Oil&Gas -- OPM and Upscaling
Containarization
Concluding remarks/look foward
OVERVIEW & NTNU’s
contributions
Prof Anne C. Elster (NTNU)
Partners
CloudLightning comprises of
eight partners from academia
and industry and is coordinated
by University College Cork.
Prof John Morrison
UCC
Dr Gabriel Gonzalez Castane
UCC
Dr Huanhuan Xiong
UCC
Mr David Kenny
UCC
Dr Georgi Gaydadjiev
MAX
Dr Marian Neagul
IeATProf Dana Petcu
IeAT
Tobias Becker
MAX
Prof Anne Elster
NTNU
Prof George Gravvanis
DUTH
Dr Suryanarayanan Natarjan
INTEL
Mr Perumal Kuppuudaiyar
INTEL
Prof Theo Lynn
DCU
Ms Anna Gourinovitch
DCU
Dr Konstantinos Giannoutakis
CERTH
Dr Hristina Palikareva
MAX
Dr Malik M. Khan
NTNU
Dr Muhammad Qasim
NTNU
Mr Geir Amund Hasle
NTNU
Mr Dapeng Dong
UCC
Prof John Morrison
UCC
Dr Gabriel Gonzalez Castane
UCC
Dr Huanhuan Xiong
UCC
Mr David Kenny
UCC
Dr Georgi Gaydadjiev
MAX
Dr Marian Neagul
IeATProf Dana Petcu
IeAT
Tobias Becker
MAX
Prof Anne Elster
NTNU
Prof George Gravvanis
DUTH
Dr Suryanarayanan Natarjan
INTEL
Mr Perumal Kuppuudaiyar
INTEL
Prof Theo Lynn
DCU
Ms Anna Gourinovitch
DCU
Dr Konstantinos Giannoutakis
CERTH
Dr Hristina Palikareva
MAX
Dr Malik M. Khan
NTNU
Dr Muhammad Qasim
NTNU
Mr Geir Amund Hasle
NTNU
Mr Dapeng Dong
UCC
CLOUD-
LIGHTNING
Funded under Call H2020-ICT-2014-1
Advanced Cloud Infrastructures and Services.
(Feb 2015 thru Jan 2018)
Aim: To develop infrastructures, methods
and tools for high performance, adaptive cloud
applications and Services that go beyond
the current capabilities.
Specific
Challenge
• CloudLightning was
funded under Call
H2020-ICT-2014-1
Advanced Cloud
Infrastructures and
Services.
Cloud computing is being transformed by new requirements such as
- heterogeneity of resources and devices
- software-defined data centres
- cloud networking, security, and
- the rising demands for better quality of user experience.
Cloud computing research will be oriented towards
• new computational and data management
models (at both infrastructure and services
levels) that respond to the advent of faster and
more efficient machines,
- rising heterogeneity of access modes and devices,
- demand for low energy solutions,
- widespread use of big data,
- federated clouds and
- secure multi-actor environments including public
administrations.
12
CloudLighning Overview – more detailed
Self-Organization and Self Management (SOSM) of HPC resources
Resource Allocation - (Based on Service Requirements) which can be:
l Bare Metal
l VM (Virtual Machines)
l Containers –the most most realistic option for HPC workloads
l Resources are devided into Cells, based on region or location
l Each Cell may have different hardware types, including servers with GPUs, MICs, DFFEs
l Each Cell is partition into vRacks which are sets of servers of the same type
Resource Utilization - (Self-Optimized )
lAlt. 1: Cell manager get request of resources, create them and allocated them directly
lAlt. 2: First discover resources available, then if more than one options, then the Cell Mgr
selects the most appropriate option.
lAlt. 3: Same as Alt. 2, except gives back solution rather than option
lvRack managers may create/aggregate the resources in its vRack and is the basic component
of self-organization in the CL system. Note this feature makes only sense for larger
deployments.
BENEFICIARIES
Primary beneficiar :
Infrastructure-as-a-Service
provider.
Benefit from activating the
HPC in the cloud market and a
reduction in cost related to
better performance per cost and
performance per watt.
Increased energy efficiency can
result in lower costs throughout
the cloud ecosystem and can
increase the accessibility and
performance in a wide range of
use cases including:
• Oil and Gas simulations,
• Genomics and
• Ray Tracing
(e.g. 3D Image Rendering)
• Improved
performance/cost and
performance/Watt for
cloud-based
datacenter(s)
• Energy- and cost-
efficient scalable
solution for OPM-based
reservoir simulation
application (Upscaling)
• Ability to run better
models for more
efficient reservoir
utilization
• Improved
performance/cost and
performance/Watt.
• Faster speed of
genome sequence
computation.
• Reduced development
times.
• Increased volume and
quality of related
research.
• Reduced CAPEX and IT
associated costs.
• Extra capacity for
overflow (“surge”)
workloads.
• Faster workload
processing to meet
project timelines.
Ray Tracing
(3D Image Rendering)
- Intel
Genomics
- Maxeler
Oil and Gas
- NTNU
CLOUDLIGHTNING USE CASES
15
EU Use Case
Motivations
CloudLightning’s use cases
support the European Union
HPC strategy and specific
industries identified by IDC
in their recent report on the
progress of the EU HPC
Strategy (IDC, 2015).
1
The health sector represents 10% of EU GDP and 8% of the
EU workforce (EC, 2014). HPC is increasingly central to
genome processing and thus advanced medicine and
bioscience research.
2
The oil and gas industry is responsible for 170,000 European
jobs and €440 billion of Europe's GDP (IDC, 2015). HPC
improves discovery performance and exploitation.
3
Ray tracing is a fundamental technology in many industries
and specifically in CAD/CAE, digital content and mechanical
design, sectors dominated by SMEs.
4
European ROI in HPC is very attractive - each euro invested in
HPC on average returned €867 in increased revenue/income
(IDC, 2015).
The OPM-based Upscaling Use Case
http://opm-project.org/
Project goal: Provide HPC-application as a service
What it does: Calculates upscaled permeability
Why chosen:
l Builds on the Open Source SW provided by the
Open Porous Media (OPM) project
l Already familiar with Upscaling application through
collaborations with Statoil
Oil & Gas application --
Upscaling / OPM (Open Porus Media)
NTNU contributions:
• PetSc integration
• AmgX integration
HPC Applications as a Service – OPM Upscalig
http://opm-project.org/
Calculates upscaled permeability
HPC Applications as a Service – OPM Upscalig
The OPM use-case application is calculation
of upscaled permeability
l uses PDE solver libraries
à MPI, GPU …
l Upscaling application is ported to work with PETSc which
provides:
l CPU-only execution
l CPU-GPU execution
l Both versions of the application provided as containerized
solution
Self-Optimized Libraries as a Service
l On-going work to provide self-optimized libraries as containerized
solutions
l Libraries under-review include ATLAS, MKL, cuBLAS, MKL,
FFTW, cuFFT
ATLAS
21
GPU System
l Dell server blade
with NVIDIA Tesla P100
SMP Cluster
lNumascale 5-node SMP
MIC System
l PC with Xeon Phi card
DFE Cluster
lMaxeler MPC-C node with
4x Vectis MAX3 DFEs
CLOUDLIGHTNING HETEROGENEOUS TESTBED @ NTNU
22
Openstack server cluster provided by NTNU’s IT Department
CLOUDLIGHTNING HETEROGENEOUS TESTBED @ NTNU
Software used
AmgX
Self-Organization-Self-Management (SOSM) -
Resource Registration Plugin
l SOSM resource registration plugins
l Python-based plugins
l Use Python bindings (pyNVML)
l Use Nvidia Management Library (NVML)
Tesla P100 GTX 980 Tesla K20
NVIDIA MANAGEMENT LIBRARY
pyNVML
SOSM
SOSM Resource Registration
Python Code
JSON output
SOSM Resource Registration
Python Code JSON output
Telemetry System for Cloudlightning -
Resource Registration Plugin
SNAP-based telemetry plugins
l Python-based plugins
l Use Python bindings (pyNVML)
l Use Nvidia Management Library (NVML)
Tesla P100 GTX 980 Tesla K20
NVIDIA MANAGEMENT LIBRARY
pyNVML
SNAP
CL Telemetry System Plug-in
Python-based SNA Collector*
SNAP Parameter
Catalog Setup
Telemetry Parameter
Output for CL
*Debugging Stage
CL Telemetry System Plug-in
Python-based SNAP Collector*SNAP
Parameter
Catalog Setup
*Debugging Stage
CL Telemetry System Plug-in
Python-based SNAP Collector*
SNAP
Parameter
Catalog Setup
Telemetry Parameter Output for CL
*Debugging Stage
CL Telemetry System Plug-in
Python-based SNAP Collector*
SNAP
Parameter
Catalog Setup
Telemetry
Parameter
Output for CL
*Debugging Stage
32
CURRENT STATUS OF THE OVERALL
CLOUDLIGHTNING PROJECT:
• SOSM Architecture defined
• Plugins for resource registration developed
• Tested use cases on individual platforms
• Working on integration, testbed and simulation
that includes the OpenStack-based SOSM
system
Challenge re: Open Source
SW Engineering Priciples
versus
Skills of application programmers
Challenge re: Open Source
Cost of maintaining the code:
- Person hrs
- Uphold motivation
SW stack – can Leverage GPUs & libraries inside
Linux containers e.g. using NVIDIA-docker
HPC-Lab Beyond CloudLightning:
HPC-Lab welcomes
H2020 MCSA Post Doc
Weifeng Liu
(2017-2019)
Prof. Gavin Taylor, USNA to join HPC-Lab for 14 month staring June 2017
Further collaborations with Schlumberger ( Bjørn Nordmoen on left in photo)
NVIDIA DGX-1 and Volta-based
systems …
Collaborations with NTNU Imaging groups since 2006:
39
THANK YOU!
Anne C. Elster
elster@ntnu.no
Supercomputer & HPC Trends:
Clusters and Accelerators!
How did we get here?
Motivation – GPU Computing:
Many advances in processor designs
are driven by Billion $$ gaming market!
Modern GPUs (Graphic Processing Unit) offer
lots of FLOPS per watt!
.. and lots of parallelism!
NVIDA GTX 1080
(Pascal): 3640 CUDA cores!
-Kepler:
-GTX 690 and Tesla K10 cards
-have 3072 (2x1536) cores!
NVIDIA DGX-1 Server -- Details
CPUs : 2 x Intel Xeon E5-2698 v3 (16-core Haswell)
GPUs: 8 x NVIDIA Tesla P100 (3584 CUDA cores)
System Memory: 512 GB DDR4-23133
GPU Memory 128GB (8 x 16GB)
Storage: 4 x Samsung PM 863 1.9 TB SSD
Network: 4 x Infiniband EDR, 2x 10 GigE
Power : 3200W
Size 3U Blade
GPU Throughput: FP16: 170TFLOPs,
FP32: 85TFLOPs, FP 64: 42.5 TFLOPs
43
Who am I? (Parallel Computing perspective)
• 1980’s: Concurrent and Parallel Pascal
• 1986: Intel iPSC Hypercube
– CMI (Bergen) and Cornell
(Cray arrived at NTNU)
• 1987: Cluster of 4 IBM 3090s (@ IBM Yorktown)
• 1988-91: Intel hypercubes (CMI Norway & Cornell)
• Some on BBN (Cornell)
• 1991-94: KSR
• 1993-98: MPI 1 & 2 (represented Cornell & Schlumberger)
Kendall Square Research (KSR)
KSR-1 at Cornell University:
- 128 processors – Total RAM: 1GB!!
- Scalable shared memory multiprocessors (SSMMs)
- Proprietary 64-bit processors
Notable Attributes:
Network latency across the bridge prevented viable scalability
beyond 128 processors.
Intel iPSC
44
Especially want to thank
My 2 first GPU master students:
Fall 2006:
Christian Larsen (MS Fall Project, December 2006):
“Utilizing GPUs on Cluster Computers”
(joint with Schlumberger)
Erik Axel Nielsen asks for FX 4800 card for project
with GE Healthcare
Elster as head of Computational Science & Visualization program helped
NTNU acquire new IBM Supercomputer
(Njord, 7+ TFLOPS, proprietary switch)
Now: Tesla P100 rated 5-10 TF!
45
45
GPUs & HPC-Lab at NTNU
2006: GPU Porgramming (Cg)
• IBM Supercomputer @ NTNU
(Njord, 7+ TFLOPS, proprietary switch)
2007:
• MS thesis on Wavelet on GPU
• CUDA Tutorial @SC
• Tesla C870 and S870 Announced
2008:
• CUDA Programming in Parallel Comp. course
• Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF (*)
• HPC-LAB at IDI/NTNU opens with
• several NVIDIA donation
• several quad-core machines (1-2 donated by Schlumberger)
• HPC-Lab CUDA-based snow sim. & seismic & SC
(*) 1 Tesla P100 rated 5-10 TF for HPC loads
NTNU GPU Activities
Elster s HPC-lab has graduated
35+ Master students (diplom) in
GPU computing (2007-2017)
Currently supervising:
3 Post Docs,
3+ PhD students
5 master studs.
NTNU NVIDIA CUDA Teaching Center (summer 2011)
• PhD seminar course (Spring 2013: 7 students)
• Master’s level course (Fall 2012: 14 students)
• Senior Parallel Computing class (>1/3 in CUDA)
• Fall 2010: 43 taking exam
• Fall 2012: 57 students
• Fall 2016: >80 init. Enrollment
‘
Elster also PI for Teaching Center at Univ. of Texas at Austin &
NTNU NVIDIA CUDA Research Center (2012)
47
47
GPUs & HPC-Lab at NTNU
2008:
• CUDA Programming in Parallel Comp. course
• Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF
• HPC-LAB at IDI/NTNU opens with
• several NVIDIA donation
• several quad-core machines (1-2 donated by Schlumberger)
• HPC-Lab CUDA-based snow sim. & seismic & SC
2009:
• NVIDIA Tesla s1070 (4 GPUs 960 cores, 4TF)
• two NVIDIA Quadro FX 5800 cards (Jan ´09),
• NVIDIA Ion (Jun´09)
• AMD/ATI Radon 5850 (2TF)
2010-11: NVIDIA Fermi-based cards (470, c2050, c2070(2011))
2012-13: NVIDIA Kepler … NOK 1 Mill equip grant from IME
2014-15: 20 GTX 980 cards, 85” 4K screen, Numascale SMP, H2020 CL grant
2016-17: Tesla P100 server and GTX 1080 and 1080Tis
+ 60 Jetson TX1s for teaching, MSCA Post Doc

Contenu connexe

Tendances

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
European Exascale System Interconnect & Storage
European Exascale System Interconnect & StorageEuropean Exascale System Interconnect & Storage
European Exascale System Interconnect & Storageinside-BigData.com
 
Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeJason Shih
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Unai Lopez-Novoa
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsThilina Gunarathne
 
Science DMZ
Science DMZScience DMZ
Science DMZJisc
 
Introducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big DataIntroducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big Datainside-BigData.com
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Missioninside-BigData.com
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingJonathan Dursi
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveJason Shih
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500Maho Nakata
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?Ian Lumb
 

Tendances (20)

Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
European Exascale System Interconnect & Storage
European Exascale System Interconnect & StorageEuropean Exascale System Interconnect & Storage
European Exascale System Interconnect & Storage
 
Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challenge
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on Clouds
 
Coca1
Coca1Coca1
Coca1
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
Science DMZ
Science DMZScience DMZ
Science DMZ
 
Introducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big DataIntroducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big Data
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 

Similaire à CloudLightning and the OPM-based Use Case

Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project OverviewFloris Sluiter
 
Scientific
Scientific Scientific
Scientific marpierc
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...Marc Dutoo
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...OCCIware
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
 
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Nane Kratzke
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plansinside-BigData.com
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impactinside-BigData.com
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computingNalini Mehta
 

Similaire à CloudLightning and the OPM-based Use Case (20)

Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project Overview
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
Scientific
Scientific Scientific
Scientific
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
 
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdfOCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computing
 

Plus de CloudLightning

CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning SimulatorCloudLightning
 
Self-Organisation as a Cloud Resource Management Strategy
Self-Organisation as a Cloud Resource Management StrategySelf-Organisation as a Cloud Resource Management Strategy
Self-Organisation as a Cloud Resource Management StrategyCloudLightning
 
CloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning
 
CloudLightning at a Glance Infographic
CloudLightning at a Glance InfographicCloudLightning at a Glance Infographic
CloudLightning at a Glance InfographicCloudLightning
 
CloudLighting - A Brief Overview
CloudLighting - A Brief OverviewCloudLighting - A Brief Overview
CloudLighting - A Brief OverviewCloudLightning
 
Testbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudTestbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudCloudLightning
 
CloudLightning Service Description Language
CloudLightning Service Description LanguageCloudLightning Service Description Language
CloudLightning Service Description LanguageCloudLightning
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
 
CloudLightning - Project Overview
CloudLightning - Project OverviewCloudLightning - Project Overview
CloudLightning - Project OverviewCloudLightning
 
CloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud
CloudLightning: Self-Organising, Self-Managing Heterogeneous CloudCloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud
CloudLightning: Self-Organising, Self-Managing Heterogeneous CloudCloudLightning
 
CloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current SolutionsCloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current SolutionsCloudLightning
 

Plus de CloudLightning (11)

CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning Simulator
 
Self-Organisation as a Cloud Resource Management Strategy
Self-Organisation as a Cloud Resource Management StrategySelf-Organisation as a Cloud Resource Management Strategy
Self-Organisation as a Cloud Resource Management Strategy
 
CloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture Overview
 
CloudLightning at a Glance Infographic
CloudLightning at a Glance InfographicCloudLightning at a Glance Infographic
CloudLightning at a Glance Infographic
 
CloudLighting - A Brief Overview
CloudLighting - A Brief OverviewCloudLighting - A Brief Overview
CloudLighting - A Brief Overview
 
Testbed for Heterogeneous Cloud
Testbed for Heterogeneous CloudTestbed for Heterogeneous Cloud
Testbed for Heterogeneous Cloud
 
CloudLightning Service Description Language
CloudLightning Service Description LanguageCloudLightning Service Description Language
CloudLightning Service Description Language
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
CloudLightning - Project Overview
CloudLightning - Project OverviewCloudLightning - Project Overview
CloudLightning - Project Overview
 
CloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud
CloudLightning: Self-Organising, Self-Managing Heterogeneous CloudCloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud
CloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud
 
CloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current SolutionsCloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current Solutions
 

Dernier

Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 

Dernier (20)

Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 

CloudLightning and the OPM-based Use Case

  • 1. Prof. Anne C. Elster, PhD Dept.of Computer Science, HPC-Lab Norwegian University of Science & Technology Trondheim, Norway CloudLightning and the OPM-based Use Case (HW à Middleware)
  • 2. Thank yous to: My Post Docs and graduate students! 06/07:Spring 2007 08/09:Spring 2009 10/11: @ SC 10 09/10:Spring 2010 07/08:@ SC 07 11/12:Spring 2012 Spring 2014
  • 3. Challenge re: Open Source – SW stack! • Leverage GPUs and libraries inside Linux containers e.g. using NVIDIA-docker
  • 4. 5 OUTLINE Overview of EU H2020 project Cloud Lightning(CL) Use case Oil&Gas -- OPM and Upscaling Containarization Concluding remarks/look foward
  • 6. Partners CloudLightning comprises of eight partners from academia and industry and is coordinated by University College Cork.
  • 7. Prof John Morrison UCC Dr Gabriel Gonzalez Castane UCC Dr Huanhuan Xiong UCC Mr David Kenny UCC Dr Georgi Gaydadjiev MAX Dr Marian Neagul IeATProf Dana Petcu IeAT Tobias Becker MAX Prof Anne Elster NTNU Prof George Gravvanis DUTH Dr Suryanarayanan Natarjan INTEL Mr Perumal Kuppuudaiyar INTEL Prof Theo Lynn DCU Ms Anna Gourinovitch DCU Dr Konstantinos Giannoutakis CERTH Dr Hristina Palikareva MAX Dr Malik M. Khan NTNU Dr Muhammad Qasim NTNU Mr Geir Amund Hasle NTNU Mr Dapeng Dong UCC
  • 8. Prof John Morrison UCC Dr Gabriel Gonzalez Castane UCC Dr Huanhuan Xiong UCC Mr David Kenny UCC Dr Georgi Gaydadjiev MAX Dr Marian Neagul IeATProf Dana Petcu IeAT Tobias Becker MAX Prof Anne Elster NTNU Prof George Gravvanis DUTH Dr Suryanarayanan Natarjan INTEL Mr Perumal Kuppuudaiyar INTEL Prof Theo Lynn DCU Ms Anna Gourinovitch DCU Dr Konstantinos Giannoutakis CERTH Dr Hristina Palikareva MAX Dr Malik M. Khan NTNU Dr Muhammad Qasim NTNU Mr Geir Amund Hasle NTNU Mr Dapeng Dong UCC
  • 9. CLOUD- LIGHTNING Funded under Call H2020-ICT-2014-1 Advanced Cloud Infrastructures and Services. (Feb 2015 thru Jan 2018) Aim: To develop infrastructures, methods and tools for high performance, adaptive cloud applications and Services that go beyond the current capabilities.
  • 10. Specific Challenge • CloudLightning was funded under Call H2020-ICT-2014-1 Advanced Cloud Infrastructures and Services. Cloud computing is being transformed by new requirements such as - heterogeneity of resources and devices - software-defined data centres - cloud networking, security, and - the rising demands for better quality of user experience. Cloud computing research will be oriented towards • new computational and data management models (at both infrastructure and services levels) that respond to the advent of faster and more efficient machines, - rising heterogeneity of access modes and devices, - demand for low energy solutions, - widespread use of big data, - federated clouds and - secure multi-actor environments including public administrations.
  • 11. 12 CloudLighning Overview – more detailed Self-Organization and Self Management (SOSM) of HPC resources Resource Allocation - (Based on Service Requirements) which can be: l Bare Metal l VM (Virtual Machines) l Containers –the most most realistic option for HPC workloads l Resources are devided into Cells, based on region or location l Each Cell may have different hardware types, including servers with GPUs, MICs, DFFEs l Each Cell is partition into vRacks which are sets of servers of the same type Resource Utilization - (Self-Optimized ) lAlt. 1: Cell manager get request of resources, create them and allocated them directly lAlt. 2: First discover resources available, then if more than one options, then the Cell Mgr selects the most appropriate option. lAlt. 3: Same as Alt. 2, except gives back solution rather than option lvRack managers may create/aggregate the resources in its vRack and is the basic component of self-organization in the CL system. Note this feature makes only sense for larger deployments.
  • 12.
  • 13. BENEFICIARIES Primary beneficiar : Infrastructure-as-a-Service provider. Benefit from activating the HPC in the cloud market and a reduction in cost related to better performance per cost and performance per watt. Increased energy efficiency can result in lower costs throughout the cloud ecosystem and can increase the accessibility and performance in a wide range of use cases including: • Oil and Gas simulations, • Genomics and • Ray Tracing (e.g. 3D Image Rendering) • Improved performance/cost and performance/Watt for cloud-based datacenter(s) • Energy- and cost- efficient scalable solution for OPM-based reservoir simulation application (Upscaling) • Ability to run better models for more efficient reservoir utilization • Improved performance/cost and performance/Watt. • Faster speed of genome sequence computation. • Reduced development times. • Increased volume and quality of related research. • Reduced CAPEX and IT associated costs. • Extra capacity for overflow (“surge”) workloads. • Faster workload processing to meet project timelines. Ray Tracing (3D Image Rendering) - Intel Genomics - Maxeler Oil and Gas - NTNU CLOUDLIGHTNING USE CASES
  • 14. 15 EU Use Case Motivations CloudLightning’s use cases support the European Union HPC strategy and specific industries identified by IDC in their recent report on the progress of the EU HPC Strategy (IDC, 2015). 1 The health sector represents 10% of EU GDP and 8% of the EU workforce (EC, 2014). HPC is increasingly central to genome processing and thus advanced medicine and bioscience research. 2 The oil and gas industry is responsible for 170,000 European jobs and €440 billion of Europe's GDP (IDC, 2015). HPC improves discovery performance and exploitation. 3 Ray tracing is a fundamental technology in many industries and specifically in CAD/CAE, digital content and mechanical design, sectors dominated by SMEs. 4 European ROI in HPC is very attractive - each euro invested in HPC on average returned €867 in increased revenue/income (IDC, 2015).
  • 15. The OPM-based Upscaling Use Case http://opm-project.org/ Project goal: Provide HPC-application as a service What it does: Calculates upscaled permeability Why chosen: l Builds on the Open Source SW provided by the Open Porous Media (OPM) project l Already familiar with Upscaling application through collaborations with Statoil
  • 16. Oil & Gas application -- Upscaling / OPM (Open Porus Media) NTNU contributions: • PetSc integration • AmgX integration
  • 17. HPC Applications as a Service – OPM Upscalig http://opm-project.org/ Calculates upscaled permeability
  • 18. HPC Applications as a Service – OPM Upscalig The OPM use-case application is calculation of upscaled permeability l uses PDE solver libraries à MPI, GPU … l Upscaling application is ported to work with PETSc which provides: l CPU-only execution l CPU-GPU execution l Both versions of the application provided as containerized solution
  • 19. Self-Optimized Libraries as a Service l On-going work to provide self-optimized libraries as containerized solutions l Libraries under-review include ATLAS, MKL, cuBLAS, MKL, FFTW, cuFFT ATLAS
  • 20. 21 GPU System l Dell server blade with NVIDIA Tesla P100 SMP Cluster lNumascale 5-node SMP MIC System l PC with Xeon Phi card DFE Cluster lMaxeler MPC-C node with 4x Vectis MAX3 DFEs CLOUDLIGHTNING HETEROGENEOUS TESTBED @ NTNU
  • 21. 22 Openstack server cluster provided by NTNU’s IT Department CLOUDLIGHTNING HETEROGENEOUS TESTBED @ NTNU
  • 23. Self-Organization-Self-Management (SOSM) - Resource Registration Plugin l SOSM resource registration plugins l Python-based plugins l Use Python bindings (pyNVML) l Use Nvidia Management Library (NVML) Tesla P100 GTX 980 Tesla K20 NVIDIA MANAGEMENT LIBRARY pyNVML SOSM
  • 26. Telemetry System for Cloudlightning - Resource Registration Plugin SNAP-based telemetry plugins l Python-based plugins l Use Python bindings (pyNVML) l Use Nvidia Management Library (NVML) Tesla P100 GTX 980 Tesla K20 NVIDIA MANAGEMENT LIBRARY pyNVML SNAP
  • 27. CL Telemetry System Plug-in Python-based SNA Collector* SNAP Parameter Catalog Setup Telemetry Parameter Output for CL *Debugging Stage
  • 28. CL Telemetry System Plug-in Python-based SNAP Collector*SNAP Parameter Catalog Setup *Debugging Stage
  • 29. CL Telemetry System Plug-in Python-based SNAP Collector* SNAP Parameter Catalog Setup Telemetry Parameter Output for CL *Debugging Stage
  • 30. CL Telemetry System Plug-in Python-based SNAP Collector* SNAP Parameter Catalog Setup Telemetry Parameter Output for CL *Debugging Stage
  • 31. 32 CURRENT STATUS OF THE OVERALL CLOUDLIGHTNING PROJECT: • SOSM Architecture defined • Plugins for resource registration developed • Tested use cases on individual platforms • Working on integration, testbed and simulation that includes the OpenStack-based SOSM system
  • 32. Challenge re: Open Source SW Engineering Priciples versus Skills of application programmers
  • 33. Challenge re: Open Source Cost of maintaining the code: - Person hrs - Uphold motivation SW stack – can Leverage GPUs & libraries inside Linux containers e.g. using NVIDIA-docker
  • 34. HPC-Lab Beyond CloudLightning: HPC-Lab welcomes H2020 MCSA Post Doc Weifeng Liu (2017-2019) Prof. Gavin Taylor, USNA to join HPC-Lab for 14 month staring June 2017 Further collaborations with Schlumberger ( Bjørn Nordmoen on left in photo)
  • 35. NVIDIA DGX-1 and Volta-based systems …
  • 36. Collaborations with NTNU Imaging groups since 2006:
  • 37. 39 THANK YOU! Anne C. Elster elster@ntnu.no
  • 38. Supercomputer & HPC Trends: Clusters and Accelerators! How did we get here?
  • 39. Motivation – GPU Computing: Many advances in processor designs are driven by Billion $$ gaming market! Modern GPUs (Graphic Processing Unit) offer lots of FLOPS per watt! .. and lots of parallelism! NVIDA GTX 1080 (Pascal): 3640 CUDA cores! -Kepler: -GTX 690 and Tesla K10 cards -have 3072 (2x1536) cores!
  • 40. NVIDIA DGX-1 Server -- Details CPUs : 2 x Intel Xeon E5-2698 v3 (16-core Haswell) GPUs: 8 x NVIDIA Tesla P100 (3584 CUDA cores) System Memory: 512 GB DDR4-23133 GPU Memory 128GB (8 x 16GB) Storage: 4 x Samsung PM 863 1.9 TB SSD Network: 4 x Infiniband EDR, 2x 10 GigE Power : 3200W Size 3U Blade GPU Throughput: FP16: 170TFLOPs, FP32: 85TFLOPs, FP 64: 42.5 TFLOPs
  • 41. 43 Who am I? (Parallel Computing perspective) • 1980’s: Concurrent and Parallel Pascal • 1986: Intel iPSC Hypercube – CMI (Bergen) and Cornell (Cray arrived at NTNU) • 1987: Cluster of 4 IBM 3090s (@ IBM Yorktown) • 1988-91: Intel hypercubes (CMI Norway & Cornell) • Some on BBN (Cornell) • 1991-94: KSR • 1993-98: MPI 1 & 2 (represented Cornell & Schlumberger) Kendall Square Research (KSR) KSR-1 at Cornell University: - 128 processors – Total RAM: 1GB!! - Scalable shared memory multiprocessors (SSMMs) - Proprietary 64-bit processors Notable Attributes: Network latency across the bridge prevented viable scalability beyond 128 processors. Intel iPSC
  • 42. 44 Especially want to thank My 2 first GPU master students: Fall 2006: Christian Larsen (MS Fall Project, December 2006): “Utilizing GPUs on Cluster Computers” (joint with Schlumberger) Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare Elster as head of Computational Science & Visualization program helped NTNU acquire new IBM Supercomputer (Njord, 7+ TFLOPS, proprietary switch) Now: Tesla P100 rated 5-10 TF!
  • 43. 45 45 GPUs & HPC-Lab at NTNU 2006: GPU Porgramming (Cg) • IBM Supercomputer @ NTNU (Njord, 7+ TFLOPS, proprietary switch) 2007: • MS thesis on Wavelet on GPU • CUDA Tutorial @SC • Tesla C870 and S870 Announced 2008: • CUDA Programming in Parallel Comp. course • Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF (*) • HPC-LAB at IDI/NTNU opens with • several NVIDIA donation • several quad-core machines (1-2 donated by Schlumberger) • HPC-Lab CUDA-based snow sim. & seismic & SC (*) 1 Tesla P100 rated 5-10 TF for HPC loads
  • 44. NTNU GPU Activities Elster s HPC-lab has graduated 35+ Master students (diplom) in GPU computing (2007-2017) Currently supervising: 3 Post Docs, 3+ PhD students 5 master studs. NTNU NVIDIA CUDA Teaching Center (summer 2011) • PhD seminar course (Spring 2013: 7 students) • Master’s level course (Fall 2012: 14 students) • Senior Parallel Computing class (>1/3 in CUDA) • Fall 2010: 43 taking exam • Fall 2012: 57 students • Fall 2016: >80 init. Enrollment ‘ Elster also PI for Teaching Center at Univ. of Texas at Austin & NTNU NVIDIA CUDA Research Center (2012)
  • 45. 47 47 GPUs & HPC-Lab at NTNU 2008: • CUDA Programming in Parallel Comp. course • Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF • HPC-LAB at IDI/NTNU opens with • several NVIDIA donation • several quad-core machines (1-2 donated by Schlumberger) • HPC-Lab CUDA-based snow sim. & seismic & SC 2009: • NVIDIA Tesla s1070 (4 GPUs 960 cores, 4TF) • two NVIDIA Quadro FX 5800 cards (Jan ´09), • NVIDIA Ion (Jun´09) • AMD/ATI Radon 5850 (2TF) 2010-11: NVIDIA Fermi-based cards (470, c2050, c2070(2011)) 2012-13: NVIDIA Kepler … NOK 1 Mill equip grant from IME 2014-15: 20 GTX 980 cards, 85” 4K screen, Numascale SMP, H2020 CL grant 2016-17: Tesla P100 server and GTX 1080 and 1080Tis + 60 Jetson TX1s for teaching, MSCA Post Doc