Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

CloudLightning and the OPM-based Use Case

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 45 Publicité

CloudLightning and the OPM-based Use Case

Télécharger pour lire hors ligne

This is a presentation by Prof. Anne Elster at the International Workshop on Open Source Supercomputing held in conjunction with the 2017 ISC High Performance Computing Conference.

This is a presentation by Prof. Anne Elster at the International Workshop on Open Source Supercomputing held in conjunction with the 2017 ISC High Performance Computing Conference.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à CloudLightning and the OPM-based Use Case (20)

Publicité

Plus récents (20)

Publicité

CloudLightning and the OPM-based Use Case

  1. 1. Prof. Anne C. Elster, PhD Dept.of Computer Science, HPC-Lab Norwegian University of Science & Technology Trondheim, Norway CloudLightning and the OPM-based Use Case (HW à Middleware)
  2. 2. Thank yous to: My Post Docs and graduate students! 06/07:Spring 2007 08/09:Spring 2009 10/11: @ SC 10 09/10:Spring 2010 07/08:@ SC 07 11/12:Spring 2012 Spring 2014
  3. 3. Challenge re: Open Source – SW stack! • Leverage GPUs and libraries inside Linux containers e.g. using NVIDIA-docker
  4. 4. 5 OUTLINE Overview of EU H2020 project Cloud Lightning(CL) Use case Oil&Gas -- OPM and Upscaling Containarization Concluding remarks/look foward
  5. 5. OVERVIEW & NTNU’s contributions Prof Anne C. Elster (NTNU)
  6. 6. Partners CloudLightning comprises of eight partners from academia and industry and is coordinated by University College Cork.
  7. 7. Prof John Morrison UCC Dr Gabriel Gonzalez Castane UCC Dr Huanhuan Xiong UCC Mr David Kenny UCC Dr Georgi Gaydadjiev MAX Dr Marian Neagul IeATProf Dana Petcu IeAT Tobias Becker MAX Prof Anne Elster NTNU Prof George Gravvanis DUTH Dr Suryanarayanan Natarjan INTEL Mr Perumal Kuppuudaiyar INTEL Prof Theo Lynn DCU Ms Anna Gourinovitch DCU Dr Konstantinos Giannoutakis CERTH Dr Hristina Palikareva MAX Dr Malik M. Khan NTNU Dr Muhammad Qasim NTNU Mr Geir Amund Hasle NTNU Mr Dapeng Dong UCC
  8. 8. Prof John Morrison UCC Dr Gabriel Gonzalez Castane UCC Dr Huanhuan Xiong UCC Mr David Kenny UCC Dr Georgi Gaydadjiev MAX Dr Marian Neagul IeATProf Dana Petcu IeAT Tobias Becker MAX Prof Anne Elster NTNU Prof George Gravvanis DUTH Dr Suryanarayanan Natarjan INTEL Mr Perumal Kuppuudaiyar INTEL Prof Theo Lynn DCU Ms Anna Gourinovitch DCU Dr Konstantinos Giannoutakis CERTH Dr Hristina Palikareva MAX Dr Malik M. Khan NTNU Dr Muhammad Qasim NTNU Mr Geir Amund Hasle NTNU Mr Dapeng Dong UCC
  9. 9. CLOUD- LIGHTNING Funded under Call H2020-ICT-2014-1 Advanced Cloud Infrastructures and Services. (Feb 2015 thru Jan 2018) Aim: To develop infrastructures, methods and tools for high performance, adaptive cloud applications and Services that go beyond the current capabilities.
  10. 10. Specific Challenge • CloudLightning was funded under Call H2020-ICT-2014-1 Advanced Cloud Infrastructures and Services. Cloud computing is being transformed by new requirements such as - heterogeneity of resources and devices - software-defined data centres - cloud networking, security, and - the rising demands for better quality of user experience. Cloud computing research will be oriented towards • new computational and data management models (at both infrastructure and services levels) that respond to the advent of faster and more efficient machines, - rising heterogeneity of access modes and devices, - demand for low energy solutions, - widespread use of big data, - federated clouds and - secure multi-actor environments including public administrations.
  11. 11. 12 CloudLighning Overview – more detailed Self-Organization and Self Management (SOSM) of HPC resources Resource Allocation - (Based on Service Requirements) which can be: l Bare Metal l VM (Virtual Machines) l Containers –the most most realistic option for HPC workloads l Resources are devided into Cells, based on region or location l Each Cell may have different hardware types, including servers with GPUs, MICs, DFFEs l Each Cell is partition into vRacks which are sets of servers of the same type Resource Utilization - (Self-Optimized ) lAlt. 1: Cell manager get request of resources, create them and allocated them directly lAlt. 2: First discover resources available, then if more than one options, then the Cell Mgr selects the most appropriate option. lAlt. 3: Same as Alt. 2, except gives back solution rather than option lvRack managers may create/aggregate the resources in its vRack and is the basic component of self-organization in the CL system. Note this feature makes only sense for larger deployments.
  12. 12. BENEFICIARIES Primary beneficiar : Infrastructure-as-a-Service provider. Benefit from activating the HPC in the cloud market and a reduction in cost related to better performance per cost and performance per watt. Increased energy efficiency can result in lower costs throughout the cloud ecosystem and can increase the accessibility and performance in a wide range of use cases including: • Oil and Gas simulations, • Genomics and • Ray Tracing (e.g. 3D Image Rendering) • Improved performance/cost and performance/Watt for cloud-based datacenter(s) • Energy- and cost- efficient scalable solution for OPM-based reservoir simulation application (Upscaling) • Ability to run better models for more efficient reservoir utilization • Improved performance/cost and performance/Watt. • Faster speed of genome sequence computation. • Reduced development times. • Increased volume and quality of related research. • Reduced CAPEX and IT associated costs. • Extra capacity for overflow (“surge”) workloads. • Faster workload processing to meet project timelines. Ray Tracing (3D Image Rendering) - Intel Genomics - Maxeler Oil and Gas - NTNU CLOUDLIGHTNING USE CASES
  13. 13. 15 EU Use Case Motivations CloudLightning’s use cases support the European Union HPC strategy and specific industries identified by IDC in their recent report on the progress of the EU HPC Strategy (IDC, 2015). 1 The health sector represents 10% of EU GDP and 8% of the EU workforce (EC, 2014). HPC is increasingly central to genome processing and thus advanced medicine and bioscience research. 2 The oil and gas industry is responsible for 170,000 European jobs and €440 billion of Europe's GDP (IDC, 2015). HPC improves discovery performance and exploitation. 3 Ray tracing is a fundamental technology in many industries and specifically in CAD/CAE, digital content and mechanical design, sectors dominated by SMEs. 4 European ROI in HPC is very attractive - each euro invested in HPC on average returned €867 in increased revenue/income (IDC, 2015).
  14. 14. The OPM-based Upscaling Use Case http://opm-project.org/ Project goal: Provide HPC-application as a service What it does: Calculates upscaled permeability Why chosen: l Builds on the Open Source SW provided by the Open Porous Media (OPM) project l Already familiar with Upscaling application through collaborations with Statoil
  15. 15. Oil & Gas application -- Upscaling / OPM (Open Porus Media) NTNU contributions: • PetSc integration • AmgX integration
  16. 16. HPC Applications as a Service – OPM Upscalig http://opm-project.org/ Calculates upscaled permeability
  17. 17. HPC Applications as a Service – OPM Upscalig The OPM use-case application is calculation of upscaled permeability l uses PDE solver libraries à MPI, GPU … l Upscaling application is ported to work with PETSc which provides: l CPU-only execution l CPU-GPU execution l Both versions of the application provided as containerized solution
  18. 18. Self-Optimized Libraries as a Service l On-going work to provide self-optimized libraries as containerized solutions l Libraries under-review include ATLAS, MKL, cuBLAS, MKL, FFTW, cuFFT ATLAS
  19. 19. 21 GPU System l Dell server blade with NVIDIA Tesla P100 SMP Cluster lNumascale 5-node SMP MIC System l PC with Xeon Phi card DFE Cluster lMaxeler MPC-C node with 4x Vectis MAX3 DFEs CLOUDLIGHTNING HETEROGENEOUS TESTBED @ NTNU
  20. 20. 22 Openstack server cluster provided by NTNU’s IT Department CLOUDLIGHTNING HETEROGENEOUS TESTBED @ NTNU
  21. 21. Software used AmgX
  22. 22. Self-Organization-Self-Management (SOSM) - Resource Registration Plugin l SOSM resource registration plugins l Python-based plugins l Use Python bindings (pyNVML) l Use Nvidia Management Library (NVML) Tesla P100 GTX 980 Tesla K20 NVIDIA MANAGEMENT LIBRARY pyNVML SOSM
  23. 23. SOSM Resource Registration Python Code JSON output
  24. 24. SOSM Resource Registration Python Code JSON output
  25. 25. Telemetry System for Cloudlightning - Resource Registration Plugin SNAP-based telemetry plugins l Python-based plugins l Use Python bindings (pyNVML) l Use Nvidia Management Library (NVML) Tesla P100 GTX 980 Tesla K20 NVIDIA MANAGEMENT LIBRARY pyNVML SNAP
  26. 26. CL Telemetry System Plug-in Python-based SNA Collector* SNAP Parameter Catalog Setup Telemetry Parameter Output for CL *Debugging Stage
  27. 27. CL Telemetry System Plug-in Python-based SNAP Collector*SNAP Parameter Catalog Setup *Debugging Stage
  28. 28. CL Telemetry System Plug-in Python-based SNAP Collector* SNAP Parameter Catalog Setup Telemetry Parameter Output for CL *Debugging Stage
  29. 29. CL Telemetry System Plug-in Python-based SNAP Collector* SNAP Parameter Catalog Setup Telemetry Parameter Output for CL *Debugging Stage
  30. 30. 32 CURRENT STATUS OF THE OVERALL CLOUDLIGHTNING PROJECT: • SOSM Architecture defined • Plugins for resource registration developed • Tested use cases on individual platforms • Working on integration, testbed and simulation that includes the OpenStack-based SOSM system
  31. 31. Challenge re: Open Source SW Engineering Priciples versus Skills of application programmers
  32. 32. Challenge re: Open Source Cost of maintaining the code: - Person hrs - Uphold motivation SW stack – can Leverage GPUs & libraries inside Linux containers e.g. using NVIDIA-docker
  33. 33. HPC-Lab Beyond CloudLightning: HPC-Lab welcomes H2020 MCSA Post Doc Weifeng Liu (2017-2019) Prof. Gavin Taylor, USNA to join HPC-Lab for 14 month staring June 2017 Further collaborations with Schlumberger ( Bjørn Nordmoen on left in photo)
  34. 34. NVIDIA DGX-1 and Volta-based systems …
  35. 35. Collaborations with NTNU Imaging groups since 2006:
  36. 36. 39 THANK YOU! Anne C. Elster elster@ntnu.no
  37. 37. Supercomputer & HPC Trends: Clusters and Accelerators! How did we get here?
  38. 38. Motivation – GPU Computing: Many advances in processor designs are driven by Billion $$ gaming market! Modern GPUs (Graphic Processing Unit) offer lots of FLOPS per watt! .. and lots of parallelism! NVIDA GTX 1080 (Pascal): 3640 CUDA cores! -Kepler: -GTX 690 and Tesla K10 cards -have 3072 (2x1536) cores!
  39. 39. NVIDIA DGX-1 Server -- Details CPUs : 2 x Intel Xeon E5-2698 v3 (16-core Haswell) GPUs: 8 x NVIDIA Tesla P100 (3584 CUDA cores) System Memory: 512 GB DDR4-23133 GPU Memory 128GB (8 x 16GB) Storage: 4 x Samsung PM 863 1.9 TB SSD Network: 4 x Infiniband EDR, 2x 10 GigE Power : 3200W Size 3U Blade GPU Throughput: FP16: 170TFLOPs, FP32: 85TFLOPs, FP 64: 42.5 TFLOPs
  40. 40. 43 Who am I? (Parallel Computing perspective) • 1980’s: Concurrent and Parallel Pascal • 1986: Intel iPSC Hypercube – CMI (Bergen) and Cornell (Cray arrived at NTNU) • 1987: Cluster of 4 IBM 3090s (@ IBM Yorktown) • 1988-91: Intel hypercubes (CMI Norway & Cornell) • Some on BBN (Cornell) • 1991-94: KSR • 1993-98: MPI 1 & 2 (represented Cornell & Schlumberger) Kendall Square Research (KSR) KSR-1 at Cornell University: - 128 processors – Total RAM: 1GB!! - Scalable shared memory multiprocessors (SSMMs) - Proprietary 64-bit processors Notable Attributes: Network latency across the bridge prevented viable scalability beyond 128 processors. Intel iPSC
  41. 41. 44 Especially want to thank My 2 first GPU master students: Fall 2006: Christian Larsen (MS Fall Project, December 2006): “Utilizing GPUs on Cluster Computers” (joint with Schlumberger) Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare Elster as head of Computational Science & Visualization program helped NTNU acquire new IBM Supercomputer (Njord, 7+ TFLOPS, proprietary switch) Now: Tesla P100 rated 5-10 TF!
  42. 42. 45 45 GPUs & HPC-Lab at NTNU 2006: GPU Porgramming (Cg) • IBM Supercomputer @ NTNU (Njord, 7+ TFLOPS, proprietary switch) 2007: • MS thesis on Wavelet on GPU • CUDA Tutorial @SC • Tesla C870 and S870 Announced 2008: • CUDA Programming in Parallel Comp. course • Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF (*) • HPC-LAB at IDI/NTNU opens with • several NVIDIA donation • several quad-core machines (1-2 donated by Schlumberger) • HPC-Lab CUDA-based snow sim. & seismic & SC (*) 1 Tesla P100 rated 5-10 TF for HPC loads
  43. 43. NTNU GPU Activities Elster s HPC-lab has graduated 35+ Master students (diplom) in GPU computing (2007-2017) Currently supervising: 3 Post Docs, 3+ PhD students 5 master studs. NTNU NVIDIA CUDA Teaching Center (summer 2011) • PhD seminar course (Spring 2013: 7 students) • Master’s level course (Fall 2012: 14 students) • Senior Parallel Computing class (>1/3 in CUDA) • Fall 2010: 43 taking exam • Fall 2012: 57 students • Fall 2016: >80 init. Enrollment ‘ Elster also PI for Teaching Center at Univ. of Texas at Austin & NTNU NVIDIA CUDA Research Center (2012)
  44. 44. 47 47 GPUs & HPC-Lab at NTNU 2008: • CUDA Programming in Parallel Comp. course • Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF • HPC-LAB at IDI/NTNU opens with • several NVIDIA donation • several quad-core machines (1-2 donated by Schlumberger) • HPC-Lab CUDA-based snow sim. & seismic & SC 2009: • NVIDIA Tesla s1070 (4 GPUs 960 cores, 4TF) • two NVIDIA Quadro FX 5800 cards (Jan ´09), • NVIDIA Ion (Jun´09) • AMD/ATI Radon 5850 (2TF) 2010-11: NVIDIA Fermi-based cards (470, c2050, c2070(2011)) 2012-13: NVIDIA Kepler … NOK 1 Mill equip grant from IME 2014-15: 20 GTX 980 cards, 85” 4K screen, Numascale SMP, H2020 CL grant 2016-17: Tesla P100 server and GTX 1080 and 1080Tis + 60 Jetson TX1s for teaching, MSCA Post Doc

×