Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019

11 097 vues

Publié le

Broadening support for GPU-accelerated supercomputing to a fast-growing new platform, NVIDIA founder and CEO Jensen Huang introduced a reference design for building GPU-accelerated Arm servers, with wide industry backing.

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019

  1. 1. 1
  2. 2. 2 THE EXPANDING UNIVERSE OF HPC JENSEN HUANG | SC19
  3. 3. 3 AT THE INTERSECTION OF GRAPHICS, SIMULATION, AI
  4. 4. 4
  5. 5. 5 COMPUTING FOR THE DA VINCIS OF OUR TIME FIRST AI SUPERCOMPUTERS FIRST EXASCALE SCIENCE42 NEW TOP 500 SYSTEMS ABCI SUMMIT CLIMATE LBNL | NVIDIA GENOMICS ORNL NUCLEAR WASTE REMEDIATION LBNL | PNNL Brown U. NVIDIA CANCER DETECTION ORNL | Stony Brook U.
  6. 6. 6 FULL STACK SPEED-UP CUDA-X CUDA AI DRIVEMETRO ISAACCLARARAPIDS AERIALCG CUDA 10.2 cuTENSOR 1.0 cuSOLVER 10.3 cuBLAS 10.2 cuDNN 7.6 TensorRT 6.0 DALI 0.15 NCCL 2.5 IndeX 2.1 OptiX 7.0 RAPIDS 0.10 Spark XGBoost 3x in 2 Years 2017 2019 2018 Time to Solution 27 Hours 20 Hours 10 Hours Amber Chroma GROMACS GTC LAMMPS MILC NAMD QE SPECFEM3D TensorFlow VASP Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50] , VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
  7. 7. 7 THE EXPANDING UNIVERSE OF HPC NETWORK EDGE ANALYTICS SIMULATION AI Edge Cloud Arm Data Analytics Extreme IO EXTREME IO
  8. 8. 8 INCREDIBLE ADVANCES IN AI WRITING DIALOG TRANSLATION SUMMARIZATION Q&A CLASSIFICATION 2012 2019 BERT TRANSFORMER ALEXNET CNN 3D POSE DENOISING SEGMENTATION OBJECT RECOGNITION CLASSIFICATION IMAGE GENERATION
  9. 9. 9 GPU COMPUTING POWERS AI ADVANCES #1 MLPERF — AI TRAINING + AI INFERENCE HPC COMPUTING CHALLENGE Doubling 2 Years Doubling 3.4 Months Two Distinct Eras of AI TrainingSuper Moore’s Law — From 600 to 2 Hours in 5 Years K80 SERVER DGX 2 Hours 600 Hours Time to Train (ResNet-50)
  10. 10. 10 NVIDIA AI END-TO-END PLATFORM TRAINING AUTONOMOUS MACHINES DGX HGX EGX AGX EDGE AICLOUD
  11. 11. 11 AI FOR SCIENCE EXPERIMENTATION DATA SIMULATION DATA NEURAL ESTIMATION Real-time Steering Fast Approximation Design Space Exploration ICF + MERLIN — Fusion Inverse Problems LIGO — Gravitational Waves Faster Prediction ANI + MD – Chemistry Real-time Steering ITER – Fusion Energy
  12. 12. 12
  13. 13. 13 1x Data Transfer 100x Data Collected STREAMING AI SOFTWARE-DEFINED SENSORS BUILD MODELSSTREAMING AI PROCESSING ECMWF: 287 TB/dayLSST: 20 TB/day SKA: 16 TB/sec
  14. 14. 14 NVIDIA EGX STACK NGC Kubernetes Networking Storage Security CUDA-X Third-Party ISVs METROPOLIS IMAGE PROCESSING DECODE DNN GRAPHICS ENCODE DEEPSTREAM Powered by NVIDIA CUDA Tensor Core GPU | Secured Boot Root of Trust Cryptographic Acceleration for IPsec and TLS | NVMe-oF over TCP and RDMA Industrial-strength Cloud Native and AI Stack NVIDIA EGX EDGE SUPERCOMPUTING PLATFORM
  15. 15. 15 VERTICAL INDUSTRY FRAMEWORKS Clara Metropolis Isaac Omniverse Aerial DRIVE WORLD’S LARGEST DELIVERY SERVICE ADOPTS NVIDIA AI PUTTING AI TO WORK
  16. 16. 16 NVIDIA EGX Edge Supercomputing Platform
  17. 17. 17 SUPERCOMPUTING CLOUD Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU. CPU Instance 48 Hours, $152 Amber, Chroma, GROMACS, GTC, LAMMPS MILC, NAMD, QE, SPECFEM3D, TensorFlow, VASP SUPER COMPUTING IS HARD — CLOUD HPC IS EXPENSIVE
  18. 18. 18 SUPERCOMPUTING CLOUD 8x GPU Instance 1x GPU Instance CPU Instance 48 Hours, $152 Amber, Chroma, GROMACS, GTC, LAMMPS MILC, NAMD, QE, SPECFEM3D, TensorFlow, VASP Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU. SUPER COMPUTING IS HARD — GPU CLOUD 1/7TH COST OF CPU CLOUD 48x Faster, 1/7th the Cost
  19. 19. 19 ICECUBE OBSERVATORY DETECTING NEUTRINOS 50K NVIDIA GPUs IN THE CLOUD 350 PF OF SIMULATION FOR 2 HOURS PRODUCED 5% OF ANNUAL SIMULATION DATA AWS, MICROSOFT AZURE, GOOGLE CLOUD PLATFORM DISTRIBUTED ACROSS U.S., EUROPE, APAC Frank Wüerthwein, Ph.D. Executive Director, Open Science Grid Igor Sfiligoi Lead Developer and Researcher MULTIPLE GENERATIONS, ONE APPLICATION Events Processed Per GPU Type V100 M60 K80 T4 P40 P100 THE LARGEST CLOUD SIMULATION IN HISTORY
  20. 20. 20 Up to 800 V100 GPUs Connected via Mellanox InfiniBand ANNOUNCING WORLD’S LARGEST ON-DEMAND SUPERCOMPUTER
  21. 21. 21 DIVERSE ARM ARCHITECTURES AMPERE COMPUTING eMAG Hyperscale and Storage AMAZON GRAVITON Hyperscale and SmartNIC MARVELL THUNDERX2 Hyperscale, Storage and HPC FUJITSU A64FX Supercomputing HUAWEI KUNPENG 920 Big Data and Edge
  22. 22. 22 NVIDIA CUDA ON ARM AT OAK RIDGE NATIONAL LAB Benchmark Application [Dataset]: GROMACS [ADH Dodec- Dev prototype], LAMMPS [LJ 2.5], MILC [Apex Small], NAMD [apoa1_npt_cuda], Quantum Espresso [AUSURF112-jR], Relion [Plasmodium Ribosome], SPECFEM3D [four_material_simple_model], TensorFlow [ResNet50: Batch:256]; CPU node: 2x ThunderX2 9975; GPU node: Same CPU node + 2x V100 32GB PCIe ; *1xV100 for GROMACS, MILC, and TensorFlow
  23. 23. 23 ANNOUNCING NVIDIA HPC FOR ARM HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink 4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge CUDA on Arm Beta Available Now NIC PCIe Switch PCIe Switch NIC CPU CPU GPU GPU GPU GPU
  24. 24. 24 ANNOUNCING NVIDIA HPC FOR ARM HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink 4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge CUDA on Arm Beta Available Now NIC PCIe Switch PCIe Switch NIC CPU CPU GPU GPU GPU GPU
  25. 25. 25 ANNOUNCING NVIDIA HPC FOR ARM HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink 4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge CUDA on Arm Beta Available Now APPLICATIONS PROGRAMMING MODELS C++ CUDA FORTRAN COMET DCA++ GROMACS INDEX LAMMPS LSMS MATLAB MILC NAMD OPTIX RELION TENSORFLOW PARAVIEW OPENACC PYTHON ARM ALLINEA STUDIO BRIGHT COMPUTING CMAKE CUDA-GDB CUPTI GCC LLVM NVCC PAPI SINGULARITY SLURM TAU GAMERA SDKS QUANTUM ESPRESSO PERFORCE TOTALVIEW PGI SCORE-P VMD
  26. 26. 26
  27. 27. 27 50 GB/s 50 GB/s EXTREME COMPUTE NEEDS EXTREME IO TRADITIONAL RDMA NODE A NODE B PCIe Switch CPU System Memory GPU NIC PCIe Switch CPU System Memory GPU NIC
  28. 28. 28 EXTREME COMPUTE NEEDS EXTREME IO GPUDIRECT RDMA NODE A NODE B PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC
  29. 29. 29 EXTREME COMPUTE NEEDS EXTREME IO TRADITIONAL STORAGE PCIe Switch CPU System Memory GPU GPUDIRECT RDMA NODE A NODE B NIC PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC 50 GB/s
  30. 30. 30 EXTREME COMPUTE NEEDS EXTREME IO GPUDIRECT STORAGE PCIe Switch CPU System Memory GPU Storage 100 GB/s GPUDIRECT RDMA NODE A NODE B NIC PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC
  31. 31. 31 ANNOUNCING NVIDIA MAGNUM IO Acceleration Libraries for Large-scale HPC and IO High-bandwidth, Low-latency, Massive Storage Access with Lower CPU Utilization GPUDIRECT STORAGE PCIe Switch CPU System Memory GPU Storage 100 GB/s GPUDIRECT RDMA NODE A NODE B NIC PCIe Switch CPU System Memory GPU 100 GB/s NIC PCIe Switch CPU System Memory GPU NIC
  32. 32. 32 PYTHON CUDA APACHE ARROW CUDF CUGRAPH RAPIDS CUML PANDAS SCI-KL / XGBOOST CUDNN DEEP LEARNING FRAMEWORKS DASK NVIDIA RAPIDS DATA SCIENCE Open Source | Multi-GPU and Multi-Node | Up to 100x Speed-Up | 150K Downloads in 1 Year Data Load and Processing Times from Hours to Minutes | Used by NERSC, ORNL, NASA, SDSC
  33. 33. 33 NVIDIA MAGNUM IO BOOSTS RAPIDS DATA ANALYTICS 20x ON TPC-H STRUCTURAL BIOLOGY — 3x VMDNEW PANGEO XARRAY ZARR READER FOR CLIMATE Q4 TPC-H Benchmark Work Breakdown: With Repeated Query 0 400,000 800,000 1,200,000 WITHOUT GDS WITH GDS Latency (msec) CUDA Startup GPU and CPU Allocation Data Preload Warmup Query Repeat Query Clean Up Driver Close
  34. 34. 34 ANNOUNCING WORLD’S LARGEST INTERACTIVE VOLUME VISUALIZATION Simulating Mars Lander with FUN3D | Interactively Visualizing 150 TB; Unstructured Mesh 4 NVIDIA DGX-2 Streaming 400 GB/s | NVIDIA Magnum IO | NVIDIA IndeX
  35. 35. 35 ANNOUNCING NVIDIA DGX-2 AS SUPERCOMPUTING ANALYTICS INSTRUMENT 16 V100 GPUs - 2 PF Tensor Core | 512 GB HBM2 - 16 TB/s | 8 MLNX CX5 - 800 Gbps 30 TB NVMe - 53 GB/s with Magnum IO | Fabric Storage - 100 GB/s with Magnum IO 2.3x Faster Than Current IO500 10-node Leader Powered by NVIDIA Magnum IO EXTREME WEATHER AI INFERENCE NVIDIA TENSOR RT 3D VOLUME ANALYTICS PANGEO XARRAY VMD COMPUTATIONAL MICROSCOPE NVIDIA OPTIX 3D INTERACTIVE VOLUME RENDERING NVIDIA INDEX TPC-H RECORD 10 TB JOIN NVIDIA RAPIDS
  36. 36. 36 THE EXPANDING UNIVERSE OF HPC NETWORK EDGE ANALYTICS SIMULATION EXTREME IO NVIDIA HPC for ARM NVIDIA EGX Edge Supercomputing Platform NVIDIA DGX-2 Supercomputing Analytics Instrument NVIDIA Magnum IO NGC Azure
  37. 37. 37

×