We cover the IBM solution for HPC. In addition to hardware and software stack we show how the rational choice of compilation/running parameters helps to significantly improve the performance of technical computing applications.
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
IBM POWER8 as an HPC platform
1. IBM POWER8 as an HPC platform
Alexander Pozdneev, Georgy Pavlov
IBM
October 23, 2015 — IBM Linux on Power: Platform News
1 c 2015 IBM Corporation
2. What is HPC?
• HPC — High Performance Computing
• A.k.a. technical computing
• Aeroacoustics: Effects of chevrons on jet noise
• Supersonic jet engine noise computational fluid dynamics simulation
• 128k Blue Gene/P cores — ≈ 100 hours
• 1M Blue Gene/Q cores — ≈ 12 hours
http://youtu.be/cjoz5tncRUs http://youtu.be/uxT-VmY3OWc
2 c 2015 IBM Corporation
3. Secrets of the Dark Universe
• Cosmology: The evolution of the Universe simulation
• Understanding the physics of the dark matter and energy
• 1 BG/Q rack — 68B particles
• 32 BG/Q racks — 1.1T particles
http://www.youtube.com/watch?v=tdv8yrJk4VE http://www.youtube.com/watch?v=-S-T_iTiAxQ
3 c 2015 IBM Corporation
4. Real-time modeling of human heart ventricles
• Physiology: Simulation of drug-induced
arrhythmias
• Resolution — 0.1 mm
• 768k Blue Gene/Q cores
• 43% peak
• http://dl.acm.org/citation.cfm?id=2388999
• LLNL, IBM Research, IBM Research
Collaboratory for Life Sciences
4 c 2015 IBM Corporation
5. Modelling of a complete human viral pathogen poliovirus
• Molecular biology: Reconstruction and simulation of poliovirus
• Antiviral drugs, virus infection, modelling related viruses
• 3.3M–3.7M atoms
• Blue Gene/Q, Victorian Life Sciences Computing Initiative
• http://www.youtube.com/watch?v=Nih0Qa673FY
5 c 2015 IBM Corporation
7. Outline
1 Data centric computing as a new HPC paradigm
2 Architecture of IBM HPC systems based on POWER8+NVIDIA servers
3 Software stack of IBM HPC systems
4 IBM HPC mathematical libraries
5 Measuring efficiency of an HPC system on real applications
6 Summary
7 c 2015 IBM Corporation
9. Data centric computing as a new HPC paradigm
• That is all about moving data around
• Memory bandwidth
• Memory latency
• High value of “memory access operations” / “computations”
• Number of FLOPs1 per cycle is no longer relevant
• Offloading computing to memory (Active Memory Cube by Micron)
1
FLOP — Floating-Point Operation
9 c 2015 IBM Corporation
13. Overview of IBM Power System S822LC
Power S822LC model 8335-GTA
• POWER8 processor module:
8-core, 3.32 GHz
10-core, 2.92 GHz
• Two sockets
• Graphics processing units
Two NVIDIA K80 GPUs
• Eight memory slots
• 2U height
13 c 2015 IBM Corporation
15. System software
Software stack of IBM HPC systems
• System software
Operating system: Linux, bare-metal (no virtualization)
Drivers:
• Mellanox InfiniBand OFED
• NVIDIA
Deployment: xCAT
• Parallel operating environment
IBM Parallel Environment Runtime Edition (PE RTE)
Workload scheduler: IBM Platform LSF
Parallel filesystem: IBM Spectrum Scale (“GPFS”2
)
2
General Parallel File System
15 c 2015 IBM Corporation
16. Development tools
Software stack of IBM HPC systems
• Compilers
IBM XL C/C++/Fortran compilers
IBM Advance Toolchain, http://ibm.co/AdvanceToolchain
• Fork of GNU compiler/tools optimized for POWER8
• gcc, g++, gfortran
• Analysis tools (oprofile, valgrind, itrace)
Vanilla GCC, binutils, etc.
CUDA Toolkit
• IBM Parallel Environment Developer Edition (PE DE)
• IBM Software Development Kit for Linux on Power
• Mathematical libraries
Mathematical Acceleration Subsystem (MASS)
IBM Engineering and Scientific Subroutine Library (ESSL)
IBM Parallel ESSL
16 c 2015 IBM Corporation
17. Engineering and Scientific Subroutine Library
• High-performance mathematical functions
Scientific applications
Engineering applications
• Platforms
IBM POWER servers
IBM POWER clusters
• Libraries
ESSL Serial and SMP: 600+ subroutines
(SMP — Symmetric Multi-Processing)
Parallel ESSL: 125+ subroutines
• Languages:
C
C++
Fortran
http://www.ibm.com/systems/power/software/essl
17 c 2015 IBM Corporation
18. ESSL: Industry de facto standards
• ESSL implements the following interfaces:
BLAS (linear algebra)
LAPACK (linear algebra)
FFTW (Fourier transformation)
• Parallel ESSL implements the following interfaces:
ScaLAPACK
• Easy migration
• Just recompile! http://fftw.org
18 c 2015 IBM Corporation
19. What mathematical areas are supported?
ESSL
• Linear algebra subprograms
• Matrix operations
• Linear algebraic equations
• Eigensystems analysis
• Fourier transforms, convolution,
correlation, . . .
• Sorting and searching
• Interpolation
• Numerical quadrature
• Random number generation
Parallel ESSL
• BLACS
• Level 2 parallel BLAS
• Level 3 parallel BLAS
• Linear algebraic equations
• Eigensystems analysis
• Fourier transforms
• Random number generation
19 c 2015 IBM Corporation
20. How to leverage the hardware?
Symmetric multiprocessing:
• Multiple hardware threads
• Multiple cores
POWER8+NVIDIA:
• Use multiple GPUs
• Select which GPU to use
• Run ESSL in a hybrid mode
20 c 2015 IBM Corporation
21. Synthetic benchmarks vs. real apps
Measuring car pollution in official tests?
• You get low toxic nitrogen oxides in a lab environment
• You cannot predict how much smoke you produce,
unless you test your scenarios
• You would run a testdrive prior to car purchase
21 c 2015 IBM Corporation
23. Importance of threads affinity
NAS Parallel Benchmarks, mg.C (peaks at SMT1), 20 cores
23 c 2015 IBM Corporation
24. Choice of compilation parameters: -O5 -qnohot
NAS Parallel Benchmarks, bt.C, affinity, baseline: -O3 -qhot
24 c 2015 IBM Corporation
25. Compilation parameters: -O3 -qhot, -O5 -qprefetch
NAS Parallel Benchmarks, mg.C, affinity, baseline: -O5 -qnohot
25 c 2015 IBM Corporation
26. Choice of an SMT mode: SMT1
NAS Parallel Benchmarks, mg.C, affinity, baseline: SMT8
26 c 2015 IBM Corporation
27. Choice of an SMT mode: SMT2, SMT4
NAS Parallel Benchmarks, bt.C, affinity, baseline: SMT1
27 c 2015 IBM Corporation
28. Choice of an SMT mode: SMT8
NAS Parallel Benchmarks, cg.C, affinity, baseline: SMT1
28 c 2015 IBM Corporation
29. Summary
• Technical computing problems ⇒ need for HPC
• Data centric computing as a new HPC paradigm
• CORAL project
• IBM Power System S822LC model 8335-GTA
• IBM HPC Software stack
• High performance math libraries
• Leveraging performance
29 c 2015 IBM Corporation
31. Further reads
• XL C/C++ for Linux,
http://www.ibm.com/support/knowledgecenter/SSXVZZ/
• XL Fortran for Linux,
http://www.ibm.com/support/knowledgecenter/SSAT4T/
• XL C/C++ for Linux 13.1.2 Optimization and Programming Guide,
http:
//www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.2/com.
ibm.xlcpp1312.lelinux.doc/proguide/optimization.html
• XL Fortran for Linux 15.1.2 Optimization and Programming Guide,
http://www.ibm.com/support/knowledgecenter/SSAT4T_15.1.
2/com.ibm.xlf1512.lelinux.doc/proguide/optimization.html
31 c 2015 IBM Corporation
32. Relevance of LINPACK
• Based on DGEMM()
• 80–90% of peak performance
• Commercial deployment verification test for large systems
• Proprietary binary files run by the installation team
32 c 2015 IBM Corporation
34. Benchmarking methodology options
1. Take one core for the initial tuning
Try SMT1, SMT2, SMT4, SMT8
(number of threads + affinity + -qtune=pwr8:XXX
Try different optimization options (-O3, -O4, . . . )
2. Choose SMT-mode and compiler options that provide the best timing
3. Take one core as a baseline
Run on 1–5 cores (within one chip)
Run on 5 and 10 cores (within one socket)
Run on 10 and 20 cores
34 c 2015 IBM Corporation
35. POWER8 features
• Eight threads per core
Hide memory latency (like GPU3
)
Instuction flow is arbitrary (unlike GPU)
• Memory bandwidth
• No sense in benchmarking only one thread (like in GPU)
• Scalability within a core depends only on the application
• Advanced features to try
Transactional memory
Relaxed memory model
Decimal floating point unit
3
GPU — Graphical Processing Unit
35 c 2015 IBM Corporation
36. Disclaimer
All the information, representations, statements, opinions and proposals in this
document are correct and accurate to the best of our present knowledge but are
not intended (and should not be taken) to be contractually binding unless and
until they become the subject of separate, specific agreement between us.
Any IBM Machines provided are subject to the Statements of Limited Warranty
accompanying the applicable Machine.
Any IBM Program Products provided are subject to their applicable license terms.
Nothing herein, in whole or in part, shall be deemed to constitute a warranty.
IBM products are subject to withdrawal from marketing and or service upon
notice, and changes to product configurations, or follow-on products, may result
in price changes.
Any references in this document to “partner” or “partnership” do not constitute or
imply a partnership in the sense of the Partnership Act 1890.
IBM is not responsible for printing errors in this proposal that result in pricing or
information inaccuracies.
36 c 2015 IBM Corporation
37. Правовая информация
IBM, логотип IBM, BladeCenter, System Storage и System x являются товарными знаками International Business
Machines Corporation в США и/или других странах. Полный список товарных знаков компании IBM смотрите
на узле Web: www.ibm.com/legal/copytrade.shtml.
Названия других компаний, продуктов и услуг могут являться товарными знаками или знаками обслуживания
других компаний.
(c) 2015 International Business Machines Corporation. Все права защищены.
Упоминание в этой публикации продуктов или услуг корпорации IBM не означает, что IBM предполагает
предоставлять их во всех странах, в которых осуществляет свою деятельность, информация о
предоставлении продуктов или услуг может быть изменена без уведомления. За самой свежей информацией
о продуктах и услугах компании IBM, предоставляемых в Вашем регионе, следует обращаться в ближайшее
торговое представительство IBM или к авторизованным бизнес-партнерам.
Все заявления относительно намерений и перспективных планов IBM могут быть изменены без уведомления.
Информация о продуктах третьих фирм получена от производителей этих продуктов или из опубликованных
анонсов указанных продуктов. IBM не тестировала эти продукты и не может подтвердить
производительность, совместимость, или любые другие заявления относительно продуктов третьих фирм.
Вопросы о возможностях продуктов третьих фирм следует адресовать поставщику этих продуктов.
Информация может содержать технические неточности или типографические ошибки. В представленную в
публикации информацию могут вноситься изменения, эти изменения будут включаться в новые редакции
данной публикации. IBM может вносить изменения в рассматриваемые в данной публикации продукты или
услуги в любое время без уведомления.
Любые ссылки на узлы Web третьих фирм приведены только для удобства и никоим образом не служат
поддержкой этим узлам Web. Материалы на указанных узлах Web не являются частью материалов для
данного продукта IBM.
37 c 2015 IBM Corporation