Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
OpenPOWER/POWER9 Webinar from MIT and IBM
1. IBM AI Solutions on
Power –Key features
OpenPOWER Webinar
Clarisse Taaffe-Hedglin
clarisse@us.ibm.com
Executive HPC/AI Architect
Client Experience Centers
IBM Systems
2. Agenda
Systems Designed for ”Faster Time to Results”
Researchers and Universities Adopting Power9
IBM Machine Learning and Deep Learning solutions
3. Today’s challenges demand innovation
Data holds competitive valueFull system and stack
open innovation required
44 zettabytes
unstructured data
2010 2022
structured data
DataGrowth
Price/Performance
Moore’s
Law
Processor
Technology
2000 2020
Firmware / OS
Accelerators
Software
Storage
Network
pushing the limits of chip technology
4. Post Moore’s law Computing Programming Models
Move data manipulation to memory
• More efficient to manage data in
memory
• Compute devices can be made
simpler and more efficient
But requires consistent (location / device
independent) model to address data.
Implement fundamental new data security models
• Supports simpler units to compute on data
• Essential for Enterprise.
Embed AI to add intelligence into the system
• Target hardware enabled data and compute
placement
• Identify patterns and allows automatic tuning
for those patterns.
• Supports higher level of abstraction in
programming
8. Collaboration with the White
House Office of Science and
Technology Policy and the U.S.
Department of Energy and many
others, IBM is helping launch
the COVID-19 High Performance
Computing Consortium, which will
bring forth an unprecedented
amount of computing power—16
systems with more than 330
petaflops, 775,000 CPU cores,
34,000 GPUs, and counting — to
help researchers everywhere
better understand COVID-19, its
treatments and potential cures.
The compound, shown in gray, was calculated to bind to the
SARS-CoV-2 spike protein, shown in cyan, to prevent it from
docking to the Human Angiotensin-Converting Enzyme 2, or
ACE2, receptor, shown in purple. Credit: Micholas Smith/Oak
Ridge National Laboratory, U.S. Dept. of Energy
Summit in the News
9. Battling the
Pandemic with
Accelerated
Data and AI
Platform Genomics Molecular Simulations
Medical Image Processing
Cryogenic Electron
Microscopy
Natural Language Processing
10. IBM Power University Adopters in the US.
** Hyperlinks to customer websites within the logos.
11. IBM’s Global Research Capability
11
Healthcare
Government
Financial Services
Healthcare
Industry Cloud
IoT
Blockchain
Cognitive Robotics
Financial Services
Accessibility
Core AI Capabilities
Cloud & IoT
Industry Solutions
Blockchain
Cognitive Fashion
Education & Skilling
Cognitive Financial Services
Cognitive
Healthcare
IoT & Mobile
SecuritySecurity
Analytics
Nanotechnology
Exascale
Cognitive IoT
AI for Healthcare
Edge ComputingBig Data & Cognitive
Cloud
Healthcare / Life Sciences
Quantum Computing
POWER
Mobile
Aging
Cognitive Oil & Gas
Insurance Analytics
Industry Cloud
Big Data
Nanomaterials
Neurosynaptics
3,000+researchers
Australia
Tokyo
China
Almaden
Haifa
Zurich
Africa
Ireland
Brazil
Watson
Austin
India
13. OpenPOWER Collaboration to Build Optimized AI Servers
IBM Power Systems S822LC for
High Performance Computing
• Up to 5.2 Tflops/GPU
• Stacked Memory for increased
BW, capacity & energy efficiency
• Enhanced Unified Memory
• Up to 12 SMT8 cores
• CAPI Acceleration
• Adaptive Power Management
• 100 Gb/s EDR
• In-Network Computing with SHARP
• Adaptive Routing
• Native RDMA
• NVMe over Fabrics offload
• PCIe Gen 4
• CAPI v2 for fast virtual RDMA
support
• Hardware Tag Matching
(automate pt-2-pt
communication)
• MPI rendezvous protocol offload
• Precision time protocol support
• Up to 24 SMT4 cores
• CAPI v2 , PCIe Gen 4
• Superior Core Performance
• Up to 7.8 TF/GPU
• Next Generation High
Bandwidth Memory
• Memory coherency
• Billion Cell Reservoir Simulation in
record time (92 mins vs 20 hours)
• ResNet-50 90-epoch training in
lowest time (7 hours vs 10 days) with
highest accuracy (33.8% vs 29.8%)
IBM Power Systems
POWER9 server for HPC & AI
NVLink-1 5x faster than
PCIe Gen 3
NVLink-2 7-10x faster
than PCIe Gen 3
GPU can access CPU’s
page tables
NVLink
P9 CPUDDR4
NVLink
NVLink
Tesla
V100
Tesla
V100
Tesla
V100
NVL
100 GB/s
NVL
100 GB/s
100 GB/s
100GB/s
100GB/s
100GB/s
170 GB/s
IBM Power Systems AC922
14. 14
IBM POWER9 Family
When data-intensive workloads are the bottom line
S922/S914/S924
H922/H924/L922
E950/H950 E980/H980 LC922/LC921/IC922 AC922/IC922
Enterprise AI WorkloadsBig Data Workloads
Entry Midsize Enterprise
Mission Critical Data Intensive Workloads for Private Clouds
HPC-AI Systems
17. Store Large Models & Dataset in
System Memory
Transfer One Layer at a Time to GPU
17
100GB/s
Memory
CPU
170GB/s
NVLink
150 GB/s
IBM AC922 Power9 Server
CPU-GPU NVLink 5x Faster
than Intel x86 PCI-Gen3
GPU GPU
Memory
CPU
170GB/s
NVLink
150 GB/s
GPU GPU
500 Iterations of Enlarged GoogleNet model on Enlarged
ImageNet Dataset (2240x2240), mini-batch size = 15
Both servers with 4 NVIDIA V100 GPUs
4.7x Faster
Large Model Support (LMS) Enables
Higher Accuracy via Larger Models
18. TensorFlow Large Model Support Example
3D U-Net segmentation models
with higher resolution images
allows for learning and labeling
finer details and structures of brain
tumors.
https://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/
19. Enterprise AI Hardware Portfolio
IBM Power AC922
TRAIN
Powering the Fastest Supercomputer
DATA
IBM Power IC922
INFERENCE
IBM Power IC922
Deploy AI into ProductionStorage Dense Server
19
• NVMe dense server with IO rich
architecture for superior throughput1
• Enterprise ready cloud deployment
with RH OpenShift and Power
Systems reliability
• 2.35x superior price/performance for
containerized cloud deployments
• Best training platform with 4x faster
model iteration
• ~6x data throughput with NVLink
to GPUs
• Synergistic HW/SW offerings for ease
of use and leadership performance
• Superior density (33%) and through-
put to inference accelerators
• Open design for accelerator diversity
• Deploy inference at scale with HW
and SW solution offerings
NEW! NEW!
20. Designed for the AI Era
Architected for the modern
analytics and AI workloads that
fuel insights
An Acceleration Superhighway
Unleash state of the art IO and
accelerated computing potential in
the post “CPU-only” era
Delivering Enterprise-Class AI
Flatten the time to AI value curve
by accelerating the journey to build,
train, and infer deep neural networks
IBM POWER SYSTEMS AC922 Realize unprecedented performance
and application gains with
POWER9 based solutions
21. IC922 for DATA NVMe and PCI Gen4 capability designed to be
the fastest compute and data server available
• Balanced storage, network, and memory
design for optimized storage rich solutions
• 33% more bandwidth (340 GB/s DDR
BW on IC922
vs. 255 GB/s BW on x86)
• Better memory capacity capability with
32 DDR4 RDIMM slots (competition
needs bigger-sized, higher cost DIMMs)
• Rich storage capacity – up to 24 SAS/SATA or
NVMe1 drives in 2U form factor
• Total 10 PCIe slots – PCIe Gen4 slots
available to support high speed network
connectivity
• 2x throughput capability for high
performance tiers
22. IC922 for INFERENCING Deploy AI into Production with
IBM’s End to End Solution for AI• Open design for accelerator flexibility and future ready
• Purpose built to support accelerator diversity (GPU, FPGA,
ASIC)
• Future ready with PCIe Gen4 today to accommodate
new adapters
• Accelerator density in 2U Form Factor
• Up to 8 accelerators1 – can drive 6 of the 8 accelerators at
full bandwidth vs. competition, which can only support 6
and drive 4 at full bandwidth
• Near-linear scaling across all GPUs for key inference
workloads – image classification, object detection,
recommender,
and machine translation
• Better TCO for inferencing – more throughput/density per
server drives 25% less servers for same work versus
competition
and reduces associated power/cooling/space cost
• Optimized hardware with AI software stack
• Up to 160 threads - 2x thread throughput
• WML-CE and PowerAI Vision2
• Inferencing software from the WML-A portfolio2
26. Watson Machine
Learning
Community Edition Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine
WML CE: Open Source ML Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL – 1000s of nodes)
Auto Hyper-parameter
TuningWatson Machine
Learning
Accelerator
IBM Visual Insights
Auto-DL for Images &
Video
Label Train Deploy
Accelerated
Infrastructure
Accelerated Servers Storage
AI for
Data Scientists and
non-Data Scientists
H20 Driverless AI
Auto-ML for Text & Numeric
Data
Import Experiment Deploy
Watson Machine Learning family
Distributed Deep Learning
Horovod
IBM Video
Analytics
Process
Video
Utilize
Models
27. Training for mobile enabled DL models
CREATE a CUSTOM AI Model
Inference on iOS devices
Run INFERENCE on iOS Devices
(App is not customizable by end users)
1 2
3
Complete “data centre to edge computing” Solution
IBM Visual
Insights
(formerly
PowerAI Vision)
*CoreML
with
App Store
IBM Visual Insights IBM Visual Inspector+
29. IBM Visual
Inspector
Functions
Gather data to build
model
Infer, disconnected or
connected
Remote
management of
devices and models
Monitor ongoing
production
Feedback data for
retraining / quality
30. Welcome to the waitless worldTopicsWatson ML Accelerator: A Data Science & Enterprise AI Platform
Architecture Overview
30
Embracing Kubernetes & Containers (OpenShift)
On Premise, K8S/Docker via OpenShift (Power and x86 Support)
Kubernetes & Containers
Advanced Kubernetes Scheduling Policy Engine
Kubernetes Namespace with CPU/GPU Resources
Advanced Workload Scheduler – Meta Session Scheduling Daemon (MSD)
Training Execution – EDT
Hyper-Parameter Optimization Execution- HPO
Inference Execution - EDI
Resource
Management
Resource
Allocation
Workload
Scheduler
Execution Logic
Example
Frameworks /
Development
Tools / 3rd Party
Support
SnapML
WMLA:
End-to-End
Enterprise AI
Platform
32. Performance Enhancements via GPU Acceleration
Libraries and Frameworks
• TensorFlow, Pytorch
•NVIDIA Libraries
• Math library, cuBlas, NPP
• Rapids cuML, CuDF
•AutoML, AutoDL packages
•Distributed: Horovod, DDL
•Snap ML
•ESSL/PESSL
Programing models
Supporting directives
Programing language
Targeting GPU
Platform Optimized
• Easy to Implement
• Tested and Supported
• Limited – Some needs
may not be covered
• Democratized
• Modification of existing
programs with
directives
• Compiler assists with
mapping to device
• Most time intensive
• Requires deep
expertise
• Achieves best
performance results
Ease of Use
Best Application Performance
Easy
Best
Application
Choices
Advantages
Disadvantages
35. Bayesian Optimization – a
highly reusable, valuable asset
35
BOaaS
Power9
Accelerating
scientific
workflows
Accelerating
HPC ensembles
Tuning ML/DL
models
Optimising cloud
systems and
applications
param
eters out
results in
param
eters out
results in
param
eters out
results in
parameters out
results in
Bayesian
Optimization
Library
BOA API Server
BOA UI BOA SDK
BOA apps
Contains fundamental
methods and
implementations, links to
other IP such as PowerAI,
Deep Bayesian Networks
RESTful API server and
experiment database which
allows easy access to
optimization through PUT and
GET actions, and catalogues
optimization experiments
Python (though other
languages are possible)
library for easy, integrated
access to BOA APIs
Web interface for
configuring BOA
experiments, and
visualizing progress
and analysis
(Web) apps, often written in
python-DASH, which present
customer specific interfaces
to experiments and APIs
Figure 1 Breakdown of BOA components - the blue zone indicates IP should be kept by BOA, yellow zone indicates IP can be kept by customer
• Bayesian optimization allows us to
answer the question ‘Given what I know,
what should I do next for the best result?’
(AKA ‘Intelligent Search’)
• Developed state of the art methods
advancing both the efficiency and
robustness of Bayesian optimization
across many potential applications.
• Potential beneficiaries do not need to
understand Bayesian Optimization or
state of the art methods but want to
interact with it to derive business value.
• BOA allows them to do precisely this
36. 36
IBM AI Differentiators
Open, multicloud by design
Manage all your data and AI
assets, regardless of origin
AI lifecycle automation
Drive productivity within a unified,
fully governed platform
Pre-built enterprise apps
Speed time-to-value with less
skills required
Proven, prescriptive, trusted
Partner with the leader in applied
enterprise AI