OpenPOWER/POWER9 Webinar from MIT and IBM

IBM AI Solutions on
Power –Key features
OpenPOWER Webinar
Clarisse Taaffe-Hedglin
clarisse@us.ibm.com
Executive HPC/AI Architect
Client Experience Centers
IBM Systems

Agenda
Systems Designed for ”Faster Time to Results”
Researchers and Universities Adopting Power9
IBM Machine Learning and Deep Learning solutions

Today’s challenges demand innovation
Data holds competitive valueFull system and stack
open innovation required
44 zettabytes
unstructured data
2010 2022
structured data
DataGrowth
Price/Performance
Moore’s
Law
Processor
Technology
2000 2020
Firmware / OS
Accelerators
Software
Storage
Network
pushing the limits of chip technology

Post Moore’s law Computing Programming Models
Move data manipulation to memory
• More efficient to manage data in
memory
• Compute devices can be made
simpler and more efficient
But requires consistent (location / device
independent) model to address data.
Implement fundamental new data security models
• Supports simpler units to compute on data
• Essential for Enterprise.
Embed AI to add intelligence into the system
• Target hardware enabled data and compute
placement
• Identify patterns and allows automatic tuning
for those patterns.
• Supports higher level of abstraction in
programming

© IBM Corporation 2019 5
Cognitive Systems End to End Design
Server Technologies
“The Engine”
Network Technologies
“The Fuel Lines”
Storage Technologies
“The Fuel”
Systems tuned front to back to produce results instead of economical parts assembly.
We build fast computational cars. End to End
Software Defined Infrastructure/Scheduler/Orchestrator
“The Drive Train”
Single Vendor Support

Collaboration with the White
House Office of Science and
Technology Policy and the U.S.
Department of Energy and many
others, IBM is helping launch
the COVID-19 High Performance
Computing Consortium, which will
bring forth an unprecedented
amount of computing power—16
systems with more than 330
petaflops, 775,000 CPU cores,
34,000 GPUs, and counting — to
help researchers everywhere
better understand COVID-19, its
treatments and potential cures.
The compound, shown in gray, was calculated to bind to the
SARS-CoV-2 spike protein, shown in cyan, to prevent it from
docking to the Human Angiotensin-Converting Enzyme 2, or
ACE2, receptor, shown in purple. Credit: Micholas Smith/Oak
Ridge National Laboratory, U.S. Dept. of Energy
Summit in the News

Battling the
Pandemic with
Accelerated
Data and AI
Platform Genomics Molecular Simulations
Medical Image Processing
Cryogenic Electron
Microscopy
Natural Language Processing

IBM Power University Adopters in the US.
** Hyperlinks to customer websites within the logos.

IBM’s Global Research Capability
11
Healthcare
Government
Financial Services
Healthcare
Industry Cloud
IoT
Blockchain
Cognitive Robotics
Financial Services
Accessibility
Core AI Capabilities
Cloud & IoT
Industry Solutions
Blockchain
Cognitive Fashion
Education & Skilling
Cognitive Financial Services
Cognitive
Healthcare
IoT & Mobile
SecuritySecurity
Analytics
Nanotechnology
Exascale
Cognitive IoT
AI for Healthcare
Edge ComputingBig Data & Cognitive
Cloud
Healthcare / Life Sciences
Quantum Computing
POWER
Mobile
Aging
Cognitive Oil & Gas
Insurance Analytics
Industry Cloud
Big Data
Nanomaterials
Neurosynaptics
3,000+researchers
Australia
Tokyo
China
Almaden
Haifa
Zurich
Africa
Ireland
Brazil
Watson
Austin
India

OpenPOWER Collaboration to Build Optimized AI Servers
IBM Power Systems S822LC for
High Performance Computing
• Up to 5.2 Tflops/GPU
• Stacked Memory for increased
BW, capacity & energy efficiency
• Enhanced Unified Memory
• Up to 12 SMT8 cores
• CAPI Acceleration
• Adaptive Power Management
• 100 Gb/s EDR
• In-Network Computing with SHARP
• Adaptive Routing
• Native RDMA
• NVMe over Fabrics offload
• PCIe Gen 4
• CAPI v2 for fast virtual RDMA
support
• Hardware Tag Matching
(automate pt-2-pt
communication)
• MPI rendezvous protocol offload
• Precision time protocol support
• Up to 24 SMT4 cores
• CAPI v2 , PCIe Gen 4
• Superior Core Performance
• Up to 7.8 TF/GPU
• Next Generation High
Bandwidth Memory
• Memory coherency
• Billion Cell Reservoir Simulation in
record time (92 mins vs 20 hours)
• ResNet-50 90-epoch training in
lowest time (7 hours vs 10 days) with
highest accuracy (33.8% vs 29.8%)
IBM Power Systems
POWER9 server for HPC & AI
NVLink-1 5x faster than
PCIe Gen 3
NVLink-2 7-10x faster
than PCIe Gen 3
GPU can access CPU’s
page tables
NVLink
P9 CPUDDR4
NVLink
NVLink
Tesla
V100
Tesla
V100
Tesla
V100
NVL
100 GB/s
NVL
100 GB/s
100 GB/s
100GB/s
100GB/s
100GB/s
170 GB/s
IBM Power Systems AC922

14
IBM POWER9 Family
When data-intensive workloads are the bottom line
S922/S914/S924
H922/H924/L922
E950/H950 E980/H980 LC922/LC921/IC922 AC922/IC922
Enterprise AI WorkloadsBig Data Workloads
Entry Midsize Enterprise
Mission Critical Data Intensive Workloads for Private Clouds
HPC-AI Systems

Store Large Models & Dataset in
System Memory
Transfer One Layer at a Time to GPU
17
100GB/s
Memory
CPU
170GB/s
NVLink
150 GB/s
IBM AC922 Power9 Server
CPU-GPU NVLink 5x Faster
than Intel x86 PCI-Gen3
GPU GPU
Memory
CPU
170GB/s
NVLink
150 GB/s
GPU GPU
500 Iterations of Enlarged GoogleNet model on Enlarged
ImageNet Dataset (2240x2240), mini-batch size = 15
Both servers with 4 NVIDIA V100 GPUs
4.7x Faster
Large Model Support (LMS) Enables
Higher Accuracy via Larger Models

TensorFlow Large Model Support Example
3D U-Net segmentation models
with higher resolution images
allows for learning and labeling
finer details and structures of brain
tumors.
https://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/

Enterprise AI Hardware Portfolio
IBM Power AC922
TRAIN
Powering the Fastest Supercomputer
DATA
IBM Power IC922
INFERENCE
IBM Power IC922
Deploy AI into ProductionStorage Dense Server
19
• NVMe dense server with IO rich
architecture for superior throughput1
• Enterprise ready cloud deployment
with RH OpenShift and Power
Systems reliability
• 2.35x superior price/performance for
containerized cloud deployments
• Best training platform with 4x faster
model iteration
• ~6x data throughput with NVLink
to GPUs
• Synergistic HW/SW offerings for ease
of use and leadership performance
• Superior density (33%) and through-
put to inference accelerators
• Open design for accelerator diversity
• Deploy inference at scale with HW
and SW solution offerings
NEW! NEW!

Designed for the AI Era
Architected for the modern
analytics and AI workloads that
fuel insights
An Acceleration Superhighway
Unleash state of the art IO and
accelerated computing potential in
the post “CPU-only” era
Delivering Enterprise-Class AI
Flatten the time to AI value curve
by accelerating the journey to build,
train, and infer deep neural networks
IBM POWER SYSTEMS AC922 Realize unprecedented performance
and application gains with
POWER9 based solutions

IC922 for DATA NVMe and PCI Gen4 capability designed to be
the fastest compute and data server available
• Balanced storage, network, and memory
design for optimized storage rich solutions
• 33% more bandwidth (340 GB/s DDR
BW on IC922
vs. 255 GB/s BW on x86)
• Better memory capacity capability with
32 DDR4 RDIMM slots (competition
needs bigger-sized, higher cost DIMMs)
• Rich storage capacity – up to 24 SAS/SATA or
NVMe1 drives in 2U form factor
• Total 10 PCIe slots – PCIe Gen4 slots
available to support high speed network
connectivity
• 2x throughput capability for high
performance tiers

IC922 for INFERENCING Deploy AI into Production with
IBM’s End to End Solution for AI• Open design for accelerator flexibility and future ready
• Purpose built to support accelerator diversity (GPU, FPGA,
ASIC)
• Future ready with PCIe Gen4 today to accommodate
new adapters
• Accelerator density in 2U Form Factor
• Up to 8 accelerators1 – can drive 6 of the 8 accelerators at
full bandwidth vs. competition, which can only support 6
and drive 4 at full bandwidth
• Near-linear scaling across all GPUs for key inference
workloads – image classification, object detection,
recommender,
and machine translation
• Better TCO for inferencing – more throughput/density per
server drives 25% less servers for same work versus
competition
and reduces associated power/cooling/space cost
• Optimized hardware with AI software stack
• Up to 160 threads - 2x thread throughput
• WML-CE and PowerAI Vision2
• Inferencing software from the WML-A portfolio2

IBM Elastic Storage Server (ESS)
• Optimal building block for high-performance, scalable,
reliable enterprise storage
– Faster data access with choice to scale-up or out
– Easy to deploy clusters with unified system GUI
– Simplified storage administration with IBM Spectrum Control integration
• One solution for all your data needs
– Single repository of data with unified file and object support
– Anywhere access with multi-protocol support:
NFS 4.0, SMB, OpenStack Swift, Cinder, and Manila
– Ideal for Performance Backup and Archive Repository
• Ready for business-critical data
– Disaster recovery with synchronous or asynchronous replication
– Ensure reliability and fast rebuild times using Spectrum Scale RAID’s
dispersed data and erasure code
|
23

ESS Model Range
| 24
Spectrum Scale
ESS
Capacity is approximate based on 8+2P, single shared Data and Metadata pool. Performance is based on standard IOR benchmark, sufficient clients & network performance etc.
Performance shown includes reduction from peak performance measured in testing, as an allowance for variations in real world deployments.
Achievable performance will vary from the figures shown, based on workload, network, and other factors outside of IBM’s control.
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
7 - 32 GB/s
Models GL1S, GL2S, GL4S, GL5S, GL6S
1-6, 84 disk drive enclosures
0.25 – 6.8 PB usable
0.33 – 8.9 PB raw
GLxS
Disk
High perfomance,
capacity
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
60 TB – 1.1 PB usable
90 TB to 1.5 PB raw
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
9 - 37 GB/s
Models GS1S, GS2S, GS4S
1-4 SSD enclosures
High perf, IOPS,
random
I/O
GSxS
Flash
Disk: 0.5 - 2.5 PB usable
SSD: 60 - 530 TB usable
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
GHxx
Hybrid
Disk: 14 to 29 GB/s
SSD: 13 to 26 GB/s
Max: to 36 GB/s*
Models GH12, GH14, GH22, GH24
2-4, 84 drive HDD enclosures
1-2, 24 drive SSD enclosures
Combined high
perfomance,
capacity, IOPS,
random
I/O
*Maximum combined Disk&SSD Perf per ESS unit
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
Models GL1C, GL2C, GL4C, GL5C, GL6C, GL8C
1-8, 106 disk drive enclosures
7 - 32 GB/s
0.78 – 9.1 PB usable
1 – 11.8 PB raw
GLxC
Disk
High
performance,
capacity, density
9 PB
per
rack

9 April 2020/ © 2018 IBM Corporation
• Who, what, when, where, and why of account, container, object, stream, dir, file
• Perfect for indexing and searching
• Metadata may be separate from the data, stored with the data, or derived from the data
• Posix inode plus extended attributes
• Standard document headers (doc, ppt, mp3, dicom, pdf, jpeg, GeoTIFF)
• Custom metadata tags
• AI derived metadata
Age, Biomarkers, Developmental Stage, Cell
Surface, Markers, Cell Type/Cell Line,
Disease State, Extract Molecule, Genetic
Characteristics, Immunoprecipitation,
antibody, Organism,
Biomedical
Natural Language
Processing
Image
Location
Size
Owner
Group
Permissions
Last-Modified
...
System
Metadata
Metadata: Key to Unlocking Data Value & Improving Management
Spectrum Discover

Watson Machine
Learning
Community Edition Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine
WML CE: Open Source ML Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL – 1000s of nodes)
Auto Hyper-parameter
TuningWatson Machine
Learning
Accelerator
IBM Visual Insights
Auto-DL for Images &
Video
Label Train Deploy
Accelerated
Infrastructure
Accelerated Servers Storage
AI for
Data Scientists and
non-Data Scientists
H20 Driverless AI
Auto-ML for Text & Numeric
Data
Import Experiment Deploy
Watson Machine Learning family
Distributed Deep Learning
Horovod
IBM Video
Analytics
Process
Video
Utilize
Models

Training for mobile enabled DL models
CREATE a CUSTOM AI Model
Inference on iOS devices
Run INFERENCE on iOS Devices
(App is not customizable by end users)
1 2
3
Complete “data centre to edge computing” Solution
IBM Visual
Insights
(formerly
PowerAI Vision)
*CoreML
with
App Store
IBM Visual Insights IBM Visual Inspector+

IBM Visual Insights Core Capabilities
28
Image Classification Object Detection Image Segmentation Action Recognition

IBM Visual
Inspector
Functions
Gather data to build
model
Infer, disconnected or
connected
Remote
management of
devices and models
Monitor ongoing
production
Feedback data for
retraining / quality

Welcome to the waitless worldTopicsWatson ML Accelerator: A Data Science & Enterprise AI Platform
Architecture Overview
30
Embracing Kubernetes & Containers (OpenShift)
On Premise, K8S/Docker via OpenShift (Power and x86 Support)
Kubernetes & Containers
Advanced Kubernetes Scheduling Policy Engine
Kubernetes Namespace with CPU/GPU Resources
Advanced Workload Scheduler – Meta Session Scheduling Daemon (MSD)
Training Execution – EDT
Hyper-Parameter Optimization Execution- HPO
Inference Execution - EDI
Resource
Management
Resource
Allocation
Workload
Scheduler
Execution Logic
Example
Frameworks /
Development
Tools / 3rd Party
Support
SnapML
WMLA:
End-to-End
Enterprise AI
Platform

Simplicity: Integrated
Platform that Just Works
Curate, test, and
support fast moving
Open Source
Provide enterprise
distributions
Easy to deploy
enterprise AI platform
Ease of Use,
Unique Capabilities
Faster Model
Training Time
Large data & model
support with NVLink
Acceleration of analytics,
ML and DL
AutoDL: Visual Insights
AutoML: H2O
Elastic training: scale
GPUs as required
Faster training times
from single server
with scalability to 100s
of servers
Leads to faster insights
and better economics
Platform that Partners
can build on
Software Partners:
H2O, IBM, Anaconda
SIs, Solution Vendors
& Accelerator Partners
Open AI Platform w/
Ecosystem Partners
Power9
CPU
GPU
WML-A
IBM
SW
ISV
SW
Solution
SIs
Top reasons to choose Watson ML Accelerator

Performance Enhancements via GPU Acceleration
Libraries and Frameworks
• TensorFlow, Pytorch
•NVIDIA Libraries
• Math library, cuBlas, NPP
• Rapids cuML, CuDF
•AutoML, AutoDL packages
•Distributed: Horovod, DDL
•Snap ML
•ESSL/PESSL
Programing models
Supporting directives
Programing language
Targeting GPU
Platform Optimized
• Easy to Implement
• Tested and Supported
• Limited – Some needs
may not be covered
• Democratized
• Modification of existing
programs with
directives
• Compiler assists with
mapping to device
• Most time intensive
• Requires deep
expertise
• Achieves best
performance results
Ease of Use
Best Application Performance
Easy
Best
Application
Choices
Advantages
Disadvantages

34
https://developer.ibm.com/linuxonpower/2020/03/26/benchmarking-linear-models-
of-machine-learning-ml-frameworks-snap-ml-versus-cuml
https://github.com/IBM/powerai/tree/master/benchmarks/SnapML/linear_models
Analysis of Linear models of
ML frameworks performance
For data sets that do not fit in the GPU
memory, Snap ML is a clear ML framework
winner.
For dense data sets with a small number
of features, cuML is a better candidate.
For sparse data sets, snap ML is winning
against both cuML and scikit-learn.

Bayesian Optimization – a
highly reusable, valuable asset
35
BOaaS
Power9
Accelerating
scientiﬁc
workflows
Accelerating
HPC ensembles
Tuning ML/DL
models
Optimising cloud
systems and
applications
param
eters out
results in
param
eters out
results in
param
eters out
results in
parameters out
results in

Bayesian
Optimization
Library
BOA API Server
BOA UI BOA SDK
BOA apps
Contains fundamental
methods and
implementations, links to
other IP such as PowerAI,
Deep Bayesian Networks
RESTful API server and
experiment database which
allows easy access to
optimization through PUT and
GET actions, and catalogues
optimization experiments
Python (though other
languages are possible)
library for easy, integrated
access to BOA APIs
Web interface for
configuring BOA
experiments, and
visualizing progress
and analysis
(Web) apps, often written in
python-DASH, which present
customer specific interfaces
to experiments and APIs
Figure 1 Breakdown of BOA components - the blue zone indicates IP should be kept by BOA, yellow zone indicates IP can be kept by customer
• Bayesian optimization allows us to
answer the question ‘Given what I know,
what should I do next for the best result?’
(AKA ‘Intelligent Search’)
• Developed state of the art methods
advancing both the efficiency and
robustness of Bayesian optimization
across many potential applications.
• Potential beneficiaries do not need to
understand Bayesian Optimization or
state of the art methods but want to
interact with it to derive business value.
• BOA allows them to do precisely this

36
IBM AI Differentiators
Open, multicloud by design
Manage all your data and AI
assets, regardless of origin
AI lifecycle automation
Drive productivity within a unified,
fully governed platform
Pre-built enterprise apps
Speed time-to-value with less
skills required
Proven, prescriptive, trusted
Partner with the leader in applied
enterprise AI

Client Experience Centers Additional Resources
Design Sprint
Discovery Workshop
Discuss infrastructure and business challenges and identify potential use cases
IBM provides a 4-hour free workshop
Deliverable: Workshop and Use Cases
MVP (Minimum Viable Product) Build
Architectural Consulting
Team with an architect to help you define the framework of your solution.
IBM provides one week of solution architecture consulting
Deliverable: 40 hours of architecture consulting with an IBM architect
Develop a functioning solution using agile methodologies, leveraging IBM experts
IBM provides an application development team for 6-8 weeks
Deliverable: MVP application
How can IBM
make you
successful?
Contact:
design@us.ibm.com
aicoc@us.ibm.com
Apply IBM Design Thinking principles to evaluate current business and technology
processes and define the minimum viable product (MVP).
IBM provides one week of solution design, including an in-person workshop
Deliverable: Workshop and MVP definition

OpenPOWER/POWER9 Webinar from MIT and IBM

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à OpenPOWER/POWER9 Webinar from MIT and IBM

Similaire à OpenPOWER/POWER9 Webinar from MIT and IBM (20)

Plus de Ganesan Narayanasamy

Plus de Ganesan Narayanasamy (20)

Dernier

Dernier (20)

OpenPOWER/POWER9 Webinar from MIT and IBM