Contenu connexe Similaire à Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems (20) Plus de Ganesan Narayanasamy (20) Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems1. Solutions for ADAS
and AI Data Driven
engineering
1IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Ing. Florin Manaila
Senior Architect and Inventor
NextGen Workloads and Distributed AI
IBM Systems Hardware Europe, Middle East & Africa
Member of the IBM Academy of Technology (AoT)
2. Autonomous Driving is the
key driver for innovation in
the automotive industry.
But: it requires a whole
new set of tools, skills
and capabilites.
Especially in regards
to data and AI.
Autonomous
Vehicles
(AV)
- are capable of sensing
their environment
- combine external input
from sensors like radar,
computer vision, LiDAR,
sonar and GPS.
- interpret this
information to identify
navigation paths,
obstacles and signage
- … to move with little or
no human input
2
Advanced Driver
Assistance
Systems (ADAS)
- automate, adapt and
enhance vehicles for
safety and better
driving
- rely on input from
imaging, LiDAR, radar,
image processing,
computer vision and in-
car data
- examples: stability
control, lane control,
adaptive cruise, traction
control
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
3. Autonomous
Driving is a
data and
time
intensive
challenge.
Innovation in
ADAS/AV
provides huges
amounts of data.
The data
collection volume
of a single test
car for an hour
long of test
driving can add
up to ~15TB.
OEM use fleets of
vehicles to get
enough data and
validation points.
The data volume
for one single
ADAS/AV project
can easily be over
~10 PB.
It takes more
than 200h in
order to tag and
annotate net
driving scenes
from 1h of test
driving.
3IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
4. AV and ADAS include the most
complex AI and data tasks to date.
IN CAR: Complex Real-Time decisions BACKEND: Continuous development & integration
Motion Control
Scene UnderstandingPerception, Sensor fusion
Mission,
Trajectory Planning
Operations Backend
Real-time data
(weather, traffic, accident, ...)
AI Engineering & Training
Tagging/LabelingTest Data Management
Simulation and Approval
AI & Data Centric System Engineering
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
5. ADAS/AV capabilities will be an integral part
of future-proof vehicle 4.0 platforms.
Edge Services
Platform
Owned by OEM + 3rd Parties
Vehicle
Control Center
Security
Operations Center
Owned by OEM
OEM
Apps
3P
Apps
In-Vehicle Platform
Host system,
virtualized EE & API
5G & V2X
Fog
Supplier
Apps
Connected
Vehicle Platform
Vehicle-centric
OEM
Apps
3P
Apps
B2B
Connected
Services Platform
Customer-centric
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
6. We see recurring
challenges across the
industry, when OEMs start
building their autonomous
driving capabilities.
- Complex, massive data
ingestion and data lake
- Open technology
platform management
- High-quality data that
is synchronized,
labeled, tagged and
searchable
- Strong requirements
traceability, test case
and defect
management
6
- Program management
and support across
Geos/Sites and
Partners/Suppliers
- Agile engineering
processes, as well as
AI specific processes
to be aligned to vehicle
development lifecycle
(V-model, ASPICE,
ISO26262)
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
7. ADAS/AD development needs data along the entire value creation process.
IBM’s intelligent data management helps to optimize the overall process.
7
Data
Acquisition/
Ingestion
Analytics/ Scene
Selection
Data Enrichment
/ Labelling
Algorithm
Training
Simulation
Validation /
SIL / HIL
Test drives /
Connectivity
based validation
Reduced
offloading
time
Faster
Findings -
Higher
engineer
productivity
Reduced
trom
weeks to
minutes
Parallel
performance -
Reduce car
fleet
More test
cases in
shorter time
Super fast
access from
everywhere
Online
feedback -
Shorten
the loop
Engineering Lifecycle Management (ELM) References for ASPICE and ISO26262 compliant Development
Hybrid Data Bridge
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
8. IBM Autonomous Driving Engagements
Overview
IBM Service, Technology &
Research Assets
IBM Data Management
for ADAS/AD
IBM Automotive Software
Engineering
IBM Connected
Vehicle Platform
IBM Vehicle Operation Center
& Security Operation Center
Development & Test Operation
IBM Service, Technology &
Research Assets
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
9. The path to Autonomous Vehicles
Neural
network
models
Billions of parameters
Gigabytes
Computation
Iterative gradient based search
Millions of iterations
Mainly matrix operations
Data
Millions of images, sentences
Petabytes
Workload characteristics: Both compute and data intensive!
9
AUTOMATION
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
10. Technical challenges in ADAS real-time object detection
10
Technical challenges in ADAS:
• Safety driving for accident occurred in short distance: Need fast enough response for
vehicle control system
• Detection for object as far as possible and in night time: Extremely small object detection
and in night time, 10pixels*10pixels, should detect ahead >50~70m, to achieve precision
>90% and (1-recall ratio)<10%.
• Different weather situation (raining, snowing, and foggy, etc.)
• Shape distortion due to the angle between the camera and sign.
• Amount of categories will be huge: Only for traffic signs -- hundreds of types for China,
Germany and US.
Extremely small objects
Traffic Sign in US Traffic Sign in Germany
Bad Weather Condition
Shape distortion
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
11. Other technical challenges in ADAS real-time object detection
11
§ To fit the complex network into limited compute and memory resources and real time
detection. Enable AI from data center to edge is critical.
§ Color change depends on the light, weather conditions, and even the age of the sign, and this
variance could be extremely serious in night environment.
§ Distance to objects. Vehicles need real-time detection of the distance between the
objects and the car, so that vehicle companies could build more complex applications,
such as car lights adjustment according to where the objects are.
The red color of the circle has totally changed.
§ Reflection of the car light causes traffic sign couldn't be recognized.
Reflection of light: test data from US
(Click “slide show” to play it)
Distance estimation to the objects
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
12. Model Lifecycle
12
§ Integrity
§ Quality
§ Tools
Data
Development
Approval
Culture Validation
Test & DeployUsage
Performance
Monitoring
Risk
Assessment
§ Design and objectives
§ Hypothesis
§ Assumtions
§ Regulatory context
§ Technology aspects/limitations
§ Data
§ Methodology and theoretical
soundness
§ Backtesting results
§ Stress testing results
§ Model stability
§ Qualitative assessment
§ Model risk quantification
§ Business Functional
requirements
§ User acceptance testing
§ Path to production
§ Sign-off for deployment
§ Available infrastructure
for support
§ Fast Updates
§ Backtesting
§ Performancce metrics
§ Escalations
§ Link models to limit
calibration, thresholds
and risk capacity
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
13. Getting predictions
13
Online AIBatch AI
§ Optimized to minimize the latency of serving
predictions.
§ Can process one or more instances per request.
§ Predictions returned in the response message
§ Returns as soon as possible.
§ Runs on the runtime version and in the region
selected when you deploy the model.
§ Runs models deployed to an embedded
accelerator (GPU, FPGA, ASIC)
§ Optimized to handle a high volume of instances in a
job and to run more complex models.
§ Can process one or more instances per request.
§ Predictions written to output files
§ Asynchronous request.
§ Can run in any AI accelerated cluster, using
optimized runtime version.
§ Runs models deployed to AI Platform or models
stored on-prem or cloud locations.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
15. 15
Transforming AI Infrastructure Stack from yesterday
ON-CLOUD and ON-PREM
Transform & Prep
Data (ETL)
Micro-Services / Applications
APIs
(external and/or in-house)
Machine & Deep Learning
Libraries & Frameworks
Distributed Computing
Data Lake & Data Stores
Segment Specific:
Finance, Retail, Healthcare,
Automotive
Speech, Vision,
NLP, Sentiment
TensorFlow, Caffe,
Pytorch
SparkML, Snap.ML
Spark, MPI
Hadoop HDFS,
NoSQL DBs,
Parallel File
System
Accelerated
Infrastructure
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
16. 16
Transform &
Prepare
Data
Micro-Services / Applications
Governance AI
(Fairness, Explainable AI,
Model Health, Accuracy)
APIs
(external and in-house)
Machine & Deep Learning
Libraries & Frameworks
Distributed Computing
Data Lake & Data Stores
Action / Decision
APIs
Internal
Optimized Machine &
Deep Learning Runtime
Federated Learning
Local Cache
Security
Compliance
AI Infrastructure Stack from today
ON-CLOUD / ON-PREM vs near-Edge or Edge
Sensors
Data
Standardization
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
17. Multi-Tenant, Self-Serve, Accelerated Platform
Architecture Overview
RedHat OpenShift
AcceleratedWorkerNodes
forTraining
Machine Learning Frameworks and Libraries (WMLCE)
KubeFlow / ODH
AcceleratedWorkerNodes
forInference
Adversarial Robustness
Toolbox (ART) / Trusted AI
Anaconda
Team Edition
Bayesian
Optimization
Deep Search
ControlPlande
Nodes
KVM #2KVM #1
Master #1
VM
Master #2
VM
Bootstrap
VM
Master #3
VM
Bastion
VM NFS /
Parallel File System
NVMe Storage
Internal
Git
Existing Databases
HPC simulations | SW
Scalability Scalability
Open Data Hub
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
18. Goal / Computation
18
Description Build the model
Continuous
Integration
Train the model on real data
(hyperparameter tuning)
Optimize and
validate the model
Deploy the model
Desired Goal
Build a promising
model
Make sure that the
code base remains bug
free
Make the model work
with real data and optimize
Prepare the model for
deployment
and validation
Provide functionality
using the model
Iteration time Hours Hours Days-Weeks Hours - Weeks Milliseconds
No of Systems 1 10s 10s-100s 10s Hundreds (test fleet) -
Millions (live fleet)
GPUs 1-2 RTX 2080 Ti
or Tesla V100
2-6 Tesla V100 4-6 Tesla V100 4-6 Tesla V100
Embedded Inference
Platforms
(ie. Driver AGX)
Model
+
Data
Data
+
Parameters
Data
+
Parameters
Data
+
Parameters
Environments
+
Data
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
19. Multi-Tenant, Self-Serve, Accelerated Platform
GPUs | NVMEs | HDR
Kubeflow Project
Architecture Components
Fairing
Experiments
Notebooks | Jupyter Lab
Pipelines
Simulation
Pipelines
Kubeflow
Pipelines
Apache Airflow Argo Cloud Composer
Model
Serving TesnsorFlow
Serving and Istio
TensorFlow
Batch Prediction
Seldon Serving Triton PyTorch
Serving
Miscellaneous | Metadata | Nuclio functions
Hyperparameter tuning | Katlib | Google Vizer | Auto-Keras | BO
Model
Training TensorFlow
TensorBoard
TensorFlow
Extended
Pytorch Horovod Other
Operators
Data
Ingestion
Users
Web
Interface
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
20. Open Data Hub Architecture
Vision and Integration
Model Lifecycle
Kubeflow
Seldon
MLflow
ML Appliccations
Open Data HUB
AI Fairness 360
AI Library
Business
Inteligence
Superset
Interactive AI
JupyterHub + plugins
Hue
Big Data Processing
Spark | Spark SQL
Thrift
Streaming
Kafka Streams
Elasticsearch
Data Exploration
Hue
Kibana
Hive Metastore | Spectrum Discovery
Data Lake
Spectrum Scale
Ceph Storage
MinIO
In-Memory
Red Hat Data Grid
Relational Databases
PostgresSQL
MySQL
MariaDB
Red Hat AMQ Streams
(kafka Strimzi)
Red Hat Ceph
S3 API
Kafka
Connect
Logstash Fluentd rsyslogd
Red Hat OpenShift
Kubernetes | RHEL | Hybrid Cloud
Red Hat
OpenShift
Oauth
RedHat
Single Sign-
On
(Keycloak)
Red Hat
Ceph
Object
Gateway
Red Hat
3scale
IBM
Adversarial
Robustness
Toolbox
Prometheus
Grafana
Kubeflow
Pipelines
Argo
Workflows
Jekins CI/CD
TensorBoard
DATA ANALYSIS
MACHINE LEARNING AND DEEP LEARNING
METADATA MANAGEMENT
STORAGE
DATA IN MOTION
CONTAINERS PLATFORM
SECURITY
GUVERNANCE
MONITORING
ORCHESTRATION
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
22. Partnership with IBM Research
22
Visual Insights:
Auto-Deep Learning
Automatic Labeling
Large Model / High-
Res Image Support
Auto-Hyperparameter
Optimization
AI Model Optimizer &
Compiler (FPGAs, SoCs)
IBM Research Innovations
Around Computer Vision
IBM Research Innovation:
Around Analog Inference Chips
Elastic Inference
for large-scale compute
GPU and cluster
accelerated ML
Elastic
Distributed Training
INNOVATION
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
24. Merging Memory and Processing
Source: IBM Research
24
In-memory computing using resistive memory
devices is a promising non-von Neumann
approach for making energy-efficient deep
learning inference hardware
Avoid constant shuttling of data between
memory and processing units, which limits the
maximum achievable energy efficiency.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
25. 25
IBM Power AC922
• GPUs: NVIDIA Tesla V100s
• (5.6 – 10x data throughput) with
Advanced IO Architecture
• PCIe Gen 4
• CAPI 2.0
• OpenCAPI 3.0
• NVLink 2.0 CPU–GPU and GPU-GPU
TRAIN
Powering the Fastest Supercomputer
DATA
IBM Power LC922/IC922
• Up to 120 TB of data storage
• Superior I/O: PCIe Gen 4
• IBM Spectrum Scale / ESS
Hadoop Integration
INFERENCE
SIMULATION
AI Inference Platform
Hardware
• IC922 2U
• NVIDIA T4 GPUs or FPGAs*
Software
• TensorRT
• WMLCE
• Maximo Visual Inspection
Deploy AI into Production
Enterprise AI infrastructure
you need to deploy AI into production
Big Data Workloads
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
26. Accelerate AI pipelines
Provide a ESS 3000 High Performance
Tier of storage to keep AI Data Pipelines
and GPUs running at peak performance
ESS 3000 High Performance Tier
ARCHIVE
High scalability,
large/sequential I/O
capacity tier
1. Single name space
2. Global collaboration / Hybrid Cloud
3. Software RAID / Erasure Coding
4. Multi-protocol support
Spectrum Scale
Cloud Object Storage
Elastic Storage Server
IBM Cloud Paks
Classification &
metadata tagging
─
High volume, index &
auto-tagging zone
GRID
AccelerateETL
IBM
Cloud Paks
Transient storage
─
Throughput-oriented
work areas
landing zone
Fast ingest /
Real-time analytics
DATA IN
INGEST ORGANIZE ANALYZE ML/DL
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
27. IBM Spectrum Storage for AI with Power Systems
27
A fully optimized, scalable and supported AI platform that delivers blazing
fast performance, proven dependability and resiliency.
§ Single Global Namespace (CIFS/NFS/iSCSI)
§ Cloud Integration (Object Storage, S3 etc)
§ Hadoop Transparent Integration
§ NVMe based storage
IBM Spectrum DiscoverIBM Spectrum Scale
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
28. IBM ESS 3000
NVMe Flash for AI
All-new storage solution
§ Integrated scale-out advanced
data management with end-to-
end NVMe storage
§ Containerized software for ease of
install and update
§ Deploy initial configuration in
hours, not days
§ Fast and easy update and
scale-out expansion
§ Performance, capacity and ease
of integration for AI workflow
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
29. IBM Elastic Storage System 3000 specification overview
Scalable high-performance unified storage for files and objects
File management IBM Spectrum Scale Version 5
Data protection IBM Spectrum Scale erasure coding
Internal operating system Red Hat Enterprise Linux 8.x
Protocols and interfaces
POSIX with Spectrum Scale client, NFS v4.0, SMB v3.0, Hadoop MapReduce,
OpenStack Swift (object), S3 (object), CSI (Container Storage Interface)
Controllers Highly available dual active-active controllers
Storage NVMe flash drives (1.92TB, 3.84TB, 7.68TB or 15.4TB)
Number of drives 12 or 24 drives per 2U enclosure
Memory 384 GB or 768 GB memory per controller
Network adapters
Up to three PCIe host adapters per controller
Mellanox Connect X5 with InfiniBand EDR and 100GBps Ethernet
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
30. IBM Spectrum Storage for AI
Performance for Distributed Deep Learning
30
Near Linear Scaling by
adding 40GB/s per 2U
appliance
No need for downtime or
reconfiguration
Best in class throughput
potential
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
31. IBM Cognitive Systems
On-Prem solutions / products as of today
31
DATA
PRE-PROCESING
TRAINING
INFERENCE
§ AI Fairness 360
§ Adversarial Robustness Toolbox
§ Watson Studio Local
§ Watson ML Community Edition
§ Watson ML Accelerator
§ Visual Insights Training and Inference
§ Visual Inspector (iOS)
§ Engineering and Scientific Subroutine Library
§ Spectrum LSF for HPA
§ Spectrum Scale (Burst Buffer, LROC etc), Discovery
§ Cloud Object Storage
§ OpenShift / Open Data Hub
§ Cloud Pack for Data
§ Driverless AI
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
33. On-Prem Labeling for Deep Learning
33
IVISBatch AI
§ Object Detection
§ Action Detection
§ Gesture Detection
§ Etc
§ Classification
§ Object Detection
§ Segmentation
§ Path Planning
§ Etc
ADAS
§ Collaboration Platform for data annotation and
training
§ K8s Scalability
§ Auto Labeling
§ Augmentation
§ Intuitive Web Interface
§ API Driven Platform
§ Support for images and videos
§ Design for productivity
§ Custom models support
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
38. 38
Intelligent data annotation / Object Detection
Image files
§ Rectangular
labeling
§ Object Detection
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
39. 39
Intelligent data annotation / Object Detection
Image files
§ Non rectangular
labeling (multi
point polygons)
§ Image
Segmentation
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
42. Training job results
Object Detection / Basic View
42
§ mAP: the calculated mean of the
precision for each object. Precision is
the percentage of objects correctly
marked in an image. It is calculated
by true positives / (true positives +
false positives).
§ Precision: The percentage of images
that are labeled as an object which
actually should be labeled as that
category. It is calculated by true
positives / (true positives + false
positives).
§ Recall: The percentage of the images
that were labeled as an object
compared to all images that contain
that object. It is calculated as true
positives / (true positives + false
negatives).
§ IoU: the location accuracy of the
image label boxes. It is calculated by
the intersection (overlap) between a
hand drawn bounding box and a
predicted bounding box divided by
the union (combined area) of both
bounding boxes.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
44. 44
Testing the model with new images: Object Detection
Single image testing via web portal
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
45. Intelligent data annotation / Action Detection
Video files
45
§ Actions are used to
identify specific
moments occuring
in a video for action
detection.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
49. Exporting the dataset
Using web portal
49
.xml file
.jpg file
1
2
3
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
50. Production Optimized ML/DL Frameworks and Libraries for Training
50
IBM Watson Machine Learning
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
51. IBM Cognitive Systems
On-Prem solutions / products as of today
51
DATA
PRE-PROCESING
TRAINING
INFERENCE
§ AI Fairness 360
§ Adversarial Robustness Toolbox
§ Watson Studio Local
§ Watson ML Community Edition
§ Watson ML Accelerator
§ Visual Insights Training and Inference
§ Visual Inspector (iOS)
§ Engineering and Scientific Subroutine Library
§ Spectrum LSF for HPA
§ Spectrum Scale (Burst Buffer, LROC etc), Discovery
§ Cloud Object Storage
§ OpenShift / Open Data Hub
§ Cloud Pack for Data
§ Driverless AI
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
52. Watson ML Community Edition (WMLCE)
52
CUDA 10
TensorRTTensorFlow Caffe2
RAPIDS.AI: cuDF, cuML
LIBS:
Distributed Deep Learning (DDL)
Large Model Support (LMSv2)
SnapML
Local, MPI, Spark
DASK
Pytorch
Estimator, Probability,
Serving, Tensorboard APEX XGBoostBazel
libevent, libgdf, libgdf_cffi, libopencv, libprotobuf, parquet-cpp, thrift-cpp,
arrow-cpp, pyarrow, gflags, magma, cupy, py-oepncv, arrow-cpp etc
NCCL cuDNN
Spectrum MPI
Horovod
delivered via
Bare Metal or Containers
ONNX
Version1.7.0
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
53. Watson ML Community Edition (WMLCE)
53
CUDA 10
TensorFlow Caffe2
RAPIDS.AI: cuDF, cuML
LIBS:
Distributed Deep Learning (DDL)
Large Model Support (LMSv2)
SnapML
Local, MPI, Spark
DASK
Pytorch
Estimator, Probability,
Serving, Tensorboard APEX XGBoostBazel
libevent, libgdf, libgdf_cffi, libopencv, libprotobuf, parquet-cpp, thrift-cpp, arrow-cpp, pyarrow, gflags, magma,
cupy, py-oepncv, arrow-cpp etc
NCCL cuDNN
Spectrum MPI
Horovod
delivered via
Bare Metal or Containers
ONNX
Version1.7.0
TensorFlow
Serving Server
TensorRT
ONNX
Protobuf
Training
Inference
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
54. Distributed Deep Learning
Research Innovations
Optimized ML/DL frameworks & libraries
Snap Machine LearningLarge Model Support
1.1 Hours
1.53 Minutes
0
20
40
60
80
Google
CPU-only
Snap ML
Power + GPURuntime(Minutes)
Logistic Regression in
Snap ML (with GPUs) vs
TensorFlow (CPU-only)
46x Faster
3.1 Hours
49 Mins
0
2000
4000
6000
8000
10000
12000
Xeon x86 2640v4 w/ 4x
V100 GPUs
Power AC922 w/ 4x V100
GPUs
Time(secs)
Caffe with LMS (Large Model Support)
3.8x Faster
GoogleNet model
on Enlarged
ImageNet Dataset
(2240x2240)
0
100
200
300
400
1 System 64 Systems
58x Faster
ResNet-101, ImageNet-22K
Caffe with PowerAI DDL,
Running on Minsky (S822Lc)
Power System
Google: 90 x86 servers
Snap ML: 4 AC922 servers
54
16 Days
7 Hours
IBM Watson Machine Learning
Community Edition / Accelerator
57. TensorFlow
Large Model Support
57
Existing Limits:
• GPUs have limited memory
• Neural networks are growing deeper and wider
• Amount and size of data to process is always growing
IBM TFLMS Advantages:
• 10x image resolution - Keras ResNet50 and
DeepLabV3 2D for image segmentation
• Easier to enable in model code
• Automatic tuning of swapping parameters and faster
graph modification times
• Faster graph modification times
• Finer tuning of asynchronous compute and memory
transfer
• Serialization of operations in layers
• More model and tensor information output
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
59. Developing AI technologies for assistant driving system (ADAS) or self-driving system
59
ADAS Client Use Case
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
60. ZF Friedrichshafen
60
Client Value
- accelerated
data reception
and video
transformation
- continuous,
company-wide
process control
ZF Friedrichshafen and IBM collectively build a data
management system for ADAS, based on hybrid multi-
cloud.
“Managing huge amounts of data in a
hybrid multi-cloud environment is very
important. Transparent access to files in
data lakes with low latency is essential
when developing autonomous vehicles,
where we have to process images and
information from many different data
sources.“
IBM Solutions:
IBM Spectrum Scale, IBM Aspera, IBM Arema,
Red Hat OpenShift
Harald Holder
Director of IT Infrastructure Platforms at ZF
https://www.smart2zero.com/news/zf-manage-adas-development-data-cloud-together-ibm/page/0/1
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
61. Robert Bosch
61
- Out-Think
Competition
- Improved Service
Delivery
- Enhanced Customer
experience
Client Value
- Manages the complex label workflow to allow all
involved users to participate in an efficient way
- Cost reduction – AREMA is reducing the data to be
transferred and thus saving transfer costs
- Operational efficiency - whole system is
operated/configured/extended by few people
Robert Bosch, a well-known German Tier 1 supplier
manages ADAS labelling workflows to optimize
performance, quality and to save costs.
- AREMA is used for the management of ADAS/AD related
test data (test drive recordings)
- Allows full visibility (search, filter and preview) for
millions of video recordings with AREMA Media Portal
- AREMA manages the ingest process, extraction of
relevant data and conversion to different formats / same
formats with reduced or compressed content
- Management of complex labelling workflow with the
support of human approval processes and dozens of user
roles involved and automatic management file transfers
- Export to labelling as a service provider workflow
IBM Solutions:
IBM AREMA, IBM Services
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
62. ADAS Supplier in Europe
62© 2020 IBM Corporation
- Out-Think
Competition
- Improved Service
Delivery
- Enhanced Customer
experience
Client Value
- Met the required performance KPIs to give
them better ROA for their NVIDIA DGX AI
cluster
- Reduce AI training times from weeks to
days
- Experiments/Month increased by the
factor of 14
Major supplier to the automotive industry, supplying
systems, components, electronics, and engineering services
for vehicle safety, comfort and powertrain performance.
Developing ADAS solution with a large NVIDIA DGX cluster.
Challenge:
The existing storage infrastructure was unable to keep the
DGX-1 GPUs busy.
Data collected at the edge by ADAS test vehicles need to be
shared from multiple data centers in different countries to a
central site.
IBM Solution:
IBM Elastic Storage Server (ESS)
63. Volkswagen AG
Volkswagen and Red Hat work together
to automate integration testing
leveraging Red Hat OpenShift.
Red Hat OpenShift extended into 3rd Party
Hardware:
• Achieve Hardware in the Loop (HiL) for ICAS/ID3
• Virtual testing for Virtual Travel Assist 4.2, ACC and
ID3 Infotainment
• All Test Components Physical And Virtual
Managed And Controlled By OpenShift
63
“My job at Volkswagen is to make sure all the electronic
control units we have in our cars work together,” said
Michael Denecke, Head of Test Technology at
Volkswagen AG, speaking in the Red Hat Summit
2019 keynote.
“Due to the new challenge of having autonomous
driving connected cars (...), we got the idea to have all
these tests we do with hardware on virtual test
environments, and that’s why we’ve come to OpenShift
and containers.”
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
64. 64
© 2020 IBM Corporation
Mayflower Project
IBM announced October 2019 that it
has joined a global consortium of
partners, led by marine research
organization ProMare, that are building
an unmanned, fully-autonomous ship
that will cross the Atlantic on the fourth
centenary of the original Mayflower
voyage in September 2020.
IBM Solution:
IBM PowerAI Vision, IBM Cloud, IBM Cloud Object
Store, IBM Operational Decision Manager, Red Hat
Enterprise Linux (RHEL)
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
66. Thank you
66
Ing. Florin Manaila
Senior Arhcitect
NextGen Workloads and Distributed AI
—
florin.manaila@de.ibm.com
ibm.com