SlideShare une entreprise Scribd logo
1  sur  67
Télécharger pour lire hors ligne
Solutions for ADAS
and AI Data Driven
engineering
1IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Ing. Florin Manaila
Senior Architect and Inventor
NextGen Workloads and Distributed AI
IBM Systems Hardware Europe, Middle East & Africa
Member of the IBM Academy of Technology (AoT)
Autonomous Driving is the
key driver for innovation in
the automotive industry.
But: it requires a whole
new set of tools, skills
and capabilites.
Especially in regards
to data and AI.
Autonomous
Vehicles
(AV)
- are capable of sensing
their environment
- combine external input
from sensors like radar,
computer vision, LiDAR,
sonar and GPS.
- interpret this
information to identify
navigation paths,
obstacles and signage
- … to move with little or
no human input
2
Advanced Driver
Assistance
Systems (ADAS)
- automate, adapt and
enhance vehicles for
safety and better
driving
- rely on input from
imaging, LiDAR, radar,
image processing,
computer vision and in-
car data
- examples: stability
control, lane control,
adaptive cruise, traction
control
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Autonomous
Driving is a
data and
time
intensive
challenge.
Innovation in
ADAS/AV
provides huges
amounts of data.
The data
collection volume
of a single test
car for an hour
long of test
driving can add
up to ~15TB.
OEM use fleets of
vehicles to get
enough data and
validation points.
The data volume
for one single
ADAS/AV project
can easily be over
~10 PB.
It takes more
than 200h in
order to tag and
annotate net
driving scenes
from 1h of test
driving.
3IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
AV and ADAS include the most
complex AI and data tasks to date.
IN CAR: Complex Real-Time decisions BACKEND: Continuous development & integration
Motion Control
Scene UnderstandingPerception, Sensor fusion
Mission,
Trajectory Planning
Operations Backend
Real-time data
(weather, traffic, accident, ...)
AI Engineering & Training
Tagging/LabelingTest Data Management
Simulation and Approval
AI & Data Centric System Engineering
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
ADAS/AV capabilities will be an integral part
of future-proof vehicle 4.0 platforms.
Edge Services
Platform
Owned by OEM + 3rd Parties
Vehicle
Control Center
Security
Operations Center
Owned by OEM
OEM
Apps
3P
Apps
In-Vehicle Platform
Host system,
virtualized EE & API
5G & V2X
Fog
Supplier
Apps
Connected
Vehicle Platform
Vehicle-centric
OEM
Apps
3P
Apps
B2B
Connected
Services Platform
Customer-centric
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
We see recurring
challenges across the
industry, when OEMs start
building their autonomous
driving capabilities.
- Complex, massive data
ingestion and data lake
- Open technology
platform management
- High-quality data that
is synchronized,
labeled, tagged and
searchable
- Strong requirements
traceability, test case
and defect
management
6
- Program management
and support across
Geos/Sites and
Partners/Suppliers
- Agile engineering
processes, as well as
AI specific processes
to be aligned to vehicle
development lifecycle
(V-model, ASPICE,
ISO26262)
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
ADAS/AD development needs data along the entire value creation process.
IBM’s intelligent data management helps to optimize the overall process.
7
Data
Acquisition/
Ingestion
Analytics/ Scene
Selection
Data Enrichment
/ Labelling
Algorithm
Training
Simulation
Validation /
SIL / HIL
Test drives /
Connectivity
based validation
Reduced
offloading
time
Faster
Findings -
Higher
engineer
productivity
Reduced
trom
weeks to
minutes
Parallel
performance -
Reduce car
fleet
More test
cases in
shorter time
Super fast
access from
everywhere
Online
feedback -
Shorten
the loop
Engineering Lifecycle Management (ELM) References for ASPICE and ISO26262 compliant Development
Hybrid Data Bridge
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
IBM Autonomous Driving Engagements
Overview
IBM Service, Technology &
Research Assets
IBM Data Management
for ADAS/AD
IBM Automotive Software
Engineering
IBM Connected
Vehicle Platform
IBM Vehicle Operation Center
& Security Operation Center
Development & Test Operation
IBM Service, Technology &
Research Assets
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
The path to Autonomous Vehicles
Neural
network
models
Billions of parameters
Gigabytes
Computation
Iterative gradient based search
Millions of iterations
Mainly matrix operations
Data
Millions of images, sentences
Petabytes
Workload characteristics: Both compute and data intensive!
9
AUTOMATION
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Technical challenges in ADAS real-time object detection
10
Technical challenges in ADAS:
• Safety driving for accident occurred in short distance: Need fast enough response for
vehicle control system
• Detection for object as far as possible and in night time: Extremely small object detection
and in night time, 10pixels*10pixels, should detect ahead >50~70m, to achieve precision
>90% and (1-recall ratio)<10%.
• Different weather situation (raining, snowing, and foggy, etc.)
• Shape distortion due to the angle between the camera and sign.
• Amount of categories will be huge: Only for traffic signs -- hundreds of types for China,
Germany and US.
Extremely small objects
Traffic Sign in US Traffic Sign in Germany
Bad Weather Condition
Shape distortion
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Other technical challenges in ADAS real-time object detection
11
§ To fit the complex network into limited compute and memory resources and real time
detection. Enable AI from data center to edge is critical.
§ Color change depends on the light, weather conditions, and even the age of the sign, and this
variance could be extremely serious in night environment.
§ Distance to objects. Vehicles need real-time detection of the distance between the
objects and the car, so that vehicle companies could build more complex applications,
such as car lights adjustment according to where the objects are.
The red color of the circle has totally changed.
§ Reflection of the car light causes traffic sign couldn't be recognized.
Reflection of light: test data from US
(Click “slide show” to play it)
Distance estimation to the objects
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Model Lifecycle
12
§ Integrity
§ Quality
§ Tools
Data
Development
Approval
Culture Validation
Test & DeployUsage
Performance
Monitoring
Risk
Assessment
§ Design and objectives
§ Hypothesis
§ Assumtions
§ Regulatory context
§ Technology aspects/limitations
§ Data
§ Methodology and theoretical
soundness
§ Backtesting results
§ Stress testing results
§ Model stability
§ Qualitative assessment
§ Model risk quantification
§ Business Functional
requirements
§ User acceptance testing
§ Path to production
§ Sign-off for deployment
§ Available infrastructure
for support
§ Fast Updates
§ Backtesting
§ Performancce metrics
§ Escalations
§ Link models to limit
calibration, thresholds
and risk capacity
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Getting predictions
13
Online AIBatch AI
§ Optimized to minimize the latency of serving
predictions.
§ Can process one or more instances per request.
§ Predictions returned in the response message
§ Returns as soon as possible.
§ Runs on the runtime version and in the region
selected when you deploy the model.
§ Runs models deployed to an embedded
accelerator (GPU, FPGA, ASIC)
§ Optimized to handle a high volume of instances in a
job and to run more complex models.
§ Can process one or more instances per request.
§ Predictions written to output files
§ Asynchronous request.
§ Can run in any AI accelerated cluster, using
optimized runtime version.
§ Runs models deployed to AI Platform or models
stored on-prem or cloud locations.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
14
AI Workflow
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
15
Transforming AI Infrastructure Stack from yesterday
ON-CLOUD and ON-PREM
Transform & Prep
Data (ETL)
Micro-Services / Applications
APIs
(external and/or in-house)
Machine & Deep Learning
Libraries & Frameworks
Distributed Computing
Data Lake & Data Stores
Segment Specific:
Finance, Retail, Healthcare,
Automotive
Speech, Vision,
NLP, Sentiment
TensorFlow, Caffe,
Pytorch
SparkML, Snap.ML
Spark, MPI
Hadoop HDFS,
NoSQL DBs,
Parallel File
System
Accelerated
Infrastructure
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
16
Transform &
Prepare
Data
Micro-Services / Applications
Governance AI
(Fairness, Explainable AI,
Model Health, Accuracy)
APIs
(external and in-house)
Machine & Deep Learning
Libraries & Frameworks
Distributed Computing
Data Lake & Data Stores
Action / Decision
APIs
Internal
Optimized Machine &
Deep Learning Runtime
Federated Learning
Local Cache
Security
Compliance
AI Infrastructure Stack from today
ON-CLOUD / ON-PREM vs near-Edge or Edge
Sensors
Data
Standardization
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Multi-Tenant, Self-Serve, Accelerated Platform
Architecture Overview
RedHat OpenShift
AcceleratedWorkerNodes
forTraining
Machine Learning Frameworks and Libraries (WMLCE)
KubeFlow / ODH
AcceleratedWorkerNodes
forInference
Adversarial Robustness
Toolbox (ART) / Trusted AI
Anaconda
Team Edition
Bayesian
Optimization
Deep Search
ControlPlande
Nodes
KVM #2KVM #1
Master #1
VM
Master #2
VM
Bootstrap
VM
Master #3
VM
Bastion
VM NFS /
Parallel File System
NVMe Storage
Internal
Git
Existing Databases
HPC simulations | SW
Scalability Scalability
Open Data Hub
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Goal / Computation
18
Description Build the model
Continuous
Integration
Train the model on real data
(hyperparameter tuning)
Optimize and
validate the model
Deploy the model
Desired Goal
Build a promising
model
Make sure that the
code base remains bug
free
Make the model work
with real data and optimize
Prepare the model for
deployment
and validation
Provide functionality
using the model
Iteration time Hours Hours Days-Weeks Hours - Weeks Milliseconds
No of Systems 1 10s 10s-100s 10s Hundreds (test fleet) -
Millions (live fleet)
GPUs 1-2 RTX 2080 Ti
or Tesla V100
2-6 Tesla V100 4-6 Tesla V100 4-6 Tesla V100
Embedded Inference
Platforms
(ie. Driver AGX)
Model
+
Data
Data
+
Parameters
Data
+
Parameters
Data
+
Parameters
Environments
+
Data
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Multi-Tenant, Self-Serve, Accelerated Platform
GPUs | NVMEs | HDR
Kubeflow Project
Architecture Components
Fairing
Experiments
Notebooks | Jupyter Lab
Pipelines
Simulation
Pipelines
Kubeflow
Pipelines
Apache Airflow Argo Cloud Composer
Model
Serving TesnsorFlow
Serving and Istio
TensorFlow
Batch Prediction
Seldon Serving Triton PyTorch
Serving
Miscellaneous | Metadata | Nuclio functions
Hyperparameter tuning | Katlib | Google Vizer | Auto-Keras | BO
Model
Training TensorFlow
TensorBoard
TensorFlow
Extended
Pytorch Horovod Other
Operators
Data
Ingestion
Users
Web
Interface
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Open Data Hub Architecture
Vision and Integration
Model Lifecycle
Kubeflow
Seldon
MLflow
ML Appliccations
Open Data HUB
AI Fairness 360
AI Library
Business
Inteligence
Superset
Interactive AI
JupyterHub + plugins
Hue
Big Data Processing
Spark | Spark SQL
Thrift
Streaming
Kafka Streams
Elasticsearch
Data Exploration
Hue
Kibana
Hive Metastore | Spectrum Discovery
Data Lake
Spectrum Scale
Ceph Storage
MinIO
In-Memory
Red Hat Data Grid
Relational Databases
PostgresSQL
MySQL
MariaDB
Red Hat AMQ Streams
(kafka Strimzi)
Red Hat Ceph
S3 API
Kafka
Connect
Logstash Fluentd rsyslogd
Red Hat OpenShift
Kubernetes | RHEL | Hybrid Cloud
Red Hat
OpenShift
Oauth
RedHat
Single Sign-
On
(Keycloak)
Red Hat
Ceph
Object
Gateway
Red Hat
3scale
IBM
Adversarial
Robustness
Toolbox
Prometheus
Grafana
Kubeflow
Pipelines
Argo
Workflows
Jekins CI/CD
TensorBoard
DATA ANALYSIS
MACHINE LEARNING AND DEEP LEARNING
METADATA MANAGEMENT
STORAGE
DATA IN MOTION
CONTAINERS PLATFORM
SECURITY
GUVERNANCE
MONITORING
ORCHESTRATION
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Accelerated Platforms
21
On-Prem Hardware Capabilities
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Partnership with IBM Research
22
Visual Insights:
Auto-Deep Learning
Automatic Labeling
Large Model / High-
Res Image Support
Auto-Hyperparameter
Optimization
AI Model Optimizer &
Compiler (FPGAs, SoCs)
IBM Research Innovations
Around Computer Vision
IBM Research Innovation:
Around Analog Inference Chips
Elastic Inference
for large-scale compute
GPU and cluster
accelerated ML
Elastic
Distributed Training
INNOVATION
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Next-Generation AI Hardware
Source: IBM Research
23IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Merging Memory and Processing
Source: IBM Research
24
In-memory computing using resistive memory
devices is a promising non-von Neumann
approach for making energy-efficient deep
learning inference hardware
Avoid constant shuttling of data between
memory and processing units, which limits the
maximum achievable energy efficiency.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
25
IBM Power AC922
• GPUs: NVIDIA Tesla V100s
• (5.6 – 10x data throughput) with
Advanced IO Architecture
• PCIe Gen 4
• CAPI 2.0
• OpenCAPI 3.0
• NVLink 2.0 CPU–GPU and GPU-GPU
TRAIN
Powering the Fastest Supercomputer
DATA
IBM Power LC922/IC922
• Up to 120 TB of data storage
• Superior I/O: PCIe Gen 4
• IBM Spectrum Scale / ESS
Hadoop Integration
INFERENCE
SIMULATION
AI Inference Platform
Hardware
• IC922 2U
• NVIDIA T4 GPUs or FPGAs*
Software
• TensorRT
• WMLCE
• Maximo Visual Inspection
Deploy AI into Production
Enterprise AI infrastructure
you need to deploy AI into production
Big Data Workloads
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Accelerate AI pipelines
Provide a ESS 3000 High Performance
Tier of storage to keep AI Data Pipelines
and GPUs running at peak performance
ESS 3000 High Performance Tier
ARCHIVE
High scalability,
large/sequential I/O
capacity tier
1. Single name space
2. Global collaboration / Hybrid Cloud
3. Software RAID / Erasure Coding
4. Multi-protocol support
Spectrum Scale
Cloud Object Storage
Elastic Storage Server
IBM Cloud Paks
Classification &
metadata tagging
─
High volume, index &
auto-tagging zone
GRID
AccelerateETL
IBM
Cloud Paks
Transient storage
─
Throughput-oriented
work areas
landing zone
Fast ingest /
Real-time analytics
DATA IN
INGEST ORGANIZE ANALYZE ML/DL
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
IBM Spectrum Storage for AI with Power Systems
27
A fully optimized, scalable and supported AI platform that delivers blazing
fast performance, proven dependability and resiliency.
§ Single Global Namespace (CIFS/NFS/iSCSI)
§ Cloud Integration (Object Storage, S3 etc)
§ Hadoop Transparent Integration
§ NVMe based storage
IBM Spectrum DiscoverIBM Spectrum Scale
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
IBM ESS 3000
NVMe Flash for AI
All-new storage solution
§ Integrated scale-out advanced
data management with end-to-
end NVMe storage
§ Containerized software for ease of
install and update
§ Deploy initial configuration in
hours, not days
§ Fast and easy update and
scale-out expansion
§ Performance, capacity and ease
of integration for AI workflow
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
IBM Elastic Storage System 3000 specification overview
Scalable high-performance unified storage for files and objects
File management IBM Spectrum Scale Version 5
Data protection IBM Spectrum Scale erasure coding
Internal operating system Red Hat Enterprise Linux 8.x
Protocols and interfaces
POSIX with Spectrum Scale client, NFS v4.0, SMB v3.0, Hadoop MapReduce,
OpenStack Swift (object), S3 (object), CSI (Container Storage Interface)
Controllers Highly available dual active-active controllers
Storage NVMe flash drives (1.92TB, 3.84TB, 7.68TB or 15.4TB)
Number of drives 12 or 24 drives per 2U enclosure
Memory 384 GB or 768 GB memory per controller
Network adapters
Up to three PCIe host adapters per controller
Mellanox Connect X5 with InfiniBand EDR and 100GBps Ethernet
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
IBM Spectrum Storage for AI
Performance for Distributed Deep Learning
30
Near Linear Scaling by
adding 40GB/s per 2U
appliance
No need for downtime or
reconfiguration
Best in class throughput
potential
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
IBM Cognitive Systems
On-Prem solutions / products as of today
31
DATA
PRE-PROCESING
TRAINING
INFERENCE
§ AI Fairness 360
§ Adversarial Robustness Toolbox
§ Watson Studio Local
§ Watson ML Community Edition
§ Watson ML Accelerator
§ Visual Insights Training and Inference
§ Visual Inspector (iOS)
§ Engineering and Scientific Subroutine Library
§ Spectrum LSF for HPA
§ Spectrum Scale (Burst Buffer, LROC etc), Discovery
§ Cloud Object Storage
§ OpenShift / Open Data Hub
§ Cloud Pack for Data
§ Driverless AI
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
On-Prem Data Annotation Tool
32IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
On-Prem Labeling for Deep Learning
33
IVISBatch AI
§ Object Detection
§ Action Detection
§ Gesture Detection
§ Etc
§ Classification
§ Object Detection
§ Segmentation
§ Path Planning
§ Etc
ADAS
§ Collaboration Platform for data annotation and
training
§ K8s Scalability
§ Auto Labeling
§ Augmentation
§ Intuitive Web Interface
§ API Driven Platform
§ Support for images and videos
§ Design for productivity
§ Custom models support
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Architecture Components
34IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
35
Intelligent data annotation / Object Detection
Image files
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
36
Intelligent data annotation / Object Detection
Image files
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
37
Intelligent data annotation / Object Detection
Image files
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
38
Intelligent data annotation / Object Detection
Image files
§ Rectangular
labeling
§ Object Detection
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
39
Intelligent data annotation / Object Detection
Image files
§ Non rectangular
labeling (multi
point polygons)
§ Image
Segmentation
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
40
Intelligent data annotation / Object Detection
Video files
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
41
Intelligent data annotation / Object Detection
Video files
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Training job results
Object Detection / Basic View
42
§ mAP: the calculated mean of the
precision for each object. Precision is
the percentage of objects correctly
marked in an image. It is calculated
by true positives / (true positives +
false positives).
§ Precision: The percentage of images
that are labeled as an object which
actually should be labeled as that
category. It is calculated by true
positives / (true positives + false
positives).
§ Recall: The percentage of the images
that were labeled as an object
compared to all images that contain
that object. It is calculated as true
positives / (true positives + false
negatives).
§ IoU: the location accuracy of the
image label boxes. It is calculated by
the intersection (overlap) between a
hand drawn bounding box and a
predicted bounding box divided by
the union (combined area) of both
bounding boxes.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
43
Training job results
Object Detection / Advanced View
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
44
Testing the model with new images: Object Detection
Single image testing via web portal
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Intelligent data annotation / Action Detection
Video files
45
§ Actions are used to
identify specific
moments occuring
in a video for action
detection.
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
46
Intelligent data annotation / Action Detection
Video files
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
47
Intelligent data annotation / Action Detection
Video files
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Data augmentation
48IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Exporting the dataset
Using web portal
49
.xml file
.jpg file
1
2
3
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Production Optimized ML/DL Frameworks and Libraries for Training
50
IBM Watson Machine Learning
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
IBM Cognitive Systems
On-Prem solutions / products as of today
51
DATA
PRE-PROCESING
TRAINING
INFERENCE
§ AI Fairness 360
§ Adversarial Robustness Toolbox
§ Watson Studio Local
§ Watson ML Community Edition
§ Watson ML Accelerator
§ Visual Insights Training and Inference
§ Visual Inspector (iOS)
§ Engineering and Scientific Subroutine Library
§ Spectrum LSF for HPA
§ Spectrum Scale (Burst Buffer, LROC etc), Discovery
§ Cloud Object Storage
§ OpenShift / Open Data Hub
§ Cloud Pack for Data
§ Driverless AI
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Watson ML Community Edition (WMLCE)
52
CUDA 10
TensorRTTensorFlow Caffe2
RAPIDS.AI: cuDF, cuML
LIBS:
Distributed Deep Learning (DDL)
Large Model Support (LMSv2)
SnapML
Local, MPI, Spark
DASK
Pytorch
Estimator, Probability,
Serving, Tensorboard APEX XGBoostBazel
libevent, libgdf, libgdf_cffi, libopencv, libprotobuf, parquet-cpp, thrift-cpp,
arrow-cpp, pyarrow, gflags, magma, cupy, py-oepncv, arrow-cpp etc
NCCL cuDNN
Spectrum MPI
Horovod
delivered via
Bare Metal or Containers
ONNX
Version1.7.0
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Watson ML Community Edition (WMLCE)
53
CUDA 10
TensorFlow Caffe2
RAPIDS.AI: cuDF, cuML
LIBS:
Distributed Deep Learning (DDL)
Large Model Support (LMSv2)
SnapML
Local, MPI, Spark
DASK
Pytorch
Estimator, Probability,
Serving, Tensorboard APEX XGBoostBazel
libevent, libgdf, libgdf_cffi, libopencv, libprotobuf, parquet-cpp, thrift-cpp, arrow-cpp, pyarrow, gflags, magma,
cupy, py-oepncv, arrow-cpp etc
NCCL cuDNN
Spectrum MPI
Horovod
delivered via
Bare Metal or Containers
ONNX
Version1.7.0
TensorFlow
Serving Server
TensorRT
ONNX
Protobuf
Training
Inference
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Distributed Deep Learning
Research Innovations
Optimized ML/DL frameworks & libraries
Snap Machine LearningLarge Model Support
1.1 Hours
1.53 Minutes
0
20
40
60
80
Google
CPU-only
Snap ML
Power + GPURuntime(Minutes)
Logistic Regression in
Snap ML (with GPUs) vs
TensorFlow (CPU-only)
46x Faster
3.1 Hours
49 Mins
0
2000
4000
6000
8000
10000
12000
Xeon x86 2640v4 w/ 4x
V100 GPUs
Power AC922 w/ 4x V100
GPUs
Time(secs)
Caffe with LMS (Large Model Support)
3.8x Faster
GoogleNet model
on Enlarged
ImageNet Dataset
(2240x2240)
0
100
200
300
400
1 System 64 Systems
58x Faster
ResNet-101, ImageNet-22K
Caffe with PowerAI DDL,
Running on Minsky (S822Lc)
Power System
Google: 90 x86 servers
Snap ML: 4 AC922 servers
54
16 Days
7 Hours
IBM Watson Machine Learning
Community Edition / Accelerator
55
IBMWatsonMachineLearning
CommunityEdition
DockerContainers
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
56
IBMWatsonMachineLearning
CommunityEdition
UniversalBaseImages(UBI)
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
TensorFlow
Large Model Support
57
Existing Limits:
• GPUs have limited memory
• Neural networks are growing deeper and wider
• Amount and size of data to process is always growing
IBM TFLMS Advantages:
• 10x image resolution - Keras ResNet50 and
DeepLabV3 2D for image segmentation
• Easier to enable in model code
• Automatic tuning of swapping parameters and faster
graph modification times
• Faster graph modification times
• Finer tuning of asynchronous compute and memory
transfer
• Serialization of operations in layers
• More model and tensor information output
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
TensorFlow
Large Model Support
58
0
250
500
750
1000
1250
1500
1750
2000
2250
POWER9 server
with NVLink 2.0
GPU server with PCI GPU server with PCI
contention
Seconds
Epoch times at high resolution with swapping
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Developing AI technologies for assistant driving system (ADAS) or self-driving system
59
ADAS Client Use Case
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
ZF Friedrichshafen
60
Client Value
- accelerated
data reception
and video
transformation
- continuous,
company-wide
process control
ZF Friedrichshafen and IBM collectively build a data
management system for ADAS, based on hybrid multi-
cloud.
“Managing huge amounts of data in a
hybrid multi-cloud environment is very
important. Transparent access to files in
data lakes with low latency is essential
when developing autonomous vehicles,
where we have to process images and
information from many different data
sources.“
IBM Solutions:
IBM Spectrum Scale, IBM Aspera, IBM Arema,
Red Hat OpenShift
Harald Holder
Director of IT Infrastructure Platforms at ZF
https://www.smart2zero.com/news/zf-manage-adas-development-data-cloud-together-ibm/page/0/1
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Robert Bosch
61
- Out-Think
Competition
- Improved Service
Delivery
- Enhanced Customer
experience
Client Value
- Manages the complex label workflow to allow all
involved users to participate in an efficient way
- Cost reduction – AREMA is reducing the data to be
transferred and thus saving transfer costs
- Operational efficiency - whole system is
operated/configured/extended by few people
Robert Bosch, a well-known German Tier 1 supplier
manages ADAS labelling workflows to optimize
performance, quality and to save costs.
- AREMA is used for the management of ADAS/AD related
test data (test drive recordings)
- Allows full visibility (search, filter and preview) for
millions of video recordings with AREMA Media Portal
- AREMA manages the ingest process, extraction of
relevant data and conversion to different formats / same
formats with reduced or compressed content
- Management of complex labelling workflow with the
support of human approval processes and dozens of user
roles involved and automatic management file transfers
- Export to labelling as a service provider workflow
IBM Solutions:
IBM AREMA, IBM Services
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
ADAS Supplier in Europe
62© 2020 IBM Corporation
- Out-Think
Competition
- Improved Service
Delivery
- Enhanced Customer
experience
Client Value
- Met the required performance KPIs to give
them better ROA for their NVIDIA DGX AI
cluster
- Reduce AI training times from weeks to
days
- Experiments/Month increased by the
factor of 14
Major supplier to the automotive industry, supplying
systems, components, electronics, and engineering services
for vehicle safety, comfort and powertrain performance.
Developing ADAS solution with a large NVIDIA DGX cluster.
Challenge:
The existing storage infrastructure was unable to keep the
DGX-1 GPUs busy.
Data collected at the edge by ADAS test vehicles need to be
shared from multiple data centers in different countries to a
central site.
IBM Solution:
IBM Elastic Storage Server (ESS)
Volkswagen AG
Volkswagen and Red Hat work together
to automate integration testing
leveraging Red Hat OpenShift.
Red Hat OpenShift extended into 3rd Party
Hardware:
• Achieve Hardware in the Loop (HiL) for ICAS/ID3
• Virtual testing for Virtual Travel Assist 4.2, ACC and
ID3 Infotainment
• All Test Components Physical And Virtual
Managed And Controlled By OpenShift
63
“My job at Volkswagen is to make sure all the electronic
control units we have in our cars work together,” said
Michael Denecke, Head of Test Technology at
Volkswagen AG, speaking in the Red Hat Summit
2019 keynote.
“Due to the new challenge of having autonomous
driving connected cars (...), we got the idea to have all
these tests we do with hardware on virtual test
environments, and that’s why we’ve come to OpenShift
and containers.”
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
64
© 2020 IBM Corporation
Mayflower Project
IBM announced October 2019 that it
has joined a global consortium of
partners, led by marine research
organization ProMare, that are building
an unmanned, fully-autonomous ship
that will cross the Atlantic on the fourth
centenary of the original Mayflower
voyage in September 2020.
IBM Solution:
IBM PowerAI Vision, IBM Cloud, IBM Cloud Object
Store, IBM Operational Decision Manager, Red Hat
Enterprise Linux (RHEL)
IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Questions?
65IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
Thank you
66
Ing. Florin Manaila
Senior Arhcitect
NextGen Workloads and Distributed AI
—
florin.manaila@de.ibm.com
ibm.com
67

Contenu connexe

Tendances

Artificial intelligency & robotics
Artificial intelligency & roboticsArtificial intelligency & robotics
Artificial intelligency & robotics
Sneh Raval
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine Learning
Jerry Lu
 

Tendances (20)

Introduction To Artificial Intelligence PowerPoint Presentation Slides
Introduction To Artificial Intelligence PowerPoint Presentation SlidesIntroduction To Artificial Intelligence PowerPoint Presentation Slides
Introduction To Artificial Intelligence PowerPoint Presentation Slides
 
Smart transportation | Intelligent transportation system (ITS)
Smart transportation | Intelligent transportation system (ITS)Smart transportation | Intelligent transportation system (ITS)
Smart transportation | Intelligent transportation system (ITS)
 
Autonomous car
Autonomous carAutonomous car
Autonomous car
 
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
 
The Top Trends in Artificial Intelligence
The Top Trends in Artificial IntelligenceThe Top Trends in Artificial Intelligence
The Top Trends in Artificial Intelligence
 
Machine Learning & Self-Driving Cars
Machine Learning & Self-Driving CarsMachine Learning & Self-Driving Cars
Machine Learning & Self-Driving Cars
 
Autonomous car
Autonomous carAutonomous car
Autonomous car
 
Autonomous Vehicles: Technologies, Economics, and Opportunities
Autonomous Vehicles: Technologies, Economics, and OpportunitiesAutonomous Vehicles: Technologies, Economics, and Opportunities
Autonomous Vehicles: Technologies, Economics, and Opportunities
 
AI IN SPACE EXPLORATION
AI IN SPACE EXPLORATIONAI IN SPACE EXPLORATION
AI IN SPACE EXPLORATION
 
Machine Learning for Self-Driving Cars
Machine Learning for Self-Driving CarsMachine Learning for Self-Driving Cars
Machine Learning for Self-Driving Cars
 
Autonomous vehicles
Autonomous vehiclesAutonomous vehicles
Autonomous vehicles
 
Artificial intelligency & robotics
Artificial intelligency & roboticsArtificial intelligency & robotics
Artificial intelligency & robotics
 
Self Driving Autopilot Car
Self Driving Autopilot CarSelf Driving Autopilot Car
Self Driving Autopilot Car
 
A Smart Parking System using Raspberry pi
A Smart Parking System using Raspberry piA Smart Parking System using Raspberry pi
A Smart Parking System using Raspberry pi
 
Ai for logistics
Ai for logisticsAi for logistics
Ai for logistics
 
Autonomous Driving
Autonomous DrivingAutonomous Driving
Autonomous Driving
 
Introduction to AI/ML with AWS
Introduction to AI/ML with AWSIntroduction to AI/ML with AWS
Introduction to AI/ML with AWS
 
Automotive RADAR Adoption—An Overview
Automotive RADAR Adoption—An OverviewAutomotive RADAR Adoption—An Overview
Automotive RADAR Adoption—An Overview
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine Learning
 
Amazon SageMaker Clarify
Amazon SageMaker ClarifyAmazon SageMaker Clarify
Amazon SageMaker Clarify
 

Similaire à Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems

PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLESPROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
iQHub
 
TESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMS
TESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMSTESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMS
TESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMS
iQHub
 

Similaire à Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems (20)

Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 
IBM Z for the Digital Enterprise 2018 - Z Keynote
IBM Z for the Digital Enterprise 2018 - Z KeynoteIBM Z for the Digital Enterprise 2018 - Z Keynote
IBM Z for the Digital Enterprise 2018 - Z Keynote
 
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated DrivingMIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
MIPI DevCon Taipei 2019 Keynote: Technologies for Automated Driving
 
Dell AI Oil and Gas Webinar
Dell AI Oil and Gas WebinarDell AI Oil and Gas Webinar
Dell AI Oil and Gas Webinar
 
Key trends of smart transportation
Key trends of smart transportationKey trends of smart transportation
Key trends of smart transportation
 
Software defined vehicles,automotive standards (safety, security), agile cont...
Software defined vehicles,automotive standards (safety, security), agile cont...Software defined vehicles,automotive standards (safety, security), agile cont...
Software defined vehicles,automotive standards (safety, security), agile cont...
 
Vertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part IVertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part I
 
Vertex perspectives ai optimized chipsets (part i)
Vertex perspectives   ai optimized chipsets (part i)Vertex perspectives   ai optimized chipsets (part i)
Vertex perspectives ai optimized chipsets (part i)
 
Car liga ac
Car liga acCar liga ac
Car liga ac
 
In Automotive Environments - HU Michel
In Automotive Environments - HU MichelIn Automotive Environments - HU Michel
In Automotive Environments - HU Michel
 
AWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWSAWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWS
 
Streebo Manufacturing Apps Suite
Streebo Manufacturing Apps SuiteStreebo Manufacturing Apps Suite
Streebo Manufacturing Apps Suite
 
Streebo Manufacturing Apps Suite
Streebo Manufacturing Apps SuiteStreebo Manufacturing Apps Suite
Streebo Manufacturing Apps Suite
 
PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLESPROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
PROGRESS OF AUTOSAR STANDARDS FOR FUTURE INTELLIGENT VEHICLES
 
Rosella reference design architecture v 0.1
Rosella reference design architecture v 0.1Rosella reference design architecture v 0.1
Rosella reference design architecture v 0.1
 
Automated-Testing-of-Infotainment-Systems.pdf
Automated-Testing-of-Infotainment-Systems.pdfAutomated-Testing-of-Infotainment-Systems.pdf
Automated-Testing-of-Infotainment-Systems.pdf
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in Oracle
 
TESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMS
TESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMSTESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMS
TESTING, DATA LOGGING, AND PROTOTYPING FOR IN-CABIN MONITORING SYSTEMS
 
Cloud computing for Department of Transportation Federal Motor Carrier Safety...
Cloud computing for Department of Transportation Federal Motor Carrier Safety...Cloud computing for Department of Transportation Federal Motor Carrier Safety...
Cloud computing for Department of Transportation Federal Motor Carrier Safety...
 

Plus de Ganesan Narayanasamy

180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 

Plus de Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems

  • 1. Solutions for ADAS and AI Data Driven engineering 1IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation Ing. Florin Manaila Senior Architect and Inventor NextGen Workloads and Distributed AI IBM Systems Hardware Europe, Middle East & Africa Member of the IBM Academy of Technology (AoT)
  • 2. Autonomous Driving is the key driver for innovation in the automotive industry. But: it requires a whole new set of tools, skills and capabilites. Especially in regards to data and AI. Autonomous Vehicles (AV) - are capable of sensing their environment - combine external input from sensors like radar, computer vision, LiDAR, sonar and GPS. - interpret this information to identify navigation paths, obstacles and signage - … to move with little or no human input 2 Advanced Driver Assistance Systems (ADAS) - automate, adapt and enhance vehicles for safety and better driving - rely on input from imaging, LiDAR, radar, image processing, computer vision and in- car data - examples: stability control, lane control, adaptive cruise, traction control IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 3. Autonomous Driving is a data and time intensive challenge. Innovation in ADAS/AV provides huges amounts of data. The data collection volume of a single test car for an hour long of test driving can add up to ~15TB. OEM use fleets of vehicles to get enough data and validation points. The data volume for one single ADAS/AV project can easily be over ~10 PB. It takes more than 200h in order to tag and annotate net driving scenes from 1h of test driving. 3IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 4. AV and ADAS include the most complex AI and data tasks to date. IN CAR: Complex Real-Time decisions BACKEND: Continuous development & integration Motion Control Scene UnderstandingPerception, Sensor fusion Mission, Trajectory Planning Operations Backend Real-time data (weather, traffic, accident, ...) AI Engineering & Training Tagging/LabelingTest Data Management Simulation and Approval AI & Data Centric System Engineering IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 5. ADAS/AV capabilities will be an integral part of future-proof vehicle 4.0 platforms. Edge Services Platform Owned by OEM + 3rd Parties Vehicle Control Center Security Operations Center Owned by OEM OEM Apps 3P Apps In-Vehicle Platform Host system, virtualized EE & API 5G & V2X Fog Supplier Apps Connected Vehicle Platform Vehicle-centric OEM Apps 3P Apps B2B Connected Services Platform Customer-centric IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 6. We see recurring challenges across the industry, when OEMs start building their autonomous driving capabilities. - Complex, massive data ingestion and data lake - Open technology platform management - High-quality data that is synchronized, labeled, tagged and searchable - Strong requirements traceability, test case and defect management 6 - Program management and support across Geos/Sites and Partners/Suppliers - Agile engineering processes, as well as AI specific processes to be aligned to vehicle development lifecycle (V-model, ASPICE, ISO26262) IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 7. ADAS/AD development needs data along the entire value creation process. IBM’s intelligent data management helps to optimize the overall process. 7 Data Acquisition/ Ingestion Analytics/ Scene Selection Data Enrichment / Labelling Algorithm Training Simulation Validation / SIL / HIL Test drives / Connectivity based validation Reduced offloading time Faster Findings - Higher engineer productivity Reduced trom weeks to minutes Parallel performance - Reduce car fleet More test cases in shorter time Super fast access from everywhere Online feedback - Shorten the loop Engineering Lifecycle Management (ELM) References for ASPICE and ISO26262 compliant Development Hybrid Data Bridge IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 8. IBM Autonomous Driving Engagements Overview IBM Service, Technology & Research Assets IBM Data Management for ADAS/AD IBM Automotive Software Engineering IBM Connected Vehicle Platform IBM Vehicle Operation Center & Security Operation Center Development & Test Operation IBM Service, Technology & Research Assets IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 9. The path to Autonomous Vehicles Neural network models Billions of parameters Gigabytes Computation Iterative gradient based search Millions of iterations Mainly matrix operations Data Millions of images, sentences Petabytes Workload characteristics: Both compute and data intensive! 9 AUTOMATION IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 10. Technical challenges in ADAS real-time object detection 10 Technical challenges in ADAS: • Safety driving for accident occurred in short distance: Need fast enough response for vehicle control system • Detection for object as far as possible and in night time: Extremely small object detection and in night time, 10pixels*10pixels, should detect ahead >50~70m, to achieve precision >90% and (1-recall ratio)<10%. • Different weather situation (raining, snowing, and foggy, etc.) • Shape distortion due to the angle between the camera and sign. • Amount of categories will be huge: Only for traffic signs -- hundreds of types for China, Germany and US. Extremely small objects Traffic Sign in US Traffic Sign in Germany Bad Weather Condition Shape distortion IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 11. Other technical challenges in ADAS real-time object detection 11 § To fit the complex network into limited compute and memory resources and real time detection. Enable AI from data center to edge is critical. § Color change depends on the light, weather conditions, and even the age of the sign, and this variance could be extremely serious in night environment. § Distance to objects. Vehicles need real-time detection of the distance between the objects and the car, so that vehicle companies could build more complex applications, such as car lights adjustment according to where the objects are. The red color of the circle has totally changed. § Reflection of the car light causes traffic sign couldn't be recognized. Reflection of light: test data from US (Click “slide show” to play it) Distance estimation to the objects IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 12. Model Lifecycle 12 § Integrity § Quality § Tools Data Development Approval Culture Validation Test & DeployUsage Performance Monitoring Risk Assessment § Design and objectives § Hypothesis § Assumtions § Regulatory context § Technology aspects/limitations § Data § Methodology and theoretical soundness § Backtesting results § Stress testing results § Model stability § Qualitative assessment § Model risk quantification § Business Functional requirements § User acceptance testing § Path to production § Sign-off for deployment § Available infrastructure for support § Fast Updates § Backtesting § Performancce metrics § Escalations § Link models to limit calibration, thresholds and risk capacity IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 13. Getting predictions 13 Online AIBatch AI § Optimized to minimize the latency of serving predictions. § Can process one or more instances per request. § Predictions returned in the response message § Returns as soon as possible. § Runs on the runtime version and in the region selected when you deploy the model. § Runs models deployed to an embedded accelerator (GPU, FPGA, ASIC) § Optimized to handle a high volume of instances in a job and to run more complex models. § Can process one or more instances per request. § Predictions written to output files § Asynchronous request. § Can run in any AI accelerated cluster, using optimized runtime version. § Runs models deployed to AI Platform or models stored on-prem or cloud locations. IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 14. 14 AI Workflow IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 15. 15 Transforming AI Infrastructure Stack from yesterday ON-CLOUD and ON-PREM Transform & Prep Data (ETL) Micro-Services / Applications APIs (external and/or in-house) Machine & Deep Learning Libraries & Frameworks Distributed Computing Data Lake & Data Stores Segment Specific: Finance, Retail, Healthcare, Automotive Speech, Vision, NLP, Sentiment TensorFlow, Caffe, Pytorch SparkML, Snap.ML Spark, MPI Hadoop HDFS, NoSQL DBs, Parallel File System Accelerated Infrastructure IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 16. 16 Transform & Prepare Data Micro-Services / Applications Governance AI (Fairness, Explainable AI, Model Health, Accuracy) APIs (external and in-house) Machine & Deep Learning Libraries & Frameworks Distributed Computing Data Lake & Data Stores Action / Decision APIs Internal Optimized Machine & Deep Learning Runtime Federated Learning Local Cache Security Compliance AI Infrastructure Stack from today ON-CLOUD / ON-PREM vs near-Edge or Edge Sensors Data Standardization IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 17. Multi-Tenant, Self-Serve, Accelerated Platform Architecture Overview RedHat OpenShift AcceleratedWorkerNodes forTraining Machine Learning Frameworks and Libraries (WMLCE) KubeFlow / ODH AcceleratedWorkerNodes forInference Adversarial Robustness Toolbox (ART) / Trusted AI Anaconda Team Edition Bayesian Optimization Deep Search ControlPlande Nodes KVM #2KVM #1 Master #1 VM Master #2 VM Bootstrap VM Master #3 VM Bastion VM NFS / Parallel File System NVMe Storage Internal Git Existing Databases HPC simulations | SW Scalability Scalability Open Data Hub IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 18. Goal / Computation 18 Description Build the model Continuous Integration Train the model on real data (hyperparameter tuning) Optimize and validate the model Deploy the model Desired Goal Build a promising model Make sure that the code base remains bug free Make the model work with real data and optimize Prepare the model for deployment and validation Provide functionality using the model Iteration time Hours Hours Days-Weeks Hours - Weeks Milliseconds No of Systems 1 10s 10s-100s 10s Hundreds (test fleet) - Millions (live fleet) GPUs 1-2 RTX 2080 Ti or Tesla V100 2-6 Tesla V100 4-6 Tesla V100 4-6 Tesla V100 Embedded Inference Platforms (ie. Driver AGX) Model + Data Data + Parameters Data + Parameters Data + Parameters Environments + Data IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 19. Multi-Tenant, Self-Serve, Accelerated Platform GPUs | NVMEs | HDR Kubeflow Project Architecture Components Fairing Experiments Notebooks | Jupyter Lab Pipelines Simulation Pipelines Kubeflow Pipelines Apache Airflow Argo Cloud Composer Model Serving TesnsorFlow Serving and Istio TensorFlow Batch Prediction Seldon Serving Triton PyTorch Serving Miscellaneous | Metadata | Nuclio functions Hyperparameter tuning | Katlib | Google Vizer | Auto-Keras | BO Model Training TensorFlow TensorBoard TensorFlow Extended Pytorch Horovod Other Operators Data Ingestion Users Web Interface IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 20. Open Data Hub Architecture Vision and Integration Model Lifecycle Kubeflow Seldon MLflow ML Appliccations Open Data HUB AI Fairness 360 AI Library Business Inteligence Superset Interactive AI JupyterHub + plugins Hue Big Data Processing Spark | Spark SQL Thrift Streaming Kafka Streams Elasticsearch Data Exploration Hue Kibana Hive Metastore | Spectrum Discovery Data Lake Spectrum Scale Ceph Storage MinIO In-Memory Red Hat Data Grid Relational Databases PostgresSQL MySQL MariaDB Red Hat AMQ Streams (kafka Strimzi) Red Hat Ceph S3 API Kafka Connect Logstash Fluentd rsyslogd Red Hat OpenShift Kubernetes | RHEL | Hybrid Cloud Red Hat OpenShift Oauth RedHat Single Sign- On (Keycloak) Red Hat Ceph Object Gateway Red Hat 3scale IBM Adversarial Robustness Toolbox Prometheus Grafana Kubeflow Pipelines Argo Workflows Jekins CI/CD TensorBoard DATA ANALYSIS MACHINE LEARNING AND DEEP LEARNING METADATA MANAGEMENT STORAGE DATA IN MOTION CONTAINERS PLATFORM SECURITY GUVERNANCE MONITORING ORCHESTRATION IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 21. Accelerated Platforms 21 On-Prem Hardware Capabilities IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 22. Partnership with IBM Research 22 Visual Insights: Auto-Deep Learning Automatic Labeling Large Model / High- Res Image Support Auto-Hyperparameter Optimization AI Model Optimizer & Compiler (FPGAs, SoCs) IBM Research Innovations Around Computer Vision IBM Research Innovation: Around Analog Inference Chips Elastic Inference for large-scale compute GPU and cluster accelerated ML Elastic Distributed Training INNOVATION IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 23. Next-Generation AI Hardware Source: IBM Research 23IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 24. Merging Memory and Processing Source: IBM Research 24 In-memory computing using resistive memory devices is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware Avoid constant shuttling of data between memory and processing units, which limits the maximum achievable energy efficiency. IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 25. 25 IBM Power AC922 • GPUs: NVIDIA Tesla V100s • (5.6 – 10x data throughput) with Advanced IO Architecture • PCIe Gen 4 • CAPI 2.0 • OpenCAPI 3.0 • NVLink 2.0 CPU–GPU and GPU-GPU TRAIN Powering the Fastest Supercomputer DATA IBM Power LC922/IC922 • Up to 120 TB of data storage • Superior I/O: PCIe Gen 4 • IBM Spectrum Scale / ESS Hadoop Integration INFERENCE SIMULATION AI Inference Platform Hardware • IC922 2U • NVIDIA T4 GPUs or FPGAs* Software • TensorRT • WMLCE • Maximo Visual Inspection Deploy AI into Production Enterprise AI infrastructure you need to deploy AI into production Big Data Workloads IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 26. Accelerate AI pipelines Provide a ESS 3000 High Performance Tier of storage to keep AI Data Pipelines and GPUs running at peak performance ESS 3000 High Performance Tier ARCHIVE High scalability, large/sequential I/O capacity tier 1. Single name space 2. Global collaboration / Hybrid Cloud 3. Software RAID / Erasure Coding 4. Multi-protocol support Spectrum Scale Cloud Object Storage Elastic Storage Server IBM Cloud Paks Classification & metadata tagging ─ High volume, index & auto-tagging zone GRID AccelerateETL IBM Cloud Paks Transient storage ─ Throughput-oriented work areas landing zone Fast ingest / Real-time analytics DATA IN INGEST ORGANIZE ANALYZE ML/DL IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 27. IBM Spectrum Storage for AI with Power Systems 27 A fully optimized, scalable and supported AI platform that delivers blazing fast performance, proven dependability and resiliency. § Single Global Namespace (CIFS/NFS/iSCSI) § Cloud Integration (Object Storage, S3 etc) § Hadoop Transparent Integration § NVMe based storage IBM Spectrum DiscoverIBM Spectrum Scale IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 28. IBM ESS 3000 NVMe Flash for AI All-new storage solution § Integrated scale-out advanced data management with end-to- end NVMe storage § Containerized software for ease of install and update § Deploy initial configuration in hours, not days § Fast and easy update and scale-out expansion § Performance, capacity and ease of integration for AI workflow IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 29. IBM Elastic Storage System 3000 specification overview Scalable high-performance unified storage for files and objects File management IBM Spectrum Scale Version 5 Data protection IBM Spectrum Scale erasure coding Internal operating system Red Hat Enterprise Linux 8.x Protocols and interfaces POSIX with Spectrum Scale client, NFS v4.0, SMB v3.0, Hadoop MapReduce, OpenStack Swift (object), S3 (object), CSI (Container Storage Interface) Controllers Highly available dual active-active controllers Storage NVMe flash drives (1.92TB, 3.84TB, 7.68TB or 15.4TB) Number of drives 12 or 24 drives per 2U enclosure Memory 384 GB or 768 GB memory per controller Network adapters Up to three PCIe host adapters per controller Mellanox Connect X5 with InfiniBand EDR and 100GBps Ethernet IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 30. IBM Spectrum Storage for AI Performance for Distributed Deep Learning 30 Near Linear Scaling by adding 40GB/s per 2U appliance No need for downtime or reconfiguration Best in class throughput potential IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 31. IBM Cognitive Systems On-Prem solutions / products as of today 31 DATA PRE-PROCESING TRAINING INFERENCE § AI Fairness 360 § Adversarial Robustness Toolbox § Watson Studio Local § Watson ML Community Edition § Watson ML Accelerator § Visual Insights Training and Inference § Visual Inspector (iOS) § Engineering and Scientific Subroutine Library § Spectrum LSF for HPA § Spectrum Scale (Burst Buffer, LROC etc), Discovery § Cloud Object Storage § OpenShift / Open Data Hub § Cloud Pack for Data § Driverless AI IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 32. On-Prem Data Annotation Tool 32IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 33. On-Prem Labeling for Deep Learning 33 IVISBatch AI § Object Detection § Action Detection § Gesture Detection § Etc § Classification § Object Detection § Segmentation § Path Planning § Etc ADAS § Collaboration Platform for data annotation and training § K8s Scalability § Auto Labeling § Augmentation § Intuitive Web Interface § API Driven Platform § Support for images and videos § Design for productivity § Custom models support IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 34. Architecture Components 34IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 35. 35 Intelligent data annotation / Object Detection Image files IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 36. 36 Intelligent data annotation / Object Detection Image files IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 37. 37 Intelligent data annotation / Object Detection Image files IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 38. 38 Intelligent data annotation / Object Detection Image files § Rectangular labeling § Object Detection IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 39. 39 Intelligent data annotation / Object Detection Image files § Non rectangular labeling (multi point polygons) § Image Segmentation IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 40. 40 Intelligent data annotation / Object Detection Video files IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 41. 41 Intelligent data annotation / Object Detection Video files IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 42. Training job results Object Detection / Basic View 42 § mAP: the calculated mean of the precision for each object. Precision is the percentage of objects correctly marked in an image. It is calculated by true positives / (true positives + false positives). § Precision: The percentage of images that are labeled as an object which actually should be labeled as that category. It is calculated by true positives / (true positives + false positives). § Recall: The percentage of the images that were labeled as an object compared to all images that contain that object. It is calculated as true positives / (true positives + false negatives). § IoU: the location accuracy of the image label boxes. It is calculated by the intersection (overlap) between a hand drawn bounding box and a predicted bounding box divided by the union (combined area) of both bounding boxes. IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 43. 43 Training job results Object Detection / Advanced View IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 44. 44 Testing the model with new images: Object Detection Single image testing via web portal IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 45. Intelligent data annotation / Action Detection Video files 45 § Actions are used to identify specific moments occuring in a video for action detection. IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 46. 46 Intelligent data annotation / Action Detection Video files IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 47. 47 Intelligent data annotation / Action Detection Video files IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 48. Data augmentation 48IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 49. Exporting the dataset Using web portal 49 .xml file .jpg file 1 2 3 IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 50. Production Optimized ML/DL Frameworks and Libraries for Training 50 IBM Watson Machine Learning IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 51. IBM Cognitive Systems On-Prem solutions / products as of today 51 DATA PRE-PROCESING TRAINING INFERENCE § AI Fairness 360 § Adversarial Robustness Toolbox § Watson Studio Local § Watson ML Community Edition § Watson ML Accelerator § Visual Insights Training and Inference § Visual Inspector (iOS) § Engineering and Scientific Subroutine Library § Spectrum LSF for HPA § Spectrum Scale (Burst Buffer, LROC etc), Discovery § Cloud Object Storage § OpenShift / Open Data Hub § Cloud Pack for Data § Driverless AI IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 52. Watson ML Community Edition (WMLCE) 52 CUDA 10 TensorRTTensorFlow Caffe2 RAPIDS.AI: cuDF, cuML LIBS: Distributed Deep Learning (DDL) Large Model Support (LMSv2) SnapML Local, MPI, Spark DASK Pytorch Estimator, Probability, Serving, Tensorboard APEX XGBoostBazel libevent, libgdf, libgdf_cffi, libopencv, libprotobuf, parquet-cpp, thrift-cpp, arrow-cpp, pyarrow, gflags, magma, cupy, py-oepncv, arrow-cpp etc NCCL cuDNN Spectrum MPI Horovod delivered via Bare Metal or Containers ONNX Version1.7.0 IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 53. Watson ML Community Edition (WMLCE) 53 CUDA 10 TensorFlow Caffe2 RAPIDS.AI: cuDF, cuML LIBS: Distributed Deep Learning (DDL) Large Model Support (LMSv2) SnapML Local, MPI, Spark DASK Pytorch Estimator, Probability, Serving, Tensorboard APEX XGBoostBazel libevent, libgdf, libgdf_cffi, libopencv, libprotobuf, parquet-cpp, thrift-cpp, arrow-cpp, pyarrow, gflags, magma, cupy, py-oepncv, arrow-cpp etc NCCL cuDNN Spectrum MPI Horovod delivered via Bare Metal or Containers ONNX Version1.7.0 TensorFlow Serving Server TensorRT ONNX Protobuf Training Inference IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 54. Distributed Deep Learning Research Innovations Optimized ML/DL frameworks & libraries Snap Machine LearningLarge Model Support 1.1 Hours 1.53 Minutes 0 20 40 60 80 Google CPU-only Snap ML Power + GPURuntime(Minutes) Logistic Regression in Snap ML (with GPUs) vs TensorFlow (CPU-only) 46x Faster 3.1 Hours 49 Mins 0 2000 4000 6000 8000 10000 12000 Xeon x86 2640v4 w/ 4x V100 GPUs Power AC922 w/ 4x V100 GPUs Time(secs) Caffe with LMS (Large Model Support) 3.8x Faster GoogleNet model on Enlarged ImageNet Dataset (2240x2240) 0 100 200 300 400 1 System 64 Systems 58x Faster ResNet-101, ImageNet-22K Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System Google: 90 x86 servers Snap ML: 4 AC922 servers 54 16 Days 7 Hours IBM Watson Machine Learning Community Edition / Accelerator
  • 57. TensorFlow Large Model Support 57 Existing Limits: • GPUs have limited memory • Neural networks are growing deeper and wider • Amount and size of data to process is always growing IBM TFLMS Advantages: • 10x image resolution - Keras ResNet50 and DeepLabV3 2D for image segmentation • Easier to enable in model code • Automatic tuning of swapping parameters and faster graph modification times • Faster graph modification times • Finer tuning of asynchronous compute and memory transfer • Serialization of operations in layers • More model and tensor information output IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 58. TensorFlow Large Model Support 58 0 250 500 750 1000 1250 1500 1750 2000 2250 POWER9 server with NVLink 2.0 GPU server with PCI GPU server with PCI contention Seconds Epoch times at high resolution with swapping IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 59. Developing AI technologies for assistant driving system (ADAS) or self-driving system 59 ADAS Client Use Case IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 60. ZF Friedrichshafen 60 Client Value - accelerated data reception and video transformation - continuous, company-wide process control ZF Friedrichshafen and IBM collectively build a data management system for ADAS, based on hybrid multi- cloud. “Managing huge amounts of data in a hybrid multi-cloud environment is very important. Transparent access to files in data lakes with low latency is essential when developing autonomous vehicles, where we have to process images and information from many different data sources.“ IBM Solutions: IBM Spectrum Scale, IBM Aspera, IBM Arema, Red Hat OpenShift Harald Holder Director of IT Infrastructure Platforms at ZF https://www.smart2zero.com/news/zf-manage-adas-development-data-cloud-together-ibm/page/0/1 IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 61. Robert Bosch 61 - Out-Think Competition - Improved Service Delivery - Enhanced Customer experience Client Value - Manages the complex label workflow to allow all involved users to participate in an efficient way - Cost reduction – AREMA is reducing the data to be transferred and thus saving transfer costs - Operational efficiency - whole system is operated/configured/extended by few people Robert Bosch, a well-known German Tier 1 supplier manages ADAS labelling workflows to optimize performance, quality and to save costs. - AREMA is used for the management of ADAS/AD related test data (test drive recordings) - Allows full visibility (search, filter and preview) for millions of video recordings with AREMA Media Portal - AREMA manages the ingest process, extraction of relevant data and conversion to different formats / same formats with reduced or compressed content - Management of complex labelling workflow with the support of human approval processes and dozens of user roles involved and automatic management file transfers - Export to labelling as a service provider workflow IBM Solutions: IBM AREMA, IBM Services IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 62. ADAS Supplier in Europe 62© 2020 IBM Corporation - Out-Think Competition - Improved Service Delivery - Enhanced Customer experience Client Value - Met the required performance KPIs to give them better ROA for their NVIDIA DGX AI cluster - Reduce AI training times from weeks to days - Experiments/Month increased by the factor of 14 Major supplier to the automotive industry, supplying systems, components, electronics, and engineering services for vehicle safety, comfort and powertrain performance. Developing ADAS solution with a large NVIDIA DGX cluster. Challenge: The existing storage infrastructure was unable to keep the DGX-1 GPUs busy. Data collected at the edge by ADAS test vehicles need to be shared from multiple data centers in different countries to a central site. IBM Solution: IBM Elastic Storage Server (ESS)
  • 63. Volkswagen AG Volkswagen and Red Hat work together to automate integration testing leveraging Red Hat OpenShift. Red Hat OpenShift extended into 3rd Party Hardware: • Achieve Hardware in the Loop (HiL) for ICAS/ID3 • Virtual testing for Virtual Travel Assist 4.2, ACC and ID3 Infotainment • All Test Components Physical And Virtual Managed And Controlled By OpenShift 63 “My job at Volkswagen is to make sure all the electronic control units we have in our cars work together,” said Michael Denecke, Head of Test Technology at Volkswagen AG, speaking in the Red Hat Summit 2019 keynote. “Due to the new challenge of having autonomous driving connected cars (...), we got the idea to have all these tests we do with hardware on virtual test environments, and that’s why we’ve come to OpenShift and containers.” IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 64. 64 © 2020 IBM Corporation Mayflower Project IBM announced October 2019 that it has joined a global consortium of partners, led by marine research organization ProMare, that are building an unmanned, fully-autonomous ship that will cross the Atlantic on the fourth centenary of the original Mayflower voyage in September 2020. IBM Solution: IBM PowerAI Vision, IBM Cloud, IBM Cloud Object Store, IBM Operational Decision Manager, Red Hat Enterprise Linux (RHEL) IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 65. Questions? 65IBM Cognitive Systems EMEA / DOC v2 / August 5 / © 2020 IBM Corporation
  • 66. Thank you 66 Ing. Florin Manaila Senior Arhcitect NextGen Workloads and Distributed AI — florin.manaila@de.ibm.com ibm.com
  • 67. 67