SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Is Your AI Data Pre-processing
Fast Enough?
Speed It Up Using rocAL™
Rajy Rawther
PMTS Software Architect
Advanced Micro Devices, Inc.
• Introduction: Why do we need rocAL?
• rocAL pipeline and architecture
• Operators for data loading and augmentations
• Flexible pipelines: scalability across multiple devices
• How to use rocAL?
• Deep dive into MLPerf object detection example with rocAL
• rocAL performance advantages
• rocAL use case in inference
• Conclusion
Agenda
2
© 2022 Advanced Micro Devices
The Problem
3
Many Data
Formats
FileList
TFRecord
RecordIO
LMDB
Lots of
Frameworks
What processor
to choose?
GPU
CPU
Custom
FPGA
Each Framework has its own data processing pipeline and each needs to
be optimized individually
Needs a
unified
Library
© 2022 Advanced Micro Devices
• GPU performance increases >2x every new
generation
• Native pipelines mostly use CPU cores for pre-
processing data before training
• As training gets faster with GPUs, the pre-
processing needs to catch up
• EPYC™ + MI100: 64 CPU cores, 8 GPUs, 8
CPU cores/GPU
• EPYC™ + MI200: 64 cores, 8 GPUs(2x perf),
8 CPU cores/GPU, >300 TFLOPs (FP16)
• Falling CPU/GPU performance throttles the
overall speed 0
500
1000
1500
2000
2500
3000
Images
per
sec
#CPU/GPU cores
GPU Vs CPU performance comparison
Training Preprocessing Overall
Feeding the Beast: How to Fully Utilize GPUs?
4
High-performance pre-processing library which
can load balance between CPU and GPU
© 2022 Advanced Micro Devices
Introducing ROCm Augmentation Library (rocAL™)
5
CPU/GPU
decode
Training
dataset
Augmentations
CPU/GPU based
Ready to
train/infer data
Training or
Inference
© 2022 Advanced Micro Devices
• CPU and GPU based implementations for each operators
• Python and C++ APIs for easy integration and testing
• Flexible graphs to create custom pipeline utilizing CPU cores or GPU
• Supports many new augmentations like fish-eye, non-linear blend, water, RICAP, etc.
• Support for many workloads
• Classification
• Object detection
• Pose estimation and segmentation
• Seamless interoperability with frameworks using rocAL framework plugins
• Optimized to give maximum performance on AMD EPYC™ CPUs and AMD Instinct™ GPUs
Key Features of rocAL
6
© 2022 Advanced Micro Devices
A pipeline is a graph of data flow connected by node operators
What is a rocAL Pipeline?
7
Loader
Decoder Resize
Training
Meta-data
Augmentations
Image Loader
TFRecord
Loader
LMDB Loader
Video Loader
Audio Loader
…..
JPEG Decoder
JPEG Cropped
Decoder
Video Decoder
Audio Decoder
…..
Resize
Crop
ColorTwist
Brightness
Contrast
…..
PyTorch
TensorFlow
MxNet
Caffe
Caffe2
…..
Augmentations
Labels, bboxes
Jpegs
© 2022 Advanced Micro Devices
rocAL Pipeline Dataflow
8
Multi-core host or GPU
Execution
Dataset
rocAL pipeline
Augmentation Nodes
Python & C APIs
Ready to train data
© 2022 Advanced Micro Devices
rocAL Architecture
9
Loader
Module
Output Q
MetaData
Reader
node
node
node
node
Input
Prefetch
Buffer
Internal Thread
reader
decoder
decoder
decoder
decoder
Batch Count
ColorTwist
Brightness
Contrast
Flip
Rain
Resize
Loader Module
Processing module
rocAL pipeline Augmentation
Nodes
© 2022 Advanced Micro Devices
Reader
File Reader
COCO Reader
TF Reader
RecordIO Reader
LMDB Reader
Loader
ImageLoader
ImageLoaderSharded
Video Loader
Audio Loader
SequenceLoader
Decoder
Tjpeg Decoder
Cropped Decoder
HW Decoder
Video Decoder
Audio Decoder
ImageAug
Resize
Crop
ColorTwist
Normalize
Rotate
Flip
rocAL Operators
10
© 2022 Advanced Micro Devices
rocAL Advantage
11
• One unified library that integrate to all the frameworks
• Optimized augmentation operations used among all
• Flexible to support different data formats (File folder reading, LMDB, TF Record, Record IO). Portable
between frameworks
CPU GPU DLA
APU
© 2022 Advanced Micro Devices
• rocAL pipelines can be accessed using three simple steps (Define/Build and Run)
Using rocAL
12
from amd.rocal.pipeline import pipeline_def
import amd.rocal.fn as fn
@pipeline_def
def example_pipeline():
jpegs, labels = fn.readers.file(file_root=file_dir)
images = fn.decoders.image(jpegs, device=decoder_device)
resized_images = fn.resize(images, device, resize_w, resize_h)
return resized_images, labels
images, labels = pipe.run()
Define
Build
Run
pipe = example_pipeline(batch_size=8, num_threads=32,device_id=0)
pipe.build()
© 2022 Advanced Micro Devices
Each sample is decoded and resized to 224x224
Sample Output From Example_ Pipeline
13
Jpegs
Output tensor Labels
Labels
File Reader
Decode
Resize
Metadata Reader
Metadata
Augmentations
© 2022 Advanced Micro Devices
Flexible Pipelines: Scalability Across Devices
14
Input Dataset
Shard 0
Shard 1
Shard 2
Shard n
rocALPipeline 0
rocAL Pipeline 1
rocAL Pipeline 2
rocAL Pipeline n
Allocated CPU cores and GPU
Allocated CPU cores and GPU
Allocated CPU cores and GPU
Allocated CPU cores and GPU
Each pipeline is configured with GPU device_id and CPU core bindings
© 2022 Advanced Micro Devices
0
5000
10000
15000
20000
25000
Native Pytorch rocAL-cpu rocAL-gpu
IMAGES
PER
SEC
ResNet-50 Mlperf training throughput (AMD EPYC™ Server with MI200, batch_size =128)
Preprocessing Training Total
rocAL’s Impact In Performance (Mlperf Resnet-50
Training)
15
© 2022 Advanced Micro Devices
Radeon Performance Primitives (RPP) Library
rocAL’s Core: AMD RPP Library
16
Hand Optimized High Performance Library Under AMD’s ROCm Stack
© 2022 Advanced Micro Devices
0
500
1000
1500
2000
2500
3000
3500
Resize CropMirrorNormalize ColorTwist Rotate
IMAGES
PER
MILLISEC
AUGMENTATION TYPE
RPP performance compared to native processing for batch_size=128 on MI200
Native Host HIP
RPP Advantage
17
Higher is better
© 2022 Advanced Micro Devices
Example: SSD Object Detection Training
Augmentations
18
Good crop
SSDRandomCrop
Resize
ColorTwist
RandomMirror
Image with bboxes
Bad crop
Color Twisted
Resized
Randomly flipped
© 2022 Advanced Micro Devices
Mlperf SSD Training With rocAL
19
def COCOPipeline( batch_size, num_threads, local_rank , world_size, device_id, data_dir, ann_dir):
pipe = Pipeline(batch_size, num_threads, device_id=device_id)
with pipe:
jpegs, bboxes, labels = fn.readers.coco(path=data_dir, random_shuffle=True)
crop_begin, crop_size, bboxes, labels = fn.random_bbox_crop(bboxes, labels, device="cpu",
aspect_ratio=[0.5, 2.0],
thresholds=[0, 0.1, 0.3, 0.5, 0.7, 0.9])
images_decoded = fn.decoders.image_slice(jpegs, crop_begin, crop_size, device="cpu", type = types.RGB)
res_images = fn.resize(images_decoded, device="gpu", resize_x=crop, resize_y=crop)
cl_twist_images = fn.color_twist(res_images, device="gpu", contrast_rand, brightness_rand, hue_rand)
bboxes = fn.bb_flip(bboxes, ltrb=True, horizontal=flip_coin)
images = fn.crop_mirror_normalize(cl_twist_images, device="gpu",
crop=(crop, crop),
mirror=flip_coin,
mean=[0.485*255,0.456*255 ,0.406*255 ],
std=[0.229*255 ,0.224*255 ,0.225*255 ])
bboxes, labels = fn.box_encoder(bboxes, labels, device=rali_device)
pipe.set_outputs(images, bboxes, labels)
train_loader = rocALGenericIterator(pipe)
pipe.build()
for i, data in enumerate(train_loader):
images, bboxes, labels = data
# do model training
Define
Build
Run
© 2022 Advanced Micro Devices
0
100
200
300
400
500
600
700
800
900
1000
MI100-native MI100-rocAL-GPU MI200-native MI200-rocAL-GPU
TIME
IN
MILLISEC
SYSTEM TESTED
SSD Training Profiling Comparison
DataLoading Training Total
rocAL Advantage in MLPerf SSD Training
20
Overall time is
significantly
reduced
© 2022 Advanced Micro Devices
Decode,
Resize,
Normalize
rocAL Use Case In Inference: Inference Server
21
Initialize Inference
(Compile, build inference graph,
set-up hardware)
Image
database
1. Choose model &
parameters
2. Choose dataset
3. View results
Results
Up to 8 GPUs on a single
server node
Client Server
Model and Parameters
Status
CPU
cores
Images
Results
Inference Server Node
Setup Phase
rocAL
Inference
GPUs
Inference Execution
© 2022 Advanced Micro Devices
Different Stages Of Inference Pipeline
22
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
OVERALL
FPS
NUM OF CPU OR GPU CORES
Inference pipeline stages, potential FPS
1 4 16 32 1 4 16 32 1 2 4 8
File Read JPEG Decode Convert to Tensor GPU Inference
HDD SSD NVME
© 2022 Advanced Micro Devices
Challenges
• Meta data augmentations and new data-types are introduced to help with
bounding-box and other meta-data augmentations
• CPU based decoding has a hit on performance even with TJpeg decoder
• Hardware decoder using VCN
• ROI based decoding
• Memory management is tricky when we use mixed devices and variable
batch_size
• Discrepancies in image-processing transforms across different frameworks.
Transforms produce different outputs.
• Video processing needs new data layout to represent sequences (NFHWC)
© 2022 Advanced Micro Devices
Conclusion
• rocAL is the AMD open source accelerated data augmentation and
data loading library
• It provides full pre-processing pipelines to be used for training or
inference
• Has easy framework integration for today’s machine learning
workloads
• rocAL’s hybrid pipelines help intelligent load balancing between CPU
and GPU
• It is portable across multiple framework with one underlying library
• The AMD Open sourced RPP library provides the backbone for rocAL
24
© 2022 Advanced Micro Devices
References
rocAL
https://github.com/GPUOpen-ProfessionalCompute-
Libraries/MIVisionX/tree/master/rocAL
MIVisionX
https://gpuopen-professionalcompute-
libraries.github.io/MIVisionX/
RPP
https://github.com/GPUOpen-ProfessionalCompute-
Libraries/MIVisionX
AMD ROCm
https://rocmdocs.amd.com/en/latest/
25
© 2022 Advanced Micro Devices
Disclaimer
26
• The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and
typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but
not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product
differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks
of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or
revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content
hereof without obligation of AMD to notify any person of such revisions or changes.
• THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS
HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY
PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF
THE POSSIBILITY OF SUCH DAMAGES.
• © 2022 Advanced Micro Devices, Inc. All rights reserved.
• AMD, the AMD Arrow logo, EPYC, Radeon, MI100, MI200, rocAL, RPP, MIVisionX, ROCm and combinations thereof are trademarks of Advanced
Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their
respective companies.
© 2022 Advanced Micro Devices
“Is Your AI Data Pre-processing Fast Enough? Speed It Up Using rocAL,” a Presentation from AMD

Contenu connexe

Similaire à “Is Your AI Data Pre-processing Fast Enough? Speed It Up Using rocAL,” a Presentation from AMD

Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
Roberto Brandao
 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
karl.barnes
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
Dayakar Siddula
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Edge AI and Vision Alliance
 

Similaire à “Is Your AI Data Pre-processing Fast Enough? Speed It Up Using rocAL,” a Presentation from AMD (20)

HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Intel xeon e5v3 y sdi
Intel xeon e5v3 y sdiIntel xeon e5v3 y sdi
Intel xeon e5v3 y sdi
 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Architecting the Cloud Infrastructure for the Future with Intel
Architecting the Cloud Infrastructure for the Future with IntelArchitecting the Cloud Infrastructure for the Future with Intel
Architecting the Cloud Infrastructure for the Future with Intel
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
Inter connect2016 yps-2749_02232016_aspresented
Inter connect2016 yps-2749_02232016_aspresentedInter connect2016 yps-2749_02232016_aspresented
Inter connect2016 yps-2749_02232016_aspresented
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 

Plus de Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

Plus de Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

“Is Your AI Data Pre-processing Fast Enough? Speed It Up Using rocAL,” a Presentation from AMD

  • 1. Is Your AI Data Pre-processing Fast Enough? Speed It Up Using rocAL™ Rajy Rawther PMTS Software Architect Advanced Micro Devices, Inc.
  • 2. • Introduction: Why do we need rocAL? • rocAL pipeline and architecture • Operators for data loading and augmentations • Flexible pipelines: scalability across multiple devices • How to use rocAL? • Deep dive into MLPerf object detection example with rocAL • rocAL performance advantages • rocAL use case in inference • Conclusion Agenda 2 © 2022 Advanced Micro Devices
  • 3. The Problem 3 Many Data Formats FileList TFRecord RecordIO LMDB Lots of Frameworks What processor to choose? GPU CPU Custom FPGA Each Framework has its own data processing pipeline and each needs to be optimized individually Needs a unified Library © 2022 Advanced Micro Devices
  • 4. • GPU performance increases >2x every new generation • Native pipelines mostly use CPU cores for pre- processing data before training • As training gets faster with GPUs, the pre- processing needs to catch up • EPYC™ + MI100: 64 CPU cores, 8 GPUs, 8 CPU cores/GPU • EPYC™ + MI200: 64 cores, 8 GPUs(2x perf), 8 CPU cores/GPU, >300 TFLOPs (FP16) • Falling CPU/GPU performance throttles the overall speed 0 500 1000 1500 2000 2500 3000 Images per sec #CPU/GPU cores GPU Vs CPU performance comparison Training Preprocessing Overall Feeding the Beast: How to Fully Utilize GPUs? 4 High-performance pre-processing library which can load balance between CPU and GPU © 2022 Advanced Micro Devices
  • 5. Introducing ROCm Augmentation Library (rocAL™) 5 CPU/GPU decode Training dataset Augmentations CPU/GPU based Ready to train/infer data Training or Inference © 2022 Advanced Micro Devices
  • 6. • CPU and GPU based implementations for each operators • Python and C++ APIs for easy integration and testing • Flexible graphs to create custom pipeline utilizing CPU cores or GPU • Supports many new augmentations like fish-eye, non-linear blend, water, RICAP, etc. • Support for many workloads • Classification • Object detection • Pose estimation and segmentation • Seamless interoperability with frameworks using rocAL framework plugins • Optimized to give maximum performance on AMD EPYC™ CPUs and AMD Instinct™ GPUs Key Features of rocAL 6 © 2022 Advanced Micro Devices
  • 7. A pipeline is a graph of data flow connected by node operators What is a rocAL Pipeline? 7 Loader Decoder Resize Training Meta-data Augmentations Image Loader TFRecord Loader LMDB Loader Video Loader Audio Loader ….. JPEG Decoder JPEG Cropped Decoder Video Decoder Audio Decoder ….. Resize Crop ColorTwist Brightness Contrast ….. PyTorch TensorFlow MxNet Caffe Caffe2 ….. Augmentations Labels, bboxes Jpegs © 2022 Advanced Micro Devices
  • 8. rocAL Pipeline Dataflow 8 Multi-core host or GPU Execution Dataset rocAL pipeline Augmentation Nodes Python & C APIs Ready to train data © 2022 Advanced Micro Devices
  • 9. rocAL Architecture 9 Loader Module Output Q MetaData Reader node node node node Input Prefetch Buffer Internal Thread reader decoder decoder decoder decoder Batch Count ColorTwist Brightness Contrast Flip Rain Resize Loader Module Processing module rocAL pipeline Augmentation Nodes © 2022 Advanced Micro Devices
  • 10. Reader File Reader COCO Reader TF Reader RecordIO Reader LMDB Reader Loader ImageLoader ImageLoaderSharded Video Loader Audio Loader SequenceLoader Decoder Tjpeg Decoder Cropped Decoder HW Decoder Video Decoder Audio Decoder ImageAug Resize Crop ColorTwist Normalize Rotate Flip rocAL Operators 10 © 2022 Advanced Micro Devices
  • 11. rocAL Advantage 11 • One unified library that integrate to all the frameworks • Optimized augmentation operations used among all • Flexible to support different data formats (File folder reading, LMDB, TF Record, Record IO). Portable between frameworks CPU GPU DLA APU © 2022 Advanced Micro Devices
  • 12. • rocAL pipelines can be accessed using three simple steps (Define/Build and Run) Using rocAL 12 from amd.rocal.pipeline import pipeline_def import amd.rocal.fn as fn @pipeline_def def example_pipeline(): jpegs, labels = fn.readers.file(file_root=file_dir) images = fn.decoders.image(jpegs, device=decoder_device) resized_images = fn.resize(images, device, resize_w, resize_h) return resized_images, labels images, labels = pipe.run() Define Build Run pipe = example_pipeline(batch_size=8, num_threads=32,device_id=0) pipe.build() © 2022 Advanced Micro Devices
  • 13. Each sample is decoded and resized to 224x224 Sample Output From Example_ Pipeline 13 Jpegs Output tensor Labels Labels File Reader Decode Resize Metadata Reader Metadata Augmentations © 2022 Advanced Micro Devices
  • 14. Flexible Pipelines: Scalability Across Devices 14 Input Dataset Shard 0 Shard 1 Shard 2 Shard n rocALPipeline 0 rocAL Pipeline 1 rocAL Pipeline 2 rocAL Pipeline n Allocated CPU cores and GPU Allocated CPU cores and GPU Allocated CPU cores and GPU Allocated CPU cores and GPU Each pipeline is configured with GPU device_id and CPU core bindings © 2022 Advanced Micro Devices
  • 15. 0 5000 10000 15000 20000 25000 Native Pytorch rocAL-cpu rocAL-gpu IMAGES PER SEC ResNet-50 Mlperf training throughput (AMD EPYC™ Server with MI200, batch_size =128) Preprocessing Training Total rocAL’s Impact In Performance (Mlperf Resnet-50 Training) 15 © 2022 Advanced Micro Devices
  • 16. Radeon Performance Primitives (RPP) Library rocAL’s Core: AMD RPP Library 16 Hand Optimized High Performance Library Under AMD’s ROCm Stack © 2022 Advanced Micro Devices
  • 17. 0 500 1000 1500 2000 2500 3000 3500 Resize CropMirrorNormalize ColorTwist Rotate IMAGES PER MILLISEC AUGMENTATION TYPE RPP performance compared to native processing for batch_size=128 on MI200 Native Host HIP RPP Advantage 17 Higher is better © 2022 Advanced Micro Devices
  • 18. Example: SSD Object Detection Training Augmentations 18 Good crop SSDRandomCrop Resize ColorTwist RandomMirror Image with bboxes Bad crop Color Twisted Resized Randomly flipped © 2022 Advanced Micro Devices
  • 19. Mlperf SSD Training With rocAL 19 def COCOPipeline( batch_size, num_threads, local_rank , world_size, device_id, data_dir, ann_dir): pipe = Pipeline(batch_size, num_threads, device_id=device_id) with pipe: jpegs, bboxes, labels = fn.readers.coco(path=data_dir, random_shuffle=True) crop_begin, crop_size, bboxes, labels = fn.random_bbox_crop(bboxes, labels, device="cpu", aspect_ratio=[0.5, 2.0], thresholds=[0, 0.1, 0.3, 0.5, 0.7, 0.9]) images_decoded = fn.decoders.image_slice(jpegs, crop_begin, crop_size, device="cpu", type = types.RGB) res_images = fn.resize(images_decoded, device="gpu", resize_x=crop, resize_y=crop) cl_twist_images = fn.color_twist(res_images, device="gpu", contrast_rand, brightness_rand, hue_rand) bboxes = fn.bb_flip(bboxes, ltrb=True, horizontal=flip_coin) images = fn.crop_mirror_normalize(cl_twist_images, device="gpu", crop=(crop, crop), mirror=flip_coin, mean=[0.485*255,0.456*255 ,0.406*255 ], std=[0.229*255 ,0.224*255 ,0.225*255 ]) bboxes, labels = fn.box_encoder(bboxes, labels, device=rali_device) pipe.set_outputs(images, bboxes, labels) train_loader = rocALGenericIterator(pipe) pipe.build() for i, data in enumerate(train_loader): images, bboxes, labels = data # do model training Define Build Run © 2022 Advanced Micro Devices
  • 20. 0 100 200 300 400 500 600 700 800 900 1000 MI100-native MI100-rocAL-GPU MI200-native MI200-rocAL-GPU TIME IN MILLISEC SYSTEM TESTED SSD Training Profiling Comparison DataLoading Training Total rocAL Advantage in MLPerf SSD Training 20 Overall time is significantly reduced © 2022 Advanced Micro Devices
  • 21. Decode, Resize, Normalize rocAL Use Case In Inference: Inference Server 21 Initialize Inference (Compile, build inference graph, set-up hardware) Image database 1. Choose model & parameters 2. Choose dataset 3. View results Results Up to 8 GPUs on a single server node Client Server Model and Parameters Status CPU cores Images Results Inference Server Node Setup Phase rocAL Inference GPUs Inference Execution © 2022 Advanced Micro Devices
  • 22. Different Stages Of Inference Pipeline 22 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 OVERALL FPS NUM OF CPU OR GPU CORES Inference pipeline stages, potential FPS 1 4 16 32 1 4 16 32 1 2 4 8 File Read JPEG Decode Convert to Tensor GPU Inference HDD SSD NVME © 2022 Advanced Micro Devices
  • 23. Challenges • Meta data augmentations and new data-types are introduced to help with bounding-box and other meta-data augmentations • CPU based decoding has a hit on performance even with TJpeg decoder • Hardware decoder using VCN • ROI based decoding • Memory management is tricky when we use mixed devices and variable batch_size • Discrepancies in image-processing transforms across different frameworks. Transforms produce different outputs. • Video processing needs new data layout to represent sequences (NFHWC) © 2022 Advanced Micro Devices
  • 24. Conclusion • rocAL is the AMD open source accelerated data augmentation and data loading library • It provides full pre-processing pipelines to be used for training or inference • Has easy framework integration for today’s machine learning workloads • rocAL’s hybrid pipelines help intelligent load balancing between CPU and GPU • It is portable across multiple framework with one underlying library • The AMD Open sourced RPP library provides the backbone for rocAL 24 © 2022 Advanced Micro Devices
  • 26. Disclaimer 26 • The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. • THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. • © 2022 Advanced Micro Devices, Inc. All rights reserved. • AMD, the AMD Arrow logo, EPYC, Radeon, MI100, MI200, rocAL, RPP, MIVisionX, ROCm and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. © 2022 Advanced Micro Devices