SlideShare une entreprise Scribd logo
1  sur  8
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
TensorFlow Study (Part II)
for GPU part
劉得彥
Danny Liu
資訊與通訊研究所 ICL
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
GPU Options
2
• We can change the GPU options as follows:
• message GPUOptions
▪ double per_process_gpu_memory_fraction = 1;
▪ string allocator_type = 2;
a. "BFC": A "Best-fit with coalescing" algorithm
▪ int64 deferred_deletion_bytes = 3;
a. Delay deletion of up to this many bytes to reduce the number of interactions with gpu driver code.
▪ bool allow_growth = 4;
▪ string visible_device_list = 5;
a. For instance:
» import os
» os.environ[“CUDA_VISIBLE_DEVICES”] = ‘0, 1’
▪ int32 polling_active_delay_usecs = 6;
a. In the event polling loop sleep this many microseconds between PollEvents calls, when the queue is
not empty.
▪ int32 polling_inactive_delay_msecs = 7;
a. In the event polling loop sleep this many millisconds between PollEvents calls, when the queue is
empty.
▪ bool force_gpu_compatible = 8;
a. Force all tensors to be gpu_compatible. On a GPU-enabled TensorFlow, enabling this option forces all
CPU tensors to be allocated with Cuda pinned memory.
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
BFC (best-fit with coalescing)
3
• Chunks point to memory.
▪ Their prev/next pointers form a doubly-linked list of addresses sorted by base
address that must be contiguous.
▪ Chunks contain information about whether they are in use or whether they are
free, and contain a pointer to the bin they are in.
GPU memory
AllocationRegion
size
requested_size
allocation_id
prev
next
bin_num
In_use
size
requested_size
allocation_id
prev
next
bin_num
In_use
size
requested_size
allocation_id
prev
next
bin_num
In_use
chunkhandle
chunkhandle
Order by
size
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
BFC (best-fit with coalescing)
4
• The BFC Concept:
▪ Operations for bin: Search, Insert, and Delete
http://blog.csdn.net/qq_33096883/article/details/77479647
Size:
256 * 2^0 =
256Bytes
bin 0
Size:
256 * 2^1
bin 1
Size:
256 * 2^2
bin 2
… Size:
256 * 2^20
= 256MB
bin 20
Chunk 0
Chunk 1
RegionManager ( manage all regions)
chunks_
free_chunks_list_
regions_
GPU memory
AllocationRegion
handle_[]
AllocationRegion
handle[]
4
Create one large chunk for the whole memory space
that will be chunked later
1
BFC tries to allocate
memory and cannot
find chunk in bins.
Do extend()
2
If curr_region_allocation_bytes_< the
allocation size, multiplying by a power of two until
that is sufficient.
3 Use a factor = 0.9 to reduce the allocation if it failed.
X 2
Chunk 2
Free memory
management
Search
Insert
X 0.9
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
When output tensor is allocated
5
• A customized operation (Op) wants to
allocate memory for its output tensor
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Tensor GPU mem allocation
6
• When a operation try to create a output tensor during its calculation, a tensor gpu
memory allocation will happen.
• GPUBFCAllocator is an BFC implementation class.
Tensor A
Allocator*
Buffer*
BFCAllocator::AllocateRaw()
retry_helper_. AllocateRaw()
GPUBFCAllocator::
AllocateInternal()
BFCAllocate::Extend()
DeviceMemory*
stream_exec_->
AllocateArray().opaque()
• Allocate()
• DeviceMemory::MakeFromBy
teSize()
Suballocator_->Alloc()
It’s where the memory
allocation happens because
GPUMemAllocator inherits
SubAllocator.
return
GPUBFCAllocator
StreamExecutor
CUDAExecutor
CUDADriver
- DeviceAllocate()
GPU memory
StreamExecutor
- Allocate()
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
StreamExecutor Runtime Library
7
• A unified wrapper around the CUDA and OpenCL host-side
programming models (runtimes).
• Support cuBlas and cuDNN
(tensorflow/stream_executor/blas.h and dnn.h)
• It lets host code target either CUDA or OpenCL devices with
identically-functioning data-parallel kernels.
Stream Executor
工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
StreamExecutor Runtime Library
8
• Contrast with OpenMP
▪ OpenMP generates both the kernel code that runs on the
device and the host-side code needed to launch the kernel
▪ StreamExecutor only generates the host-side code.
StreamExecutor
StreamExecutorImpl
StreamExecutorInterface

Contenu connexe

Tendances

Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019
Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019
Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019Unity Technologies
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillanceAshfaqul Haque John
 
HELLO AI WORLD - MEET JETSON NANO
HELLO AI WORLD - MEET JETSON NANOHELLO AI WORLD - MEET JETSON NANO
HELLO AI WORLD - MEET JETSON NANONVIDIA Japan
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detectionchettykulkarni
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
Лекция №11 "Основы нейронных сетей"
Лекция №11 "Основы нейронных сетей" Лекция №11 "Основы нейронных сетей"
Лекция №11 "Основы нейронных сетей" Technosphere1
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...Virot "Ta" Chiraphadhanakul
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
 
Hierachical z Map Occlusion Culling
Hierachical z Map Occlusion CullingHierachical z Map Occlusion Culling
Hierachical z Map Occlusion CullingYEONG-CHEON YOU
 
Qiskit advocate demo qsvm
Qiskit advocate demo qsvmQiskit advocate demo qsvm
Qiskit advocate demo qsvmYuma Nakamura
 

Tendances (20)

CIFAR-10
CIFAR-10CIFAR-10
CIFAR-10
 
Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019
Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019
Unity asset workflows for Film and Animation pipelines - Unite Copenhagen 2019
 
Object detection
Object detectionObject detection
Object detection
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillance
 
HELLO AI WORLD - MEET JETSON NANO
HELLO AI WORLD - MEET JETSON NANOHELLO AI WORLD - MEET JETSON NANO
HELLO AI WORLD - MEET JETSON NANO
 
Yolo
YoloYolo
Yolo
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
Лекция №11 "Основы нейронных сетей"
Лекция №11 "Основы нейронных сетей" Лекция №11 "Основы нейронных сетей"
Лекция №11 "Основы нейронных сетей"
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
Hierachical z Map Occlusion Culling
Hierachical z Map Occlusion CullingHierachical z Map Occlusion Culling
Hierachical z Map Occlusion Culling
 
Qiskit advocate demo qsvm
Qiskit advocate demo qsvmQiskit advocate demo qsvm
Qiskit advocate demo qsvm
 

Similaire à TensorFlow Studying Part II for GPU

2022 COSCUP - Let's speed up your PostgreSQL services!.pptx
2022 COSCUP - Let's speed up your PostgreSQL services!.pptx2022 COSCUP - Let's speed up your PostgreSQL services!.pptx
2022 COSCUP - Let's speed up your PostgreSQL services!.pptxJosé Lin
 
causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011bzanchet
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Caffe studying 2017
Caffe studying 2017Caffe studying 2017
Caffe studying 2017Te-Yen Liu
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux AwarenessPeter Griffin
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLinaro
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness Peter Griffin
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Jian-Hong Pan
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCanSecWest
 
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai YuAn Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai YuDatabricks
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Anne Nicolas
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1Yukio Saito
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyLinaro
 
Virtualization Assessment Example
Virtualization Assessment ExampleVirtualization Assessment Example
Virtualization Assessment Examplekoesteruk22
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuThe Linux Foundation
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time OptimizationKan-Ru Chen
 

Similaire à TensorFlow Studying Part II for GPU (20)

2022 COSCUP - Let's speed up your PostgreSQL services!.pptx
2022 COSCUP - Let's speed up your PostgreSQL services!.pptx2022 COSCUP - Let's speed up your PostgreSQL services!.pptx
2022 COSCUP - Let's speed up your PostgreSQL services!.pptx
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Caffe studying 2017
Caffe studying 2017Caffe studying 2017
Caffe studying 2017
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux Awareness
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacketCsw2016 wheeler barksdale-gruskovnjak-execute_mypacket
Csw2016 wheeler barksdale-gruskovnjak-execute_mypacket
 
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai YuAn Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case study
 
Virtualization Assessment Example
Virtualization Assessment ExampleVirtualization Assessment Example
Virtualization Assessment Example
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream Qemu
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

TensorFlow Studying Part II for GPU

  • 1. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE TensorFlow Study (Part II) for GPU part 劉得彥 Danny Liu 資訊與通訊研究所 ICL
  • 2. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE GPU Options 2 • We can change the GPU options as follows: • message GPUOptions ▪ double per_process_gpu_memory_fraction = 1; ▪ string allocator_type = 2; a. "BFC": A "Best-fit with coalescing" algorithm ▪ int64 deferred_deletion_bytes = 3; a. Delay deletion of up to this many bytes to reduce the number of interactions with gpu driver code. ▪ bool allow_growth = 4; ▪ string visible_device_list = 5; a. For instance: » import os » os.environ[“CUDA_VISIBLE_DEVICES”] = ‘0, 1’ ▪ int32 polling_active_delay_usecs = 6; a. In the event polling loop sleep this many microseconds between PollEvents calls, when the queue is not empty. ▪ int32 polling_inactive_delay_msecs = 7; a. In the event polling loop sleep this many millisconds between PollEvents calls, when the queue is empty. ▪ bool force_gpu_compatible = 8; a. Force all tensors to be gpu_compatible. On a GPU-enabled TensorFlow, enabling this option forces all CPU tensors to be allocated with Cuda pinned memory.
  • 3. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE BFC (best-fit with coalescing) 3 • Chunks point to memory. ▪ Their prev/next pointers form a doubly-linked list of addresses sorted by base address that must be contiguous. ▪ Chunks contain information about whether they are in use or whether they are free, and contain a pointer to the bin they are in. GPU memory AllocationRegion size requested_size allocation_id prev next bin_num In_use size requested_size allocation_id prev next bin_num In_use size requested_size allocation_id prev next bin_num In_use chunkhandle chunkhandle Order by size
  • 4. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE BFC (best-fit with coalescing) 4 • The BFC Concept: ▪ Operations for bin: Search, Insert, and Delete http://blog.csdn.net/qq_33096883/article/details/77479647 Size: 256 * 2^0 = 256Bytes bin 0 Size: 256 * 2^1 bin 1 Size: 256 * 2^2 bin 2 … Size: 256 * 2^20 = 256MB bin 20 Chunk 0 Chunk 1 RegionManager ( manage all regions) chunks_ free_chunks_list_ regions_ GPU memory AllocationRegion handle_[] AllocationRegion handle[] 4 Create one large chunk for the whole memory space that will be chunked later 1 BFC tries to allocate memory and cannot find chunk in bins. Do extend() 2 If curr_region_allocation_bytes_< the allocation size, multiplying by a power of two until that is sufficient. 3 Use a factor = 0.9 to reduce the allocation if it failed. X 2 Chunk 2 Free memory management Search Insert X 0.9
  • 5. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE When output tensor is allocated 5 • A customized operation (Op) wants to allocate memory for its output tensor
  • 6. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Tensor GPU mem allocation 6 • When a operation try to create a output tensor during its calculation, a tensor gpu memory allocation will happen. • GPUBFCAllocator is an BFC implementation class. Tensor A Allocator* Buffer* BFCAllocator::AllocateRaw() retry_helper_. AllocateRaw() GPUBFCAllocator:: AllocateInternal() BFCAllocate::Extend() DeviceMemory* stream_exec_-> AllocateArray().opaque() • Allocate() • DeviceMemory::MakeFromBy teSize() Suballocator_->Alloc() It’s where the memory allocation happens because GPUMemAllocator inherits SubAllocator. return GPUBFCAllocator StreamExecutor CUDAExecutor CUDADriver - DeviceAllocate() GPU memory StreamExecutor - Allocate()
  • 7. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE StreamExecutor Runtime Library 7 • A unified wrapper around the CUDA and OpenCL host-side programming models (runtimes). • Support cuBlas and cuDNN (tensorflow/stream_executor/blas.h and dnn.h) • It lets host code target either CUDA or OpenCL devices with identically-functioning data-parallel kernels. Stream Executor
  • 8. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE StreamExecutor Runtime Library 8 • Contrast with OpenMP ▪ OpenMP generates both the kernel code that runs on the device and the host-side code needed to launch the kernel ▪ StreamExecutor only generates the host-side code. StreamExecutor StreamExecutorImpl StreamExecutorInterface