SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
A Lock-Free Algorithm of Tree-Based
Reduction for Large Scale Clustering on
GPGPU
National Institute of Informatics, Japan
Ruo Ando
2019 2nd International Conference on Artificial
Intelligence and Pattern Recognition
2019 年第二届人工智能和模式识别国际会议
North China University of Technology (NCUT) / 北方工业大学
August 17th, 2019 11.35-12.15
Slideshare version rev.2019.08.19
Abstract
• Recently, the art of concurrency and parallelism
has been advanced rapidly. However, conventional
techniques still suffer of the drawback of lock
contention.
• This talk reports the current situation of massively
parallel computing.
• Based on this situation, a Lock-free technique of
tree-based reduction for large scale clustering on
GPGPU is illustrated.
• In experiment, the performance of native GPU
kernel with atomic instruction, CUDA Thrust
template libraries and proposal method is
compared and evaluated.
Bottlenecks for massive parallelism
• Lock contention: Threads should spend as little
time inside a critical section as possible to reduce
the amount of time other threads sit idle waiting to
acquire the lock, a state known as “lock
contention”.
• Using a multitude of small separate critical
sections introduces system overheads associated
with acquiring and releasing each separate lock.
In many cases, contention for
locks reduces parallel efficiency
and hurts scalability.
❑道生一、一生二、二生三、三生萬物 - 老子
❑ Unreasonable Effectiveness of Data
If a machine learning program cannot work with a training of
a million examples, then the intuitive conclusion follows
that it cannot work at all.
However, it has become clear that machine learning using a
huge dataset with a trillion items can be highly effective in
tasks for which machine learning using a sanitized (clean)
dataset with a only million items is NOT useful.
Chen Sun, Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta, “Revisiting
Unreasonable Effectiveness of Data in Deep Learning Era”, ICCV 2017
https://arxiv.org/abs/1707.02968
Scalability
Reduction pattern
A reduction combines every element
in a collection into a single element
using an associative combiner
function.
Given the associativity of the
combiner function, many Different
ordering are possible, but with
different spans.
If the combiner function is also
commutative, additional Orderings
are possible.
Tree structure depends on a
reordering of the combiner
Operations by associativity.
"2019/07/02 00:00:00.867","841","25846”
"2019/07/02 03:03:00.511","784","52326”
"2019/07/02 00:00:00.867",“700",“40000”
"2019/07/02 11:11:37.872","336","50346”
"2019/07/02 00:00:00.867",“1541",“65846”
Proposal method(2) - large scale clustering
• Fine reduction
- New cluster assignment
- Calculating sums of each cluster
const int fine_shared_memory = 3 * threads * sizeof(float);
fine_reduce<<<blocks, threads, fine_shared_memory>>>;
• Coarse reduction
- Calculating centroids (new means)
const int coarse_shared_memory = 2 * k * blocks * sizeof(float);
coarse_reduce<<<1, k * blocks, coarse_shared_memory>>>
Overview and Grid layout
threads threadsthreads threads
Fine
Coarse
fine_reduce<<<blocks, threads, fine_shared_memory>>>
coarse_reduce<<<1, k * blocks, coarse_shared_memory>>>
blocks
X
Y
Input and output: fine reduction
fine_reduce<<<blocks, threads, fine_shared_memory>>>
Shared memory layout - Fine reduction
threads threadsthreads threads
Fine
fine_reduce<<<blocks, threads, fine_shared_memory>>>
blocks
X
Y
const int fine_shared_memory = 3 * threads * sizeof(float);
1
0
0
1 3 42
blockID = 1
1
0
0
1 3 42
blockID = 2
1
0
0
1 3 42
blockID = 3
1
0
0
1 3 42
blockID = 4
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
44
42
1
3
0
clusters(5)
coarse_reduce<<<1, k * blocks, coarse_shared_memory>>>
Input and output: coarse reduction
Shared memory layout - coarse reduction
blocks
coarse_reduce<<<1, k * blocks, coarse_shared_memory>>>
k
X
Y
const int coarse_shared_memory = 2 * k * blocks * sizeof(float);
Experimental results
① By using the atomicAdd function, programmer can rewrite the incr kenrel.
This instruction atomically add a value V[i] to the value stored at memory
location M.
__global__ void incr(__global__ int *ptr) { int temp = atomicAdd( ptr, 1); }
② Thrust provides two vector containers, host_vector and device_vector. The
host_vector is stored in host memory while device_vector lives in GPU device
memory
③ Reductions in serial execution like the averaging performed during the update step
scale linearly. However, parallel reductions can be implemented efficiently by using
two-stage tree-reduction: fine and coarse reduction. Key point in fine-coarse
reduction is averaging is not performed over all our data. Instead, for each cluster,
the points assigned to each cluster should be averaged.
thrust::device_vector<float> d_mean_x(h_x.begin(), h_x.begin() + k);
thrust::device_vector<float> d_mean_y(h_y.begin(), h_y.begin() + k);
Experimental results 1
Conclusion
• Recently, the art of concurrency and parallelism
has been advanced rapidly. However, conventional
techniques still suffer of the drawback of lock
contention.
• This talk reports the current situation of massively
parallel computing.
• Based on this situation, a Lock-free technique of
tree-based reduction for large scale clustering on
GPGPU is illustrated.
• In experiment, the performance of native GPU
kernel with atomic instruction, CUDA Thrust
template libraries and proposal method is
compared and evaluated.

Contenu connexe

Tendances

Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo Summit
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
 
Products go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product LinesProducts go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product LinesGreenLabAtDI
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain
 
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYSPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYcsandit
 
Advanced Techniques for Mobile Robotics
Advanced Techniques for Mobile RoboticsAdvanced Techniques for Mobile Robotics
Advanced Techniques for Mobile RoboticsPrasanth Jaya
 
Reversible data hiding with optimal value transfer
Reversible data hiding with optimal value transferReversible data hiding with optimal value transfer
Reversible data hiding with optimal value transferJPINFOTECH JAYAPRAKASH
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Jyotirmoy Sundi
 
[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...LeapMind Inc
 
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...
[Icml2019] mix hop higher-order graph convolutional architectures via spars...LeapMind Inc
 
Function computation over heterogeneous
Function computation over heterogeneousFunction computation over heterogeneous
Function computation over heterogeneousNexgen Technology
 
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPURaul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPUEduardo Gaspar
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf
 
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitMeetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitDigipolis Antwerpen
 
Robust foreground modelling to segment and detect multiple moving objects in ...
Robust foreground modelling to segment and detect multiple moving objects in ...Robust foreground modelling to segment and detect multiple moving objects in ...
Robust foreground modelling to segment and detect multiple moving objects in ...IJECEIAES
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부seungwoo kim
 

Tendances (20)

Hpc with qpu
Hpc with qpuHpc with qpu
Hpc with qpu
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
 
Products go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product LinesProducts go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product Lines
 
Optimal Power System Planning with Renewable DGs with Reactive Power Consider...
Optimal Power System Planning with Renewable DGs with Reactive Power Consider...Optimal Power System Planning with Renewable DGs with Reactive Power Consider...
Optimal Power System Planning with Renewable DGs with Reactive Power Consider...
 
solar air heater Using ANN
solar air heater Using ANNsolar air heater Using ANN
solar air heater Using ANN
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
 
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYSPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
 
Advanced Techniques for Mobile Robotics
Advanced Techniques for Mobile RoboticsAdvanced Techniques for Mobile Robotics
Advanced Techniques for Mobile Robotics
 
Reversible data hiding with optimal value transfer
Reversible data hiding with optimal value transferReversible data hiding with optimal value transfer
Reversible data hiding with optimal value transfer
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
 
[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...
 
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...[Icml2019]  mix hop  higher-order graph convolutional architectures via spars...
[Icml2019] mix hop higher-order graph convolutional architectures via spars...
 
Function computation over heterogeneous
Function computation over heterogeneousFunction computation over heterogeneous
Function computation over heterogeneous
 
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPURaul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
 
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitMeetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
 
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
 
Robust foreground modelling to segment and detect multiple moving objects in ...
Robust foreground modelling to segment and detect multiple moving objects in ...Robust foreground modelling to segment and detect multiple moving objects in ...
Robust foreground modelling to segment and detect multiple moving objects in ...
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부
 

Similaire à A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on GPGPU

Optimum capacity allocation of distributed generation
Optimum capacity allocation of distributed generationOptimum capacity allocation of distributed generation
Optimum capacity allocation of distributed generationeSAT Publishing House
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Jameson Toole
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
 
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...TELKOMNIKA JOURNAL
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...eSAT Publishing House
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...eSAT Journals
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
DEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONDEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONIRJET Journal
 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...IJECEIAES
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationParallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationIJERA Editor
 
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...IJECEIAES
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...ijassn
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...ijassn
 
Compact optimized deep learning model for edge: a review
Compact optimized deep learning model for edge: a reviewCompact optimized deep learning model for edge: a review
Compact optimized deep learning model for edge: a reviewIJECEIAES
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...IJECEIAES
 
A survey on the layers of convolutional Neural Network
A survey on the layers of convolutional Neural NetworkA survey on the layers of convolutional Neural Network
A survey on the layers of convolutional Neural NetworkSasanko Sekhar Gantayat
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016ijcsbi
 

Similaire à A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on GPGPU (20)

E035425030
E035425030E035425030
E035425030
 
Optimum capacity allocation of distributed generation
Optimum capacity allocation of distributed generationOptimum capacity allocation of distributed generation
Optimum capacity allocation of distributed generation
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
Stochastic Computing Correlation Utilization in Convolutional Neural Network ...
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
DEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONDEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTION
 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
 
Parallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix MultiplicationParallel Processing Technique for Time Efficient Matrix Multiplication
Parallel Processing Technique for Time Efficient Matrix Multiplication
 
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
 
Compact optimized deep learning model for edge: a review
Compact optimized deep learning model for edge: a reviewCompact optimized deep learning model for edge: a review
Compact optimized deep learning model for edge: a review
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...
 
A survey on the layers of convolutional Neural Network
A survey on the layers of convolutional Neural NetworkA survey on the layers of convolutional Neural Network
A survey on the layers of convolutional Neural Network
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 

Plus de Ruo Ando

KISTI-NII Joint Security Workshop 2023.pdf
KISTI-NII Joint Security Workshop 2023.pdfKISTI-NII Joint Security Workshop 2023.pdf
KISTI-NII Joint Security Workshop 2023.pdfRuo Ando
 
Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤
Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤
Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤Ruo Ando
 
解説#86 決定木 - ss.pdf
解説#86 決定木 - ss.pdf解説#86 決定木 - ss.pdf
解説#86 決定木 - ss.pdfRuo Ando
 
SaaSアカデミー for バックオフィス アイドルと学ぶDX講座 ~アイドル戦略に見るDXを専門家が徹底解説~
SaaSアカデミー for バックオフィス アイドルと学ぶDX講座  ~アイドル戦略に見るDXを専門家が徹底解説~SaaSアカデミー for バックオフィス アイドルと学ぶDX講座  ~アイドル戦略に見るDXを専門家が徹底解説~
SaaSアカデミー for バックオフィス アイドルと学ぶDX講座 ~アイドル戦略に見るDXを専門家が徹底解説~Ruo Ando
 
解説#83 情報エントロピー
解説#83 情報エントロピー解説#83 情報エントロピー
解説#83 情報エントロピーRuo Ando
 
解説#82 記号論理学
解説#82 記号論理学解説#82 記号論理学
解説#82 記号論理学Ruo Ando
 
解説#81 ロジスティック回帰
解説#81 ロジスティック回帰解説#81 ロジスティック回帰
解説#81 ロジスティック回帰Ruo Ando
 
解説#74 連結リスト
解説#74 連結リスト解説#74 連結リスト
解説#74 連結リストRuo Ando
 
解説#76 福岡正信
解説#76 福岡正信解説#76 福岡正信
解説#76 福岡正信Ruo Ando
 
解説#77 非加算無限
解説#77 非加算無限解説#77 非加算無限
解説#77 非加算無限Ruo Ando
 
解説#1 C言語ポインタとアドレス
解説#1 C言語ポインタとアドレス解説#1 C言語ポインタとアドレス
解説#1 C言語ポインタとアドレスRuo Ando
 
解説#78 誤差逆伝播
解説#78 誤差逆伝播解説#78 誤差逆伝播
解説#78 誤差逆伝播Ruo Ando
 
解説#73 ハフマン符号
解説#73 ハフマン符号解説#73 ハフマン符号
解説#73 ハフマン符号Ruo Ando
 
【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法Ruo Ando
 
【技術解説4】assertion failureとuse after-free
【技術解説4】assertion failureとuse after-free【技術解説4】assertion failureとuse after-free
【技術解説4】assertion failureとuse after-freeRuo Ando
 
ITmedia Security Week 2021 講演資料
ITmedia Security Week 2021 講演資料 ITmedia Security Week 2021 講演資料
ITmedia Security Week 2021 講演資料 Ruo Ando
 
ファジングの解説
ファジングの解説ファジングの解説
ファジングの解説Ruo Ando
 
AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月
AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月
AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月Ruo Ando
 
【AI実装4】TensorFlowのプログラムを読む2 非線形回帰
【AI実装4】TensorFlowのプログラムを読む2 非線形回帰【AI実装4】TensorFlowのプログラムを読む2 非線形回帰
【AI実装4】TensorFlowのプログラムを読む2 非線形回帰Ruo Ando
 
Intel Trusted Computing Group 1st Workshop
Intel Trusted Computing Group 1st WorkshopIntel Trusted Computing Group 1st Workshop
Intel Trusted Computing Group 1st WorkshopRuo Ando
 

Plus de Ruo Ando (20)

KISTI-NII Joint Security Workshop 2023.pdf
KISTI-NII Joint Security Workshop 2023.pdfKISTI-NII Joint Security Workshop 2023.pdf
KISTI-NII Joint Security Workshop 2023.pdf
 
Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤
Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤
Gartner 「セキュリティ&リスクマネジメントサミット 2019」- 安藤
 
解説#86 決定木 - ss.pdf
解説#86 決定木 - ss.pdf解説#86 決定木 - ss.pdf
解説#86 決定木 - ss.pdf
 
SaaSアカデミー for バックオフィス アイドルと学ぶDX講座 ~アイドル戦略に見るDXを専門家が徹底解説~
SaaSアカデミー for バックオフィス アイドルと学ぶDX講座  ~アイドル戦略に見るDXを専門家が徹底解説~SaaSアカデミー for バックオフィス アイドルと学ぶDX講座  ~アイドル戦略に見るDXを専門家が徹底解説~
SaaSアカデミー for バックオフィス アイドルと学ぶDX講座 ~アイドル戦略に見るDXを専門家が徹底解説~
 
解説#83 情報エントロピー
解説#83 情報エントロピー解説#83 情報エントロピー
解説#83 情報エントロピー
 
解説#82 記号論理学
解説#82 記号論理学解説#82 記号論理学
解説#82 記号論理学
 
解説#81 ロジスティック回帰
解説#81 ロジスティック回帰解説#81 ロジスティック回帰
解説#81 ロジスティック回帰
 
解説#74 連結リスト
解説#74 連結リスト解説#74 連結リスト
解説#74 連結リスト
 
解説#76 福岡正信
解説#76 福岡正信解説#76 福岡正信
解説#76 福岡正信
 
解説#77 非加算無限
解説#77 非加算無限解説#77 非加算無限
解説#77 非加算無限
 
解説#1 C言語ポインタとアドレス
解説#1 C言語ポインタとアドレス解説#1 C言語ポインタとアドレス
解説#1 C言語ポインタとアドレス
 
解説#78 誤差逆伝播
解説#78 誤差逆伝播解説#78 誤差逆伝播
解説#78 誤差逆伝播
 
解説#73 ハフマン符号
解説#73 ハフマン符号解説#73 ハフマン符号
解説#73 ハフマン符号
 
【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法【技術解説20】 ミニバッチ確率的勾配降下法
【技術解説20】 ミニバッチ確率的勾配降下法
 
【技術解説4】assertion failureとuse after-free
【技術解説4】assertion failureとuse after-free【技術解説4】assertion failureとuse after-free
【技術解説4】assertion failureとuse after-free
 
ITmedia Security Week 2021 講演資料
ITmedia Security Week 2021 講演資料 ITmedia Security Week 2021 講演資料
ITmedia Security Week 2021 講演資料
 
ファジングの解説
ファジングの解説ファジングの解説
ファジングの解説
 
AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月
AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月
AI(機械学習・深層学習)との協働スキルとOperational AIの事例紹介 @ ビジネス+ITセミナー 2020年11月
 
【AI実装4】TensorFlowのプログラムを読む2 非線形回帰
【AI実装4】TensorFlowのプログラムを読む2 非線形回帰【AI実装4】TensorFlowのプログラムを読む2 非線形回帰
【AI実装4】TensorFlowのプログラムを読む2 非線形回帰
 
Intel Trusted Computing Group 1st Workshop
Intel Trusted Computing Group 1st WorkshopIntel Trusted Computing Group 1st Workshop
Intel Trusted Computing Group 1st Workshop
 

Dernier

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 

Dernier (20)

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 

A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on GPGPU

  • 1. A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on GPGPU National Institute of Informatics, Japan Ruo Ando 2019 2nd International Conference on Artificial Intelligence and Pattern Recognition 2019 年第二届人工智能和模式识别国际会议 North China University of Technology (NCUT) / 北方工业大学 August 17th, 2019 11.35-12.15 Slideshare version rev.2019.08.19
  • 2. Abstract • Recently, the art of concurrency and parallelism has been advanced rapidly. However, conventional techniques still suffer of the drawback of lock contention. • This talk reports the current situation of massively parallel computing. • Based on this situation, a Lock-free technique of tree-based reduction for large scale clustering on GPGPU is illustrated. • In experiment, the performance of native GPU kernel with atomic instruction, CUDA Thrust template libraries and proposal method is compared and evaluated.
  • 3. Bottlenecks for massive parallelism • Lock contention: Threads should spend as little time inside a critical section as possible to reduce the amount of time other threads sit idle waiting to acquire the lock, a state known as “lock contention”. • Using a multitude of small separate critical sections introduces system overheads associated with acquiring and releasing each separate lock. In many cases, contention for locks reduces parallel efficiency and hurts scalability.
  • 4. ❑道生一、一生二、二生三、三生萬物 - 老子 ❑ Unreasonable Effectiveness of Data If a machine learning program cannot work with a training of a million examples, then the intuitive conclusion follows that it cannot work at all. However, it has become clear that machine learning using a huge dataset with a trillion items can be highly effective in tasks for which machine learning using a sanitized (clean) dataset with a only million items is NOT useful. Chen Sun, Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta, “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”, ICCV 2017 https://arxiv.org/abs/1707.02968 Scalability
  • 5. Reduction pattern A reduction combines every element in a collection into a single element using an associative combiner function. Given the associativity of the combiner function, many Different ordering are possible, but with different spans. If the combiner function is also commutative, additional Orderings are possible. Tree structure depends on a reordering of the combiner Operations by associativity. "2019/07/02 00:00:00.867","841","25846” "2019/07/02 03:03:00.511","784","52326” "2019/07/02 00:00:00.867",“700",“40000” "2019/07/02 11:11:37.872","336","50346” "2019/07/02 00:00:00.867",“1541",“65846”
  • 6. Proposal method(2) - large scale clustering • Fine reduction - New cluster assignment - Calculating sums of each cluster const int fine_shared_memory = 3 * threads * sizeof(float); fine_reduce<<<blocks, threads, fine_shared_memory>>>; • Coarse reduction - Calculating centroids (new means) const int coarse_shared_memory = 2 * k * blocks * sizeof(float); coarse_reduce<<<1, k * blocks, coarse_shared_memory>>>
  • 7. Overview and Grid layout threads threadsthreads threads Fine Coarse fine_reduce<<<blocks, threads, fine_shared_memory>>> coarse_reduce<<<1, k * blocks, coarse_shared_memory>>> blocks X Y
  • 8. Input and output: fine reduction fine_reduce<<<blocks, threads, fine_shared_memory>>>
  • 9. Shared memory layout - Fine reduction threads threadsthreads threads Fine fine_reduce<<<blocks, threads, fine_shared_memory>>> blocks X Y const int fine_shared_memory = 3 * threads * sizeof(float);
  • 10. 1 0 0 1 3 42 blockID = 1 1 0 0 1 3 42 blockID = 2 1 0 0 1 3 42 blockID = 3 1 0 0 1 3 42 blockID = 4 1 0 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 0 44 42 1 3 0 clusters(5) coarse_reduce<<<1, k * blocks, coarse_shared_memory>>> Input and output: coarse reduction
  • 11. Shared memory layout - coarse reduction blocks coarse_reduce<<<1, k * blocks, coarse_shared_memory>>> k X Y const int coarse_shared_memory = 2 * k * blocks * sizeof(float);
  • 12. Experimental results ① By using the atomicAdd function, programmer can rewrite the incr kenrel. This instruction atomically add a value V[i] to the value stored at memory location M. __global__ void incr(__global__ int *ptr) { int temp = atomicAdd( ptr, 1); } ② Thrust provides two vector containers, host_vector and device_vector. The host_vector is stored in host memory while device_vector lives in GPU device memory ③ Reductions in serial execution like the averaging performed during the update step scale linearly. However, parallel reductions can be implemented efficiently by using two-stage tree-reduction: fine and coarse reduction. Key point in fine-coarse reduction is averaging is not performed over all our data. Instead, for each cluster, the points assigned to each cluster should be averaged. thrust::device_vector<float> d_mean_x(h_x.begin(), h_x.begin() + k); thrust::device_vector<float> d_mean_y(h_y.begin(), h_y.begin() + k);
  • 14. Conclusion • Recently, the art of concurrency and parallelism has been advanced rapidly. However, conventional techniques still suffer of the drawback of lock contention. • This talk reports the current situation of massively parallel computing. • Based on this situation, a Lock-free technique of tree-based reduction for large scale clustering on GPGPU is illustrated. • In experiment, the performance of native GPU kernel with atomic instruction, CUDA Thrust template libraries and proposal method is compared and evaluated.