Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Dl2 computing gpu
1. deep learning
Algorithms and Applications
Bernardete Ribeiro, bribeiro@dei.uc.pt
University of Coimbra, Portugal
INIT/AERFAI Summer School on Machine Learning, Benicassim 22-26 June 2015
4. outline
∙ Motivation
∙ Graphics Processing Units (GPUs) Computing
∙ Machine Learning (ML) GPU algorithms
∙ Advantages of Open-Source in the ML field
∙ Open-source GPU ML library (GPUMLib)
∙ Overview of GPUMLib algorithms
∙ Conclusions
3
5. motivation
∙ The volume of data is increasing at an exponential rate
Diversity
Data Sources
Low-Cost
Sensors
High-
Bandwidth
Networks
Robotic
systems
High-
capacity
storage
devices
Remote
sensing
Commodity
computing
4
6. data
∙ Nowadays, there are projects that can generate several
petabytes of data per day [Hey et al., 2009]:
∙ Australian Square Kilometre Array of radio telescopes
∙ CERN’s Large Hadron Collider
∙ Pan-STARRS array of celestial telescopes
5
7. big data
∙ Nowadays, there are projects that can generate several
petabytes of data per day [Hey et al., 2009]:
∙ Australian Square Kilometre Array of radio telescopes
∙ CERN’s Large Hadron Collider
∙ Pan-STARRS array of celestial telescopes
6
8. data science
∙ Data is an asset, from which useful and valuable
information can be extracted.
7
9. data science
∙ Data is an asset, from which useful and valuable
information can be extracted.
∙ Science is gradually moving toward being computational
and data centric.
7
10. data science
∙ Data is an asset, from which useful and valuable
information can be extracted.
∙ Science is gradually moving toward being computational
and data centric.
∙ To obtain information represents only a fraction of the time
and effort needed to analyze it.
7
11. challenges
Data sources Real Data
Computer
Simulation Models
Artificial Data
Extract useful
and relevant
information
Large
volumes
of data
vastly exceeds our
capacity to analyze
it
Persistent
repositories of
(accumu-
lated) Data
challenge
8
12. potential solution
Data sources Real Data
Computer
Simulation Models
Artificial Data
Extract useful
and relevant
information
Machine Learning
Algorithms
Large
volume
of data
Persistent
repositories of
(accumu-
lated) Data
9
15. computational resources
∙ Machine Learning (ML) algorithms are computationally
expensive
∙ Their computational requirements are usually proportional
to the amount of data being processed
11
16. computational resources
∙ Machine Learning (ML) algorithms are computationally
expensive
∙ Their computational requirements are usually proportional
to the amount of data being processed
∙ ML algorithms often demand prohibitive computational
resources
11
17. advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
12
18. advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
∙ Toolkits supporting ML software development fail to meet
the expectations in terms of computational performance.
12
19. advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
∙ Toolkits supporting ML software development fail to meet
the expectations in terms of computational performance.
∙ The scientific breakthroughs of the future will undoubtedly
be powered by advanced computing capabilities that will
allow researchers to manipulate and explore massive
datasets [Hey et al., 2009].
12
20. advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
∙ Toolkits supporting ML software development fail to meet
the expectations in terms of computational performance.
∙ The scientific breakthroughs of the future will undoubtedly
be powered by advanced computing capabilities that will
allow researchers to manipulate and explore massive
datasets [Hey et al., 2009].
∙ Pressure to shift development toward high-throughput
parallel architectures (crucial for real-world applications).
12
22. graphical processing units (gpus)
∙ Highly-parallel and
programmable devices
that can be used for
general-purpose
computing applications
[Owens et al., 2008].
2001 2002 2003 2004 2005 2006 2007 2008 2009
Year
0
200
400
600
800
1000
1200
GFLOPS
dual-core quad-core
AMD (GPU)
NVIDIA (GPU)
Intel (CPU)
14
23. gpus strengths
∙ Provide remarkable
performance gains
(compared to CPUs).
∙ Relatively inexpensive
(serve the large gaming
industry).
∙ Availability.
∙ Scalability.
2001 2002 2003 2004 2005 2006 2007 2008 2009
Year
0
200
400
600
800
1000
1200
GFLOPS
dual-core quad-core
AMD (GPU)
NVIDIA (GPU)
Intel (CPU)
15
24. gpu vs cpu performance
Disparity between the GPU and CPU peak floating-point
performance
∙ The GPU performance is
doubled every 12 months
while the CPU
performance doubles
every 18 months
[Zhongwen et al., 2005].
2001 2002 2003 2004 2005 2006 2007 2008 2009
Year
0
200
400
600
800
1000
1200
GFLOPS
dual-core quad-core
AMD (GPU)
NVIDIA (GPU)
Intel (CPU)
16
25. nvidia gpu architecture
Streaming
Multiprocessor
SIMT control
Shared memory
Streaming
Multiprocessor
SIMT control
Shared memory
Streaming
Multiprocessor
SIMT control
Shared memory
· · ·
Thread
scheduling
Hostin-
terface
Memoryinterface
Off-chip memory
DRAM
DRAM
DRAM
· · ·
17
28. speedups
∙ Graphics Processing Units (GPUs) are responsible for
dramatic speedups in a wide range of areas for many
problems.
20
29. speedups
∙ GPUs are responsible for dramatic speedups in a wide
range of areas for many problems.
∙ It is not uncommon to obtain speedups of one or two
orders of magnitude.
∙ Tasks that would take years on the CPU can now be completed
in days.
∙ Weeks of processing can be transformed into hours
[Lopes and Ribeiro, 2009]
∙ Computations that would otherwise take hours can now be
completed in a few seconds.
20
34. machine learning tools
∙ Caffe: Framework for convolutional neural network
algorithms
∙ cuda-convnet: High performance C++/CUDA
implementation of convolutional neural networks
∙ Theano: Python library to define, optimize, and evaluate
mathematical expressions
∙ Torch7: Scientific computing framework for machine
learning algorithms
∙ cuBLAS: GPU-accelerated version of the complete standard
BLAS library
∙ MATLAB: Easy-to-use HPC language integrating
computation, visualization, and programming
∙ GPUMLib: GPU Machine Learning Library
25
35. companies using gpus for machine learning
http://www.nvidia.com/object/machine-learning.html
26
36. ml algorithms in gpu platform
∙ Large computational requirements.
27
37. ml algorithms in gpu platform
∙ Large computational requirements.
∙ Algorithms should present a high-degree of parallelism.
27
38. ml algorithms in gpu platform
∙ Large computational requirements.
∙ Algorithms should present a high-degree of parallelism.
∙ Favor data throughput in detriment of the latency of
individual operations.
27
39. gpu ml implementations
ClosedSourceOpenSource
2004 2005 2006 2007 2008 2009 2010 2011 2012
Multilayer Perceptrons (forward-phase)
Oh and Jung
Self-Organizing Maps
Campbell et al.
Luo et al.
Genetic Algorithms
Wong et al.
Yu et al.
Back-Propagation (two layers)
Steinkrau et al.
Convolutional Neural Networks
Chellapilla et al.
Spiking Neural Networks
Bernhard and Keriven
Belief Propagation
Brunton et al.
Yang et al.
Fuzzy ART neural networks
Martínez-Zarzuela et al.
K-Means Clustering
Shalom et al.
Recurrent networks
Trebatický and Pospíchal
Decision Trees and Forests
Sharp
Neural Network based text detection
Jang et al.
Linear Radial Basis Functions
Brandstetter and Artusi
Deep Belief Networks Sparse Coding
Raina et al.
Back-Propagation (three layers)
Guzhva et al.
Support Vector Machines
Catanzaro et al.
Genetic Algorithms
Langdon and Banzhaf
K-Nearest Neighbor
Garcia et al.
Spiking Neural Networks
Nageswaran et al.
Multiple Back-Propagation
Back-Propagation
Lopes and Ribeiro
Non-negative Matrix
Factorization
Lopes and Ribeiro
28
41. gpu implementations
∙ The number of GPU implementations of ML algorithms has
increased substantially over the last few years.
∙ However, most of the implementations are not openly
shared.
30
42. open source advantages
∙ Better reproducibility of experimental results;
1
[Sonnenburg et al., 2007]
31
43. open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
1
[Sonnenburg et al., 2007]
31
44. open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
1
[Sonnenburg et al., 2007]
31
45. open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
1
[Sonnenburg et al., 2007]
31
46. open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
∙ Innovative applications and easier combination of
advances;
1
[Sonnenburg et al., 2007]
31
47. open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
∙ Innovative applications and easier combination of
advances;
∙ Faster adoption of ML methods in other disciplines and in
industry.
1
[Sonnenburg et al., 2007]
31
48. open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
∙ Innovative applications and easier combination of
advances;
∙ Faster adoption of ML methods in other disciplines and in
industry.
∙ Cooperation among researchers
1
1
[Sonnenburg et al., 2007]
31
52. cuda
∙ represented a major step toward the simplification of the
GPU programming model:
35
53. cuda
∙ represented a major step toward the simplification of the
GPU programming model:
∙ Support for accessible programming interfaces and
industry-standard languages, such as C and C++.
35
54. cuda
∙ represented a major step toward the simplification of the
GPU programming model:
∙ Support for accessible programming interfaces and
industry-standard languages, such as C and C++.
∙ released by NVIDIA in the end of 2006 and since then
numerous GPU implementations, spanning a wide range of
applications, have been developed using this technology.
35
55. cuda
∙ represented a major step toward the simplification of the
GPU programming model:
∙ Support for accessible programming interfaces and
industry-standard languages, such as C and C++.
∙ released by NVIDIA in the end of 2006 and since then
numerous GPU implementations, spanning a wide range of
applications, have been developed using this technology.
∙ While there alternative options, such as the OpenCL, the
Microsoft Directcompute or the AMD Stream, so far CUDA is the
only technology that has achieved wide adoption and usage
[Stamatopoulos et al., 2012].
35
62. neural selective input model (nsim)
x
p
1
x
p
2
x
p
3
r
p
3
x
p
j
y
p
1
y
p
2
wjk
θk
×
multiplier
r
p
j
˜x
p
j
selective input neuron
Physical model
Model 1 when x
p
3 is missing: r
p
3 = 0
x
p
1
x
p
2
Conceptual models
y
p
1
y
p
2
Model 2 when the value of x
p
3 is known: r
p
3 = 1
x
p
1
x
p
2
x
p
3
y
p
1
y
p
2
42
63. resource allocating network with long term memory
x1
x2
x3
x4
z1
z2
Hidden
layer
Input
layer
Output
layer
Generate & StoreRetrieve & Learn
Long-Term Memory
43
80. non-negative matrix factorization (nmf)
H
W≈V
rN samples
Dfeatures
rfeatures
N samples
sample with D
original features
sample with r
new features
basis
vector
60
81. yale and orl image datasets
∙ Yale
∙ Vtrain is composed of 4096 rows (64 × 64 pixels) and 150
columns (face images)
∙ Vtest is composed of 4096 rows and 15 columns.
∙ AT&T (ORL)
∙ Vtrain is composed of 10304 (112 × 92) rows and 360 columns
(face images)
∙ Vtest is composed of 10304 rows and 40 columns.
61
82. time to perform 10,000 nmf iterations on the yale database.
10
100
1000
10000
20 40 60 80 100 120
time(seconds)
r
Vtrain (CPU)
Vtest (CPU)
Vtrain (GPU)
Vtest (GPU)
10s
1m40s
16m40s
3h46m40s
55.6× 82.5× 110.9× 182.3× 251.7×
6.6× 12.9× 21.5× 44.1× 74.1×
62
97. conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
76
98. conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
∙ Experimental results with GPUMLib algorithms show the
potential and usefulness of this library
76
99. conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
∙ Experimental results with GPUMLib algorithms show the
potential and usefulness of this library
∙ Problems involving larger datasets benefit the most from
this architecture.
76
100. conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
∙ Experimental results with GPUMLib algorithms show the
potential and usefulness of this library
∙ Problems involving larger datasets benefit the most from
this architecture.
∙ To promote cooperation among researchers and benefit the
field, open-source GPU ML algorithms are fundamental
76
101. Hey, T., Tansley, S., and Tolle, K., editors (2009).
The Fourth Paradigm: Data-Intensive Scientific Discovery.
Microsoft Research.
Lopes, N. and Ribeiro, B. (2009).
Fast pattern classification of ventricular arrhythmias
using graphics processing units.
In Proceedings of the 14th Iberoamerican Conference on
Pattern Recognition (CIARP 2009), LNCS 5856, pages
603–610. Springer.
Owens, J. D., Houston, M., Luebke, D., Green, S., Stone,
J. E., and Phillips, J. C. (2008).
GPU computing.
Proceedings of the IEEE, 96(5):879–899.
Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou,
L., Holmes, G., LeCun, Y., Müller, K.-R., Pereira, F.,
76
102. Rasmussen, C. E., Rätsch, G., Schölkopf, B., Smola, A.,
Vincent, P., Weston, J., and Williamson, R. C. (2007).
The need for open source software in machine learning.
Journal of Machine Learning Research, 8:2443–2466.
Stamatopoulos, C., Chuang, T. Y., Fraser, C. S., and Lu, Y. Y.
(2012).
Fully automated image orientation in the absence of
targets.
In International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences (XXII ISPRS
Congress), volume Volume XXXIX-B5, pages 303–308.
Zhongwen, L., hongzhi, L., Zhengping, Y., and Xincai, W.
(2005).
Self-organizing maps computing on graphic process unit.
76
103. In Proceedings of the 13th European Symposium on
Artificial Neural Networks, pages 557–562.
76