SlideShare une entreprise Scribd logo
1  sur  103
Télécharger pour lire hors ligne
deep learning
Algorithms and Applications
Bernardete Ribeiro, bribeiro@dei.uc.pt
University of Coimbra, Portugal
INIT/AERFAI Summer School on Machine Learning, Benicassim 22-26 June 2015
V - GPU Computing for Machine Learning
1
elements 5: gpu computing
outline
∙ Motivation
∙ Graphics Processing Units (GPUs) Computing
∙ Machine Learning (ML) GPU algorithms
∙ Advantages of Open-Source in the ML field
∙ Open-source GPU ML library (GPUMLib)
∙ Overview of GPUMLib algorithms
∙ Conclusions
3
motivation
∙ The volume of data is increasing at an exponential rate
Diversity
Data Sources
Low-Cost
Sensors
High-
Bandwidth
Networks
Robotic
systems
High-
capacity
storage
devices
Remote
sensing
Commodity
computing
4
data
∙ Nowadays, there are projects that can generate several
petabytes of data per day [Hey et al., 2009]:
∙ Australian Square Kilometre Array of radio telescopes
∙ CERN’s Large Hadron Collider
∙ Pan-STARRS array of celestial telescopes
5
big data
∙ Nowadays, there are projects that can generate several
petabytes of data per day [Hey et al., 2009]:
∙ Australian Square Kilometre Array of radio telescopes
∙ CERN’s Large Hadron Collider
∙ Pan-STARRS array of celestial telescopes
6
data science
∙ Data is an asset, from which useful and valuable
information can be extracted.
7
data science
∙ Data is an asset, from which useful and valuable
information can be extracted.
∙ Science is gradually moving toward being computational
and data centric.
7
data science
∙ Data is an asset, from which useful and valuable
information can be extracted.
∙ Science is gradually moving toward being computational
and data centric.
∙ To obtain information represents only a fraction of the time
and effort needed to analyze it.
7
challenges
Data sources Real Data
Computer
Simulation Models
Artificial Data
Extract useful
and relevant
information
Large
volumes
of data
vastly exceeds our
capacity to analyze
it
Persistent
repositories of
(accumu-
lated) Data
challenge
8
potential solution
Data sources Real Data
Computer
Simulation Models
Artificial Data
Extract useful
and relevant
information
Machine Learning
Algorithms
Large
volume
of data
Persistent
repositories of
(accumu-
lated) Data
9
Machine Learning (ML)
10
computational resources
∙ Machine Learning (ML) algorithms are computationally
expensive
11
computational resources
∙ Machine Learning (ML) algorithms are computationally
expensive
∙ Their computational requirements are usually proportional
to the amount of data being processed
11
computational resources
∙ Machine Learning (ML) algorithms are computationally
expensive
∙ Their computational requirements are usually proportional
to the amount of data being processed
∙ ML algorithms often demand prohibitive computational
resources
11
advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
12
advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
∙ Toolkits supporting ML software development fail to meet
the expectations in terms of computational performance.
12
advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
∙ Toolkits supporting ML software development fail to meet
the expectations in terms of computational performance.
∙ The scientific breakthroughs of the future will undoubtedly
be powered by advanced computing capabilities that will
allow researchers to manipulate and explore massive
datasets [Hey et al., 2009].
12
advanced computing
∙ Problems are becoming increasingly challenging and
demanding (in some cases intractable by traditional CPU
architectures).
∙ Toolkits supporting ML software development fail to meet
the expectations in terms of computational performance.
∙ The scientific breakthroughs of the future will undoubtedly
be powered by advanced computing capabilities that will
allow researchers to manipulate and explore massive
datasets [Hey et al., 2009].
∙ Pressure to shift development toward high-throughput
parallel architectures (crucial for real-world applications).
12
computing with graphical processing units (gpus)
graphical processing units (gpus)
∙ Highly-parallel and
programmable devices
that can be used for
general-purpose
computing applications
[Owens et al., 2008].
2001 2002 2003 2004 2005 2006 2007 2008 2009
Year
0
200
400
600
800
1000
1200
GFLOPS
dual-core quad-core
AMD (GPU)
NVIDIA (GPU)
Intel (CPU)
14
gpus strengths
∙ Provide remarkable
performance gains
(compared to CPUs).
∙ Relatively inexpensive
(serve the large gaming
industry).
∙ Availability.
∙ Scalability.
2001 2002 2003 2004 2005 2006 2007 2008 2009
Year
0
200
400
600
800
1000
1200
GFLOPS
dual-core quad-core
AMD (GPU)
NVIDIA (GPU)
Intel (CPU)
15
gpu vs cpu performance
Disparity between the GPU and CPU peak floating-point
performance
∙ The GPU performance is
doubled every 12 months
while the CPU
performance doubles
every 18 months
[Zhongwen et al., 2005].
2001 2002 2003 2004 2005 2006 2007 2008 2009
Year
0
200
400
600
800
1000
1200
GFLOPS
dual-core quad-core
AMD (GPU)
NVIDIA (GPU)
Intel (CPU)
16
nvidia gpu architecture
Streaming
Multiprocessor
SIMT control
Shared memory
Streaming
Multiprocessor
SIMT control
Shared memory
Streaming
Multiprocessor
SIMT control
Shared memory
· · ·
Thread
scheduling
Hostin-
terface
Memoryinterface
Off-chip memory
DRAM
DRAM
DRAM
· · ·
17
Streaming Multiprocessor (SM)
Intruction Cache
Register File (32, 768 × 32-bit)
Warp Scheduler
Dispatch Unit
Warp Scheduler
Dispatch Unit
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
SP
core
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
LD/ST
SFU
SFU
SFU
SFU
Interconnect Network
64 KB Shared Memory / L1 Cache
Uniform Cache
Scalar Processor (SP) core
Dispatch Port
Operand Collector
floating-point
unit
integer
unit
Result Queue
18
speedups with gpus
speedups
∙ Graphics Processing Units (GPUs) are responsible for
dramatic speedups in a wide range of areas for many
problems.
20
speedups
∙ GPUs are responsible for dramatic speedups in a wide
range of areas for many problems.
∙ It is not uncommon to obtain speedups of one or two
orders of magnitude.
∙ Tasks that would take years on the CPU can now be completed
in days.
∙ Weeks of processing can be transformed into hours
[Lopes and Ribeiro, 2009]
∙ Computations that would otherwise take hours can now be
completed in a few seconds.
20
one-order of magnitude
year 12 days
month 1 day
week 5:36 hours
day 48 minutes
hour 2 minutes
30×
21
two-orders of magnitude
year 29 hours
month 2:24 hours
week 34 minutes
day 5 minutes
hour 12 seconds
300×
22
Intractable Problems
Become Manageable
23
gpu applications
http://www.nvidia.com/object/cuda-apps-flash-new.html
24
machine learning tools
∙ Caffe: Framework for convolutional neural network
algorithms
∙ cuda-convnet: High performance C++/CUDA
implementation of convolutional neural networks
∙ Theano: Python library to define, optimize, and evaluate
mathematical expressions
∙ Torch7: Scientific computing framework for machine
learning algorithms
∙ cuBLAS: GPU-accelerated version of the complete standard
BLAS library
∙ MATLAB: Easy-to-use HPC language integrating
computation, visualization, and programming
∙ GPUMLib: GPU Machine Learning Library
25
companies using gpus for machine learning
http://www.nvidia.com/object/machine-learning.html
26
ml algorithms in gpu platform
∙ Large computational requirements.
27
ml algorithms in gpu platform
∙ Large computational requirements.
∙ Algorithms should present a high-degree of parallelism.
27
ml algorithms in gpu platform
∙ Large computational requirements.
∙ Algorithms should present a high-degree of parallelism.
∙ Favor data throughput in detriment of the latency of
individual operations.
27
gpu ml implementations
ClosedSourceOpenSource
2004 2005 2006 2007 2008 2009 2010 2011 2012
Multilayer Perceptrons (forward-phase)
Oh and Jung
Self-Organizing Maps
Campbell et al.
Luo et al.
Genetic Algorithms
Wong et al.
Yu et al.
Back-Propagation (two layers)
Steinkrau et al.
Convolutional Neural Networks
Chellapilla et al.
Spiking Neural Networks
Bernhard and Keriven
Belief Propagation
Brunton et al.
Yang et al.
Fuzzy ART neural networks
Martínez-Zarzuela et al.
K-Means Clustering
Shalom et al.
Recurrent networks
Trebatický and Pospíchal
Decision Trees and Forests
Sharp
Neural Network based text detection
Jang et al.
Linear Radial Basis Functions
Brandstetter and Artusi
Deep Belief Networks Sparse Coding
Raina et al.
Back-Propagation (three layers)
Guzhva et al.
Support Vector Machines
Catanzaro et al.
Genetic Algorithms
Langdon and Banzhaf
K-Nearest Neighbor
Garcia et al.
Spiking Neural Networks
Nageswaran et al.
Multiple Back-Propagation
Back-Propagation
Lopes and Ribeiro
Non-negative Matrix
Factorization
Lopes and Ribeiro
28
Open Source
29
gpu implementations
∙ The number of GPU implementations of ML algorithms has
increased substantially over the last few years.
∙ However, most of the implementations are not openly
shared.
30
open source advantages
∙ Better reproducibility of experimental results;
1
[Sonnenburg et al., 2007]
31
open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
1
[Sonnenburg et al., 2007]
31
open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
1
[Sonnenburg et al., 2007]
31
open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
1
[Sonnenburg et al., 2007]
31
open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
∙ Innovative applications and easier combination of
advances;
1
[Sonnenburg et al., 2007]
31
open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
∙ Innovative applications and easier combination of
advances;
∙ Faster adoption of ML methods in other disciplines and in
industry.
1
[Sonnenburg et al., 2007]
31
open source advantages
∙ Better reproducibility of experimental results;
∙ Fair comparison of algorithms;
∙ Quicker detection of errors;
∙ Quicker adoption of algorithms;
∙ Innovative applications and easier combination of
advances;
∙ Faster adoption of ML methods in other disciplines and in
industry.
∙ Cooperation among researchers
1
1
[Sonnenburg et al., 2007]
31
GPUMLib - GPU Machine Learning Library
http://gpumlib.sourceforge.net/
32
gpumlib – http://gpumlib.sourceforge.net/
Host (CPU) and device (GPU) memory access framework
HostArray HostMatrix CudaArray
DeviceArray DeviceMatrix · · ·
C++ classes (algorithms)
Back-
Propagation
Radial Basis
Functions
Deep Belief
Networks
Restricted
Boltzmann
Machines
Multiple
Back-
Propagation
Support
Vector
Machines
Non-Negative
Matrix
Factorization
· · ·
Common
Host (CPU)
Classes
Common
CUDA
Kernels
CUDA (GPU) Kernels
Multiple
Back-
Propagation
Support
Vector
Machines
Non-Negative
Matrix
Factorization
k-Nearest
Neighbor
Radial Basis
Functions
Restricted
Boltzmann
Machines
Genetic
Algorithms
· · ·
Common
Device
(GPU)
Functions
33
CUDA (Compute Unified Device Architecture)
34
cuda
∙ represented a major step toward the simplification of the
GPU programming model:
35
cuda
∙ represented a major step toward the simplification of the
GPU programming model:
∙ Support for accessible programming interfaces and
industry-standard languages, such as C and C++.
35
cuda
∙ represented a major step toward the simplification of the
GPU programming model:
∙ Support for accessible programming interfaces and
industry-standard languages, such as C and C++.
∙ released by NVIDIA in the end of 2006 and since then
numerous GPU implementations, spanning a wide range of
applications, have been developed using this technology.
35
cuda
∙ represented a major step toward the simplification of the
GPU programming model:
∙ Support for accessible programming interfaces and
industry-standard languages, such as C and C++.
∙ released by NVIDIA in the end of 2006 and since then
numerous GPU implementations, spanning a wide range of
applications, have been developed using this technology.
∙ While there alternative options, such as the OpenCL, the
Microsoft Directcompute or the AMD Stream, so far CUDA is the
only technology that has achieved wide adoption and usage
[Stamatopoulos et al., 2012].
35
Hardware and Software
36
systems’s main characteristics
Main Characteristics
System 1 (8600 GT) Intel Core 2 6600 (2.4GHz)
NVIDIA GeForce 8600 GT
Windows Vista (x64)
4GB memory
System 2 (GTX 280) Intel Core 2 Quad Q 9300 (2.5GHz)
NVIDIA GeForce GTX 280
Windows 7 (x64)
4GB memory
System 3 (GTX 460) Intel Dual-Core i5-2410M (2.7GHz)
NVIDIA GeForce GTX 460
Windows 7 (x64)
8GB memory
37
GPUMLib Algorithms Overview
38
neural networks
back-propagation (bp)
x1
x2
x3
x4
x5
y1
y2
y3
40
multiple back-propagation (mbp)
xp
1
×
xp
2 ×
xp
3
×
yp
1×
yp
2×
Space Network
Main Network with selec-
tive activation neurons
41
neural selective input model (nsim)
x
p
1
x
p
2
x
p
3
r
p
3
x
p
j
y
p
1
y
p
2
wjk
θk
×
multiplier
r
p
j
˜x
p
j
selective input neuron
Physical model
Model 1 when x
p
3 is missing: r
p
3 = 0
x
p
1
x
p
2
Conceptual models
y
p
1
y
p
2
Model 2 when the value of x
p
3 is known: r
p
3 = 1
x
p
1
x
p
2
x
p
3
y
p
1
y
p
2
42
resource allocating network with long term memory
x1
x2
x3
x4
z1
z2
Hidden
layer
Input
layer
Output
layer
Generate & StoreRetrieve & Learn
Long-Term Memory
43
Datasets
44
uci benchmarks
Dataset (Benchmark) Samples Features Classes
f(x) = sin(x)/x 101 1 1
Two-spirals 194 2 1
Sonar 104 60 1
Covertype 11340 54 7
Poker Hand 25010 85 10
Ventricular Arrhythmia’s 19391 18 1
Yale face database 120 45 15
45
Performance
46
speedup (×) for the f(x) = sin(x)/x problem.
Topology Nh 8600 GT GTX 280
FF/BP
7 3.98 ± 0.19 5.48 ± 0.21
9 4.66 ± 0.16 7.15 ± 0.13
11 5.44 ± 0.13 8.43 ± 0.15
MFF/MBP
5 4.37 ± 0.10 6.08 ± 0.13
7 5.73 ± 0.07 7.99 ± 0.10
9 6.77 ± 0.09 10.24 ± 0.12
47
speedup (×) for the two-spirals problem.
Topology Nh 8600 GT GTX 280
FF/BP
25 7.68 ± 1.01 32.84 ± 4.78
30 7.96 ± 0.68 39.22 ± 3.36
35 7.55 ± 0.42 39.61 ± 2.49
MFF/MBP
15 9.89 ± 0.76 32.85 ± 2.59
20 9.75 ± 0.16 38.10 ± 0.70
25 10.01 ± 0.31 42.98 ± 1.27
48
two-spirals nns training time (mbp algorithm)
10
100
1000
15 20 25
Time(s)
First Hidden Layer Neurons
GTX 280
8600 GT
Core 2 6600
49
speedup (×) for the covertype problem.
Topology Nh 8600 GT GTX 280
FF/BP
12 8.33 ± 0.24 59.67 ± 0.50
24 8.27 ± 0.78 56.37 ± 0.53
60 7.51 ± 0.07 57.87 ± 0.53
120 8.62 ± 0.04 57.92 ± 0.26
180 9.00 ± 0.38 58.40 ± 2.74
240 10.40 ± 0.12 62.88 ± 0.53
300 18.18 ± 0.11 112.56 ± 0.69
MFF/MBP
12 8.63 ± 0.16 64.59 ± 0.87
24 8.20 ± 0.09 59.63 ± 0.74
60 8.09 ± 0.32 60.08 ± 2.41
120 8.92 ± 0.09 59.00 ± 0.57
180 19.21 ± 0.15 121.95 ± 0.75
240 23.85 ± 0.11 141.83 ± 0.52
300 25.35 ± 0.37 153.23 ± 2.22 50
speedup (×) for the poker problem.
Topology Nh 8600 GT GTX 280
FF/BP
12 8.35 ± 0.07 57.49 ± 0.35
24 8.05 ± 2.15 54.79 ± 0.40
60 8.73 ± 0.05 59.51 ± 0.44
120 8.98 ± 0.08 57.55 ± 0.35
180 17.80 ± 0.20 111.30 ± 0.36
240 27.41 ± 0.50 159.03 ± 2.71
300 29.03 ± 1.58 174.91 ± 9.50
MFF/MBP
12 8.61 ± 0.07 61.50 ± 0.44
24 8.13 ± 0.05 58.42 ± 0.31
60 8.68 ± 0.05 58.61 ± 0.33
120 21.02 ± 0.16 132.67 ± 0.91
180 27.85 ± 1.31 171.73 ± 8.44
240 30.29 ± 0.22 174.45 ± 0.60
300 30.12 ± 1.15 178.64 ± 6.86 51
epochs trained using the mbp (poker problem)
0.1
1
10
100
1000
0 50 100 150 200 250 300
Epochsperminute
Hidden Layer Neurons
GTX 280
8600 GT
Core 2 6600
52
speedup (×) for the arrhythmia’s problem.
Nh 8600 GT GTX 280
1 7.15 ± 0.22 33.16 ± 1.03
2 7.77 ± 0.16 42.39 ± 0.86
3 8.72 ± 0.19 49.12 ± 1.14
4 9.04 ± 0.18 48.90 ± 1.29
5 9.13 ± 0.14 49.86 ± 0.97
6 9.23 ± 0.14 53.20 ± 0.80
7 9.09 ± 0.10 53.04 ± 0.67
8 9.23 ± 0.09 53.94 ± 0.54
9 9.35 ± 0.12 53.64 ± 0.90
10 9.46 ± 0.07 53.74 ± 0.54
11 9.27 ± 0.06 52.38 ± 0.83
12 9.29 ± 0.10 51.85 ± 0.57
13 9.31 ± 0.06 51.04 ± 0.65
14 9.07 ± 0.11 50.29 ± 0.67 53
epochs trained using the mbp (arrhythmia’s)
100
1000
10000
0 2 4 6 8 10 12 14
Epochsperminute
Hidden Layer Neurons
GTX 280
8600 GT
Core 2 6600
54
speedups (×) versus average network connections per layer
0
20
40
60
80
100
120
140
160
180
200
100 1000 10000 100000 1e+006 1e+007 1e+008 1e+009
Speedup
Average number of threads per layer
55
radial basis function networks (rbf)
Dataset Samples Features CPU(s) GPU(s) Speedup
Iris 150 4 4.58 12.36 0.37
Breast 569 31 66.99 28.54 2.35
Vehicle 846 18 452.97 346.55 1.31
Vowel 990 10 994.42 866.70 1.15
CMC 1473 9 638.05 501.78 1.27
Satellite 6458 36 10011.50 2365.66 4.23
∙ GPU: 9800 GT (112 cores)
∙ CPU: Intel Core 2 Duo E8400 running at 3.0GHz
56
selecting neural networks models
autonomous training system (ats)
0
1000
2000
3000
4000
5000
1 2 3 4 5 6 7 8
NumberofNetworksTrained
Number of Hidden Neurons
Sonar
10 7 5
2445
4930
2545
57 1
58
decomposition algorithms
non-negative matrix factorization (nmf)
H
W≈V
rN samples
Dfeatures
rfeatures
N samples
sample with D
original features
sample with r
new features
basis
vector
60
yale and orl image datasets
∙ Yale
∙ Vtrain is composed of 4096 rows (64 × 64 pixels) and 150
columns (face images)
∙ Vtest is composed of 4096 rows and 15 columns.
∙ AT&T (ORL)
∙ Vtrain is composed of 10304 (112 × 92) rows and 360 columns
(face images)
∙ Vtest is composed of 10304 rows and 40 columns.
61
time to perform 10,000 nmf iterations on the yale database.
10
100
1000
10000
20 40 60 80 100 120
time(seconds)
r
Vtrain (CPU)
Vtest (CPU)
Vtrain (GPU)
Vtest (GPU)
10s
1m40s
16m40s
3h46m40s
55.6× 82.5× 110.9× 182.3× 251.7×
6.6× 12.9× 21.5× 44.1× 74.1×
62
time to perform 10,000 nmf iterations on the at&t (orl) database
10
100
1000
10000
100000
300000
50 100 150 200 250 300
time(seconds)
r
Vtrain (CPU)
Vtest (CPU)
Vtrain (GPU)
Vtest (GPU)
1m40s
16m40s
3h46m40s
27h46m40s
83h20m00s
277.3×
393.0×
563.9×
533.0×
553.7×
706.8×
18.5× 58.8× 119.7×
134.7×
153.5× 173.6×
63
statistical machine learning algorithms
support vector machines (svms)
x2
x1
direction
1
ρ
direction 2
65
gpumlib – support vector machines (svms)
Speedups
Dataset Samples Features Training Classification
Adult 32561 14 1.83× 2.17×
Breast Cancer 569 30 0.14× 1.10×
German 1000 59 1.42× 2.61×
Haberman 306 3 0.15× 0.14×
Heart 270 20 0.44× 0.23×
Ionosphere 351 34 0.32× 0.41×
Sonar 208 30 0.32× 0.38×
Tic-tac-toe 958 9 0.53× 0.78×
Two-Spiral 2097152 2 6.90× 3.88×
MP3 Steganalysis 1994 742 6.87× 29.53×
Peptidases 20778 24 3.04× 4.04×
66
GPU Computing for Deep Learning
67
deep neural networks
gpumlib deep learning algorithms
∙ Restricted Boltzmann Machines (RBMs)
∙ Deep Belief Networks (DBNs)
x· · ·
h1· · ·
p(x|h1)p(h1|x)
x· · ·
h1· · ·
h2· · ·
p(x|h1)p(h1|x)
p(h1|h2)p(h2|h1)
x· · ·
h1· · ·
h2· · ·
h3· · ·
p(x|h1)p(h1|x)
p(h1|h2)p(h2|h1)
p(h2|h3)p(h3|h2)
69
mnist dataset
70
performance
Average time required to train one epoch
0.01
0.1
1
10
100
0 100 200 300 400 500 600 700 800 900
Time(s)
Hidden units
N = 1, 000
23.26×
23.13×
21.86×
24.46×
29.79×
GTX 460 (GPU)
dual-core i5 (CPU)
71
performance
Average time required to train one epoch
0.1
1
10
100
1000
0 100 200 300 400 500 600 700 800 900
Time(s)
Hidden units
N = 10, 000
32.83×
30.29×
28.59×
29.47×
38.16×
GTX 460 (GPU)
dual-core i5 (CPU)
72
performance
Average time required to train one epoch
1
10
100
1000
10000
0 100 200 300 400 500 600 700 800 900
Time(s)
Hidden units
N = 60, 000
42.73×
43.46×
38.64×
41.83×
46.07×
GTX 460 (GPU)
dual-core i5 (CPU)
73
receptive fields
74
receptive fields
75
conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
76
conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
76
conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
∙ Experimental results with GPUMLib algorithms show the
potential and usefulness of this library
76
conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
∙ Experimental results with GPUMLib algorithms show the
potential and usefulness of this library
∙ Problems involving larger datasets benefit the most from
this architecture.
76
conclusions
∙ Parallel implementations of ML algorithms are crucial for
the development of real-world ML applications
∙ The GPU is particularly well positioned to fulfil this need,
given its availability, high-performance and relative
low-cost.
∙ Experimental results with GPUMLib algorithms show the
potential and usefulness of this library
∙ Problems involving larger datasets benefit the most from
this architecture.
∙ To promote cooperation among researchers and benefit the
field, open-source GPU ML algorithms are fundamental
76
Hey, T., Tansley, S., and Tolle, K., editors (2009).
The Fourth Paradigm: Data-Intensive Scientific Discovery.
Microsoft Research.
Lopes, N. and Ribeiro, B. (2009).
Fast pattern classification of ventricular arrhythmias
using graphics processing units.
In Proceedings of the 14th Iberoamerican Conference on
Pattern Recognition (CIARP 2009), LNCS 5856, pages
603–610. Springer.
Owens, J. D., Houston, M., Luebke, D., Green, S., Stone,
J. E., and Phillips, J. C. (2008).
GPU computing.
Proceedings of the IEEE, 96(5):879–899.
Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou,
L., Holmes, G., LeCun, Y., Müller, K.-R., Pereira, F.,
76
Rasmussen, C. E., Rätsch, G., Schölkopf, B., Smola, A.,
Vincent, P., Weston, J., and Williamson, R. C. (2007).
The need for open source software in machine learning.
Journal of Machine Learning Research, 8:2443–2466.
Stamatopoulos, C., Chuang, T. Y., Fraser, C. S., and Lu, Y. Y.
(2012).
Fully automated image orientation in the absence of
targets.
In International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences (XXII ISPRS
Congress), volume Volume XXXIX-B5, pages 303–308.
Zhongwen, L., hongzhi, L., Zhengping, Y., and Xincai, W.
(2005).
Self-organizing maps computing on graphic process unit.
76
In Proceedings of the 13th European Symposium on
Artificial Neural Networks, pages 557–562.
76

Contenu connexe

Tendances

Standardized Construction of HPC Clusters for Academic Usage
Standardized Construction of HPC Clusters for Academic UsageStandardized Construction of HPC Clusters for Academic Usage
Standardized Construction of HPC Clusters for Academic UsageBradford Bazemore
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsHéloïse Nonne
 
Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Grigori Fursin
 
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...inside-BigData.com
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Kenta Oono
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIJim Dowling
 
Computer architecture pptx
Computer architecture pptxComputer architecture pptx
Computer architecture pptxMDSHABBIR12
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineG. Bruce Berriman
 
An NSA Big Graph experiment
An NSA Big Graph experimentAn NSA Big Graph experiment
An NSA Big Graph experimentTrieu Nguyen
 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learningIntel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learninggeetachauhan
 
Python for Earth
Python for EarthPython for Earth
Python for Earthzakiakhmad
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
 
Big Data Analytics for connected home
Big Data Analytics for connected homeBig Data Analytics for connected home
Big Data Analytics for connected homeHéloïse Nonne
 
Visualisation of Big Imaging Data
Visualisation of Big Imaging DataVisualisation of Big Imaging Data
Visualisation of Big Imaging DataSlava Kitaeff, PhD
 
MN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFMN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFPreferred Networks
 

Tendances (20)

Standardized Construction of HPC Clusters for Academic Usage
Standardized Construction of HPC Clusters for Academic UsageStandardized Construction of HPC Clusters for Academic Usage
Standardized Construction of HPC Clusters for Academic Usage
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...Collective Knowledge: python and scikit-learn based open research SDK for col...
Collective Knowledge: python and scikit-learn based open research SDK for col...
 
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...SX Aurora TSUBASA  (Vector Engine) a Brand-new Vector Supercomputing power in...
SX Aurora TSUBASA (Vector Engine) a Brand-new Vector Supercomputing power in...
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
APAN Cloud WG (2015/3/2)
APAN Cloud WG (2015/3/2)APAN Cloud WG (2015/3/2)
APAN Cloud WG (2015/3/2)
 
Google TPU
Google TPUGoogle TPU
Google TPU
 
Computer architecture pptx
Computer architecture pptxComputer architecture pptx
Computer architecture pptx
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
 
An NSA Big Graph experiment
An NSA Big Graph experimentAn NSA Big Graph experiment
An NSA Big Graph experiment
 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learningIntel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learning
 
Python for Earth
Python for EarthPython for Earth
Python for Earth
 
MATLAB and HDF-EOS
MATLAB and HDF-EOSMATLAB and HDF-EOS
MATLAB and HDF-EOS
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
Big Data Analytics for connected home
Big Data Analytics for connected homeBig Data Analytics for connected home
Big Data Analytics for connected home
 
Visualisation of Big Imaging Data
Visualisation of Big Imaging DataVisualisation of Big Imaging Data
Visualisation of Big Imaging Data
 
SciPy 2010 Review
SciPy 2010 ReviewSciPy 2010 Review
SciPy 2010 Review
 
MN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFMN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOF
 

En vedette

Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningTerry Taewoong Um
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsBuhwan Jeong
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習台灣資料科學年會
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 

En vedette (7)

Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 

Similaire à Dl2 computing gpu

Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Presentation
PresentationPresentation
Presentationbutest
 
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...Christian Plessl
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...KTN
 
Hybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTESHybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTESSubhajit Sahu
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionAkihiro Hayashi
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013btMasoud Nikravesh
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...ChangWoo Min
 
realtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptxrealtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptxgopikahari7
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Masoud Nikravesh
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraMasaharu Munetomo
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 

Similaire à Dl2 computing gpu (20)

Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Presentation
PresentationPresentation
Presentation
 
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
FPGA-accelerated High-Performance Computing – Close to Breakthrough or Pipedr...
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
Implementing AI: Running AI at the Edge: Adapting AI to available resource in...
 
Hybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTESHybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTES
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
 
realtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptxrealtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptx
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 

Plus de Armando Vieira

Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)Armando Vieira
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsArmando Vieira
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsArmando Vieira
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars salesArmando Vieira
 
Visualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyVisualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyArmando Vieira
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithmsArmando Vieira
 
Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015Armando Vieira
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Armando Vieira
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationArmando Vieira
 
Optimization of digital marketing campaigns
Optimization of digital marketing campaignsOptimization of digital marketing campaigns
Optimization of digital marketing campaignsArmando Vieira
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningArmando Vieira
 
Online democracy Armando Vieira
Online democracy Armando VieiraOnline democracy Armando Vieira
Online democracy Armando VieiraArmando Vieira
 
Invtur conference aveiro 2010
Invtur conference aveiro 2010Invtur conference aveiro 2010
Invtur conference aveiro 2010Armando Vieira
 
Tourism with recomendation systems
Tourism with recomendation systemsTourism with recomendation systems
Tourism with recomendation systemsArmando Vieira
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionArmando Vieira
 
Artificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysisArtificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysisArmando Vieira
 

Plus de Armando Vieira (20)

Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithms
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithms
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars sales
 
Visualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyVisualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and Shiny
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
 
Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective acceleration
 
Optimization of digital marketing campaigns
Optimization of digital marketing campaignsOptimization of digital marketing campaigns
Optimization of digital marketing campaigns
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learning
 
Online democracy Armando Vieira
Online democracy Armando VieiraOnline democracy Armando Vieira
Online democracy Armando Vieira
 
Invtur conference aveiro 2010
Invtur conference aveiro 2010Invtur conference aveiro 2010
Invtur conference aveiro 2010
 
Tourism with recomendation systems
Tourism with recomendation systemsTourism with recomendation systems
Tourism with recomendation systems
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy prediction
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Requiem pelo ensino
Requiem pelo ensino Requiem pelo ensino
Requiem pelo ensino
 
Eurogen v
Eurogen vEurogen v
Eurogen v
 
Artificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysisArtificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysis
 

Dernier

Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Dernier (20)

Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 

Dl2 computing gpu

  • 1. deep learning Algorithms and Applications Bernardete Ribeiro, bribeiro@dei.uc.pt University of Coimbra, Portugal INIT/AERFAI Summer School on Machine Learning, Benicassim 22-26 June 2015
  • 2. V - GPU Computing for Machine Learning 1
  • 3. elements 5: gpu computing
  • 4. outline ∙ Motivation ∙ Graphics Processing Units (GPUs) Computing ∙ Machine Learning (ML) GPU algorithms ∙ Advantages of Open-Source in the ML field ∙ Open-source GPU ML library (GPUMLib) ∙ Overview of GPUMLib algorithms ∙ Conclusions 3
  • 5. motivation ∙ The volume of data is increasing at an exponential rate Diversity Data Sources Low-Cost Sensors High- Bandwidth Networks Robotic systems High- capacity storage devices Remote sensing Commodity computing 4
  • 6. data ∙ Nowadays, there are projects that can generate several petabytes of data per day [Hey et al., 2009]: ∙ Australian Square Kilometre Array of radio telescopes ∙ CERN’s Large Hadron Collider ∙ Pan-STARRS array of celestial telescopes 5
  • 7. big data ∙ Nowadays, there are projects that can generate several petabytes of data per day [Hey et al., 2009]: ∙ Australian Square Kilometre Array of radio telescopes ∙ CERN’s Large Hadron Collider ∙ Pan-STARRS array of celestial telescopes 6
  • 8. data science ∙ Data is an asset, from which useful and valuable information can be extracted. 7
  • 9. data science ∙ Data is an asset, from which useful and valuable information can be extracted. ∙ Science is gradually moving toward being computational and data centric. 7
  • 10. data science ∙ Data is an asset, from which useful and valuable information can be extracted. ∙ Science is gradually moving toward being computational and data centric. ∙ To obtain information represents only a fraction of the time and effort needed to analyze it. 7
  • 11. challenges Data sources Real Data Computer Simulation Models Artificial Data Extract useful and relevant information Large volumes of data vastly exceeds our capacity to analyze it Persistent repositories of (accumu- lated) Data challenge 8
  • 12. potential solution Data sources Real Data Computer Simulation Models Artificial Data Extract useful and relevant information Machine Learning Algorithms Large volume of data Persistent repositories of (accumu- lated) Data 9
  • 14. computational resources ∙ Machine Learning (ML) algorithms are computationally expensive 11
  • 15. computational resources ∙ Machine Learning (ML) algorithms are computationally expensive ∙ Their computational requirements are usually proportional to the amount of data being processed 11
  • 16. computational resources ∙ Machine Learning (ML) algorithms are computationally expensive ∙ Their computational requirements are usually proportional to the amount of data being processed ∙ ML algorithms often demand prohibitive computational resources 11
  • 17. advanced computing ∙ Problems are becoming increasingly challenging and demanding (in some cases intractable by traditional CPU architectures). 12
  • 18. advanced computing ∙ Problems are becoming increasingly challenging and demanding (in some cases intractable by traditional CPU architectures). ∙ Toolkits supporting ML software development fail to meet the expectations in terms of computational performance. 12
  • 19. advanced computing ∙ Problems are becoming increasingly challenging and demanding (in some cases intractable by traditional CPU architectures). ∙ Toolkits supporting ML software development fail to meet the expectations in terms of computational performance. ∙ The scientific breakthroughs of the future will undoubtedly be powered by advanced computing capabilities that will allow researchers to manipulate and explore massive datasets [Hey et al., 2009]. 12
  • 20. advanced computing ∙ Problems are becoming increasingly challenging and demanding (in some cases intractable by traditional CPU architectures). ∙ Toolkits supporting ML software development fail to meet the expectations in terms of computational performance. ∙ The scientific breakthroughs of the future will undoubtedly be powered by advanced computing capabilities that will allow researchers to manipulate and explore massive datasets [Hey et al., 2009]. ∙ Pressure to shift development toward high-throughput parallel architectures (crucial for real-world applications). 12
  • 21. computing with graphical processing units (gpus)
  • 22. graphical processing units (gpus) ∙ Highly-parallel and programmable devices that can be used for general-purpose computing applications [Owens et al., 2008]. 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year 0 200 400 600 800 1000 1200 GFLOPS dual-core quad-core AMD (GPU) NVIDIA (GPU) Intel (CPU) 14
  • 23. gpus strengths ∙ Provide remarkable performance gains (compared to CPUs). ∙ Relatively inexpensive (serve the large gaming industry). ∙ Availability. ∙ Scalability. 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year 0 200 400 600 800 1000 1200 GFLOPS dual-core quad-core AMD (GPU) NVIDIA (GPU) Intel (CPU) 15
  • 24. gpu vs cpu performance Disparity between the GPU and CPU peak floating-point performance ∙ The GPU performance is doubled every 12 months while the CPU performance doubles every 18 months [Zhongwen et al., 2005]. 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year 0 200 400 600 800 1000 1200 GFLOPS dual-core quad-core AMD (GPU) NVIDIA (GPU) Intel (CPU) 16
  • 25. nvidia gpu architecture Streaming Multiprocessor SIMT control Shared memory Streaming Multiprocessor SIMT control Shared memory Streaming Multiprocessor SIMT control Shared memory · · · Thread scheduling Hostin- terface Memoryinterface Off-chip memory DRAM DRAM DRAM · · · 17
  • 26. Streaming Multiprocessor (SM) Intruction Cache Register File (32, 768 × 32-bit) Warp Scheduler Dispatch Unit Warp Scheduler Dispatch Unit SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core SP core LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST LD/ST SFU SFU SFU SFU Interconnect Network 64 KB Shared Memory / L1 Cache Uniform Cache Scalar Processor (SP) core Dispatch Port Operand Collector floating-point unit integer unit Result Queue 18
  • 28. speedups ∙ Graphics Processing Units (GPUs) are responsible for dramatic speedups in a wide range of areas for many problems. 20
  • 29. speedups ∙ GPUs are responsible for dramatic speedups in a wide range of areas for many problems. ∙ It is not uncommon to obtain speedups of one or two orders of magnitude. ∙ Tasks that would take years on the CPU can now be completed in days. ∙ Weeks of processing can be transformed into hours [Lopes and Ribeiro, 2009] ∙ Computations that would otherwise take hours can now be completed in a few seconds. 20
  • 30. one-order of magnitude year 12 days month 1 day week 5:36 hours day 48 minutes hour 2 minutes 30× 21
  • 31. two-orders of magnitude year 29 hours month 2:24 hours week 34 minutes day 5 minutes hour 12 seconds 300× 22
  • 34. machine learning tools ∙ Caffe: Framework for convolutional neural network algorithms ∙ cuda-convnet: High performance C++/CUDA implementation of convolutional neural networks ∙ Theano: Python library to define, optimize, and evaluate mathematical expressions ∙ Torch7: Scientific computing framework for machine learning algorithms ∙ cuBLAS: GPU-accelerated version of the complete standard BLAS library ∙ MATLAB: Easy-to-use HPC language integrating computation, visualization, and programming ∙ GPUMLib: GPU Machine Learning Library 25
  • 35. companies using gpus for machine learning http://www.nvidia.com/object/machine-learning.html 26
  • 36. ml algorithms in gpu platform ∙ Large computational requirements. 27
  • 37. ml algorithms in gpu platform ∙ Large computational requirements. ∙ Algorithms should present a high-degree of parallelism. 27
  • 38. ml algorithms in gpu platform ∙ Large computational requirements. ∙ Algorithms should present a high-degree of parallelism. ∙ Favor data throughput in detriment of the latency of individual operations. 27
  • 39. gpu ml implementations ClosedSourceOpenSource 2004 2005 2006 2007 2008 2009 2010 2011 2012 Multilayer Perceptrons (forward-phase) Oh and Jung Self-Organizing Maps Campbell et al. Luo et al. Genetic Algorithms Wong et al. Yu et al. Back-Propagation (two layers) Steinkrau et al. Convolutional Neural Networks Chellapilla et al. Spiking Neural Networks Bernhard and Keriven Belief Propagation Brunton et al. Yang et al. Fuzzy ART neural networks Martínez-Zarzuela et al. K-Means Clustering Shalom et al. Recurrent networks Trebatický and Pospíchal Decision Trees and Forests Sharp Neural Network based text detection Jang et al. Linear Radial Basis Functions Brandstetter and Artusi Deep Belief Networks Sparse Coding Raina et al. Back-Propagation (three layers) Guzhva et al. Support Vector Machines Catanzaro et al. Genetic Algorithms Langdon and Banzhaf K-Nearest Neighbor Garcia et al. Spiking Neural Networks Nageswaran et al. Multiple Back-Propagation Back-Propagation Lopes and Ribeiro Non-negative Matrix Factorization Lopes and Ribeiro 28
  • 41. gpu implementations ∙ The number of GPU implementations of ML algorithms has increased substantially over the last few years. ∙ However, most of the implementations are not openly shared. 30
  • 42. open source advantages ∙ Better reproducibility of experimental results; 1 [Sonnenburg et al., 2007] 31
  • 43. open source advantages ∙ Better reproducibility of experimental results; ∙ Fair comparison of algorithms; 1 [Sonnenburg et al., 2007] 31
  • 44. open source advantages ∙ Better reproducibility of experimental results; ∙ Fair comparison of algorithms; ∙ Quicker detection of errors; 1 [Sonnenburg et al., 2007] 31
  • 45. open source advantages ∙ Better reproducibility of experimental results; ∙ Fair comparison of algorithms; ∙ Quicker detection of errors; ∙ Quicker adoption of algorithms; 1 [Sonnenburg et al., 2007] 31
  • 46. open source advantages ∙ Better reproducibility of experimental results; ∙ Fair comparison of algorithms; ∙ Quicker detection of errors; ∙ Quicker adoption of algorithms; ∙ Innovative applications and easier combination of advances; 1 [Sonnenburg et al., 2007] 31
  • 47. open source advantages ∙ Better reproducibility of experimental results; ∙ Fair comparison of algorithms; ∙ Quicker detection of errors; ∙ Quicker adoption of algorithms; ∙ Innovative applications and easier combination of advances; ∙ Faster adoption of ML methods in other disciplines and in industry. 1 [Sonnenburg et al., 2007] 31
  • 48. open source advantages ∙ Better reproducibility of experimental results; ∙ Fair comparison of algorithms; ∙ Quicker detection of errors; ∙ Quicker adoption of algorithms; ∙ Innovative applications and easier combination of advances; ∙ Faster adoption of ML methods in other disciplines and in industry. ∙ Cooperation among researchers 1 1 [Sonnenburg et al., 2007] 31
  • 49. GPUMLib - GPU Machine Learning Library http://gpumlib.sourceforge.net/ 32
  • 50. gpumlib – http://gpumlib.sourceforge.net/ Host (CPU) and device (GPU) memory access framework HostArray HostMatrix CudaArray DeviceArray DeviceMatrix · · · C++ classes (algorithms) Back- Propagation Radial Basis Functions Deep Belief Networks Restricted Boltzmann Machines Multiple Back- Propagation Support Vector Machines Non-Negative Matrix Factorization · · · Common Host (CPU) Classes Common CUDA Kernels CUDA (GPU) Kernels Multiple Back- Propagation Support Vector Machines Non-Negative Matrix Factorization k-Nearest Neighbor Radial Basis Functions Restricted Boltzmann Machines Genetic Algorithms · · · Common Device (GPU) Functions 33
  • 51. CUDA (Compute Unified Device Architecture) 34
  • 52. cuda ∙ represented a major step toward the simplification of the GPU programming model: 35
  • 53. cuda ∙ represented a major step toward the simplification of the GPU programming model: ∙ Support for accessible programming interfaces and industry-standard languages, such as C and C++. 35
  • 54. cuda ∙ represented a major step toward the simplification of the GPU programming model: ∙ Support for accessible programming interfaces and industry-standard languages, such as C and C++. ∙ released by NVIDIA in the end of 2006 and since then numerous GPU implementations, spanning a wide range of applications, have been developed using this technology. 35
  • 55. cuda ∙ represented a major step toward the simplification of the GPU programming model: ∙ Support for accessible programming interfaces and industry-standard languages, such as C and C++. ∙ released by NVIDIA in the end of 2006 and since then numerous GPU implementations, spanning a wide range of applications, have been developed using this technology. ∙ While there alternative options, such as the OpenCL, the Microsoft Directcompute or the AMD Stream, so far CUDA is the only technology that has achieved wide adoption and usage [Stamatopoulos et al., 2012]. 35
  • 57. systems’s main characteristics Main Characteristics System 1 (8600 GT) Intel Core 2 6600 (2.4GHz) NVIDIA GeForce 8600 GT Windows Vista (x64) 4GB memory System 2 (GTX 280) Intel Core 2 Quad Q 9300 (2.5GHz) NVIDIA GeForce GTX 280 Windows 7 (x64) 4GB memory System 3 (GTX 460) Intel Dual-Core i5-2410M (2.7GHz) NVIDIA GeForce GTX 460 Windows 7 (x64) 8GB memory 37
  • 61. multiple back-propagation (mbp) xp 1 × xp 2 × xp 3 × yp 1× yp 2× Space Network Main Network with selec- tive activation neurons 41
  • 62. neural selective input model (nsim) x p 1 x p 2 x p 3 r p 3 x p j y p 1 y p 2 wjk θk × multiplier r p j ˜x p j selective input neuron Physical model Model 1 when x p 3 is missing: r p 3 = 0 x p 1 x p 2 Conceptual models y p 1 y p 2 Model 2 when the value of x p 3 is known: r p 3 = 1 x p 1 x p 2 x p 3 y p 1 y p 2 42
  • 63. resource allocating network with long term memory x1 x2 x3 x4 z1 z2 Hidden layer Input layer Output layer Generate & StoreRetrieve & Learn Long-Term Memory 43
  • 65. uci benchmarks Dataset (Benchmark) Samples Features Classes f(x) = sin(x)/x 101 1 1 Two-spirals 194 2 1 Sonar 104 60 1 Covertype 11340 54 7 Poker Hand 25010 85 10 Ventricular Arrhythmia’s 19391 18 1 Yale face database 120 45 15 45
  • 67. speedup (×) for the f(x) = sin(x)/x problem. Topology Nh 8600 GT GTX 280 FF/BP 7 3.98 ± 0.19 5.48 ± 0.21 9 4.66 ± 0.16 7.15 ± 0.13 11 5.44 ± 0.13 8.43 ± 0.15 MFF/MBP 5 4.37 ± 0.10 6.08 ± 0.13 7 5.73 ± 0.07 7.99 ± 0.10 9 6.77 ± 0.09 10.24 ± 0.12 47
  • 68. speedup (×) for the two-spirals problem. Topology Nh 8600 GT GTX 280 FF/BP 25 7.68 ± 1.01 32.84 ± 4.78 30 7.96 ± 0.68 39.22 ± 3.36 35 7.55 ± 0.42 39.61 ± 2.49 MFF/MBP 15 9.89 ± 0.76 32.85 ± 2.59 20 9.75 ± 0.16 38.10 ± 0.70 25 10.01 ± 0.31 42.98 ± 1.27 48
  • 69. two-spirals nns training time (mbp algorithm) 10 100 1000 15 20 25 Time(s) First Hidden Layer Neurons GTX 280 8600 GT Core 2 6600 49
  • 70. speedup (×) for the covertype problem. Topology Nh 8600 GT GTX 280 FF/BP 12 8.33 ± 0.24 59.67 ± 0.50 24 8.27 ± 0.78 56.37 ± 0.53 60 7.51 ± 0.07 57.87 ± 0.53 120 8.62 ± 0.04 57.92 ± 0.26 180 9.00 ± 0.38 58.40 ± 2.74 240 10.40 ± 0.12 62.88 ± 0.53 300 18.18 ± 0.11 112.56 ± 0.69 MFF/MBP 12 8.63 ± 0.16 64.59 ± 0.87 24 8.20 ± 0.09 59.63 ± 0.74 60 8.09 ± 0.32 60.08 ± 2.41 120 8.92 ± 0.09 59.00 ± 0.57 180 19.21 ± 0.15 121.95 ± 0.75 240 23.85 ± 0.11 141.83 ± 0.52 300 25.35 ± 0.37 153.23 ± 2.22 50
  • 71. speedup (×) for the poker problem. Topology Nh 8600 GT GTX 280 FF/BP 12 8.35 ± 0.07 57.49 ± 0.35 24 8.05 ± 2.15 54.79 ± 0.40 60 8.73 ± 0.05 59.51 ± 0.44 120 8.98 ± 0.08 57.55 ± 0.35 180 17.80 ± 0.20 111.30 ± 0.36 240 27.41 ± 0.50 159.03 ± 2.71 300 29.03 ± 1.58 174.91 ± 9.50 MFF/MBP 12 8.61 ± 0.07 61.50 ± 0.44 24 8.13 ± 0.05 58.42 ± 0.31 60 8.68 ± 0.05 58.61 ± 0.33 120 21.02 ± 0.16 132.67 ± 0.91 180 27.85 ± 1.31 171.73 ± 8.44 240 30.29 ± 0.22 174.45 ± 0.60 300 30.12 ± 1.15 178.64 ± 6.86 51
  • 72. epochs trained using the mbp (poker problem) 0.1 1 10 100 1000 0 50 100 150 200 250 300 Epochsperminute Hidden Layer Neurons GTX 280 8600 GT Core 2 6600 52
  • 73. speedup (×) for the arrhythmia’s problem. Nh 8600 GT GTX 280 1 7.15 ± 0.22 33.16 ± 1.03 2 7.77 ± 0.16 42.39 ± 0.86 3 8.72 ± 0.19 49.12 ± 1.14 4 9.04 ± 0.18 48.90 ± 1.29 5 9.13 ± 0.14 49.86 ± 0.97 6 9.23 ± 0.14 53.20 ± 0.80 7 9.09 ± 0.10 53.04 ± 0.67 8 9.23 ± 0.09 53.94 ± 0.54 9 9.35 ± 0.12 53.64 ± 0.90 10 9.46 ± 0.07 53.74 ± 0.54 11 9.27 ± 0.06 52.38 ± 0.83 12 9.29 ± 0.10 51.85 ± 0.57 13 9.31 ± 0.06 51.04 ± 0.65 14 9.07 ± 0.11 50.29 ± 0.67 53
  • 74. epochs trained using the mbp (arrhythmia’s) 100 1000 10000 0 2 4 6 8 10 12 14 Epochsperminute Hidden Layer Neurons GTX 280 8600 GT Core 2 6600 54
  • 75. speedups (×) versus average network connections per layer 0 20 40 60 80 100 120 140 160 180 200 100 1000 10000 100000 1e+006 1e+007 1e+008 1e+009 Speedup Average number of threads per layer 55
  • 76. radial basis function networks (rbf) Dataset Samples Features CPU(s) GPU(s) Speedup Iris 150 4 4.58 12.36 0.37 Breast 569 31 66.99 28.54 2.35 Vehicle 846 18 452.97 346.55 1.31 Vowel 990 10 994.42 866.70 1.15 CMC 1473 9 638.05 501.78 1.27 Satellite 6458 36 10011.50 2365.66 4.23 ∙ GPU: 9800 GT (112 cores) ∙ CPU: Intel Core 2 Duo E8400 running at 3.0GHz 56
  • 78. autonomous training system (ats) 0 1000 2000 3000 4000 5000 1 2 3 4 5 6 7 8 NumberofNetworksTrained Number of Hidden Neurons Sonar 10 7 5 2445 4930 2545 57 1 58
  • 80. non-negative matrix factorization (nmf) H W≈V rN samples Dfeatures rfeatures N samples sample with D original features sample with r new features basis vector 60
  • 81. yale and orl image datasets ∙ Yale ∙ Vtrain is composed of 4096 rows (64 × 64 pixels) and 150 columns (face images) ∙ Vtest is composed of 4096 rows and 15 columns. ∙ AT&T (ORL) ∙ Vtrain is composed of 10304 (112 × 92) rows and 360 columns (face images) ∙ Vtest is composed of 10304 rows and 40 columns. 61
  • 82. time to perform 10,000 nmf iterations on the yale database. 10 100 1000 10000 20 40 60 80 100 120 time(seconds) r Vtrain (CPU) Vtest (CPU) Vtrain (GPU) Vtest (GPU) 10s 1m40s 16m40s 3h46m40s 55.6× 82.5× 110.9× 182.3× 251.7× 6.6× 12.9× 21.5× 44.1× 74.1× 62
  • 83. time to perform 10,000 nmf iterations on the at&t (orl) database 10 100 1000 10000 100000 300000 50 100 150 200 250 300 time(seconds) r Vtrain (CPU) Vtest (CPU) Vtrain (GPU) Vtest (GPU) 1m40s 16m40s 3h46m40s 27h46m40s 83h20m00s 277.3× 393.0× 563.9× 533.0× 553.7× 706.8× 18.5× 58.8× 119.7× 134.7× 153.5× 173.6× 63
  • 85. support vector machines (svms) x2 x1 direction 1 ρ direction 2 65
  • 86. gpumlib – support vector machines (svms) Speedups Dataset Samples Features Training Classification Adult 32561 14 1.83× 2.17× Breast Cancer 569 30 0.14× 1.10× German 1000 59 1.42× 2.61× Haberman 306 3 0.15× 0.14× Heart 270 20 0.44× 0.23× Ionosphere 351 34 0.32× 0.41× Sonar 208 30 0.32× 0.38× Tic-tac-toe 958 9 0.53× 0.78× Two-Spiral 2097152 2 6.90× 3.88× MP3 Steganalysis 1994 742 6.87× 29.53× Peptidases 20778 24 3.04× 4.04× 66
  • 87. GPU Computing for Deep Learning 67
  • 89. gpumlib deep learning algorithms ∙ Restricted Boltzmann Machines (RBMs) ∙ Deep Belief Networks (DBNs) x· · · h1· · · p(x|h1)p(h1|x) x· · · h1· · · h2· · · p(x|h1)p(h1|x) p(h1|h2)p(h2|h1) x· · · h1· · · h2· · · h3· · · p(x|h1)p(h1|x) p(h1|h2)p(h2|h1) p(h2|h3)p(h3|h2) 69
  • 91. performance Average time required to train one epoch 0.01 0.1 1 10 100 0 100 200 300 400 500 600 700 800 900 Time(s) Hidden units N = 1, 000 23.26× 23.13× 21.86× 24.46× 29.79× GTX 460 (GPU) dual-core i5 (CPU) 71
  • 92. performance Average time required to train one epoch 0.1 1 10 100 1000 0 100 200 300 400 500 600 700 800 900 Time(s) Hidden units N = 10, 000 32.83× 30.29× 28.59× 29.47× 38.16× GTX 460 (GPU) dual-core i5 (CPU) 72
  • 93. performance Average time required to train one epoch 1 10 100 1000 10000 0 100 200 300 400 500 600 700 800 900 Time(s) Hidden units N = 60, 000 42.73× 43.46× 38.64× 41.83× 46.07× GTX 460 (GPU) dual-core i5 (CPU) 73
  • 96. conclusions ∙ Parallel implementations of ML algorithms are crucial for the development of real-world ML applications 76
  • 97. conclusions ∙ Parallel implementations of ML algorithms are crucial for the development of real-world ML applications ∙ The GPU is particularly well positioned to fulfil this need, given its availability, high-performance and relative low-cost. 76
  • 98. conclusions ∙ Parallel implementations of ML algorithms are crucial for the development of real-world ML applications ∙ The GPU is particularly well positioned to fulfil this need, given its availability, high-performance and relative low-cost. ∙ Experimental results with GPUMLib algorithms show the potential and usefulness of this library 76
  • 99. conclusions ∙ Parallel implementations of ML algorithms are crucial for the development of real-world ML applications ∙ The GPU is particularly well positioned to fulfil this need, given its availability, high-performance and relative low-cost. ∙ Experimental results with GPUMLib algorithms show the potential and usefulness of this library ∙ Problems involving larger datasets benefit the most from this architecture. 76
  • 100. conclusions ∙ Parallel implementations of ML algorithms are crucial for the development of real-world ML applications ∙ The GPU is particularly well positioned to fulfil this need, given its availability, high-performance and relative low-cost. ∙ Experimental results with GPUMLib algorithms show the potential and usefulness of this library ∙ Problems involving larger datasets benefit the most from this architecture. ∙ To promote cooperation among researchers and benefit the field, open-source GPU ML algorithms are fundamental 76
  • 101. Hey, T., Tansley, S., and Tolle, K., editors (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research. Lopes, N. and Ribeiro, B. (2009). Fast pattern classification of ventricular arrhythmias using graphics processing units. In Proceedings of the 14th Iberoamerican Conference on Pattern Recognition (CIARP 2009), LNCS 5856, pages 603–610. Springer. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., and Phillips, J. C. (2008). GPU computing. Proceedings of the IEEE, 96(5):879–899. Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou, L., Holmes, G., LeCun, Y., Müller, K.-R., Pereira, F., 76
  • 102. Rasmussen, C. E., Rätsch, G., Schölkopf, B., Smola, A., Vincent, P., Weston, J., and Williamson, R. C. (2007). The need for open source software in machine learning. Journal of Machine Learning Research, 8:2443–2466. Stamatopoulos, C., Chuang, T. Y., Fraser, C. S., and Lu, Y. Y. (2012). Fully automated image orientation in the absence of targets. In International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (XXII ISPRS Congress), volume Volume XXXIX-B5, pages 303–308. Zhongwen, L., hongzhi, L., Zhengping, Y., and Xincai, W. (2005). Self-organizing maps computing on graphic process unit. 76
  • 103. In Proceedings of the 13th European Symposium on Artificial Neural Networks, pages 557–562. 76