SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Learning at Scale:
Deep, Distributed and Multi-dimensional
Anima Anandkumar
..
Amazon AI & Caltech
Significantly improve many applications on multiple domains
“deep learning” trend in the past 10 years
image understanding speech recognition natural language
processing
…
Deep Learning
autonomy
Image Classification
Layer 1 Layer 2 Output
multilevel feature extractions from raw pixels
to semantic meanings
explore spatial information with convolution layers
Image Classification
§ Hard to define the network
§ the definition of the inception network has >1k lines of codes in Caffe
§ A single image requires billions floating-point operations
§ Intel i7 ~500 GFLOPS
§ Nvidia Titan X: ~5 TFLOPS
§ Memory consumption is linear with number of layers
State-of-the-art networks have tens to hundreds layers
Outline
1 Introduction
2 Distributed Deep Learning Using Mxnet
3 Learning in Multiple Dimensions
4 Conclusion
3. MXNet
image credit - wikipedia
• Imperative and Declarative Programming
• Language Support
• Backend and Automatic Parallelization
Writing Parallel Programs is Painful
Each forward-backward-update
involves O(num_layer), which is
often 100—1,000, tensor
computations and communications
data = next_batch()data[gpu0].copyfrom(data[0:50])
_, fc1_wgrad[gpu0] =
FullcBackward(fc1_ograd[gpu0] ,
fc1_weight[gpu0])
fc1_ograd[gpu0], fc2_wgrad[gpu0] =
FullcBackward(fc2_ograd[gpu0] ,
fc2_weight[gpu0])
fc2_ograd[gpu0] = LossGrad(fc2[gpu0],
label[0:50])
fc2[gpu0] = FullcForward(fc1[gpu0],
fc2_weight[gpu0])
fc1[gpu0] = FullcForward(data[gpu0],
fc1_weight[gpu0])
fc2_wgrad[cpu] =
fc2_wgrad[gpu0] + fc2_wgrad[gpu1]
fc2_weight[cpu].copyto(
fc2_weight[gpu0] ,
fc2_weight[gpu1])
fc2_weight[cpu] -=
lr*fc12_wgrad[gpu0]
fc1_weight[cpu] -= lr *
fc1_wgrad[gpu0]
fc1_wgrad[cpu] =
fc1_wgrad[gpu0] + fc1_wgrad[gpu1]
fc1_weight[cpu].copyto(
fc1_weight[gpu0] ,
fc1_weight[gpu1])
data[gpu0].copyfrom(data[51:100])
_, fc1_wgrad[gpu1] =
FullcBackward(fc1_ograd[gpu1] ,
fc1_weight[gpu1])
fc1_ograd[gpu1], fc2_wgrad[gpu1] =
FullcBackward(fc2_ograd[gpu1] ,
fc2_weight[gpu1])
fc2_ograd[gpu1] =
LossGrad(fc2[gpu1], label[51:100])
fc2[gpu1] = FullcForward(fc1[gpu1],
fc2_weight[gpu1])
fc1[gpu1] = FullcForward(data[gpu1],
fc1_weight[gpu1])
Dependency graph for 2-layer neural
networks with 2 GPUs
Auto Parallelization
18
Write serial programs Run in parallel
>>> import mxnet as mx
>>> A = mx.nd.ones((2,2)) *2
>>> C = A + 2
>>> B = A + 1
>>> D = B * C
>>> D.wait_to_read()
A = 2
C = A + 2 B = A + 1
D = B ⨉ C
Data Parallelism
19
key-value store
examples
1. Read a data partition
2. Pull the parameters
3. Compute the gradient
4. Push the gradient
5. Update the parameters
Scale to Multiple GPU Machines
21
PCIe Switch
GPU
GPU
GPU
GPU
CPU
Network Switch
63 GB/s
4 PCIe 3.0 16x
15.75 GB/s
PCIe 3.0 16x
1.25 GB/s
10 Gbit Ethernet
Hierarchical parameter server
Level-1 Servers
Workers
Level-2 Servers
GPUs
CPUs
Experiment Setup
✧
✓ 1.2 million images with 1000 classes
✧ Resnet 152-layer model
✧ EC2 P2.16xlarge
22
GPU 0-15
PCIe switches
CPU
✧ Minibatch SGD
✧ Synchronized Updating
Scalability over Multiple Machines
23
time(sec)/bath
0
0.25
0.5
0.75
1
# of GPUs
0 32 64 96 128
Comm Cost
batch size/GPU=2
batch size/GPU=4
batch size/GPU=8
batch size/GPU=16
115x
8
2012before 2013 2014 2015 2016 2017
mxnet
imperative
symbolic
gluon
Back-end System
✧ Optimization
✓ Memory optimization
✓ Operator fusion
✧ Scheduling
✓ Auto-parallelization
11
a b
1
+
⨉
c
fullc
softmax
weight
bias
Back-end
import mxnet as mx
a = mx.nd.zeros((100, 50))
b = mx.nd.ones((100, 50))
c = a * b
c += 1
import mxnet as mx
net = mx.symbol.Variable('data')
net = mx.symbol.FullyConnected(
data=net, num_hidden=128)
net = mx.symbol.SoftmaxOutput(data=net)
texec = mx.module.Module(net)
texec.forward(data=c)
texec.backward()
Front-end
In summary
✦ Symbolic
❖ efficient & portable
❖ but hard to use
10
✦ tesla
✦ Imperative
❖ flexible
❖ may be slow
✦ Gluon
❖ imperative for developing
❖ symbolic for deploying
Outline
1 Introduction
2 Distributed Deep Learning Using Mxnet
3 Learning in Multiple Dimensions
4 Conclusion
Tensors: Beyond 2D world
Modern data is inherently multi-dimensional
Tensors: Beyond 2D world
Modern data is inherently multi-dimensional
Input Hidden 1 Hidden 2 Output
Tensor Contraction
Extends the notion of matrix product
Matrix product
Mv =
j
vjMj
= +
Tensor Contraction
T(u, v, ·) =
i,j
uivjTi,j,:
=
++
+
Employing Tensor Contractions in Alexnet
Replace fully connected layer with tensor contraction layer
Enabling Tensor Contraction Layer in Mxnet
Performance	of	the	TCL
• Trained	end-to-end
• On	ImageNet	with	VGG:	
• 65.9%	space	savings
• performance	drop	of	0.6%	only
• On	ImageNet	with	AlexNet:		
• 56.6%	space	savings
• Performance	improvement	of	0.5%
Low-rank	tensor	regression
Tensor	Regression	Networks,		J.	Kossaifi,	Z.C.Lipton,	A.Khanna,	
T.Furlanello and	A.Anandkumar,		ArXiv pre-publication
Performance	and	rank
Speeding up Tensor Contractions
1 Tensor contractions are a core primitive of multilinear algebra.
2 BLAS 3: Unbounded compute intensity (no. of ops per I/O)
Consider single-index contractions: CC = AA BB
=
=
A(:,1,:) A(:,2,:)A422
B21
C421
e.g. Cmnp = Amnk Bkp
Speeding up Tensor Contraction
Explicit permutation dominates,
especially for small tensors.
Consider Cmnp = Akm Bpkn.
1 Akm → Amk
2 Bpkn → Bkpn
3 Cmnp → Cmpn
4 Cm(pn) = Amk Bk(pn)
5 Cmpn → Cmnp
100 200 300 400 500
0
0.2
0.4
0.6
0.8
1
n
(Top) CPU. (Bottom) GPU. The fraction of time
spent in copies/transpositions. Lines are shown with
1, 2, 3, and 6 transpositions.
Existing Primitives
GEMM
Suboptimal for many small matrices.
Pointer-to-Pointer BatchedGEMM
Available in MKL 11.3β and cuBLAS 4.1
C[p] = α op(A[p]) op(B[p]) + β C[p]
cublas<T>gemmBatched(cublasHandle_t handle,
cublasOperation_t transA, cublasOperation_t transB,
int M, int N, int K,
const T* alpha,
const T** A, int ldA,
const T** B, int ldB,
const T* beta,
T** C, int ldC,
int batchCount)
Tensor Contraction with Extended BLAS Primitives
Cmn[p] = AmkBkn[p]
cublasDgemmStridedBatched(handle,
CUBLAS_OP_N, CUBLAS_OP_N,
M, N, K,
&alpha,
A, ldA1, 0,
B, ldB1, ldB2,
&beta,
C, ldC1, ldC2,
P)
Tensor Contraction with Extended BLAS Primitives
Cmnp = A∗∗ × B∗∗∗
Cmnp ≡ C[m + n · ldC1 + p · ldC2]
Case Contraction Kernel1 Kernel2 Case Contraction Kernel1 Kernel2
1.1 AmkBknp Cm(np) = AmkBk(np) Cmn[p] = AmkBkn[p] 4.1 AknBkmp Cmn[p] = Bkm[p]Akn
1.2 AmkBkpn Cmn[p] = AmkBk[p]n Cm[n]p = AmkBkp[n] 4.2 AknBkpm Cmn[p] = Bk[p]mAkn
1.3 AmkBnkp Cmn[p] = AmkBnk[p] 4.3 AknBmkp Cmn[p] = Bmk[p]Akn
1.4 AmkBpkn Cm[n]p = AmkBpk[n] 4.4 AknBpkm
1.5 AmkBnpk Cm(np) = AmkB(np)k Cmn[p] = AmkBn[p]k 4.5 AknBmpk Cmn[p] = Bm[p]kAkn
1.6 AmkBpnk Cm[n]p = AmkBp[n]k 4.6 AknBpmk
2.1 AkmBknp Cm(np) = AkmBk(np) Cmn[p] = AkmBkn[p] 5.1 ApkBkmn C(mn)p = Bk(mn)Apk Cm[n]p = Bkm[n]Apk
2.2 AkmBkpn Cmn[p] = AkmBk[p]n Cm[n]p = AkmBkp[n] 5.2 ApkBknm Cm[n]p = Bk[n]mApk
2.3 AkmBnkp Cmn[p] = AkmBnk[p] 5.3 ApkBmkn Cm[n]p = Bmk[n]Apk
2.4 AkmBpkn Cm[n]p = AkmBpk[n] 5.4 ApkBnkm
2.5 AkmBnpk Cm(np) = AkmB(np)k Cmn[p] = AkmBn[p]k 5.5 ApkBmnk C(mn)p = B(mn)kApk Cm[n]p = Bm[n]kApk
2.6 AkmBpnk Cm[n]p = AkmBp[n]k 5.6 ApkBnmk
3.1 AnkBkmp Cmn[p] = Bkm[p]Ank 6.1 AkpBkmn C(mn)p = Bk(mn)Akp Cm[n]p = Bkm[n]Akp
3.2 AnkBkpm Cmn[p] = Bk[p]mAnk 6.2 AkpBknm Cm[n]p = Bk[n]mAkp
3.3 AnkBmkp Cmn[p] = Bmk[p]Ank 6.3 AkpBmkn Cm[n]p = Bmk[n]Akp
3.4 AnkBpkm 6.4 AkpBnkm
3.5 AnkBmpk Cmn[p] = Bm[p]kAnk 6.5 AkpBmnk C(mn)p = B(mn)kAkp Cm[n]p = Bm[n]kAkp
3.6 AnkBpmk 6.6 AkpBnmk
A new primitive: StridedBatchedGEMM
Performance on par with pure GEMM (P100 and beyond).
Applications: Tucker Decomposition
Tmnp = GijkAmiBnjCpk
mnp ijk
mi
njT G
A
B
pkC Main steps in the algorithm
Ymjk = TmnpBt
njCt
pk
Yink = TmnpAt+1
mi Ct
pk
Yijp = TmnpBt+1
nj At+1
mi
Performance on Tucker decomposition:
20 40 60 80 100 120
10−2
100
102
104
106
n
Time(sec)
TensorToolbox
BTAS
Cyclops
CPU Batched
GPU Batched
Tensor Sketches
Randomized dimensionality reduction
through sketching.
◮ Complexity independent of tensor order:
exponential gain!
+1
+1
-1
Tensor T
Sketch s
Applications
Tensor Decomposition via Sketching
Visual Question and Answering
CNN
RNN
What is the
mustach made of?
C
W
H
MCT
L
Avgpooling
FC
Relu
BatchNorm
FC
"Banana"
Softmax
MCT in Visual Question & Answering
CNN
RNN
‡ ¡t is the
musta™  ¢¡£¤ ¥¦§
C
¨
r
w©
v
ev

FC
Relu
f
!
m
FC
4#$%$na4
Softma
x
Multimodal Tensor Pooling
C W
H
L
Text feature
Image feature d1
d2
d3Spatial sketch
Count sketch
3D FFT
1D FFT
3D IFFT
(optional)
d4
d1
d2
d3
Tensor Decompositions
Extracting Topics from Documents
Topics
Topic Proportion
police
witness
campus
police
witness
campus
police
witness
campus
police
witness
crime
Sports
Educaon
campus
A., D. P. Foster, D. Hsu, S.M. Kakade, Y.K. Liu.“Two SVDs Suffice: Spectral decompositions
for probabilistic topic modeling and latent Dirichlet allocation,” NIPS 2012.
Tensor Methods for Topic Modeling
campus
police
witness
Topic-word matrix P[word = i|topic = j]
Linearly independent columns
Moment Tensor: Co-occurrence of Word Triplets
= + +
campus
police
witness
crim
e
Sports
Educa
on
campus
police
witness
cam
pus
police
witness
Tensors vs. Variational Inference
Criterion: Perplexity = exp[−likelihood].
Learning Topics from PubMed on Spark, 8mil articles
0
2
4
6
8
10
×104
RunningTime
103
104
105
Perplexity
Tensor
Variational
Learning network communities from social network data
Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6.
102
10
3
10
4
105
10
6
RunningTime
FB YP DBLPsub DBLP 10-2
10-1
10
0
101
Error
FB YP DBLPsub DBLP
F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
Tensors vs. Variational Inference
Criterion: Perplexity = exp[−likelihood].
Learning Topics from PubMed on Spark, 8mil articles
0
2
4
6
8
10
×104
RunningTime
103
104
105
Perplexity
Tensor
Variational
Learning network communities from social network data
Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6.
102
10
3
10
4
105
10
6
RunningTime
FB YP DBLPsub DBLP 10-2
10-1
10
0
101
Error
FB YP DBLPsub DBLP
Orders of Magnitude Faster 
More Accurate
F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
Outline
1 Introduction
2 Distributed Deep Learning Using Mxnet
3 Learning in Multiple Dimensions
4 Conclusion
Conclusion
Distributed Deep Learning at Scale
Mxnet has many attractive features
◮ Flexible programming
◮ Portable
◮ Highly efficient
Easy to deploy large-scale DL on AWS cloud
◮ Deep Learning AMI
◮ Cloud formation templates
Tensors are the future of ML
Tensor contractions: space savings in deep architectures.
New primitives speed up tensor contractions: extended BLAS
=
++
+
T
u
v
= + ....

Contenu connexe

Tendances

Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learningpauldix
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsRevolution Analytics
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataMLconf
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachSpark Summit
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Intro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetAmazon Web Services
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...Dong Heon Cho
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 

Tendances (20)

Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learning
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Intro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNetIntro to Scalable Deep Learning on AWS with Apache MXNet
Intro to Scalable Deep Learning on AWS with Apache MXNet
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
[ppt]
[ppt][ppt]
[ppt]
 

En vedette

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf
 
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017MLconf
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...MLconf
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017MLconf
 
Jonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIMLconf
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017MLconf
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017MLconf
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017MLconf
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf
 
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...MLconf
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017MLconf
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf
 
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15MLconf
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCMLconf
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music IndustryData Science London
 
Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013MLconf
 
Lukas Biewald, MLconf
Lukas Biewald, MLconf Lukas Biewald, MLconf
Lukas Biewald, MLconf MLconf
 
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...MLconf
 

En vedette (20)

Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
Rushin Shah, Engineering Manager, Facebook at MLconf SF 2017
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
 
Jonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAI
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
 
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
 
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
Pedro Domingos, Professor, University of Washington at MLconf ATL - 9/18/15
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music Industry
 
Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013
 
Lukas Biewald, MLconf
Lukas Biewald, MLconf Lukas Biewald, MLconf
Lukas Biewald, MLconf
 
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
 

Similaire à Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor, CalTech at MLconf SF 2017

Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Codemotion
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGenevsachde
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural NetworksShinya Takamaeda-Y
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Provectus
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1NVIDIA
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureIJMER
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 

Similaire à Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor, CalTech at MLconf SF 2017 (20)

Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural Networks
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
Федор Поляков (Looksery) “Face Tracking на мобильных устройствах в режиме реа...
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
An35225228
An35225228An35225228
An35225228
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT Architecture
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 

Plus de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Plus de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Dernier

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Dernier (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor, CalTech at MLconf SF 2017

  • 1. Learning at Scale: Deep, Distributed and Multi-dimensional Anima Anandkumar .. Amazon AI & Caltech
  • 2. Significantly improve many applications on multiple domains “deep learning” trend in the past 10 years image understanding speech recognition natural language processing … Deep Learning autonomy
  • 3. Image Classification Layer 1 Layer 2 Output multilevel feature extractions from raw pixels to semantic meanings explore spatial information with convolution layers
  • 4. Image Classification § Hard to define the network § the definition of the inception network has >1k lines of codes in Caffe § A single image requires billions floating-point operations § Intel i7 ~500 GFLOPS § Nvidia Titan X: ~5 TFLOPS § Memory consumption is linear with number of layers State-of-the-art networks have tens to hundreds layers
  • 5. Outline 1 Introduction 2 Distributed Deep Learning Using Mxnet 3 Learning in Multiple Dimensions 4 Conclusion
  • 6. 3. MXNet image credit - wikipedia • Imperative and Declarative Programming • Language Support • Backend and Automatic Parallelization
  • 7. Writing Parallel Programs is Painful Each forward-backward-update involves O(num_layer), which is often 100—1,000, tensor computations and communications data = next_batch()data[gpu0].copyfrom(data[0:50]) _, fc1_wgrad[gpu0] = FullcBackward(fc1_ograd[gpu0] , fc1_weight[gpu0]) fc1_ograd[gpu0], fc2_wgrad[gpu0] = FullcBackward(fc2_ograd[gpu0] , fc2_weight[gpu0]) fc2_ograd[gpu0] = LossGrad(fc2[gpu0], label[0:50]) fc2[gpu0] = FullcForward(fc1[gpu0], fc2_weight[gpu0]) fc1[gpu0] = FullcForward(data[gpu0], fc1_weight[gpu0]) fc2_wgrad[cpu] = fc2_wgrad[gpu0] + fc2_wgrad[gpu1] fc2_weight[cpu].copyto( fc2_weight[gpu0] , fc2_weight[gpu1]) fc2_weight[cpu] -= lr*fc12_wgrad[gpu0] fc1_weight[cpu] -= lr * fc1_wgrad[gpu0] fc1_wgrad[cpu] = fc1_wgrad[gpu0] + fc1_wgrad[gpu1] fc1_weight[cpu].copyto( fc1_weight[gpu0] , fc1_weight[gpu1]) data[gpu0].copyfrom(data[51:100]) _, fc1_wgrad[gpu1] = FullcBackward(fc1_ograd[gpu1] , fc1_weight[gpu1]) fc1_ograd[gpu1], fc2_wgrad[gpu1] = FullcBackward(fc2_ograd[gpu1] , fc2_weight[gpu1]) fc2_ograd[gpu1] = LossGrad(fc2[gpu1], label[51:100]) fc2[gpu1] = FullcForward(fc1[gpu1], fc2_weight[gpu1]) fc1[gpu1] = FullcForward(data[gpu1], fc1_weight[gpu1]) Dependency graph for 2-layer neural networks with 2 GPUs
  • 8. Auto Parallelization 18 Write serial programs Run in parallel >>> import mxnet as mx >>> A = mx.nd.ones((2,2)) *2 >>> C = A + 2 >>> B = A + 1 >>> D = B * C >>> D.wait_to_read() A = 2 C = A + 2 B = A + 1 D = B ⨉ C
  • 9. Data Parallelism 19 key-value store examples 1. Read a data partition 2. Pull the parameters 3. Compute the gradient 4. Push the gradient 5. Update the parameters
  • 10. Scale to Multiple GPU Machines 21 PCIe Switch GPU GPU GPU GPU CPU Network Switch 63 GB/s 4 PCIe 3.0 16x 15.75 GB/s PCIe 3.0 16x 1.25 GB/s 10 Gbit Ethernet Hierarchical parameter server Level-1 Servers Workers Level-2 Servers GPUs CPUs
  • 11. Experiment Setup ✧ ✓ 1.2 million images with 1000 classes ✧ Resnet 152-layer model ✧ EC2 P2.16xlarge 22 GPU 0-15 PCIe switches CPU ✧ Minibatch SGD ✧ Synchronized Updating
  • 12. Scalability over Multiple Machines 23 time(sec)/bath 0 0.25 0.5 0.75 1 # of GPUs 0 32 64 96 128 Comm Cost batch size/GPU=2 batch size/GPU=4 batch size/GPU=8 batch size/GPU=16 115x
  • 13. 8 2012before 2013 2014 2015 2016 2017 mxnet imperative symbolic gluon
  • 14. Back-end System ✧ Optimization ✓ Memory optimization ✓ Operator fusion ✧ Scheduling ✓ Auto-parallelization 11 a b 1 + ⨉ c fullc softmax weight bias Back-end import mxnet as mx a = mx.nd.zeros((100, 50)) b = mx.nd.ones((100, 50)) c = a * b c += 1 import mxnet as mx net = mx.symbol.Variable('data') net = mx.symbol.FullyConnected( data=net, num_hidden=128) net = mx.symbol.SoftmaxOutput(data=net) texec = mx.module.Module(net) texec.forward(data=c) texec.backward() Front-end
  • 15. In summary ✦ Symbolic ❖ efficient & portable ❖ but hard to use 10 ✦ tesla ✦ Imperative ❖ flexible ❖ may be slow ✦ Gluon ❖ imperative for developing ❖ symbolic for deploying
  • 16. Outline 1 Introduction 2 Distributed Deep Learning Using Mxnet 3 Learning in Multiple Dimensions 4 Conclusion
  • 17. Tensors: Beyond 2D world Modern data is inherently multi-dimensional
  • 18. Tensors: Beyond 2D world Modern data is inherently multi-dimensional Input Hidden 1 Hidden 2 Output
  • 19. Tensor Contraction Extends the notion of matrix product Matrix product Mv = j vjMj = + Tensor Contraction T(u, v, ·) = i,j uivjTi,j,: = ++ +
  • 20. Employing Tensor Contractions in Alexnet Replace fully connected layer with tensor contraction layer
  • 21. Enabling Tensor Contraction Layer in Mxnet
  • 22. Performance of the TCL • Trained end-to-end • On ImageNet with VGG: • 65.9% space savings • performance drop of 0.6% only • On ImageNet with AlexNet: • 56.6% space savings • Performance improvement of 0.5%
  • 25. Speeding up Tensor Contractions 1 Tensor contractions are a core primitive of multilinear algebra. 2 BLAS 3: Unbounded compute intensity (no. of ops per I/O) Consider single-index contractions: CC = AA BB = = A(:,1,:) A(:,2,:)A422 B21 C421 e.g. Cmnp = Amnk Bkp
  • 26. Speeding up Tensor Contraction Explicit permutation dominates, especially for small tensors. Consider Cmnp = Akm Bpkn. 1 Akm → Amk 2 Bpkn → Bkpn 3 Cmnp → Cmpn 4 Cm(pn) = Amk Bk(pn) 5 Cmpn → Cmnp 100 200 300 400 500 0 0.2 0.4 0.6 0.8 1 n (Top) CPU. (Bottom) GPU. The fraction of time spent in copies/transpositions. Lines are shown with 1, 2, 3, and 6 transpositions.
  • 27. Existing Primitives GEMM Suboptimal for many small matrices. Pointer-to-Pointer BatchedGEMM Available in MKL 11.3β and cuBLAS 4.1 C[p] = α op(A[p]) op(B[p]) + β C[p] cublas<T>gemmBatched(cublasHandle_t handle, cublasOperation_t transA, cublasOperation_t transB, int M, int N, int K, const T* alpha, const T** A, int ldA, const T** B, int ldB, const T* beta, T** C, int ldC, int batchCount)
  • 28. Tensor Contraction with Extended BLAS Primitives Cmn[p] = AmkBkn[p] cublasDgemmStridedBatched(handle, CUBLAS_OP_N, CUBLAS_OP_N, M, N, K, &alpha, A, ldA1, 0, B, ldB1, ldB2, &beta, C, ldC1, ldC2, P)
  • 29. Tensor Contraction with Extended BLAS Primitives Cmnp = A∗∗ × B∗∗∗ Cmnp ≡ C[m + n · ldC1 + p · ldC2] Case Contraction Kernel1 Kernel2 Case Contraction Kernel1 Kernel2 1.1 AmkBknp Cm(np) = AmkBk(np) Cmn[p] = AmkBkn[p] 4.1 AknBkmp Cmn[p] = Bkm[p]Akn 1.2 AmkBkpn Cmn[p] = AmkBk[p]n Cm[n]p = AmkBkp[n] 4.2 AknBkpm Cmn[p] = Bk[p]mAkn 1.3 AmkBnkp Cmn[p] = AmkBnk[p] 4.3 AknBmkp Cmn[p] = Bmk[p]Akn 1.4 AmkBpkn Cm[n]p = AmkBpk[n] 4.4 AknBpkm 1.5 AmkBnpk Cm(np) = AmkB(np)k Cmn[p] = AmkBn[p]k 4.5 AknBmpk Cmn[p] = Bm[p]kAkn 1.6 AmkBpnk Cm[n]p = AmkBp[n]k 4.6 AknBpmk 2.1 AkmBknp Cm(np) = AkmBk(np) Cmn[p] = AkmBkn[p] 5.1 ApkBkmn C(mn)p = Bk(mn)Apk Cm[n]p = Bkm[n]Apk 2.2 AkmBkpn Cmn[p] = AkmBk[p]n Cm[n]p = AkmBkp[n] 5.2 ApkBknm Cm[n]p = Bk[n]mApk 2.3 AkmBnkp Cmn[p] = AkmBnk[p] 5.3 ApkBmkn Cm[n]p = Bmk[n]Apk 2.4 AkmBpkn Cm[n]p = AkmBpk[n] 5.4 ApkBnkm 2.5 AkmBnpk Cm(np) = AkmB(np)k Cmn[p] = AkmBn[p]k 5.5 ApkBmnk C(mn)p = B(mn)kApk Cm[n]p = Bm[n]kApk 2.6 AkmBpnk Cm[n]p = AkmBp[n]k 5.6 ApkBnmk 3.1 AnkBkmp Cmn[p] = Bkm[p]Ank 6.1 AkpBkmn C(mn)p = Bk(mn)Akp Cm[n]p = Bkm[n]Akp 3.2 AnkBkpm Cmn[p] = Bk[p]mAnk 6.2 AkpBknm Cm[n]p = Bk[n]mAkp 3.3 AnkBmkp Cmn[p] = Bmk[p]Ank 6.3 AkpBmkn Cm[n]p = Bmk[n]Akp 3.4 AnkBpkm 6.4 AkpBnkm 3.5 AnkBmpk Cmn[p] = Bm[p]kAnk 6.5 AkpBmnk C(mn)p = B(mn)kAkp Cm[n]p = Bm[n]kAkp 3.6 AnkBpmk 6.6 AkpBnmk
  • 30. A new primitive: StridedBatchedGEMM Performance on par with pure GEMM (P100 and beyond).
  • 31. Applications: Tucker Decomposition Tmnp = GijkAmiBnjCpk mnp ijk mi njT G A B pkC Main steps in the algorithm Ymjk = TmnpBt njCt pk Yink = TmnpAt+1 mi Ct pk Yijp = TmnpBt+1 nj At+1 mi Performance on Tucker decomposition: 20 40 60 80 100 120 10−2 100 102 104 106 n Time(sec) TensorToolbox BTAS Cyclops CPU Batched GPU Batched
  • 32. Tensor Sketches Randomized dimensionality reduction through sketching. ◮ Complexity independent of tensor order: exponential gain! +1 +1 -1 Tensor T Sketch s Applications Tensor Decomposition via Sketching Visual Question and Answering CNN RNN What is the mustach made of? C W H MCT L Avgpooling FC Relu BatchNorm FC "Banana" Softmax
  • 33. MCT in Visual Question & Answering CNN RNN ‡ ¡t is the musta™  ¢¡£¤ ¥¦§ C ¨ r w© v ev FC Relu f ! m FC 4#$%$na4 Softma x
  • 34. Multimodal Tensor Pooling C W H L Text feature Image feature d1 d2 d3Spatial sketch Count sketch 3D FFT 1D FFT 3D IFFT (optional) d4 d1 d2 d3
  • 36. Extracting Topics from Documents Topics Topic Proportion police witness campus police witness campus police witness campus police witness crime Sports Educaon campus A., D. P. Foster, D. Hsu, S.M. Kakade, Y.K. Liu.“Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation,” NIPS 2012.
  • 37. Tensor Methods for Topic Modeling campus police witness Topic-word matrix P[word = i|topic = j] Linearly independent columns Moment Tensor: Co-occurrence of Word Triplets = + + campus police witness crim e Sports Educa on campus police witness cam pus police witness
  • 38. Tensors vs. Variational Inference Criterion: Perplexity = exp[−likelihood]. Learning Topics from PubMed on Spark, 8mil articles 0 2 4 6 8 10 ×104 RunningTime 103 104 105 Perplexity Tensor Variational Learning network communities from social network data Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6. 102 10 3 10 4 105 10 6 RunningTime FB YP DBLPsub DBLP 10-2 10-1 10 0 101 Error FB YP DBLPsub DBLP F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
  • 39. Tensors vs. Variational Inference Criterion: Perplexity = exp[−likelihood]. Learning Topics from PubMed on Spark, 8mil articles 0 2 4 6 8 10 ×104 RunningTime 103 104 105 Perplexity Tensor Variational Learning network communities from social network data Facebook n ∼ 20k, Yelp n ∼ 40k, DBLP-sub n ∼ 1e5, DBLP n ∼ 1e6. 102 10 3 10 4 105 10 6 RunningTime FB YP DBLPsub DBLP 10-2 10-1 10 0 101 Error FB YP DBLPsub DBLP Orders of Magnitude Faster More Accurate F. Huang, U.N. Niranjan, M. Hakeem, A, “Online tensor methods for training latent variable models,” JMLR 2014.
  • 40. Outline 1 Introduction 2 Distributed Deep Learning Using Mxnet 3 Learning in Multiple Dimensions 4 Conclusion
  • 41. Conclusion Distributed Deep Learning at Scale Mxnet has many attractive features ◮ Flexible programming ◮ Portable ◮ Highly efficient Easy to deploy large-scale DL on AWS cloud ◮ Deep Learning AMI ◮ Cloud formation templates Tensors are the future of ML Tensor contractions: space savings in deep architectures. New primitives speed up tensor contractions: extended BLAS = ++ + T u v = + ....