SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alexander Smola
AWS Machine Learning
Personalization and Scalable Deep Learning with MXNET
Outline
• Personalization
• Latent Variable Models
• User Engagement and Return Times
• Deep Recommender Systems
• MXNet
• Basic concepts
• Launching a cluster in a minute
• Imagenet for beginners
Personalization
Latent Variable Models
• Temporal sequence of observations

Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Clusters (navigational, informational queries in search)
• Topics (interest distributions for users over time)
• Kalman Filter (trajectory and location modeling)
Action
Explanation
Latent Variable Models
• Temporal sequence of observations

Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Clusters (navigational, informational queries in search)
• Topics (interest distributions for users over time)
• Kalman Filter (trajectory and location modeling)
Action
Explanation
Are the parametric models really true?
Latent Variable Models
• Temporal sequence of observations

Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Nonparametric model / spectral
• Use data to determine shape
• Sidestep approximate inference
x
h
ht = f(xt 1, ht 1)
xt = g(xt 1, ht)
Latent Variable Models
• Temporal sequence of observations

Purchases, likes, app use, e-mails, ad clicks, queries, ratings
• Latent state to explain behavior
• Plain deep network = RNN
• Deep network with attention = LSTM / GRU …

(learn when to update state, how to read out)
x
h
Long Short Term Memory
x
h
Schmidhuber and Hochreiter, 1998
it = (Wi(xt, ht) + bi)
ft = (Wf (xt, ht) + bf )
zt+1 = ft · zt + it · tanh(Wz(xt, ht) + bz)
ot = (Wo(xt, ht, zt+1) + bo)
ht+1 = ot · tanh zt+1
Long Short Term Memory
x
h
Schmidhuber and Hochreiter, 1998
(zt+1, ht+1, ot) = LSTM(zt, ht, xt)
Treat it as a black box
User Engagement
9:01 8:55 11:50
12:30
never
next week
?
(app frame toutiao.com)
User Engagement Modeling
• User engagement is gradual
• Daily average users?
• Weekly average users?
• Number of active users?
• Number of users?
• Abandonment is passive
• The last time you tweeted? Pin? Like? Skype?
• Churn models assume active abandonment 

(insurance, phone, bank)
9:01
User Engagement Modeling
• User engagement is gradual
• Model user returns
• Context of activity
• World events (elections, Super Bowl, …)
• User habits (morning reader, night owl)
• Previous reading behavior

(poor quality content will discourage return)
9:01
Survival Analysis 101
• Model population where something dramatic happens
• Cancer patients (death; efficacy of a drug)
• Atoms (radioactive decay)
• Japanese women (marriage)
• Users (opens app)
• Survival probability
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JA
well known that the differential equation can be solved
partial integration, i.e.
Pr(tsurvival T) = exp
Z T
0
(T)dt
!
. (2)
ce, if the patient survives until time T and we stop
kernel
time t
Conse
hazard rate function
Session Model
• User activity is sequence of times
• bi when app is opened
• ei when app is closed
• In between wait for user return
• Model user activity likelihood
start
end
Look up
table
One-hot
UserID
Hidden2
Hidden1
User
Embedding
Look up
table
One-hot
TimeID
Time
Embedding……
0 0 1 0 0 0……
……
0 0 0 1 0 0……
……
……
……
External
Feature
Rate
Fig. 1. A Personalized Time-Aware architecture for Survival Analysis.
Given the data from previous session, we aims to predict the (quantized)
rate values for the next session.
tun
to
[39
sp
[40
of
tho
in
to
mo
ins
lin
lea
Session Model
start
end
Personalized LSTM
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JANUARY 2016 8
Hidden2
Hidden1
Input
……
……
……
Hidden2
Hidden1
……
……
Hidden2
Hidden1
……
……
Input
……
Input
……
Session s-2 Session s-1 Session s
Fig. 2. Unfolded LSTM network for 3 sessions. The input vector for session s is the concatenation of user embedding, time slot embedding and the
• LSTM for global state update
• LSTM for indvidual state update
• Update both of them
• Learn using backprop and SGD
Jing and Smola, WSDM’17
Perplexity (quality of prediction)
next visit time (hour)
Fig. 6. The histogram of the time period between two sessions. The top
one is from Toutiao and the bottom one is from Last.fm. The small bump
around 24 hours corresponds to users having a daily habit of using the
app at the same time.
global constant model. A static model with only one pa-
rameter, assuming that the rate is constant throughout
the time frame for all users.
global+user constant model. A static model that assumes
that the rate is an additive function of a global constant
and a user-specific constant model.
piecewise constant model. A more flexible static model
that learns parameters for each discretized bin.
Hawkes process. A self-exciting point process that respects
past sessions.
integrated model. A combined model with all the above
components.
DNN. A model that assumes that the rate is a function
of time, user, session feature, parameterized by a deep
neural network.
LSTM. A recurrent neural network that incorporates past
activities.
For completeness, we also report the result for Cox’s model
where the Hazard Rate is given by
u(t) = 0(t) exp(h , xu(t)i) (28)
perp = exp
⇣ 1
M
mX
u=1
muX
i=1
log p({bi, ei}; )
⌘
(29)
where M is the total number of sessions in the test set. The
lower the value, the better the model is at explaining the
test data. In other words, perplexity measures the amount
of surprise in a user’s behavior relative to our prediction.
Obviously a good model can predict well, hence there will
be less surprise.
6.6 Model Comparison
The summarized results are shown in table 1. As can be seen
from the table, there is a big gap between linear models
and the two deep models. The Cox model is inferior to
our integrated model and significantly worse than the deep
networks.
model Toutiao Last.fm
Cox Model 27.13 28.31
global constant 45.29 59.98
user constant 28.74 45.44
piecewise constant 26.88 26.12
Hawkes process 22.58 30.80
integrated model 21.56 26.06
DNN 18.87 20.62
LSTM 18.10 19.80
TABLE 1
Average perplexity evaluated on the test set for different models.
flexible static model
iscretized bin.
nt process that respects
el with all the above
the rate is a function
ameterized by a deep
that incorporates past
result for Cox’s model
xu(t)i) (28)
from the table, there is a big gap between line
and the two deep models. The Cox model is
our integrated model and significantly worse than
networks.
model Toutiao Last.fm
Cox Model 27.13 28.31
global constant 45.29 59.98
user constant 28.74 45.44
piecewise constant 26.88 26.12
Hawkes process 22.58 30.80
integrated model 21.56 26.06
DNN 18.87 20.62
LSTM 18.10 19.80
TABLE 1
Average perplexity evaluated on the test set for different
Perplexity (quality of prediction)IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JANUARY 2016
Toutiao Last.fm
# of sessions (%)
0 20 40 60 80 100
Perplexity
0
20
40
60
80
100
120
140
160
global constant
user constant
piecewise constant
Hawkes Process
Integrated
Cox
DNN
LSTM
# of sessions (%)
0 20 40 60 80 100
Perplexity
0
20
40
60
80
100
120
140
160
180
global constant
user constant
piecewise constant
Hawkes Process
Integrated
Cox
DNN
LSTM
%)
50
LSTM v.s. Integrated
LSTM v.s. Cox
%)
45
50
LSTM v.s. Integrated
LSTM v.s. Cox
# of sessions (%)
0 20 40 60 80 100
0
20
# of sessions (%)
0 5 10 15 20
RelativeImprovements(%)
0
10
20
30
40
50
LSTM v.s. Integrated
LSTM v.s. Cox
Fig. 7. Top row: Average test perplexity as a function of the fraction of o
LSTMs over the integrated and the Cox model. Left column: Toutiao datJing and Smola, WSDM’17
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
0.6
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
t (hour)
0 20 40 60 80
λ(t)
0
0.1
0.2
0.3
0.4
0.5
instantaneous rate
actual return time
t (hour)
0 20 40 60 80
Pr(return≥t)
0
0.2
0.4
0.6
0.8
1
survival function
actual return time
g. 9. Six randomly sampled learned predictive rate function. Three from toutiao (left) and three from Last.fm (right). Each pair of figure denotes
e instantaneous rate value (t) (purple), the survival function p(return t) in red, and the actual return time in blue. Clearly, our deep model is
Recommender Systems
Recommender systems, not recommender archaeology
users
items
time
NOW
predict that
(future)
use this
(past)
don’t predict this
(archaeology)
The Netflix contest
got it wrong …
Getting it right
change in
taste and
expertise
change in
perception
and novelty
LSTM
LSTM
Wu et al, WSDM’17
Wu et al, WSDM’17
Prizes
Sanity Check
Deep Learning with MXNet
Caffe
Torch
Theano
Tensorflow
CNTK
Keras
Paddle
(image - Banksy/wikipedia)
Why yet another deep networks tool?
Why yet another deep networks tool?
• Frugality & resource efficiency

Engineered for cheap GPUs with smaller memory, slow networks
• Speed
• Linear scaling with #machines and #GPUs
• High efficiency on single machine, too (C++ backend)
• Simplicity

Mix declarative and imperative code
single implementation of
backend system and
common operators
performance guarantee
regardless which frontend
language is used
frontend
backend
Imperative Programs
import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
print c
d = c + 1 Easy to tweak
with python
codes
Pro
• Straightforward and flexible.
• Take advantage of language native
features (loop, condition, debugger)
Con
• Hard to optimize
Declarative Programs
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
Pro
• More chances for optimization
• Cross different languages
Con
• Less flexible
A B
1
+
⨉
C can share memory with D,
because C is deleted later
Imperative vs. Declarative for Deep Learning
Computational Graph
of the Deep Architecture
forward backward
Needs heavy optimization,
fits declarative programs
Needs mutation and more
language native features, good for
imperative programs
Updates and Interactions
with the graph
• Iteration loops
• Parameter update

• Beam search
• Feature extraction …
w w ⌘@wf(w)
Mixed Style Training Loop in MXNet
executor = neuralnetwork.bind()
for i in range(3):
train_iter.reset()
for dbatch in train_iter:
args["data"][:] = dbatch.data[0]
args["softmax_label"][:] = dbatch.label[0]
executor.forward(is_train=True)
executor.backward()
for key in update_keys:
args[key] -= learning_rate * grads[key]
Imperative NDArray can be set as input
nodes to the graph
Executor is bound from
declarative program that
describes the network
Imperative parameter update on GPU
Mixed API for Quick Extensions
• Runtime switching between different graphs depending on input
• Useful for sequence modeling and image size reshaping
• Use of imperative code in Python, 10 lines of additional Python code
BucketingVariable length sentences
3D Image Construction
Deep3D
100 lines of Python code
https://github.com/piiswrong/deep3d
Distributed Deep Learning
Distributed Deep Learning
Distributed Deep Learning
## train
num_gpus = 4
gpus = [mx.gpu(i) for i in range(num_gpus)]
model = mx.model.FeedForward(
ctx = gpus,
symbol = softmax,
num_round = 20,
learning_rate = 0.01,
momentum = 0.9,
wd = 0.00001)
model.fit(X = train, eval_data = val, batch_end_callback = mx.callback.Speedometer(batch_size=batch_size))
2 lines for multi GPU
Scaling on p2.16xlarge
alexnet
inception-v3
resnet-50
GPUs GPUs
average throughput
per GPU
aggregate throughput
GPU-GPU sync
alexnet
inception-v3
resnet-50 108x
75x
Demo
Getting Started
• Website

http://mxnet.io/
• GitHub repository

git clone —recursive git@github.com:dmlc/mxnet.git
• Docker

docker pull dmlc/mxnet
• Amazon AWS Deep Learning AMI (with other toolkits & anaconda)

https://aws.amazon.com/marketplace/pp/B01M0AXXQB

http://bit.ly/deepami
• CloudFormation Template

https://github.com/dmlc/mxnet/tree/master/tools/cfn 

http://bit.ly/deepcfn
Acknowledgements
• User engagement

How Jing, Chao-Yuan Wu
• Temporal recommenders

Chao-Yuan Wu, Alex Beutel, Amr Ahmed
• MXNet & Deep Learning AMI

Mu Li, Tianqi Chen, Bing Xu, Eric Xie, Joseph Spisak,
Naveen Swamy, Anirudh Subramanian and many more …
We are hiring
{smola, thakerb, spisakj}@amazon.com

Contenu connexe

Tendances

Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowS N
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
TensorFlow Tutorial Part1
TensorFlow Tutorial Part1TensorFlow Tutorial Part1
TensorFlow Tutorial Part1Sungjoon Choi
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningSri Ambati
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2Park Chunduck
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learningpauldix
 
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)Amazon Web Services
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaAndre Pemmelaar
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksJunKudo2
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
 

Tendances (20)

Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
TensorFlow Tutorial Part1
TensorFlow Tutorial Part1TensorFlow Tutorial Part1
TensorFlow Tutorial Part1
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learning
 
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres Rodriguez
 

En vedette

Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017MLconf
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...MLconf
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017MLconf
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...MLconf
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017MLconf
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017MLconf
 
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016MLconf
 
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017MLconf
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017MLconf
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 

En vedette (20)

Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
 
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
 
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
Sanjeev Satheesj, Research Scientist, Baidu at The AI Conference 2017
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 

Similaire à Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016

Power System Simulation: History, State of the Art, and Challenges
Power System Simulation: History, State of the Art, and ChallengesPower System Simulation: History, State of the Art, and Challenges
Power System Simulation: History, State of the Art, and ChallengesLuigi Vanfretti
 
Modeling & Simulation Lecture Notes
Modeling & Simulation Lecture NotesModeling & Simulation Lecture Notes
Modeling & Simulation Lecture NotesFellowBuddy.com
 
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Luigi Vanfretti
 
Entropy 12-02268-v2
Entropy 12-02268-v2Entropy 12-02268-v2
Entropy 12-02268-v2CAA Sudan
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...Matteo Ferroni
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...ssuser4b1f48
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018Olaf de Leeuw
 
What is the likely future of real-time transient stability?
What is the likely future of real-time transient stability?What is the likely future of real-time transient stability?
What is the likely future of real-time transient stability?Université de Liège (ULg)
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 
Modeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovationsModeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovationsNicola Barbieri
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Pooyan Jamshidi
 
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...IRJET Journal
 
Controls Based Q Measurement Report
Controls Based Q Measurement ReportControls Based Q Measurement Report
Controls Based Q Measurement ReportLouis Gitelman
 
Md simulation and stochastic simulation
Md simulation and stochastic simulationMd simulation and stochastic simulation
Md simulation and stochastic simulationAbdulAhad358
 
Technical Trends_Study of Quantum
Technical Trends_Study of QuantumTechnical Trends_Study of Quantum
Technical Trends_Study of QuantumHardik Gohel
 
AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017
AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017
AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017Amazon Web Services Korea
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...Soumya Banerjee
 
A temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksA temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksDaniele Loiacono
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaAlceu Ferraz Costa
 

Similaire à Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016 (20)

Power System Simulation: History, State of the Art, and Challenges
Power System Simulation: History, State of the Art, and ChallengesPower System Simulation: History, State of the Art, and Challenges
Power System Simulation: History, State of the Art, and Challenges
 
Modeling & Simulation Lecture Notes
Modeling & Simulation Lecture NotesModeling & Simulation Lecture Notes
Modeling & Simulation Lecture Notes
 
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
Modeling and Simulation of Electrical Power Systems using OpenIPSL.org and Gr...
 
Entropy 12-02268-v2
Entropy 12-02268-v2Entropy 12-02268-v2
Entropy 12-02268-v2
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
What is the likely future of real-time transient stability?
What is the likely future of real-time transient stability?What is the likely future of real-time transient stability?
What is the likely future of real-time transient stability?
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
Modeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovationsModeling adoptions and the stages of the diffusion of innovations
Modeling adoptions and the stages of the diffusion of innovations
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
 
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
IRJET- Two-Class Priority Queueing System with Restricted Number of Priority ...
 
Controls Based Q Measurement Report
Controls Based Q Measurement ReportControls Based Q Measurement Report
Controls Based Q Measurement Report
 
Md simulation and stochastic simulation
Md simulation and stochastic simulationMd simulation and stochastic simulation
Md simulation and stochastic simulation
 
Technical Trends_Study of Quantum
Technical Trends_Study of QuantumTechnical Trends_Study of Quantum
Technical Trends_Study of Quantum
 
main
mainmain
main
 
AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017
AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017
AWS 클라우드를 통한 쓰나미 연구 사례: 日츄오대 - AWS Summit Seoul 2017
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
 
A temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksA temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networks
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social Media
 

Plus de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Plus de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Dernier

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Dernier (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Alexander Smola AWS Machine Learning Personalization and Scalable Deep Learning with MXNET
  • 2. Outline • Personalization • Latent Variable Models • User Engagement and Return Times • Deep Recommender Systems • MXNet • Basic concepts • Launching a cluster in a minute • Imagenet for beginners
  • 4. Latent Variable Models • Temporal sequence of observations
 Purchases, likes, app use, e-mails, ad clicks, queries, ratings • Latent state to explain behavior • Clusters (navigational, informational queries in search) • Topics (interest distributions for users over time) • Kalman Filter (trajectory and location modeling) Action Explanation
  • 5. Latent Variable Models • Temporal sequence of observations
 Purchases, likes, app use, e-mails, ad clicks, queries, ratings • Latent state to explain behavior • Clusters (navigational, informational queries in search) • Topics (interest distributions for users over time) • Kalman Filter (trajectory and location modeling) Action Explanation Are the parametric models really true?
  • 6. Latent Variable Models • Temporal sequence of observations
 Purchases, likes, app use, e-mails, ad clicks, queries, ratings • Latent state to explain behavior • Nonparametric model / spectral • Use data to determine shape • Sidestep approximate inference x h ht = f(xt 1, ht 1) xt = g(xt 1, ht)
  • 7. Latent Variable Models • Temporal sequence of observations
 Purchases, likes, app use, e-mails, ad clicks, queries, ratings • Latent state to explain behavior • Plain deep network = RNN • Deep network with attention = LSTM / GRU …
 (learn when to update state, how to read out) x h
  • 8. Long Short Term Memory x h Schmidhuber and Hochreiter, 1998 it = (Wi(xt, ht) + bi) ft = (Wf (xt, ht) + bf ) zt+1 = ft · zt + it · tanh(Wz(xt, ht) + bz) ot = (Wo(xt, ht, zt+1) + bo) ht+1 = ot · tanh zt+1
  • 9. Long Short Term Memory x h Schmidhuber and Hochreiter, 1998 (zt+1, ht+1, ot) = LSTM(zt, ht, xt) Treat it as a black box
  • 10. User Engagement 9:01 8:55 11:50 12:30 never next week ? (app frame toutiao.com)
  • 11. User Engagement Modeling • User engagement is gradual • Daily average users? • Weekly average users? • Number of active users? • Number of users? • Abandonment is passive • The last time you tweeted? Pin? Like? Skype? • Churn models assume active abandonment 
 (insurance, phone, bank) 9:01
  • 12. User Engagement Modeling • User engagement is gradual • Model user returns • Context of activity • World events (elections, Super Bowl, …) • User habits (morning reader, night owl) • Previous reading behavior
 (poor quality content will discourage return) 9:01
  • 13. Survival Analysis 101 • Model population where something dramatic happens • Cancer patients (death; efficacy of a drug) • Atoms (radioactive decay) • Japanese women (marriage) • Users (opens app) • Survival probability TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JA well known that the differential equation can be solved partial integration, i.e. Pr(tsurvival T) = exp Z T 0 (T)dt ! . (2) ce, if the patient survives until time T and we stop kernel time t Conse hazard rate function
  • 14. Session Model • User activity is sequence of times • bi when app is opened • ei when app is closed • In between wait for user return • Model user activity likelihood start end
  • 15. Look up table One-hot UserID Hidden2 Hidden1 User Embedding Look up table One-hot TimeID Time Embedding…… 0 0 1 0 0 0…… …… 0 0 0 1 0 0…… …… …… …… External Feature Rate Fig. 1. A Personalized Time-Aware architecture for Survival Analysis. Given the data from previous session, we aims to predict the (quantized) rate values for the next session. tun to [39 sp [40 of tho in to mo ins lin lea Session Model start end
  • 16. Personalized LSTM IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JANUARY 2016 8 Hidden2 Hidden1 Input …… …… …… Hidden2 Hidden1 …… …… Hidden2 Hidden1 …… …… Input …… Input …… Session s-2 Session s-1 Session s Fig. 2. Unfolded LSTM network for 3 sessions. The input vector for session s is the concatenation of user embedding, time slot embedding and the • LSTM for global state update • LSTM for indvidual state update • Update both of them • Learn using backprop and SGD Jing and Smola, WSDM’17
  • 17. Perplexity (quality of prediction) next visit time (hour) Fig. 6. The histogram of the time period between two sessions. The top one is from Toutiao and the bottom one is from Last.fm. The small bump around 24 hours corresponds to users having a daily habit of using the app at the same time. global constant model. A static model with only one pa- rameter, assuming that the rate is constant throughout the time frame for all users. global+user constant model. A static model that assumes that the rate is an additive function of a global constant and a user-specific constant model. piecewise constant model. A more flexible static model that learns parameters for each discretized bin. Hawkes process. A self-exciting point process that respects past sessions. integrated model. A combined model with all the above components. DNN. A model that assumes that the rate is a function of time, user, session feature, parameterized by a deep neural network. LSTM. A recurrent neural network that incorporates past activities. For completeness, we also report the result for Cox’s model where the Hazard Rate is given by u(t) = 0(t) exp(h , xu(t)i) (28) perp = exp ⇣ 1 M mX u=1 muX i=1 log p({bi, ei}; ) ⌘ (29) where M is the total number of sessions in the test set. The lower the value, the better the model is at explaining the test data. In other words, perplexity measures the amount of surprise in a user’s behavior relative to our prediction. Obviously a good model can predict well, hence there will be less surprise. 6.6 Model Comparison The summarized results are shown in table 1. As can be seen from the table, there is a big gap between linear models and the two deep models. The Cox model is inferior to our integrated model and significantly worse than the deep networks. model Toutiao Last.fm Cox Model 27.13 28.31 global constant 45.29 59.98 user constant 28.74 45.44 piecewise constant 26.88 26.12 Hawkes process 22.58 30.80 integrated model 21.56 26.06 DNN 18.87 20.62 LSTM 18.10 19.80 TABLE 1 Average perplexity evaluated on the test set for different models. flexible static model iscretized bin. nt process that respects el with all the above the rate is a function ameterized by a deep that incorporates past result for Cox’s model xu(t)i) (28) from the table, there is a big gap between line and the two deep models. The Cox model is our integrated model and significantly worse than networks. model Toutiao Last.fm Cox Model 27.13 28.31 global constant 45.29 59.98 user constant 28.74 45.44 piecewise constant 26.88 26.12 Hawkes process 22.58 30.80 integrated model 21.56 26.06 DNN 18.87 20.62 LSTM 18.10 19.80 TABLE 1 Average perplexity evaluated on the test set for different
  • 18. Perplexity (quality of prediction)IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 100, NO. 10, JANUARY 2016 Toutiao Last.fm # of sessions (%) 0 20 40 60 80 100 Perplexity 0 20 40 60 80 100 120 140 160 global constant user constant piecewise constant Hawkes Process Integrated Cox DNN LSTM # of sessions (%) 0 20 40 60 80 100 Perplexity 0 20 40 60 80 100 120 140 160 180 global constant user constant piecewise constant Hawkes Process Integrated Cox DNN LSTM %) 50 LSTM v.s. Integrated LSTM v.s. Cox %) 45 50 LSTM v.s. Integrated LSTM v.s. Cox # of sessions (%) 0 20 40 60 80 100 0 20 # of sessions (%) 0 5 10 15 20 RelativeImprovements(%) 0 10 20 30 40 50 LSTM v.s. Integrated LSTM v.s. Cox Fig. 7. Top row: Average test perplexity as a function of the fraction of o LSTMs over the integrated and the Cox model. Left column: Toutiao datJing and Smola, WSDM’17
  • 19. t (hour) 0 20 40 60 80 λ(t) 0 0.1 0.2 0.3 0.4 0.5 instantaneous rate actual return time t (hour) 0 20 40 60 80 Pr(return≥t) 0 0.2 0.4 0.6 0.8 1 survival function actual return time t (hour) 0 20 40 60 80 λ(t) 0 0.1 0.2 0.3 0.4 0.5 instantaneous rate actual return time t (hour) 0 20 40 60 80 Pr(return≥t) 0 0.2 0.4 0.6 0.8 1 survival function actual return time t (hour) 0 20 40 60 80 λ(t) 0 0.1 0.2 0.3 0.4 0.5 instantaneous rate actual return time t (hour) 0 20 40 60 80 Pr(return≥t) 0 0.2 0.4 0.6 0.8 1 survival function actual return time t (hour) 0 20 40 60 80 λ(t) 0 0.1 0.2 0.3 0.4 0.5 0.6 instantaneous rate actual return time t (hour) 0 20 40 60 80 Pr(return≥t) 0 0.2 0.4 0.6 0.8 1 survival function actual return time t (hour) 0 20 40 60 80 λ(t) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 instantaneous rate actual return time t (hour) 0 20 40 60 80 Pr(return≥t) 0 0.2 0.4 0.6 0.8 1 survival function actual return time t (hour) 0 20 40 60 80 λ(t) 0 0.1 0.2 0.3 0.4 0.5 instantaneous rate actual return time t (hour) 0 20 40 60 80 Pr(return≥t) 0 0.2 0.4 0.6 0.8 1 survival function actual return time g. 9. Six randomly sampled learned predictive rate function. Three from toutiao (left) and three from Last.fm (right). Each pair of figure denotes e instantaneous rate value (t) (purple), the survival function p(return t) in red, and the actual return time in blue. Clearly, our deep model is
  • 21. Recommender systems, not recommender archaeology users items time NOW predict that (future) use this (past) don’t predict this (archaeology)
  • 22. The Netflix contest got it wrong …
  • 23. Getting it right change in taste and expertise change in perception and novelty LSTM LSTM Wu et al, WSDM’17
  • 24. Wu et al, WSDM’17
  • 29. Why yet another deep networks tool? • Frugality & resource efficiency
 Engineered for cheap GPUs with smaller memory, slow networks • Speed • Linear scaling with #machines and #GPUs • High efficiency on single machine, too (C++ backend) • Simplicity
 Mix declarative and imperative code single implementation of backend system and common operators performance guarantee regardless which frontend language is used frontend backend
  • 30. Imperative Programs import numpy as np a = np.ones(10) b = np.ones(10) * 2 c = b * a print c d = c + 1 Easy to tweak with python codes Pro • Straightforward and flexible. • Take advantage of language native features (loop, condition, debugger) Con • Hard to optimize
  • 31. Declarative Programs A = Variable('A') B = Variable('B') C = B * A D = C + 1 f = compile(D) d = f(A=np.ones(10), B=np.ones(10)*2) Pro • More chances for optimization • Cross different languages Con • Less flexible A B 1 + ⨉ C can share memory with D, because C is deleted later
  • 32. Imperative vs. Declarative for Deep Learning Computational Graph of the Deep Architecture forward backward Needs heavy optimization, fits declarative programs Needs mutation and more language native features, good for imperative programs Updates and Interactions with the graph • Iteration loops • Parameter update
 • Beam search • Feature extraction … w w ⌘@wf(w)
  • 33. Mixed Style Training Loop in MXNet executor = neuralnetwork.bind() for i in range(3): train_iter.reset() for dbatch in train_iter: args["data"][:] = dbatch.data[0] args["softmax_label"][:] = dbatch.label[0] executor.forward(is_train=True) executor.backward() for key in update_keys: args[key] -= learning_rate * grads[key] Imperative NDArray can be set as input nodes to the graph Executor is bound from declarative program that describes the network Imperative parameter update on GPU
  • 34. Mixed API for Quick Extensions • Runtime switching between different graphs depending on input • Useful for sequence modeling and image size reshaping • Use of imperative code in Python, 10 lines of additional Python code BucketingVariable length sentences
  • 35. 3D Image Construction Deep3D 100 lines of Python code https://github.com/piiswrong/deep3d
  • 38. Distributed Deep Learning ## train num_gpus = 4 gpus = [mx.gpu(i) for i in range(num_gpus)] model = mx.model.FeedForward( ctx = gpus, symbol = softmax, num_round = 20, learning_rate = 0.01, momentum = 0.9, wd = 0.00001) model.fit(X = train, eval_data = val, batch_end_callback = mx.callback.Speedometer(batch_size=batch_size)) 2 lines for multi GPU
  • 39. Scaling on p2.16xlarge alexnet inception-v3 resnet-50 GPUs GPUs average throughput per GPU aggregate throughput GPU-GPU sync alexnet inception-v3 resnet-50 108x 75x
  • 40. Demo
  • 41. Getting Started • Website
 http://mxnet.io/ • GitHub repository
 git clone —recursive git@github.com:dmlc/mxnet.git • Docker
 docker pull dmlc/mxnet • Amazon AWS Deep Learning AMI (with other toolkits & anaconda)
 https://aws.amazon.com/marketplace/pp/B01M0AXXQB
 http://bit.ly/deepami • CloudFormation Template
 https://github.com/dmlc/mxnet/tree/master/tools/cfn 
 http://bit.ly/deepcfn
  • 42. Acknowledgements • User engagement
 How Jing, Chao-Yuan Wu • Temporal recommenders
 Chao-Yuan Wu, Alex Beutel, Amr Ahmed • MXNet & Deep Learning AMI
 Mu Li, Tianqi Chen, Bing Xu, Eric Xie, Joseph Spisak, Naveen Swamy, Anirudh Subramanian and many more … We are hiring {smola, thakerb, spisakj}@amazon.com