SlideShare une entreprise Scribd logo
1  sur  139
By: Fernando Velasco
Training Big Data Spain
http://xlic.es/v/7208D8
© Stratio 2016. Confidential, All Rights Reserved. 2
A man needs three names
© Stratio 2016. Confidential, All Rights Reserved. 3
● Mathematician
A man needs three names
© Stratio 2016. Confidential, All Rights Reserved. 4
● Data Scientist
● Mathematician
A man needs three names
© Stratio 2016. Confidential, All Rights Reserved. 5
● Data Scientist
● Mathematician
● Stratian
A man needs three names
© Stratio 2016. Confidential, All Rights Reserved. 6
● Data Scientist
● Mathematician
● Stratian
fvelasco@stratio.com
A man needs three names
1
2
3
4
5
© Stratio 2016. Confidential, All Rights Reserved.
INDEX
Introduction
Data Centric Environment
● Distributed TensorFlow example. Keras
Neural Nets
● BackPropagation
Recurrent Neural Networks
● LSTM
Autoencoders
● Data Augmentation
● VAE
1
2
3
4
5
© Stratio 2016. Confidential, All Rights Reserved.
INDEX
Introduction
Data Centric Environment
● Distributed TensorFlow example. Keras
Neural Nets
● BackPropagation
Recurrent Neural Networks
● LSTM
Autoencoders
● Data Augmentation
● VAE
© Stratio 2016. Confidential, All Rights Reserved.
Who are we? Where do we come from? Where are we going?
9
© Stratio 2016. Confidential, All Rights Reserved.
Who are we? Where do we come from? Where are we going?
10
© Stratio 2016. Confidential, All Rights Reserved. 11
© Stratio 2016. Confidential, All Rights Reserved. 12
© Stratio 2016. Confidential, All Rights Reserved. 13
Technical
Environment
© Stratio 2016. Confidential, All Rights Reserved.
Data Centricity
15
Mobile APP Campaign
Management
E-commerce
Digital Marketing
Legacy
Application
Call centerSAP : ERP
ATG
TPV APP
CRM
© Stratio 2016. Confidential, All Rights Reserved.
Data Centricity
16
DATA
Mobile APP Campaign
Management
E-commerce
Digital Marketing
Legacy
Application
Call centerSAP : ERP
ATG
TPV APP
CRM
Data Intelligence
API DaaS
© Stratio 2016. Confidential, All Rights Reserved.
Data Centricity
17
DATA
Mobile APP Campaign
Management
E-commerce
Digital Marketing
Legacy
Application
Call centerSAP : ERP
ATG
TPV APP
CRM
Data Intelligence
API DaaS
● Unique data at the center, surrounded by
applications that use it in real-time, gaining
maximum data intelligence
● In order to allow simultaneous updates, the
consistency is eventual.
● Applications use the microservices in the DaaS
layer to access the Data
● The Data Intelligence layer provides access to the
applications to the Data Intelligence
● Applications are developed through microservices
orchestration
Environment Resume
Multiuser Environment
Manage users and provision of notebooks
Analytic Environment
User 1 front-end
User N front-end
User 1 back-end
User code
Analytic Environment
User n back-end
© Stratio 2016. Confidential, All Rights Reserved.
tf.motivation
19
● Growing Community: One of the main reasons to use TensorFlow is the huge
community behind it. TensorFlow is widely known and used.
● Great Technical Capabilities:
- Multi-GPU support
- Distributed training
- Queues for operations like data loading and preprocessing on the graph.
- Graph visualization using TensorBoard.
- Model checkpointing.
- High Performance and GPU memory usage optimization
● High-quality metaframeworks: Keras because of TensorFlow and perhaps also
TensorFlow because of Keras. Once again both lead the list of Deep Learning libraries.
Continuous release schedule and maintenance. New features and tests are
integrated first so that early adopters can try them before documentation. This is great for such a
big community and allows the framework to keep improving.
Distribution strategies: Data vs. Model Parallelism
When splitting the training of a neural network across multiple
compute nodes, two strategies are commonly employed:
● Data parallelism: individual instances of the model are
created on each node and fed different training samples; this
allows for higher training throughput.
● Model parallelism: a single instance of the model is split
across multiple nodes allowing for larger models, ones which
may not necessarily fit in the memory of a single node, to be
trained.
● Mixed: if desired, these two strategies can also be composed
resulting in multiple instances of a given model with each
instance spanning multiple
Distributed Computation Synchrony
There are many ways to specify Distributed structure in
TensorFlow. Possible approaches include:
Asynchronous training: In this approach, each replica of the
graph has an independent training loop that executes
without coordination. It is compatible with both forms of
replication above.
Synchronous training: In this approach, all of the replicas
read the same values for the current parameters, compute
gradients in parallel, and then apply them together. It is
compatible with in-graph replication (e.g. using gradient
averaging), and between-graph replication.
© Stratio 2016. Confidential, All Rights Reserved.
tf.motivation
22
● Growing Community: One of the main reasons to use TensorFlow is the huge
community behind it. TensorFlow is widely known and used.
● Great Technical Capabilities:
- Multi-GPU support
- Distributed training
- Queues for operations like data loading and preprocessing on the graph.
- Graph visualization using TensorBoard.
- Model checkpointing.
- High Performance and GPU memory usage optimization
● High-quality metaframeworks: Keras because of TensorFlow and perhaps also
TensorFlow because of Keras. Once again both lead the list of Deep Learning libraries.
Continuous release schedule and maintenance. New features and tests are
integrated first so that early adopters can try them before documentation. This is great for such a
big community and allows the framework to keep inproving.
© Stratio 2016. Confidential, All Rights Reserved.
Who are we? Where do we come from? Where are we going?
23
Stimulating the brain
© Stratio 2016. Confidential, All Rights Reserved.
Let me introduce you to my friend Cajal. He knew something about neurons
24
© Stratio 2016. Confidential, All Rights Reserved.
Let me introduce you to my friend Cajal. He knew something about neurons
25
dendrite
© Stratio 2016. Confidential, All Rights Reserved.
Let me introduce you to my friend Cajal. He knew something about neurons
26
dendrite
axon
© Stratio 2016. Confidential, All Rights Reserved.
Let me introduce you to my friend Cajal. He knew something about neurons
27
dendrite
axon
synapses: impulse transmission
© Stratio 2016. Confidential, All Rights Reserved.
Building the structures: how can we define a neuron?
28
© Stratio 2016. Confidential, All Rights Reserved.
Layers, layers, layers
29
Activation
Functions
© Stratio 2016. Confidential, All Rights Reserved.
Layers, layers, layers
30
Activation
Functions
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Basics
31
Input hidden hidden hidden
Output
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Basics
32
Forward Propagation: get a result
Input hidden hidden hidden
Output
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Basics
33
Forward Propagation: get a result
Input hidden hidden hidden
Output
Error
Estimation:
evaluate
performances
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Basics
34
Forward Propagation: get a result
Backward Propagation: who’s to blame?
Input hidden hidden hidden
Output
Error
Estimation:
evaluate
performances
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Basics
35
Forward Propagation: get a result
Backward Propagation: who’s to blame?
Input hidden hidden hidden
Output
Error
Estimation:
evaluate
performances
● A cost function C is
defined
● Every parameter has
its impact on the cost
given some training
examples
● Impacts are computed
in terms of derivations
● Use the chain rule to
propagate error
backwards
Funciones de activación: Salidas
● Lineales:
● sf
● Binomiales : sigmoide
● ad
● Multinomiales: softmax
Activation
Functions
Sigmoid and Relu functions
- Bounded
- Probability-like function
- Dense computation
- Differentiable
- On many examples of fully
connected layers
Sigmoid and Relu functions
- Bounded
- Probability-like function
- Dense computation
- Differentiable
- On many examples of fully
connected layers
We are too cool to speak
about linear activators, aren’t
we?
Not entirely...
Sigmoid and Relu functions
- Sparse activation
- Efficient computation
- “Differentiable”
- Unbounded
- Potential Dying Relu
- Convolutional-friendly
- Bounded
- Probability-like function
- Dense computation
- Differentiable
- On many examples of fully
connected layers
We are too cool to speak
about linear activators, aren’t
we?
Not entirely...
Hyperbolic Tangent
- Bounded
- Positive/negative values
- Dense computation
- Differentiable
- Nice to LSTM-like thinking
Softmax
- Represents probability on a
categorical distribution
- Multiclass normalization
- Bounded
- Differentiable
- Used on final layers
Funciones de activación: Salidas
Differentiation
is the key
On the ease of Derivations
● Sigmoid
● Hyperbolic Tangent ● ReLU
● Softmax
On the ease of Derivations
● Sigmoid
● Hyperbolic Tangent ● ReLU
● Softmax
Handset value
Funciones de activación: Salidas
● Lineales:
● sf
● Binomiales : sigmoide
● ad
● Multinomiales: softmax
Activation
Functions
Loss
Functions
Regression error
● The most classic measure
● Penalizes highly big mistakes
● Less interpretable ● Scale invariant
● Symmetric
● Interpretable
● Harder differentiability and
convergence
● Harder differentiability and
convergence
● Penalizes less on higher
mistakes
● Interpretable
Regression error
● The most classic measure
● Penalizes highly big mistakes
● Less interpretable ● Scale invariant
● Symmetric
● Interpretable
● Harder differentiability and
convergence
● Harder differentiability and
convergence
● Penalizes less on higher
mistakes
● Interpretable
The choice is always problem-dependent
Funciones de coste
● Regresión:
● Clasificación:
The shortest way is not
always the best one
Classification and Categorical Cross- Entropy
● Categorical Cross-Entropy
Where indexes i and j stand for each example and resp. class, the ys stand for the true labels
and the ps stand for their assigned probabilities
On two classes it turns into the easy-to-understand, most common
When compared to accuracy, Cross-Entropy turns to be a more granular way to compute error
closeness of a prediction, as it takes into account the closeness of a prediction .
Derivation also eases calculus compared with RMSE
Classification and Categorical Cross- Entropy
● Categorical Cross-Entropy
Where indexes i and j stand for each example and resp. class, the ys stand for the true labels
and the ps stand for their assigned probabilities
On two classes it turns into the easy-to-understand, most common
When compared to accuracy, Cross-Entropy turns to be a more granular way to compute error
closeness of a prediction, as it takes into account the closeness of a prediction .
Derivation also eases calculus compared with RMSE
Classificator 1
Classificator 2
Classification and Categorical Cross- Entropy
● Categorical Cross-Entropy
Where indexes i and j stand for each example and resp. class, the ys stand for the true labels
and the ps stand for their assigned probabilities
On two classes it turns into the easy-to-understand, most common
When compared to accuracy, Cross-Entropy turns to be a more granular way to compute error
closeness of a prediction, as it takes into account the closeness of a prediction .
Derivation also eases calculus compared with RMSE
Classificator 1
Classificator 2
Regularization
Regularization: Norm penalties
● Add a penalty to the loss function:
● L2:
○ Keep weights near zero.
○ Simplest one, differentiable.
● L1:
○ Sparse results, feature selection.
○ Not differentiable, slower.
Regularization: Dropout
● Randomly drop neurons (along with
their connections) during training.
● Acts like adding noise.
● Very effective, computationally
inexpensive.
● Ensemble of all sub-networks
generated.
Regularization: Dropout
● Randomly drop neurons (along with
their connections) during training.
● Acts like adding noise.
● Very effective, computationally
inexpensive.
● Ensemble of all sub-networks
generated.
Regularization: Dropout
● Randomly drop neurons (along with
their connections) during training.
● Acts like adding noise.
● Very effective, computationally
inexpensive.
● Ensemble of all sub-networks
generated.
Optimization
Optimization: Challenges
● The difficulty in training neural
networks is mainly attributed to their
optimization part.
● Plateaus, saddle points and local
minima grows exponentially with the
dimension
● Classical convex optimization
algorithms don’t perform well.
Optimization: Batch Gradient descent
● Goes over the whole training set.
● Very expensive.
● There isn’t an easy way to
incorporate new data to training set.
Optimization: Mini-Batch Gradient descent
● Stochastic Gradient Descent (SGD)
● Randomly sample a small number of
examples (minibatch)
● Estimate cost function and gradient:
● Batch size: Length of the minibatch
● Iteration: Every time we update the
weights
● Epoch: One pass over the whole training
set.
● k = 1 => online learning
● Small batches => regularization
Optimization: Variants
● Momentum:The momentum algorithm accumulates an exponentially decaying moving
average of past gradients and continues to move in their direction.
● AdaGrad: The learning rate is adapted component-wise, and is given by the square root of
sum of squares of the historical.
● RMSProp: modifies AdaGrad to perform better in the non-convex setting by changing the
gradient accumulation into an exponentially weighted moving average
● ADAM(Adaptive Moment): Combination of RMSPROP and momentum.
Momentum basics
Negative of the gradient
Momentum
Real Movement
© Stratio 2016. Confidential, All Rights Reserved.
A fistful of cool applications
63
Not all that
wander are lost
© Stratio 2016. Confidential, All Rights Reserved.
A fistful of cool applications
64
Not all that
wander are lost
Object
Classification and
Detection
© Stratio 2016. Confidential, All Rights Reserved.
A fistful of cool applications
65
Not all that
wander are lost
Object
Classification and
Detection
RBM on
Recommender
Systems
© Stratio 2016. Confidential, All Rights Reserved.
A fistful of cool applications
66
Not all that
wander are lost
Object
Classification and
Detection
Instant Visual
translation
RBM on
Recommender
Systems
© Stratio 2016. Confidential, All Rights Reserved.
A fistful of cool applications
67
Not all that
wander are lost
Object
Classification and
Detection
Instant Visual
translation
RBM on
Recommender
Systems
Generative
Models
(GAN/VAE)
Keras
Introducing
Keras
© Stratio 2016. Confidential, All Rights Reserved.
tf.motivation
69
● Growing Community: One of the main reasons to use TensorFlow is the huge
community behind it. TensorFlow is widely known and used.
● Great Technical Capabilities:
- Multi-GPU support
- Distributed training
- Queues for operations like data loading and preprocessing on the graph.
- Graph visualization using TensorBoard.
- Model checkpointing.
- High Performance and GPU memory usage optimization
● High-quality metaframeworks: Keras because of TensorFlow and perhaps also
TensorFlow because of Keras. Once again both lead the list of Deep Learning libraries.
Continuous release schedule and maintenance. New features and tests are
integrated first so that early adopters can try them before documentation. This is great for such a
big community and allows the framework to keep inproving.
Keras
Introducing
Keras
Welcome to the jungle!
● Me Tarzán, you Cheetah. Human friendly interface. User actions are
minimized in order to ease the process, isolating users from the backend.
● Territorial behaviors are allowed. Several backends can be used:
Tensorflow, CNTK and Theano (poor Theano!), but there is also another
interesting property on modularization: every model is a sequence of
standalone modules plugged together with as little restrictions as possible,
and allowing us to fully configure cost functions, optimizers, initializations,
activation functions ...
● Keeps your model herd a-growin’. New modules are simple to add, and
existing modules provide ample examples.
● Kaa is our friend. We love Python! It makes the lives of data scientists easier.
the code is compact, easier to debug, and allows for ease of extensibility.
© Stratio 2016. Confidential, All Rights Reserved.
Ever felt lost in Automatic Translation?
72
Índice Analítico
Introducción: ¿por qué combinar modelos?
Boosting & Bagging basics
Demo:
○ Implementación de Adaboost con árboles
binarios
○ Feature Selection con Random Forest
1
2
3
Not all that
wander are lost
What do we say to those who think
machine translation sucks?
Not today!
© Stratio 2016. Confidential, All Rights Reserved.
Neural Machine Translation Idea
74
Introducción: ¿por qué combinar modelos?
Boosting & Bagging basics
Demo:
○ Implementación de Adaboost con árboles
binarios
○ Feature Selection con Random Forest
1
2
3
Not all that
wander are lost
Encoder:
words => hidden state
Decoder :
hidden state => words
Hidden states are not entirely universal languages!!
© Stratio 2016. Confidential, All Rights Reserved.
Attention Basics
75
● Not every word is a one-to one
translation
● Whole-Weighting combination
increases computation time
● Some other more human
approaches can be taken (e.g:
reinforcement learning)
© Stratio 2016. Confidential, All Rights Reserved.
BaseSlide
76
Sequential Data
© Stratio 2016. Confidential, All Rights Reserved.
Sequence Statement
77
● Most machine learning algorithms are designed for
independent, unordered data.
● Many real problems uses sequential data:
○ Time series, behavior, audio signals…
○ t does not have to be time, can be spatial measure
(images), or any order measure (Recommender
systems)
● The sequences are a natural way of representing reality:
vision, hearing, action-reaction, words, sentences, etc.
● Don’t forget order matters!!
© Stratio 2016. Confidential, All Rights Reserved.
Introducing Recurrent Neural Networks
78
● Neural Networks with recurrent connections,
specialized in processing sequential data.
● Recurrent connections allows a ‘memory’ of
previous inputs.
● Can scale to long sequences (variable length),
not practical for other types of nets.
● Same parameters for every timestep (t) =>
generalize
RNN images by Christopher Olah
© Stratio 2016. Confidential, All Rights Reserved.
Recurrent Neural Networks Architecture
79
0 1 2 t
Looping the loop: Backpropagation Through Time
● Same idea as in the standard backpropagation, but the recurrent net needs
to be unfolded through time for a certain amount of timesteps.
● The weight changes calculated for each network copy are summed before
individual weights are adapted.
● The set of weights for each copy(time step) always remain the same.
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Through Time
80
0 1 2 t
● Cost function:
● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop.
● Chain rule application lead to a lot of derivation products.
where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc)
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Through Time
81
0 1 2 t
● Cost function:
● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop.
● Chain rule application lead to a lot of derivation products.
where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc)
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Through Time
82
0 1 2 t
● Cost function:
● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop.
● Chain rule application lead to a lot of derivation products.
where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc)
BackProp
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Through Time
83
0 1 2 t
● Cost function:
● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop.
● Chain rule application lead to a lot of derivation products.
where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc)
BackProp
© Stratio 2016. Confidential, All Rights Reserved.
BackPropagation Through Time
84
0 1 2 t
● Cost function:
● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop.
● Chain rule application lead to a lot of derivation products.
where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc)
BackProp
© Stratio 2016. Confidential, All Rights Reserved.
BaseSlide
85
Beware of the
Vanishing Gradient!!
© Stratio 2016. Confidential, All Rights Reserved.
Gradients in time
86
● Backpropagating the error in time involves
as many recurrent derivation terms as
timesteps on the net.
● It can be problematic if matrix W is too large
or too low in terms of its values.
● Thus, the very first terms would have no
influence on the result as there is no
memory related to them
Short-term modulation
Long-term modulation
© Stratio 2016. Confidential, All Rights Reserved.
LSTM Briefing (Sepp Hochreiter and Jürgen Schmidhuber, 1997)
88
● And up to three outputs, two of them
are states: Long and short.
● Third output (if exists or considered) is
similar to the classic output
● Timesteps are still the key
● From now on, we are going to have
two connections (states)
● Each timestep receives an input
LSTM images also by Christopher Olah
Índice Analítico
© Stratio 2016. Confidential, All Rights Reserved.
LSTM Briefing (II)
90
● Each timestep may have one or
more units
● Each state corresponds to each
kind of memory at play: Long
and Short
● Inside each cell, there are four
questions asked:
○ Which part of the Long memory has
to be deleted?
○ From the new info, is there anything
interesting to be remembered?
○ If there is, How do we combine it
along with the Long memory?
○ What is the Short term impression
for this step?
© Stratio 2016. Confidential, All Rights Reserved.
LSTM Briefing (II)
91
● Each timestep may have one or
more units
● Each state corresponds to each
kind of memory at play: Long
and Short
● Inside each cell, there are four
questions asked:
○ Which part of the Long memory has
to be deleted?
○ From the new info, is there anything
interesting to be remembered?
○ If there is, How do we combine it
along with the Long memory?
○ What is the Short term impression
for this step?
forget gate
f
© Stratio 2016. Confidential, All Rights Reserved.
LSTM Briefing (II)
92
● Each timestep may have one or
more units
● Each state corresponds to each
kind of memory at play: Long
and Short
● Inside each cell, there are four
questions asked:
○ Which part of the Long memory has
to be deleted?
○ From the new info, is there anything
interesting to be remembered?
○ If there is, How do we combine it
along with the Long memory?
○ What is the Short term impression
for this step?
input gate
forget gate
if
© Stratio 2016. Confidential, All Rights Reserved.
LSTM Briefing (II)
93
● Each timestep may have one or
more units
● Each state corresponds to each
kind of memory at play: Long
and Short
● Inside each cell, there are four
questions asked:
○ Which part of the Long memory has
to be deleted?
○ From the new info, is there anything
interesting to be remembered?
○ If there is, How do we combine it
along with the Long memory?
○ What is the Short term impression
for this step?
input gate
forget gate
candidate gate
cif
© Stratio 2016. Confidential, All Rights Reserved.
LSTM Briefing (II)
94
● Each timestep may have one or
more units
● Each state corresponds to each
kind of memory at play: Long
and Short
● Inside each cell, there are four
questions asked:
○ Which part of the Long memory has
to be deleted?
○ From the new info, is there anything
interesting to be remembered?
○ If there is, How do we combine it
along with the Long memory?
○ What is the Short term impression
for this step?
input gate
forget gate
candidate gate
output gate
ocif
© Stratio 2016. Confidential, All Rights Reserved.
BaseSlide
95
Focusing on forget gate the question is answered as follows:
where the h is the activation the b is the associated bias and the W is
the weight matrix on the forget gate. Or, in a more explicit way:
Where the Wfx is the input weight matrix (the classic one) and Whh is
the hidden state matrix between timesteps.
On a similar way, one can express input and output equations this very
same way: hit and hot
Anyway, there are some differences on the candidate gate ones,
mainly related to its activation function: the hyperbolic tangent. On the
same notation:
© Stratio 2016. Confidential, All Rights Reserved.
BaseSlide
96
Focusing on forget gate the question is answered as follows:
where the h is the activation the b is the associated bias and the W is
the weight matrix on the forget gate. Or, in a more explicit way:
Where the Wfx is the input weight matrix (the classic one) and Whh is
the hidden state matrix between timesteps.
On a similar way, one can express input and output equations this very
same way: hit and hot
Anyway, there are some differences on the candidate gate ones,
mainly related to its activation function: the hyperbolic tangent. On the
same notation:tanh values in a [-1, 1] range. This way we are
able to add and subtract on the Long term
memory
© Stratio 2016. Confidential, All Rights Reserved.
Inside a LSTM Cell (II)
97
And finally, we can update states, including
output. This way:
Or on simpler words, we forget what is to be
forgotten and we add what is to be added.
At the very end, with the same tanh idea,
we put Short and Long terms together:
So, LSTM nets are
That easy?
© Stratio 2016. Confidential, All Rights Reserved.
Cool Applications
99
Not all that
wander are lost
CNN + LSTM to describe pictures Film scripts. Yes, it’s for real
© Stratio 2016. Confidential, All Rights Reserved.
BaseSlide
100
The man who creates the network
should write the code
Demo Time!!
© Stratio 2016. Confidential, All Rights Reserved. 101
© Stratio 2016. Confidential, All Rights Reserved. 102
AutoEncoders
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Idea)
103
Input hidden hidden hidden
Output
● Supervised neural networks try to predict
labels from input data
● It is not always possible to obtain labels
● Unsupervised learning can help obtain data
structure.
● What if we turn the output to be the input?
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Idea)
104
This is not the Generative
Model you are looking for
Input image
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Idea)
105
This is not the Generative
Model you are looking for
Input image
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Idea)
106
This is not the Generative
Model you are looking for
Input image
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Idea)
107
This is not the Generative
Model you are looking for
Input image
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Idea)
108
This is not the Generative
Model you are looking for
Input image Output image
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Idea)
109
This is not the Generative
Model you are looking for
Input image Output image
It tries to predict x from x, but no labels are needed.
The idea is learning an approximation of the identity
function.
Along the way, some restrictions are placed:
typically the hidden layers compress the data.
The original input is represented at the output, even
if it comes from noisy or corrupted data.
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Encoder and decoder)
110
This is not the Generative
Model you are looking for
Input image Output image
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Encoder and decoder)
111
This is not the Generative
Model you are looking for
Input image Output image
Encode Decode
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders (Encoder and decoder)
112
This is not the Generative
Model you are looking for
Input image Output image
The latent space is commonly a narrow hidden layer
between encoder and decoder
It learns the data structure
Encoder and decoder can share the same
(inversed) structure or be different.
Each one can have its own depth (number of layers)
and complexity.
Encode Decode
Latent Space
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders BackPropagation
113
This is not the Generative
Model you are looking for
Input image Output image
Encode Decode
Latent Space
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders BackPropagation
114
This is not the Generative
Model you are looking for
Input image Output image
A cost function can be defined taking into account
differences between input and
Decoded(Encoded(Input))
This allows BackProp to be carried along Encoder
and Decoder
To prevent function composition to be the Identity,
some regularizations can be taken
One of the most common is just reducing the latent
space dimension (i.e: compressing the data on
the encoding)
Encode Decode
Latent Space
BackPropagation
© Stratio 2016. Confidential, All Rights Reserved.
Autoencoders Applications
115
Reduction of dimensionality
Data Structure/Feature learning
Denoising or data cleaning
Pre-training deep networks
© Stratio 2016. Confidential, All Rights Reserved.
Data Augmentation
116
© Stratio 2016. Confidential, All Rights Reserved.
Data Augmentation
117
© Stratio 2016. Confidential, All Rights Reserved.
Data Augmentation
118
● Specialized image and video classification tasks often
have insufficient data.
● Traditional transformations consist of using a
combination of affine transformations to manipulate the
training data
● Data augmentation has been shown to produce promising
ways to increase the accuracy of classification tasks.
● While traditional augmentation is very effective alone, other
techniques enabled by generative models have proved to be
even better
© Stratio 2016. Confidential, All Rights Reserved.
Generative Models (Idea)
119
Generative Models
“What I cannot create, I do
not understand.”
—Richard Feynman
© Stratio 2016. Confidential, All Rights Reserved.
Generative Models (Idea)
120
● They model how the data was generated in
order to categorize a signal.
● Instead of modeling P(y|x) as the usual
discriminative models, the distribution under
the hood is P(x, y)
● The number of parameters is significantly
smaller than the amount of data on which they
are trained.
● This forces the models to discover the data
essence
● What the model does is understanding the
world around the data, and provide good data
representations of it
© Stratio 2016. Confidential, All Rights Reserved.
Generative Models Applications
121
● Generate potentially unfeasible examples for
Reinforcement Learning
● Denoising/Pretraining
● Structured prediction exploration in RL
● Entirely plausible generation of images to
depict image/video
● Feature understanding
© Stratio 2016. Confidential, All Rights Reserved.
Generative Models Applications
122
● Generate potentially unfeasible examples for
Reinforcement Learning
● Denoising/Pretraining
● Structured prediction exploration in RL
● Entirely plausible generation of images to
depict image/video
● Feature understanding
© Stratio 2016. Confidential, All Rights Reserved.
Variational Autoencoder Idea (I)
123
Input image Output image
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network
© Stratio 2016. Confidential, All Rights Reserved.
Variational Autoencoder Idea (II)
124
Input image Output image
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network
© Stratio 2016. Confidential, All Rights Reserved.
Variational Autoencoder Idea (II)
125
Output image
Latent Space
Mean Vector
Standard Deviation
Vector
Decoder
Network
© Stratio 2016. Confidential, All Rights Reserved.
Variational Autoencoder Idea (II)
126
Latent Space
Mean Vector
Standard Deviation
Vector
Decoder
Network
© Stratio 2016. Confidential, All Rights Reserved.
Variational Autoencoder Idea (II)
127
Output image
Latent Space
Mean Vector
Standard Deviation
Vector
Decoder
Network
Sample on Latent Space => Generate new representations
Prior distribution
Keras
Introducing
Keras
Demogorgon smile
generation is beyond the
state of the art
© Stratio 2016. Confidential, All Rights Reserved.
Latent Space Distribution (I)
129
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network
© Stratio 2016. Confidential, All Rights Reserved.
Latent Space Distribution (II): VAE Loss function
130
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network
● Encoder and decoder can be denoted as conditional
probability representations of data:
© Stratio 2016. Confidential, All Rights Reserved.
Latent Space Distribution (II): VAE Loss function
131
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network
● Encoder and decoder can be denoted as conditional
probability representations of data:
● Typically the encoder reduces dimensions as decoder
increases it . So, when reconstructing the inputs some
information is lost. This information loss can be
measured using the reconstruction log-likelihood:
© Stratio 2016. Confidential, All Rights Reserved.
Latent Space Distribution (II): VAE Loss function
132
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network
● Encoder and decoder can be denoted as conditional
probability representations of data:
● Typically the encoder reduces dimensions as decoder
increases it . So, when reconstructing the inputs some
information is lost. This information loss can be
measured using the reconstruction log-likelihood:
● In order to keep the latent image distribution under
control, we can introduce a regularizer into the loss
function. The Kullback-Leibler divergence between the
encoder distribution and a given and known distribution,
such as the standard Gaussian:
© Stratio 2016. Confidential, All Rights Reserved.
Latent Space Distribution (II): VAE Loss function
133
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network
● Encoder and decoder can be denoted as conditional
probability representations of data:
● Typically the encoder reduces dimensions as decoder
increases it . So, when reconstructing the inputs some
information is lost. This information loss can be
measured using the reconstruction log-likelihood:
● In order to keep the latent image distribution under
control, we can introduce a regularizer into the loss
function. The Kullback-Leibler divergence between the
encoder distribution and a given and known distribution,
such as the standard Gaussian:
● With this penalty in the loss encoder, outputs are forced
to be sufficiently diverse: similar inputs will be kept close
(smoothly) together in the latent space.
Relu
Distribution
Divergence (K-L)
Reconstruction
Loss
© Stratio 2016. Confidential, All Rights Reserved.
Latent Space Distribution (III): Probability overview
135
Latent Space
Mean Vector
Standard Deviation
Vector
Encoder
Network
Decoder
Network● The VAE contains a specific probability model of
data x and latent variables z.
● We can write the joint probability of the model as
p(x,z): “how likely is observation x under the joint
distribution”.
● By definition, p(x, z)=p(x∣z)p(z)
● In order to generate the data, the process is as
follows:
For each datapoint i:
- Draw latent variables zi∼p(z)
- Draw datapoint xi∼p(x∣z)
● We need to figure out p(z) and p(x|z)
● The likelihood is the representation to be learnt from
the decoder
● Encoder likelihood can be used to estimate
parameters from the prior.
© Stratio 2016. Confidential, All Rights Reserved.
Variational Autoencoder: BackProp +reparametrization trick
136
● VAEs are built by using Backpropagation on
the previously defined loss function.
● Mean and variance estimations doesn’t get us
Z but its distribution parameters.
● In order to get Z we could sample directly from
the true posterior given the parameters, but
sampling cannot be differentiated.
● Instead a trick can be applied so that the non-
differentiable part is left outside the network
● By stating
we can remove the sampling from the
backprop part
Índice Analítico
Introducción: ¿por qué combinar modelos?
Boosting & Bagging basics
Demo:
○ Implementación de Adaboost con árboles
binarios
○ Feature Selection con Random Forest
1
2
3
Not all that
wander are lost
Any Questions?
THANK YOU!
Artificial Intelligence on Data Centric Platform

Contenu connexe

Tendances

Tendances (20)

On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
 
Building and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStaxBuilding and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStax
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0Why spark by Stratio - v.1.0
Why spark by Stratio - v.1.0
 
Building a Digital Bank
Building a Digital BankBuilding a Digital Bank
Building a Digital Bank
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGate
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
 
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
 
SciDB
SciDBSciDB
SciDB
 
What is DataStax Enterprise?
What is DataStax Enterprise?What is DataStax Enterprise?
What is DataStax Enterprise?
 
Massively Scalable Computational Finance with SciDB
 Massively Scalable Computational Finance with SciDB Massively Scalable Computational Finance with SciDB
Massively Scalable Computational Finance with SciDB
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
How to Successfully Visualize DSE Graph data
How to Successfully Visualize DSE Graph dataHow to Successfully Visualize DSE Graph data
How to Successfully Visualize DSE Graph data
 
Big Analytics Without Big Hassles
Big Analytics Without Big HasslesBig Analytics Without Big Hassles
Big Analytics Without Big Hassles
 
Pythian operational visibility
Pythian operational visibilityPythian operational visibility
Pythian operational visibility
 
Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
 

Similaire à Artificial Intelligence on Data Centric Platform

BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple Spaces
Srinath Perera
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
Distributed Multi-device Execution of TensorFlow – an Outlook
Distributed Multi-device Execution of TensorFlow – an OutlookDistributed Multi-device Execution of TensorFlow – an Outlook
Distributed Multi-device Execution of TensorFlow – an Outlook
Sebnem Rusitschka
 

Similaire à Artificial Intelligence on Data Centric Platform (20)

Refactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid ArchitecturesRefactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid Architectures
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
Neuromation.io AI Ukraine Presentation
Neuromation.io AI Ukraine PresentationNeuromation.io AI Ukraine Presentation
Neuromation.io AI Ukraine Presentation
 
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningWebinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the Learning
 
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
 
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep LearningTensorFlow on Spark: A Deep Dive into Distributed Deep Learning
TensorFlow on Spark: A Deep Dive into Distributed Deep Learning
 
FIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE Global Summit - Big Data and Machine Learning with FIWAREFIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE Global Summit - Big Data and Machine Learning with FIWARE
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple Spaces
 
distributed system lab materials about ad
distributed system lab materials about addistributed system lab materials about ad
distributed system lab materials about ad
 
introduction to advanced distributed system
introduction to advanced distributed systemintroduction to advanced distributed system
introduction to advanced distributed system
 
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Graph Gurus Episode 7: Connecting the Dots in Real-Time: Deep Link Analysis w...
Graph Gurus Episode 7: Connecting the Dots in Real-Time: Deep Link Analysis w...Graph Gurus Episode 7: Connecting the Dots in Real-Time: Deep Link Analysis w...
Graph Gurus Episode 7: Connecting the Dots in Real-Time: Deep Link Analysis w...
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
Distributed Multi-device Execution of TensorFlow – an Outlook
Distributed Multi-device Execution of TensorFlow – an OutlookDistributed Multi-device Execution of TensorFlow – an Outlook
Distributed Multi-device Execution of TensorFlow – an Outlook
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 

Plus de Stratio

Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
Stratio
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
Stratio
 

Plus de Stratio (20)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Artificial Intelligence on Data Centric Platform

  • 1. By: Fernando Velasco Training Big Data Spain http://xlic.es/v/7208D8
  • 2. © Stratio 2016. Confidential, All Rights Reserved. 2 A man needs three names
  • 3. © Stratio 2016. Confidential, All Rights Reserved. 3 ● Mathematician A man needs three names
  • 4. © Stratio 2016. Confidential, All Rights Reserved. 4 ● Data Scientist ● Mathematician A man needs three names
  • 5. © Stratio 2016. Confidential, All Rights Reserved. 5 ● Data Scientist ● Mathematician ● Stratian A man needs three names
  • 6. © Stratio 2016. Confidential, All Rights Reserved. 6 ● Data Scientist ● Mathematician ● Stratian fvelasco@stratio.com A man needs three names
  • 7. 1 2 3 4 5 © Stratio 2016. Confidential, All Rights Reserved. INDEX Introduction Data Centric Environment ● Distributed TensorFlow example. Keras Neural Nets ● BackPropagation Recurrent Neural Networks ● LSTM Autoencoders ● Data Augmentation ● VAE
  • 8. 1 2 3 4 5 © Stratio 2016. Confidential, All Rights Reserved. INDEX Introduction Data Centric Environment ● Distributed TensorFlow example. Keras Neural Nets ● BackPropagation Recurrent Neural Networks ● LSTM Autoencoders ● Data Augmentation ● VAE
  • 9. © Stratio 2016. Confidential, All Rights Reserved. Who are we? Where do we come from? Where are we going? 9
  • 10. © Stratio 2016. Confidential, All Rights Reserved. Who are we? Where do we come from? Where are we going? 10
  • 11. © Stratio 2016. Confidential, All Rights Reserved. 11
  • 12. © Stratio 2016. Confidential, All Rights Reserved. 12
  • 13. © Stratio 2016. Confidential, All Rights Reserved. 13
  • 15. © Stratio 2016. Confidential, All Rights Reserved. Data Centricity 15 Mobile APP Campaign Management E-commerce Digital Marketing Legacy Application Call centerSAP : ERP ATG TPV APP CRM
  • 16. © Stratio 2016. Confidential, All Rights Reserved. Data Centricity 16 DATA Mobile APP Campaign Management E-commerce Digital Marketing Legacy Application Call centerSAP : ERP ATG TPV APP CRM Data Intelligence API DaaS
  • 17. © Stratio 2016. Confidential, All Rights Reserved. Data Centricity 17 DATA Mobile APP Campaign Management E-commerce Digital Marketing Legacy Application Call centerSAP : ERP ATG TPV APP CRM Data Intelligence API DaaS ● Unique data at the center, surrounded by applications that use it in real-time, gaining maximum data intelligence ● In order to allow simultaneous updates, the consistency is eventual. ● Applications use the microservices in the DaaS layer to access the Data ● The Data Intelligence layer provides access to the applications to the Data Intelligence ● Applications are developed through microservices orchestration
  • 18. Environment Resume Multiuser Environment Manage users and provision of notebooks Analytic Environment User 1 front-end User N front-end User 1 back-end User code Analytic Environment User n back-end
  • 19. © Stratio 2016. Confidential, All Rights Reserved. tf.motivation 19 ● Growing Community: One of the main reasons to use TensorFlow is the huge community behind it. TensorFlow is widely known and used. ● Great Technical Capabilities: - Multi-GPU support - Distributed training - Queues for operations like data loading and preprocessing on the graph. - Graph visualization using TensorBoard. - Model checkpointing. - High Performance and GPU memory usage optimization ● High-quality metaframeworks: Keras because of TensorFlow and perhaps also TensorFlow because of Keras. Once again both lead the list of Deep Learning libraries. Continuous release schedule and maintenance. New features and tests are integrated first so that early adopters can try them before documentation. This is great for such a big community and allows the framework to keep improving.
  • 20. Distribution strategies: Data vs. Model Parallelism When splitting the training of a neural network across multiple compute nodes, two strategies are commonly employed: ● Data parallelism: individual instances of the model are created on each node and fed different training samples; this allows for higher training throughput. ● Model parallelism: a single instance of the model is split across multiple nodes allowing for larger models, ones which may not necessarily fit in the memory of a single node, to be trained. ● Mixed: if desired, these two strategies can also be composed resulting in multiple instances of a given model with each instance spanning multiple
  • 21. Distributed Computation Synchrony There are many ways to specify Distributed structure in TensorFlow. Possible approaches include: Asynchronous training: In this approach, each replica of the graph has an independent training loop that executes without coordination. It is compatible with both forms of replication above. Synchronous training: In this approach, all of the replicas read the same values for the current parameters, compute gradients in parallel, and then apply them together. It is compatible with in-graph replication (e.g. using gradient averaging), and between-graph replication.
  • 22. © Stratio 2016. Confidential, All Rights Reserved. tf.motivation 22 ● Growing Community: One of the main reasons to use TensorFlow is the huge community behind it. TensorFlow is widely known and used. ● Great Technical Capabilities: - Multi-GPU support - Distributed training - Queues for operations like data loading and preprocessing on the graph. - Graph visualization using TensorBoard. - Model checkpointing. - High Performance and GPU memory usage optimization ● High-quality metaframeworks: Keras because of TensorFlow and perhaps also TensorFlow because of Keras. Once again both lead the list of Deep Learning libraries. Continuous release schedule and maintenance. New features and tests are integrated first so that early adopters can try them before documentation. This is great for such a big community and allows the framework to keep inproving.
  • 23. © Stratio 2016. Confidential, All Rights Reserved. Who are we? Where do we come from? Where are we going? 23 Stimulating the brain
  • 24. © Stratio 2016. Confidential, All Rights Reserved. Let me introduce you to my friend Cajal. He knew something about neurons 24
  • 25. © Stratio 2016. Confidential, All Rights Reserved. Let me introduce you to my friend Cajal. He knew something about neurons 25 dendrite
  • 26. © Stratio 2016. Confidential, All Rights Reserved. Let me introduce you to my friend Cajal. He knew something about neurons 26 dendrite axon
  • 27. © Stratio 2016. Confidential, All Rights Reserved. Let me introduce you to my friend Cajal. He knew something about neurons 27 dendrite axon synapses: impulse transmission
  • 28. © Stratio 2016. Confidential, All Rights Reserved. Building the structures: how can we define a neuron? 28
  • 29. © Stratio 2016. Confidential, All Rights Reserved. Layers, layers, layers 29 Activation Functions
  • 30. © Stratio 2016. Confidential, All Rights Reserved. Layers, layers, layers 30 Activation Functions
  • 31. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Basics 31 Input hidden hidden hidden Output
  • 32. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Basics 32 Forward Propagation: get a result Input hidden hidden hidden Output
  • 33. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Basics 33 Forward Propagation: get a result Input hidden hidden hidden Output Error Estimation: evaluate performances
  • 34. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Basics 34 Forward Propagation: get a result Backward Propagation: who’s to blame? Input hidden hidden hidden Output Error Estimation: evaluate performances
  • 35. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Basics 35 Forward Propagation: get a result Backward Propagation: who’s to blame? Input hidden hidden hidden Output Error Estimation: evaluate performances ● A cost function C is defined ● Every parameter has its impact on the cost given some training examples ● Impacts are computed in terms of derivations ● Use the chain rule to propagate error backwards
  • 36. Funciones de activación: Salidas ● Lineales: ● sf ● Binomiales : sigmoide ● ad ● Multinomiales: softmax Activation Functions
  • 37. Sigmoid and Relu functions - Bounded - Probability-like function - Dense computation - Differentiable - On many examples of fully connected layers
  • 38. Sigmoid and Relu functions - Bounded - Probability-like function - Dense computation - Differentiable - On many examples of fully connected layers We are too cool to speak about linear activators, aren’t we? Not entirely...
  • 39. Sigmoid and Relu functions - Sparse activation - Efficient computation - “Differentiable” - Unbounded - Potential Dying Relu - Convolutional-friendly - Bounded - Probability-like function - Dense computation - Differentiable - On many examples of fully connected layers We are too cool to speak about linear activators, aren’t we? Not entirely...
  • 40. Hyperbolic Tangent - Bounded - Positive/negative values - Dense computation - Differentiable - Nice to LSTM-like thinking
  • 41. Softmax - Represents probability on a categorical distribution - Multiclass normalization - Bounded - Differentiable - Used on final layers
  • 42. Funciones de activación: Salidas Differentiation is the key
  • 43. On the ease of Derivations ● Sigmoid ● Hyperbolic Tangent ● ReLU ● Softmax
  • 44. On the ease of Derivations ● Sigmoid ● Hyperbolic Tangent ● ReLU ● Softmax Handset value
  • 45. Funciones de activación: Salidas ● Lineales: ● sf ● Binomiales : sigmoide ● ad ● Multinomiales: softmax Activation Functions Loss Functions
  • 46. Regression error ● The most classic measure ● Penalizes highly big mistakes ● Less interpretable ● Scale invariant ● Symmetric ● Interpretable ● Harder differentiability and convergence ● Harder differentiability and convergence ● Penalizes less on higher mistakes ● Interpretable
  • 47. Regression error ● The most classic measure ● Penalizes highly big mistakes ● Less interpretable ● Scale invariant ● Symmetric ● Interpretable ● Harder differentiability and convergence ● Harder differentiability and convergence ● Penalizes less on higher mistakes ● Interpretable The choice is always problem-dependent
  • 48. Funciones de coste ● Regresión: ● Clasificación: The shortest way is not always the best one
  • 49. Classification and Categorical Cross- Entropy ● Categorical Cross-Entropy Where indexes i and j stand for each example and resp. class, the ys stand for the true labels and the ps stand for their assigned probabilities On two classes it turns into the easy-to-understand, most common When compared to accuracy, Cross-Entropy turns to be a more granular way to compute error closeness of a prediction, as it takes into account the closeness of a prediction . Derivation also eases calculus compared with RMSE
  • 50. Classification and Categorical Cross- Entropy ● Categorical Cross-Entropy Where indexes i and j stand for each example and resp. class, the ys stand for the true labels and the ps stand for their assigned probabilities On two classes it turns into the easy-to-understand, most common When compared to accuracy, Cross-Entropy turns to be a more granular way to compute error closeness of a prediction, as it takes into account the closeness of a prediction . Derivation also eases calculus compared with RMSE Classificator 1 Classificator 2
  • 51. Classification and Categorical Cross- Entropy ● Categorical Cross-Entropy Where indexes i and j stand for each example and resp. class, the ys stand for the true labels and the ps stand for their assigned probabilities On two classes it turns into the easy-to-understand, most common When compared to accuracy, Cross-Entropy turns to be a more granular way to compute error closeness of a prediction, as it takes into account the closeness of a prediction . Derivation also eases calculus compared with RMSE Classificator 1 Classificator 2
  • 53. Regularization: Norm penalties ● Add a penalty to the loss function: ● L2: ○ Keep weights near zero. ○ Simplest one, differentiable. ● L1: ○ Sparse results, feature selection. ○ Not differentiable, slower.
  • 54. Regularization: Dropout ● Randomly drop neurons (along with their connections) during training. ● Acts like adding noise. ● Very effective, computationally inexpensive. ● Ensemble of all sub-networks generated.
  • 55. Regularization: Dropout ● Randomly drop neurons (along with their connections) during training. ● Acts like adding noise. ● Very effective, computationally inexpensive. ● Ensemble of all sub-networks generated.
  • 56. Regularization: Dropout ● Randomly drop neurons (along with their connections) during training. ● Acts like adding noise. ● Very effective, computationally inexpensive. ● Ensemble of all sub-networks generated.
  • 58. Optimization: Challenges ● The difficulty in training neural networks is mainly attributed to their optimization part. ● Plateaus, saddle points and local minima grows exponentially with the dimension ● Classical convex optimization algorithms don’t perform well.
  • 59. Optimization: Batch Gradient descent ● Goes over the whole training set. ● Very expensive. ● There isn’t an easy way to incorporate new data to training set.
  • 60. Optimization: Mini-Batch Gradient descent ● Stochastic Gradient Descent (SGD) ● Randomly sample a small number of examples (minibatch) ● Estimate cost function and gradient: ● Batch size: Length of the minibatch ● Iteration: Every time we update the weights ● Epoch: One pass over the whole training set. ● k = 1 => online learning ● Small batches => regularization
  • 61. Optimization: Variants ● Momentum:The momentum algorithm accumulates an exponentially decaying moving average of past gradients and continues to move in their direction. ● AdaGrad: The learning rate is adapted component-wise, and is given by the square root of sum of squares of the historical. ● RMSProp: modifies AdaGrad to perform better in the non-convex setting by changing the gradient accumulation into an exponentially weighted moving average ● ADAM(Adaptive Moment): Combination of RMSPROP and momentum.
  • 62. Momentum basics Negative of the gradient Momentum Real Movement
  • 63. © Stratio 2016. Confidential, All Rights Reserved. A fistful of cool applications 63 Not all that wander are lost
  • 64. © Stratio 2016. Confidential, All Rights Reserved. A fistful of cool applications 64 Not all that wander are lost Object Classification and Detection
  • 65. © Stratio 2016. Confidential, All Rights Reserved. A fistful of cool applications 65 Not all that wander are lost Object Classification and Detection RBM on Recommender Systems
  • 66. © Stratio 2016. Confidential, All Rights Reserved. A fistful of cool applications 66 Not all that wander are lost Object Classification and Detection Instant Visual translation RBM on Recommender Systems
  • 67. © Stratio 2016. Confidential, All Rights Reserved. A fistful of cool applications 67 Not all that wander are lost Object Classification and Detection Instant Visual translation RBM on Recommender Systems Generative Models (GAN/VAE)
  • 69. © Stratio 2016. Confidential, All Rights Reserved. tf.motivation 69 ● Growing Community: One of the main reasons to use TensorFlow is the huge community behind it. TensorFlow is widely known and used. ● Great Technical Capabilities: - Multi-GPU support - Distributed training - Queues for operations like data loading and preprocessing on the graph. - Graph visualization using TensorBoard. - Model checkpointing. - High Performance and GPU memory usage optimization ● High-quality metaframeworks: Keras because of TensorFlow and perhaps also TensorFlow because of Keras. Once again both lead the list of Deep Learning libraries. Continuous release schedule and maintenance. New features and tests are integrated first so that early adopters can try them before documentation. This is great for such a big community and allows the framework to keep inproving.
  • 71. Welcome to the jungle! ● Me Tarzán, you Cheetah. Human friendly interface. User actions are minimized in order to ease the process, isolating users from the backend. ● Territorial behaviors are allowed. Several backends can be used: Tensorflow, CNTK and Theano (poor Theano!), but there is also another interesting property on modularization: every model is a sequence of standalone modules plugged together with as little restrictions as possible, and allowing us to fully configure cost functions, optimizers, initializations, activation functions ... ● Keeps your model herd a-growin’. New modules are simple to add, and existing modules provide ample examples. ● Kaa is our friend. We love Python! It makes the lives of data scientists easier. the code is compact, easier to debug, and allows for ease of extensibility.
  • 72. © Stratio 2016. Confidential, All Rights Reserved. Ever felt lost in Automatic Translation? 72
  • 73. Índice Analítico Introducción: ¿por qué combinar modelos? Boosting & Bagging basics Demo: ○ Implementación de Adaboost con árboles binarios ○ Feature Selection con Random Forest 1 2 3 Not all that wander are lost What do we say to those who think machine translation sucks? Not today!
  • 74. © Stratio 2016. Confidential, All Rights Reserved. Neural Machine Translation Idea 74 Introducción: ¿por qué combinar modelos? Boosting & Bagging basics Demo: ○ Implementación de Adaboost con árboles binarios ○ Feature Selection con Random Forest 1 2 3 Not all that wander are lost Encoder: words => hidden state Decoder : hidden state => words Hidden states are not entirely universal languages!!
  • 75. © Stratio 2016. Confidential, All Rights Reserved. Attention Basics 75 ● Not every word is a one-to one translation ● Whole-Weighting combination increases computation time ● Some other more human approaches can be taken (e.g: reinforcement learning)
  • 76. © Stratio 2016. Confidential, All Rights Reserved. BaseSlide 76 Sequential Data
  • 77. © Stratio 2016. Confidential, All Rights Reserved. Sequence Statement 77 ● Most machine learning algorithms are designed for independent, unordered data. ● Many real problems uses sequential data: ○ Time series, behavior, audio signals… ○ t does not have to be time, can be spatial measure (images), or any order measure (Recommender systems) ● The sequences are a natural way of representing reality: vision, hearing, action-reaction, words, sentences, etc. ● Don’t forget order matters!!
  • 78. © Stratio 2016. Confidential, All Rights Reserved. Introducing Recurrent Neural Networks 78 ● Neural Networks with recurrent connections, specialized in processing sequential data. ● Recurrent connections allows a ‘memory’ of previous inputs. ● Can scale to long sequences (variable length), not practical for other types of nets. ● Same parameters for every timestep (t) => generalize RNN images by Christopher Olah
  • 79. © Stratio 2016. Confidential, All Rights Reserved. Recurrent Neural Networks Architecture 79 0 1 2 t Looping the loop: Backpropagation Through Time ● Same idea as in the standard backpropagation, but the recurrent net needs to be unfolded through time for a certain amount of timesteps. ● The weight changes calculated for each network copy are summed before individual weights are adapted. ● The set of weights for each copy(time step) always remain the same.
  • 80. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Through Time 80 0 1 2 t ● Cost function: ● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop. ● Chain rule application lead to a lot of derivation products. where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc)
  • 81. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Through Time 81 0 1 2 t ● Cost function: ● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop. ● Chain rule application lead to a lot of derivation products. where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc)
  • 82. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Through Time 82 0 1 2 t ● Cost function: ● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop. ● Chain rule application lead to a lot of derivation products. where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc) BackProp
  • 83. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Through Time 83 0 1 2 t ● Cost function: ● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop. ● Chain rule application lead to a lot of derivation products. where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc) BackProp
  • 84. © Stratio 2016. Confidential, All Rights Reserved. BackPropagation Through Time 84 0 1 2 t ● Cost function: ● Network parameters depend on the parameters on the previous timestep. So do derivations during backprop. ● Chain rule application lead to a lot of derivation products. where each Li stands for the usual cost on one timestep (e.g: MSE on regression, etc) BackProp
  • 85. © Stratio 2016. Confidential, All Rights Reserved. BaseSlide 85 Beware of the Vanishing Gradient!!
  • 86. © Stratio 2016. Confidential, All Rights Reserved. Gradients in time 86 ● Backpropagating the error in time involves as many recurrent derivation terms as timesteps on the net. ● It can be problematic if matrix W is too large or too low in terms of its values. ● Thus, the very first terms would have no influence on the result as there is no memory related to them
  • 88. © Stratio 2016. Confidential, All Rights Reserved. LSTM Briefing (Sepp Hochreiter and Jürgen Schmidhuber, 1997) 88 ● And up to three outputs, two of them are states: Long and short. ● Third output (if exists or considered) is similar to the classic output ● Timesteps are still the key ● From now on, we are going to have two connections (states) ● Each timestep receives an input LSTM images also by Christopher Olah
  • 90. © Stratio 2016. Confidential, All Rights Reserved. LSTM Briefing (II) 90 ● Each timestep may have one or more units ● Each state corresponds to each kind of memory at play: Long and Short ● Inside each cell, there are four questions asked: ○ Which part of the Long memory has to be deleted? ○ From the new info, is there anything interesting to be remembered? ○ If there is, How do we combine it along with the Long memory? ○ What is the Short term impression for this step?
  • 91. © Stratio 2016. Confidential, All Rights Reserved. LSTM Briefing (II) 91 ● Each timestep may have one or more units ● Each state corresponds to each kind of memory at play: Long and Short ● Inside each cell, there are four questions asked: ○ Which part of the Long memory has to be deleted? ○ From the new info, is there anything interesting to be remembered? ○ If there is, How do we combine it along with the Long memory? ○ What is the Short term impression for this step? forget gate f
  • 92. © Stratio 2016. Confidential, All Rights Reserved. LSTM Briefing (II) 92 ● Each timestep may have one or more units ● Each state corresponds to each kind of memory at play: Long and Short ● Inside each cell, there are four questions asked: ○ Which part of the Long memory has to be deleted? ○ From the new info, is there anything interesting to be remembered? ○ If there is, How do we combine it along with the Long memory? ○ What is the Short term impression for this step? input gate forget gate if
  • 93. © Stratio 2016. Confidential, All Rights Reserved. LSTM Briefing (II) 93 ● Each timestep may have one or more units ● Each state corresponds to each kind of memory at play: Long and Short ● Inside each cell, there are four questions asked: ○ Which part of the Long memory has to be deleted? ○ From the new info, is there anything interesting to be remembered? ○ If there is, How do we combine it along with the Long memory? ○ What is the Short term impression for this step? input gate forget gate candidate gate cif
  • 94. © Stratio 2016. Confidential, All Rights Reserved. LSTM Briefing (II) 94 ● Each timestep may have one or more units ● Each state corresponds to each kind of memory at play: Long and Short ● Inside each cell, there are four questions asked: ○ Which part of the Long memory has to be deleted? ○ From the new info, is there anything interesting to be remembered? ○ If there is, How do we combine it along with the Long memory? ○ What is the Short term impression for this step? input gate forget gate candidate gate output gate ocif
  • 95. © Stratio 2016. Confidential, All Rights Reserved. BaseSlide 95 Focusing on forget gate the question is answered as follows: where the h is the activation the b is the associated bias and the W is the weight matrix on the forget gate. Or, in a more explicit way: Where the Wfx is the input weight matrix (the classic one) and Whh is the hidden state matrix between timesteps. On a similar way, one can express input and output equations this very same way: hit and hot Anyway, there are some differences on the candidate gate ones, mainly related to its activation function: the hyperbolic tangent. On the same notation:
  • 96. © Stratio 2016. Confidential, All Rights Reserved. BaseSlide 96 Focusing on forget gate the question is answered as follows: where the h is the activation the b is the associated bias and the W is the weight matrix on the forget gate. Or, in a more explicit way: Where the Wfx is the input weight matrix (the classic one) and Whh is the hidden state matrix between timesteps. On a similar way, one can express input and output equations this very same way: hit and hot Anyway, there are some differences on the candidate gate ones, mainly related to its activation function: the hyperbolic tangent. On the same notation:tanh values in a [-1, 1] range. This way we are able to add and subtract on the Long term memory
  • 97. © Stratio 2016. Confidential, All Rights Reserved. Inside a LSTM Cell (II) 97 And finally, we can update states, including output. This way: Or on simpler words, we forget what is to be forgotten and we add what is to be added. At the very end, with the same tanh idea, we put Short and Long terms together:
  • 98. So, LSTM nets are That easy?
  • 99. © Stratio 2016. Confidential, All Rights Reserved. Cool Applications 99 Not all that wander are lost CNN + LSTM to describe pictures Film scripts. Yes, it’s for real
  • 100. © Stratio 2016. Confidential, All Rights Reserved. BaseSlide 100 The man who creates the network should write the code Demo Time!!
  • 101. © Stratio 2016. Confidential, All Rights Reserved. 101
  • 102. © Stratio 2016. Confidential, All Rights Reserved. 102 AutoEncoders
  • 103. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Idea) 103 Input hidden hidden hidden Output ● Supervised neural networks try to predict labels from input data ● It is not always possible to obtain labels ● Unsupervised learning can help obtain data structure. ● What if we turn the output to be the input?
  • 104. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Idea) 104 This is not the Generative Model you are looking for Input image
  • 105. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Idea) 105 This is not the Generative Model you are looking for Input image
  • 106. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Idea) 106 This is not the Generative Model you are looking for Input image
  • 107. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Idea) 107 This is not the Generative Model you are looking for Input image
  • 108. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Idea) 108 This is not the Generative Model you are looking for Input image Output image
  • 109. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Idea) 109 This is not the Generative Model you are looking for Input image Output image It tries to predict x from x, but no labels are needed. The idea is learning an approximation of the identity function. Along the way, some restrictions are placed: typically the hidden layers compress the data. The original input is represented at the output, even if it comes from noisy or corrupted data.
  • 110. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Encoder and decoder) 110 This is not the Generative Model you are looking for Input image Output image
  • 111. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Encoder and decoder) 111 This is not the Generative Model you are looking for Input image Output image Encode Decode
  • 112. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders (Encoder and decoder) 112 This is not the Generative Model you are looking for Input image Output image The latent space is commonly a narrow hidden layer between encoder and decoder It learns the data structure Encoder and decoder can share the same (inversed) structure or be different. Each one can have its own depth (number of layers) and complexity. Encode Decode Latent Space
  • 113. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders BackPropagation 113 This is not the Generative Model you are looking for Input image Output image Encode Decode Latent Space
  • 114. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders BackPropagation 114 This is not the Generative Model you are looking for Input image Output image A cost function can be defined taking into account differences between input and Decoded(Encoded(Input)) This allows BackProp to be carried along Encoder and Decoder To prevent function composition to be the Identity, some regularizations can be taken One of the most common is just reducing the latent space dimension (i.e: compressing the data on the encoding) Encode Decode Latent Space BackPropagation
  • 115. © Stratio 2016. Confidential, All Rights Reserved. Autoencoders Applications 115 Reduction of dimensionality Data Structure/Feature learning Denoising or data cleaning Pre-training deep networks
  • 116. © Stratio 2016. Confidential, All Rights Reserved. Data Augmentation 116
  • 117. © Stratio 2016. Confidential, All Rights Reserved. Data Augmentation 117
  • 118. © Stratio 2016. Confidential, All Rights Reserved. Data Augmentation 118 ● Specialized image and video classification tasks often have insufficient data. ● Traditional transformations consist of using a combination of affine transformations to manipulate the training data ● Data augmentation has been shown to produce promising ways to increase the accuracy of classification tasks. ● While traditional augmentation is very effective alone, other techniques enabled by generative models have proved to be even better
  • 119. © Stratio 2016. Confidential, All Rights Reserved. Generative Models (Idea) 119 Generative Models “What I cannot create, I do not understand.” —Richard Feynman
  • 120. © Stratio 2016. Confidential, All Rights Reserved. Generative Models (Idea) 120 ● They model how the data was generated in order to categorize a signal. ● Instead of modeling P(y|x) as the usual discriminative models, the distribution under the hood is P(x, y) ● The number of parameters is significantly smaller than the amount of data on which they are trained. ● This forces the models to discover the data essence ● What the model does is understanding the world around the data, and provide good data representations of it
  • 121. © Stratio 2016. Confidential, All Rights Reserved. Generative Models Applications 121 ● Generate potentially unfeasible examples for Reinforcement Learning ● Denoising/Pretraining ● Structured prediction exploration in RL ● Entirely plausible generation of images to depict image/video ● Feature understanding
  • 122. © Stratio 2016. Confidential, All Rights Reserved. Generative Models Applications 122 ● Generate potentially unfeasible examples for Reinforcement Learning ● Denoising/Pretraining ● Structured prediction exploration in RL ● Entirely plausible generation of images to depict image/video ● Feature understanding
  • 123. © Stratio 2016. Confidential, All Rights Reserved. Variational Autoencoder Idea (I) 123 Input image Output image Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network
  • 124. © Stratio 2016. Confidential, All Rights Reserved. Variational Autoencoder Idea (II) 124 Input image Output image Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network
  • 125. © Stratio 2016. Confidential, All Rights Reserved. Variational Autoencoder Idea (II) 125 Output image Latent Space Mean Vector Standard Deviation Vector Decoder Network
  • 126. © Stratio 2016. Confidential, All Rights Reserved. Variational Autoencoder Idea (II) 126 Latent Space Mean Vector Standard Deviation Vector Decoder Network
  • 127. © Stratio 2016. Confidential, All Rights Reserved. Variational Autoencoder Idea (II) 127 Output image Latent Space Mean Vector Standard Deviation Vector Decoder Network Sample on Latent Space => Generate new representations Prior distribution
  • 129. © Stratio 2016. Confidential, All Rights Reserved. Latent Space Distribution (I) 129 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network
  • 130. © Stratio 2016. Confidential, All Rights Reserved. Latent Space Distribution (II): VAE Loss function 130 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network ● Encoder and decoder can be denoted as conditional probability representations of data:
  • 131. © Stratio 2016. Confidential, All Rights Reserved. Latent Space Distribution (II): VAE Loss function 131 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network ● Encoder and decoder can be denoted as conditional probability representations of data: ● Typically the encoder reduces dimensions as decoder increases it . So, when reconstructing the inputs some information is lost. This information loss can be measured using the reconstruction log-likelihood:
  • 132. © Stratio 2016. Confidential, All Rights Reserved. Latent Space Distribution (II): VAE Loss function 132 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network ● Encoder and decoder can be denoted as conditional probability representations of data: ● Typically the encoder reduces dimensions as decoder increases it . So, when reconstructing the inputs some information is lost. This information loss can be measured using the reconstruction log-likelihood: ● In order to keep the latent image distribution under control, we can introduce a regularizer into the loss function. The Kullback-Leibler divergence between the encoder distribution and a given and known distribution, such as the standard Gaussian:
  • 133. © Stratio 2016. Confidential, All Rights Reserved. Latent Space Distribution (II): VAE Loss function 133 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network ● Encoder and decoder can be denoted as conditional probability representations of data: ● Typically the encoder reduces dimensions as decoder increases it . So, when reconstructing the inputs some information is lost. This information loss can be measured using the reconstruction log-likelihood: ● In order to keep the latent image distribution under control, we can introduce a regularizer into the loss function. The Kullback-Leibler divergence between the encoder distribution and a given and known distribution, such as the standard Gaussian: ● With this penalty in the loss encoder, outputs are forced to be sufficiently diverse: similar inputs will be kept close (smoothly) together in the latent space.
  • 135. © Stratio 2016. Confidential, All Rights Reserved. Latent Space Distribution (III): Probability overview 135 Latent Space Mean Vector Standard Deviation Vector Encoder Network Decoder Network● The VAE contains a specific probability model of data x and latent variables z. ● We can write the joint probability of the model as p(x,z): “how likely is observation x under the joint distribution”. ● By definition, p(x, z)=p(x∣z)p(z) ● In order to generate the data, the process is as follows: For each datapoint i: - Draw latent variables zi∼p(z) - Draw datapoint xi∼p(x∣z) ● We need to figure out p(z) and p(x|z) ● The likelihood is the representation to be learnt from the decoder ● Encoder likelihood can be used to estimate parameters from the prior.
  • 136. © Stratio 2016. Confidential, All Rights Reserved. Variational Autoencoder: BackProp +reparametrization trick 136 ● VAEs are built by using Backpropagation on the previously defined loss function. ● Mean and variance estimations doesn’t get us Z but its distribution parameters. ● In order to get Z we could sample directly from the true posterior given the parameters, but sampling cannot be differentiated. ● Instead a trick can be applied so that the non- differentiable part is left outside the network ● By stating we can remove the sampling from the backprop part
  • 137. Índice Analítico Introducción: ¿por qué combinar modelos? Boosting & Bagging basics Demo: ○ Implementación de Adaboost con árboles binarios ○ Feature Selection con Random Forest 1 2 3 Not all that wander are lost Any Questions?

Notes de l'éditeur

  1. Pon el simplescreenrecorder
  2. Pon el simplescreenrecorder
  3. Pon el simplescreenrecorder
  4. Pon el simplescreenrecorder
  5. Pon el simplescreenrecorder
  6. Pon el simplescreenrecorder
  7. Pon el simplescreenrecorder
  8. Pon el simplescreenrecorder
  9. Pon el simplescreenrecorder Teoría vieja. Ahora es el momento Representation learning
  10. 14
  11. Two main software modules PyStratio is a Python package providing complete access to all SparkML distributed algorithms via PySpark, as well to Stratio Crossdata. RStratio is an R package that relies on SparkR and provides wrappers for SparkML distributed algorithms, a feature not supported by the official SparkR releases. Integration with other distributed libraries such as H2O and Tensorspark TensorFlow over Spark REST API That eases and speeds up the process of moving a trained model from development to production environments. The model becomes accessible through a web REST interface, and can be applied to get real-time predictions or on massive batch data.
  12. Two main software modules PyStratio is a Python package providing complete access to all SparkML distributed algorithms via PySpark, as well to Stratio Crossdata. RStratio is an R package that relies on SparkR and provides wrappers for SparkML distributed algorithms, a feature not supported by the official SparkR releases. Integration with other distributed libraries such as H2O and Tensorspark TensorFlow over Spark REST API That eases and speeds up the process of moving a trained model from development to production environments. The model becomes accessible through a web REST interface, and can be applied to get real-time predictions or on massive batch data.
  13. Two main software modules PyStratio is a Python package providing complete access to all SparkML distributed algorithms via PySpark, as well to Stratio Crossdata. RStratio is an R package that relies on SparkR and provides wrappers for SparkML distributed algorithms, a feature not supported by the official SparkR releases. Integration with other distributed libraries such as H2O and Tensorspark TensorFlow over Spark REST API That eases and speeds up the process of moving a trained model from development to production environments. The model becomes accessible through a web REST interface, and can be applied to get real-time predictions or on massive batch data.
  14. Distributed cluster Each user launchs its own environment JupyterHub each notebook is run independientemente los datos están en local por facilitar el acceso y porque son pocos, pero el acceso a un datastore completo sería más data centric
  15. We will execute an algorithm as simple as approximating Pi via Monte Carlo Each node executes its tasks on an independent way. Es in-graph Es síncrono The graph can be viewed via tensorBoard.
  16. We will execute an algorithm as simple as approximating Pi via Monte Carlo Each node executes its tasks on an independent way. Es in-graph Es síncrono The graph can be viewed via tensorBoard.
  17. Sin entrar en detalle, la parte del output * (1-output) con el cross-entropy se va, y no con el RMSE, con lo que no hay problemas de cuando damos probabilidades muy altas
  18. Sin entrar en detalle, la parte del output * (1-output) con el cross-entropy se va, y no con el RMSE, con lo que no hay problemas de cuando damos probabilidades muy altas
  19. Sin entrar en detalle, la parte del output * (1-output) con el cross-entropy se va, y no con el RMSE, con lo que no hay problemas de cuando damos probabilidades muy altas
  20. setting α to 0 results in no regularization. Larger values of α correspond to more regularization. Diferentes F, diferentes resultados La derivada en L2 se incremena linealmente con w, en L1 es constante sign(w)
  21. Optimization algorithms that use the entire training set are called batch or deterministic gradient methods,
  22. For instance, if our train set has 1500 examples, and our batch size is 500, then it will take 3 iterations to complete 1 epoch.
  23. https://indico.io/blog/the-good-bad-ugly-of-tensorflow/
  24. Keras (κέρας) means horn in Greek. It is a reference to a literary image from ancient Greek and Latin literature, first found in the Odyssey, where dream spirits (Oneiroi, singular Oneiros) are divided between those who deceive men with false visions, who arrive to Earth through a gate of ivory, and those who announce a future that will come to pass, who arrive through a gate of horn. It's a play on the words κέρας (horn) / κραίνω (fulfill), and ἐλέφας (ivory) / ἐλεφαίρομαι (deceive). "Oneiroi are beyond our unravelling --who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them." Homer, Odyssey 19. 562 ff (Shewring translation).
  25. ejemplo: yuxtaposición roja
  26. Corto plazo: neurotransmisores de una neurona a la siguiente medio plazo: activación de terminales que no hacían nada Largo plazo: nuevos terminales. Implica variaciones en expresión génica, remodelaciones de la célula, etc
  27. This is our programme. You will be able to see information and updates + our schedule by scanning this QR code