https://www.youtube.com/watch?v=5ZUlVlumIQo&list=PLqJzTtkUiq54DDEEZvzisPlSGp_BadhNJ&index=10
Over the last years, deep learning is rapidly advancing with impressive results obtained in several areas including computer vision, machine translation and speech recognition. Deep learning attempts to learn complex function through learning hierarchical representation of data. A deep learning model is composed of non-linear modules that each transforms the representation from lower layer to the higher more abstract one. Very complex functions can be learned using enough composition of the non-linear modules. Furthermore, the need for manual feature engineering can be obviated by learning features themselves through the representation learning. In this talk, we first explain how deep learning architecture in particular and neural networks in general are loosely inspired by mammalian visual cortex and nervous system respectively. We also discuss about the reason for big and successful comeback of neural networks with the deep learning models. Finally, we give a brief introduction of various deep structures and their applications to several domains.
References:
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.
Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning for NLP." Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL) (2013).
Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning. 2010.
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference on Machine Learning (ICML’13). 2013.
Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the conference on empirical methods in natural language processing (EMNLP). Vol. 1631. 2013.
https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
https://www.udacity.com/course/deep-learning--ud730
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
social pharmacy d-pharm 1st year by Pragati K. Mahajan
Semantic, Cognitive and Perceptual Computing -Deep learning
1. Brief Overview of Deep Networks
Monireh Ebrahimi
Semantic Cognitive Perceptual Computing Course, July 2016.
Ohio Center of Excellence in Knowledge-enabled Computing(Kno.e.sis),
Wright State University, USA
1
2. • “Representation-learning methods with
multiple levels of representation, obtained by
composing simple but non-linear modules
that each transform the representation at one
level (starting with the raw input) into a
representation at a higher, slightly more
abstract level. “
What is deep learning?
2
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553
(2015): 436-444.
3. Successive model layers learn deeper intermediate representations.
Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010
Workshop on Deep Learning and Unsupervised Feature Learning.
2010.
3
What is deep learning? Learning Hierarchical Representations
Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning
for NLP." Tutorial at Association of Computational Logistics (ACL),
2012, and North American Chapter of the Association of
Computational Linguistics (NAACL) (2013).
4. • Image recognition: Pixel → edge → texton →
motif → part → object
• Text: Character → word → word group →
clause → sentence → story
• Speech: Sample → spectral band → sound →
… phone → phoneme → word
What is deep learning? Learning Hierarchical Representations
4
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International
Conference on Machine Learning (ICML’13). 2013.
5. • Does not require any manual Feature
Engineering
• Deep architectures work well (vision, audio,
NLP, etc.)!
– Speech Recognition(2009)
– Computer Vision (2012)
• Early in 2015, a machine was able to beat the human at
an object recognition challenge for the first time in the
history of AI.
– Machine Translation (2014)
Why go deep?
5
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553
(2015): 436-444.
6. • Loosely inspired by biological neural networks
(the central nervous system of animals),
particularly brain
Biologically inspired: how does the cortex learn perception?
6
7. • which details are important?
• For airplanes, feathers and wing flapping
weren't crucial
• What is the equivalent of aerodynamics for
understanding intelligence?
“Let's be inspired by nature, but not too much”
7
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference
on Machine Learning (ICML’13). 2013.
8. • Retina - LGN - V1 - V2 - V4 - PIT – AIT
• Lots of intermediate representations
Biologically Inspired: The Mammalian Visual Cortex is
Hierarchical.
8
[picture from Simon Thorpe]
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International
Conference on Machine Learning (ICML’13). 2013.
12. RBM (Restricted Boltzman Machine)
12
• Solution to Vanishing
Gradient Problem
• Reconstruct the input and
learn the features in this
process.
https://www.youtube.com/channel/UC9OeZkIwhzfv-
_Cb7fCikLQ
14. Deep Learning for NLP
14
• Use of vectors
– dense low-dimensional real-valued vectors
• Continuous Bag of Words
• Skip Gram Model
• Two popular tools: Word2Vec, Glove
– One-hot vector
• Size of the entire vocabulary
• Very large sparse vector
https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
17. Deep Belief Net
17
• Stack of RBMs
• Identical to MLP in terms of network structure
• Different Training:
– Pre-training
– Fine-tuning
• Small labeled dataset
• Reasonable training time
• Very accurate
• Image Recognition
18. Convolutional Neural Networks
18
1. Convolutional layer
2. ReLU layer
3. Pooling Layer
4. Fully Connected Layer
• Supervised
• Large amount of labeled
data for training
19. Convolutional Neural Networks
19
– CNN performs quite well on NLP problems.
• Although we do not have the nice intuition that we have for
image recognition
– Text Processing (Sentiment Analysis and Text
Categorization)
• Word-level
• Character-level:
– Very attractive for user-generated contents with typos and new
vocabularies
– Models can be fine-tuned from a task A with large corpus to a
more targeted task with smaller corpus
– Learning directly from character-level input (needs millions of
examples)
– Learning from pre-trained character embeddings
21. Recurrent Neural Nets
21
• Extremely difficult to train
– Exponential Vanishing Gradient Problem
• RNN with n time steps = n layers MLP
– Solution:
• LSTM/GRU: Helps the net to decide when to forget the current
input and when to remember it for the future time steps.
• Good for:
– Time Series Analysis (Forecasting)
– Machine Translation
– Text Processing (Parsing, NER, Sentiment Analysis)
• Word-level
• Character-level
22. Recursive Neural Tensor Network
22
• Leave group:
• input
• Root group:
• class and score
Socher, Richard, et al. "Recursive deep models for semantic compositionality
over a sentiment treebank." Proceedings of the conference on empirical
methods in natural language processing (EMNLP). Vol. 1631. 2013.
23. References
23
1. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature
521.7553 (2015): 436-444.
2. Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning for NLP."
Tutorial at Association of Computational Logistics (ACL), 2012, and North American
Chapter of the Association of Computational Linguistics (NAACL) (2013).
3. Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010 Workshop
on Deep Learning and Unsupervised Feature Learning. 2010.
4. LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International
Conference on Machine Learning (ICML’13). 2013.
5. Socher, Richard, et al. "Recursive deep models for semantic compositionality over
a sentiment treebank." Proceedings of the conference on empirical methods in natural
language processing (EMNLP). Vol. 1631. 2013.
6. https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
7. https://www.udacity.com/course/deep-learning--ud730
8. http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-
introduction-to-rnns/
24. Thank you
Thank you, and please visit us at http://knoesis.org
monireh@knoesis.org
24
Notes de l'éditeur
With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations.
It's nice imitate nature,
Which details are merely the result of evolution, and the constraints of biochemistry?
Vanishing Gradient Problem: One of the reasons that NNs were not that much successful before.
Solved in 2006-2007 by Benjio, Le Cun, Hinton papers, 3 papers, breakthrough in deep learning
Neural networks, big come back with deep learning
1- forward: an RBM takes input and translates them into a set of numbers that encode the inputs
2- backward: takes this set of numbers and translates them back to form the reconstructed inputs.
3- At the visible layer, the reconstruction
Deep Autoencoders are extremely useful tools for dimensionality reduction
An autoencoder is a neural net that takes a set of typically unlabeled inputs, and after decoding them, tries to reconstruct them as accurately as possible. As result of this, then net must decide which of the data features are the most important, essentially acting as a feature extraction engine.
The fundamental difference between deep learning and traditional NLP methods is the use of dense vectors.
word2vec map a word into a 1D vector whose size is some fixed size chosen empirically(N), that is the number of nodes in the hidden layer also. Indeed, after training the neural network, for each word in the input layer, all the weights to the hidden layer of dimension N is learned. So for each word you have 1*N vector of weights that is its vector representation (so-called the real-value dense low-dimensional(1*N) vector representation of that word ).
What the neural net takes as input is the one-hot vector of size V*1, so in each iteration only one word in the NN input is 1. What NN does in that iteration is changing all the output vectors (that is updating all the weights between the hidden layer and the output layer (1*N vector for each word)) in a way that all the output words that can co-occur with the input word become more similar to the input word (that is activated by being 1) and all other words in the output that cannot be in the input word context more dissimilar. Similarly the vector for the input (that is activated by being 1) will be updated in a way that the input word will become more similar to its context words in output. What we mean by input vector is the weight between input layer and hidden layer that is 1*N.
After running the algorithm many times we have 2 choices: using 1*N vector of weight from each word from input layer to its hidden layer as its vector representation or choose the 1*N vector of weights from the word form output layer to the hidden layer as the 1*N vector representation of that word. Empirically the use the first choice. So what we use a vector representation of one word from word2vec is nothing but the 1*N vector of weights from that word in input layer to the hidden layer.
We do not start backpropagation until we already have sensible weights that already do well at the task. – So the initial gradients are sensible and backprop only needs to perform a local search. [https://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf]
https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
ReLU: for the vanishing gradient problem
Pooling layer: for dimensionality reduction
Words in the source language: input
Words in the target language: output