2. ● What is deep learning (DL)?
● In short, DL is about works driven by super deep
neural networks.
Overview of Deep Learning
3. Overview of Deep LearningFus
e
Creat
e
Prisma
Paints
Chainer
Understand
4. Meaning of Deep Learning
● What is the necessity of having such a giant (millions of
parameters)?
● Solve non-well-defined problem
● Handle ambiguous features
● Perhaps, the giant is not so scary?
5. Meaning of Deep Learning
● Solve non-well-defined problem
How to describe this mapping?
32 *
32
8. Meaning of Deep Learning
● Perhaps, the giant is not so scary?
● There are many excellent open source frameworks.
● Caffe/Tensorflow/MXnet/Torch <- My favorite
● End-to-end fashion doesn’t involve much stuff.
● The state-of-the-art model can be implemented in just 30
lines!
● Model can easily be generalized to other cases.
● Design once, deploy everywhere.
9. Prerequisites
● Math
● Calculus (derivate / maxima / maxium)
● Linear algebra ( basic vector / matrix operation)
● Coding
● Have taken any course about C/C++/Java
● Python basics
10. The First Step — Linear
Regression
Given data (x1, y1), (x2, y2), … (xn, yn), to
find a line which minimize the overall
distance
Example: You now are going to rent a apartment and worrying about leasing a
overpriced one. Therefore you write a web crawl to collect apartment information
nearby. Linear regression can help you to find proper parameters to estimate
apartment price.
17. Limits of Linear Regression
● How to deal with big data?
● Stochastic Gradient Descent (A small batch per loop)
● How to choose a proper learning rate(step size)?
● Cross-Validation
● Advanced algorithm — Adagrad
● What if stuck at local optimal?
● A good initialization
● Advanced algorithm — Momentum, Nestrov
22. Neural Network
● The history of Neural Network
● Created in late 1940s by a psychologist (Hebbian
Learning).
● Short resurgence in 1975 powered by back
propagation.
● Extremely popular after 2012 (The AlexNet
outperforms any other traditional models)
26. Propagation and Back
Propagation
W_ij : the weight of edge from previous Jth node
to Ith node
a_k : the value of Kth node
z_k : the value after activation of Kth node
30. Going Deeper! Deep Neural
Network
● Why previous neural network cannot go
deeper?
● Too much parameters!
● Gradient vanish
● Backward of tons of parameters
● take weeks to train
● easily to overfit
● stuck at local optimal
33. Deep Neural Network
● Convolutional operation
● Different kernel matrices brings different results
● Can be view as a powerful preprocess tool
● But how to choose a proper kernel matrix?
● Let the matrix learn parameters by itself !
● Convolutional layers!