The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
[243] turning data into value
1.
2. Ph.D in Computer Science at ENS Paris/INRIA
Postdoctoral Fellow at Carnegie Mellon University
>500 citations, Best Paper Award at 2009 CVPR Conference
NEC Labs (Bell Labs) in Cupertino (Silicon Valley)
Senior Researcher at Intel (3 pending patents)
- Developed ML algorithms for face recognition
Invited speaker to CMU, Samsung, Tokyo Univ, SNU, etc.
Co-Founder of Solidware
Olivier Duchenne
Co-founder | Chief Machine Learning Scientist
8 years experience in Machine learning, Computer Vision and Big Data
3. Guidelines for using Machine Learning on real data
Avoid Common Mistakes
Understand Better the Data
1.Big Enough Data?
2.Changing Data
Machine Learning and Data Science
4. From Computer Vision Experience
To Solving Companies issues:
Ex: car accident prediction (insurance),
default prediction (bank),
stock value prediction
Machine Learning and Data Science
5. Prediction Function
Predicted Target Value
ML Algorithms analyze
historical data
to detect patterns
PAST DATA
(Training Data Set)
Internal Data
Ex: Age, Gender
External Data
Ex: Web Crawl
Target Value
Machine-Learning based Predictive Modeling
Newly Incoming Data
Unknown
Target Value
Internal Data External Data
6. 1. Prediction Function. Ex: a linear function, a neural net,…
2. The prediction function is parametrized. Ex: 𝐟 𝜶 𝐗 = 𝜶𝒊 𝑿𝒊𝒊
3. The goal is to find the best prediction function, i.e. the best
parameters.
4. We build an objective function, that represents how good a
prediction function is.
5. The objective function always has a data term. Ex: 𝐨𝐛𝐣 𝜶 =
𝒇 𝜶 𝑿 𝒔 − 𝒀 𝒔 𝟐
𝒔
6. The algorithm tries to find the best parameters, that optimizes this
objective function. Ex: closed form solution, stochastic gradient
descent, …
Basic Explanation of Machine Learning
7. History of Machine Learning for Computer Vision
Model-Driven Mixed Data-Driven
1970s
Hand-designed Model
1980s
Alignment
Method
2000s
Deformable
Model
2010s
Conv. Network
1990s
Grid Model
8. Why didn’t people use ML since the beginning?
General Assumptions for the reason
1.“Better Computer” available now
2.“Better Algorithms”
3.“Amount of Data”
“We create so much data that 90% of the data in the world today has
been created in the last two years alone”
- Petter Bae Brandtzæ g, SINTEF ICT
9. How much data did CV Researcher use?
Image source: http://www.vision.caltech.edu/ Image source: http://doi.ieeecomputersociety.org/
2004
Caltech 101
10K Images
2005-2010
Pascal VOC
2K 30K objects
2010-2015
Image Net
10M 15M images
http://www.image-net.org/
10. The answer is… “Amount of Data”
Image source - Smartdatacollective.com
• Most Advanced Machine
Learning cannot be applied if
there are not enough data
• Critical mass of data is
necessary to use, for example,
deep learning
• When the amount of data
increases, the machine
learning models and, therefore,
the prediction model becomes
more complex and better
11. With enough data, ANY algorithms okay?
Support vector machines Bayesian networks
Regression forestSparse dictionary learning
Artificial neural networksK-Nearest neighbors
Deep learning Boosting
Deep Learning Neural Networks Log. Regression
No, it depends on the company and the problem you are trying to solve
A B C
12. What Changed in Machine Learning Domain
From the Past to the Present:
13. Synonym: Over generalizing
That is like visiting a new place during one day, seeing a mountain fire.
And believing that there are fires everyday there.
Why do we need lots of data?
Overfitting
In real life, we do not have many chances of having
clean & BIG data
14. 0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Seoul Busan Daejeon Gwangju
Prob. To default
Prob. To default
… (many more cities)
An example: Overfitting due to lack of data
As there are many
categories,
some categories with small
data show outlier results
16. You want to detect an event which occur on average with probability: p=5%
Let’s say you have many cities with ~50 samples
On average, 1/13 will have this event 0 times.
Without proper handling, the extreme case, will be all wrong.
This kind of error can happen often
17. How to fight against overfitting
Data
More Samples
Less Variables
Artificial Data Extension
Algorithm
Simpler Objective Function
Regularization
Bagging
Modeling
Feature Engineering
Data Normalization
18. Data
In Computer Vision, it is possible to extend the data.
Ex: Hiring annotator, Amazon Mechanical Turk, Google Re-Captcha
Companies often they have a limited number of samples, and cannot extend it.
Ex: A Korean Bank that gives ~100K loans per year
19. 1. Count only positives ( Detecting rare events require more data)
Ex: Image Detection. It’s easy to find an infinite number of negatives.
Often company want to detect rare events (few positives)
Ex: predicting car accident / ad clicks / defaults / online purchase
How to count your data?
20. 2. Difficulty of the task
How to count your data?
• Learning addition ( 𝒚 = 𝟏 ∗ 𝑿 𝟏 + 𝟏 ∗ 𝑿 𝟐 )
(Requires ~100 samples)
• Learning object recognition
( Requires ~10M samples)
21. 3. Probabilistic event detection is harder.
What is in this image? Will this user click on a car advertisement?
Client #1: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Client #2: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Yes
No
How to count your data?
22. Algorithm
1. Many algorithms exist: GLM, Boosting, Lasso, Regression Forest, SVM,
Gaussian Process, Bayesian Networks, Deep Learning, …
2. The complexity of their prediction functions differ.
3. The more complex the prediction function is, the more it fits the data.
Purchase
Prob.
Age
Purchase
Prob.
Age
Purchase
Prob.
Age
Underfitting Overfitting
Algorithm
23. 1. Less parameters Less overfitting
2. More parameters Less underfitting
3. Ex: Best of both worlds: Deep Conv Nets
Algorithm
24. Avoiding “Too Many Categories” problem
Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San
25. Avoiding “Too Many Categories” problem
Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San
Grouping
Merging
26. Avoiding “Too Many Categories” problem
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5 6
Prob. To default
Prob. To default log10(population)
28. Data Normalization
Removing variance that has no impact on the target value Help the ML system to focus on meaningful variance
Deep Face (Facebook 2014), DB size: 120M images
29. Bagging
1. Randomly modify slightly the training set.
2. Do the training
3. Repeat
4. Average all prediction functions
30. • Market changes
• Law/Regulation Changes
• Collected Data changes
• Client filtering / Marketing changes
Data change through time
Representation of data change
• Variable names change
• Category names change
Changing Data
• Cyclic Data Changes
Seasonality
• Trending has to be handled separately
Interpolation – Extrapolation
31. Why is time so different from other variables ?
Prob.
To buy
A
smartphone
Age
Prob.
To buy
A
smartphone
Time
?
?
Interpolation Extrapolation
32. Time is correlated with hidden variables
Cost for car
insurance
(one type of
insurance)
Time
New Law
33. Change causes can be unknown, but consistant
Cost for car
insurance
(one type of
insurance)
Time