3. This talk will
- introduce machine learning
- make you ML-aware
- have examples
4. This talk will not
- give you a PhD
- implement algorithms
- cover collaborative filtering,
optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...
5. What is Machine Learning?
Many different algorithms
that predict data
from other data
using applied statistics.
12. Classification
• Documents
o Sort email (Gmail's importance filter)
o Route questions to appropriate expert (Aardvark)
o Categorize reviews (Amazon)
• Users
o Expertise; interests; pro vs free; likelihood of paying;
expected future karma
• Events
o Abnormal vs. normal
14. Algorithms:
Decision Tree Learning
Features
Email contains
word "viagra"
no yes
Email contains Email contains
word "Ruby" attachment?
no yes no yes
P(Spam)=10% P(Spam)=5% P(Spam)=70% P(Spam)=95%
Labels
15. Algorithms:
Support Vector Machines (SVMs)
Graphics from Wikipedia
16. Algorithms:
Support Vector Machines (SVMs)
Graphics from Wikipedia
17. Algorithms:
Naive Bayes
• Break documents into words and treat each
word as an independent feature
• Surprisingly effective on simple text and
document classification
• Works well when you have lots of data
Graphics from Wikipedia
18. Algorithms:
Naive Bayes
You received 100 emails, 70 of which were spam.
Word Spam with this word Ham with this word
viagra 42 (60%) 1 (3.3%)
ruby 7 (10%) 15 (50%)
hello 35 (50%) 24 (80%)
A new email contains hello and viagra. The probability that it
is spam is:
P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra)
= 0.7 * (0.5 * 0.6) / (0.59 * 0.43)
= 82%
Graphics from Wikipedia
19. Algorithms:
Neural Nets
Hidden layer
Input layer (features)
Output layer (Classification)
Graphics from Wikipedia
20. Curse of Dimensionality
The more features
and labels that you
have, the more data
that you need.
http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg
21. Overfitting
• With enough parameters, anything is
possible.
• We want our algorithms to generalize and
infer, not memorize specific training
examples.
• Therefore, we test our algorithms on
different data than we train them on.