4. Definition
Field of Study that gives Computers the ability
to learn without being explicitly programmed
--Arthur Samuel
A more Mathematical one
A Computer program is said to learn from
Experience E with respect to some Task T and
Performance measure P, if it's Performance at
Task in T, as measured by P, improves with
Experience E –Tom M. Mitchell 4
5. Related Disciplines
Sub-Field of Artificial Intelligence
Deals with Design and Development of Algorithms
Closely related to Data Mining
Uses techniques from Statistics, Probability Theory
and Pattern Recognition
Not new but growing fast because of Big Data
5
6. Types of Machine Learning
Supervised Machine Learning
Provide right set of answers for different set of
questions
Underlying algorithm learns/infers over a period
of time
Tries to return correct answers for similar
questions
Unsupervised Machine Learning
Provide data &
Let underlying algorithm find some structure 6
7. Popular Use Cases
Recommendation Systems
Amazon, Netflix, iTunes Genius, IMDb...
Up-Selling & Churn Analysis
Customer Sentiment Analysis
Market Segmentation
...
7
10. Typical Machine Learning Algorithm
Training Set
Learning
Algorithm
Input Expected
Hypothesis Output
Features
10
11. Let's Simplify a bit
➢
Goal is to draw a
4000
House Sizes vs Prices Straight line which
3500
covers our Data-Set
3000 reasonably
2500 ➢
Our Hypothesis can be
Prices (1000 USD)
2000
1500
hθ ( x)=θ0+θ1 x
hΘthat 0+Θ1(xx)≃ y
x=Θ h
1000
Such
500
➢
0
θ
50 100 150 200 250 300 350 400
House Sizes (Sq Yards)
11
12. In Mathematical Terms
➢
Hypothesis hθ ( x)=θ0+θ1 x
➢
Parameters θ0 ,θ1
➢
Cost Function
➢
We would like to minimize J (θ0 ,θ1 )
12
13. Solution : Gradient Descent
➢
Start with an initial
values of θ0 , θ1
➢
Keep Changing θ0 , θ1
until we end up at
minimum
13
22. What is WEKA ?
Developed by Machine Learning Group,
University of Waikato, New Zealand
Collection of Machine Learning Algorithms
Contains tools for
Data Pre-Processing
Classification & Regression
Clustering
Visualization
Can be embedded inside your application
Implemented in Java
22
24. Terminology
Training DataSet == Instances
Each Row in DataSet == Instance
Instance is Collection of Attributes (Features)
Types of Attributes
Nominal (True, False, Malignant, Benign,
Cloudy...)
Real values (6, 2.34, 0...)
String (“Interesting”, “Really like it”, “Hate
It” ...)
...
24
25. Sample DataSets
@RELATION house @RELATION CPU
@ATTRIBUTE houseSize real @attribute outlook {sunny, overcast,
@ATTRIBUTE lotSize real rainy}
@ATTRIBUTE bedrooms real @attribute temperature real
@ATTRIBUTE granite real @attribute humidity real
@ATTRIBUTE bathroom real @attribute windy {TRUE, FALSE}
@ATTRIBUTE sellingPrice real @attribute play {yes, no}
@DATA @data
3529,9191,6,0,0,205000 sunny,85,85,FALSE,no
3247,10061,5,1,1,224900 sunny,80,90,TRUE,no
4032,10150,5,0,1,197900 overcast,83,86,FALSE,yes
2397,14156,4,1,0,189900 rainy,70,96,FALSE,yes
2200,9600,4,0,1,195000 rainy,68,80,FALSE,yes
3536,19994,6,1,1,325000 rainy,65,70,TRUE,no
2983,9365,5,0,1,230000 overcast,64,65,TRUE,yes
25
28. Apache Mahout
➢
Collection of Machine Learning Algorithms
➢
Map-Reduce Enabled (most cases)
➢
DataSources
➢
Database
➢
File-System
➢
Lucene Integration
➢
Very Active Community
➢
Apache License
28
29. WEKA vs Apache Mahout
WEKA Apache-Mahout
➢
Lot of Algorithms ➢
Lesser number of
➢
Tools for Algorithms but
➢
Modeling growing
➢
Comparison ➢
Lack of tools for
➢
Data-Flow Modeling
➢
May need work for ➢
Ready by Design for
running on large data- Large Scale
sets ➢
Vibrant Community
➢
License Issues ➢
Apache License
29
31. Google Prediction API 101
➢
Cloud Based Web Service for Machine Learning
➢
Exposed as REST API
➢
Does not require any Machine Learning
knowledge
➢
Capabilities
➢
Categorical &
➢
Regression
31
37. Resources
➢
Online Machine Learning Course - Prof. Andrew
Ng, Stanford University
➢
WEKA Wiki and API docs
➢
Apache Mahout Wiki
➢
IBM Developer Works Articles
➢
Google Prediction API Web Site
➢
Data Mining : Practical Machine Learning Tools &
Techniques – Ian H. Witten, Eibe Frank, Mark Hall
➢
Machine Learning Forums
37