Slides from a lecture given on October 14, 2015 for the Data Mining the City class at Columbia University's Graduate School of Architecture, Planning, and Preservation (GSAPP)
8. What is Learning?
1. To get knowledge of something by study, experience, or being taught.
2. To become aware by information or from observation
3. To commit to memory
4. To be informed of or to ascertain
5. To receive instruction
Witten, Frank, Hall. Data Mining, Practical Machine Learning Tools and Techniques, 3d edition. 8
9. What is Learning?
1. To get knowledge of something by study, experience, or being taught.
2. To become aware by information or from observation
3. To commit to memory
4. To be informed of or to ascertain
5. To receive instruction
Things learn when they change their behavior in a way that makes them
perform better in the future.
Witten, Frank, Hall. Data Mining, Practical Machine Learning Tools and Techniques, 3d edition. 9
10. “Telling the future, when it comes right down
to it, is not solely a human yearning. It is the
fundamental nature of any organism, and
perhaps any complex system. Telling the future
is what organisms are for.”
- Kevin Kelly, “Out of Control”
10
12. TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
12
13. Name Gender Height Income HS Degree
Bob Male 5’5” $44,000 No
John Male 6’0” $60,000 Yes
Susan Female 5’10” $40,000 No
Betty Female 5’6” $55,000 Yes
13
14. Name Gender Height Income HS Degree
Bob Male 5’5” $44,000 No
John Male 6’0” $60,000 Yes
Susan Female 5’10” $40,000 No
Betty Female 5’6” $55,000 Yes
Description
Data
Categorical
Data
Continuous
Data
14
15. Name Gender Height Income HS Degree
Bob Male 5’5” $44,000 No
John Male 6’0” $60,000 Yes
Susan Female 5’10” $40,000 No
Betty Female 5’6” $55,000 Yes
Problem 1: Predict Income
Features (X) value (y)
[regression]
15
16. Name Gender Height Income HS Degree
Bob Male 5’5” $44,000 No
John Male 6’0” $60,000 Yes
Susan Female 5’10” $40,000 No
Betty Female 5’6” $55,000 Yes
Problem 2: Predict HS Degree
Features (X) value (y)
[classification]
16
17. TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
SUPERVISED LEARNING MODEL
17
18. TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’ Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
UNSUPERVISED LEARNING MODEL
18
19. “Not everything that can be counted counts,
and not everything that counts can be counted.”
- William Bruce Cameron, 1967
19
20. Machine Learning Applications
1. Web mining (search engine)
2. Screening (loan customers)
3. Image analysis (geographic detection)
4. Load forecasting (energy companies)
5. Diagnosis (medical and mechanical failure)
6. Marketing and sales (retaining customers, targeting advertising, recommender systems)
7. Science (gene detection, galaxy detection, prefixing structure of organic compounds)
8. City design and planning?
20
21. • Image data for 5,328 colonies over 6 days (~32,000 images) at 550x550 resolution
• Table of information and for 145 colonies processed by hand
• Time-lapse video of growth for one colony
DATA RECEIVED
Day 1
Day 4
Day 2
Day 5
Day 3
Day 6 21
37. TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
SUPERVISED LEARNING MODEL
37
38. TRAINING DATA ~70%
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
VALIDATION DATA ~30%
Features (X1
, X2
, ...)
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODELS
VALIDATED
MODEL
SUPERVISED LEARNING MODEL - WITH VALIDATION
38
42. http://en.wikipedia.org/wiki/Support_vector_machine
SUPPORT VECTOR MACHINES
H1 does not separate the classes.
H2 does, but only with a small margin.
H3 separates them with the maximum margin.
Maximum-margin hyperplane and margins
for an SVM trained with samples from two
classes. Samples on the margin are called the
support vectors.
42
43. SUPPORT VECTOR MACHINES
http://en.wikipedia.org/wiki/Support_vector_machine
Non-linear Classification
Non-linear models are useful for data that cannot be separated in its original feature
space. They are created through the ‘kernel trick’ where data is first projected into a
higher-dimensional space in which it can be separated, and then the whole model is
projected back into the feature space.
43
45. SUPPORT VECTOR MACHINES
Non-linear soft-margin SVM classification used to classify non-separable data
http://en.wikipedia.org/wiki/Support_vector_machine 45
48. SUPPORT VECTOR MACHINES
http://www.svms.org/parameters/
Penalty
Factor
The penalty factor in a SVM penalizes the model (creates higher values in the optimization)
for wrong guesses. It is driven by two parameters (which become inputs into the model):
C - a multiplier that controls the strength of the penalty factor. Higher values of C will pro-
duce larger relative penalties for misclassified points and lead to over-fitting (high variance).
ε (epsilon) - controlls the margin of error or ‘gray area’ of the model (how wrong an example
has to be before it is considered an error). Higher values will produce simpler models but
may result in under-fitting (high bias).
48