Data Mining the City - A (practical) introduction to Machine Learning

Data Mining the City
Big Data, Urbanism, and Web 2.0
Danil Nagy (dn2216@columbia.edu)
Wednesdays, 7:00pm-9:00pm
200 Buell
1

Week 5
A (practical) introduction
to Machine Learning
2

Meadows, et al. The Limits to Growth (1972)
4

Data Mining vs. Machine Learning
6

What is Learning?
1. To get knowledge of something by study, experience, or being taught.
2. To become aware by information or from observation
3. To commit to memory
4. To be informed of or to ascertain
5. To receive instruction
Witten, Frank, Hall. Data Mining, Practical Machine Learning Tools and Techniques, 3d edition. 8

What is Learning?
1. To get knowledge of something by study, experience, or being taught.
2. To become aware by information or from observation
3. To commit to memory
4. To be informed of or to ascertain
5. To receive instruction
Things learn when they change their behavior in a way that makes them
perform better in the future.
Witten, Frank, Hall. Data Mining, Practical Machine Learning Tools and Techniques, 3d edition. 9

“Telling the future, when it comes right down
to it, is not solely a human yearning. It is the
fundamental nature of any organism, and
perhaps any complex system. Telling the future
is what organisms are for.”
- Kevin Kelly, “Out of Control”
10

TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
12

Name Gender Height Income HS Degree
Bob Male 5’5” $44,000 No
John Male 6’0” $60,000 Yes
Susan Female 5’10” $40,000 No
Betty Female 5’6” $55,000 Yes
13

Bob Male 5’5” $44,000 No
Description
Data
Categorical
Data
Continuous
Data
14

Bob Male 5’5” $44,000 No
Problem 1: Predict Income
Features (X) value (y)
[regression]
15

Bob Male 5’5” $44,000 No
Problem 2: Predict HS Degree
Features (X) value (y)
[classification]
16

TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
SUPERVISED LEARNING MODEL
17

TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’ Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
UNSUPERVISED LEARNING MODEL
18

“Not everything that can be counted counts,
and not everything that counts can be counted.”
- William Bruce Cameron, 1967
19

Machine Learning Applications
1. Web mining (search engine)
2. Screening (loan customers)
3. Image analysis (geographic detection)
4. Load forecasting (energy companies)
5. Diagnosis (medical and mechanical failure)
6. Marketing and sales (retaining customers, targeting advertising, recommender systems)
7. Science (gene detection, galaxy detection, prefixing structure of organic compounds)
8. City design and planning?
20

• Image data for 5,328 colonies over 6 days (~32,000 images) at 550x550 resolution
• Table of information and for 145 colonies processed by hand
• Time-lapse video of growth for one colony
DATA RECEIVED
Day 1
Day 4
Day 2
Day 5
Day 3
Day 6 21

FEATURE EXTRACTION: Method 1 - Matrix Representation
Original image (550x550 pixels) Normalized single-channel subset (30x30 pixels)
52
46
37
43
37
32
37
32
30
22

degree = 3
get the code here: http://goo.gl/ogJM3u 36

TRAINING DATA
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODEL
SUPERVISED LEARNING MODEL
37

TRAINING DATA ~70%
Features (X1
, X2
, ...)
NEW DATA
Features (X1
, X2
, ...)
‘LEARNING’
Value (y)
VALIDATION DATA ~30%
Features (X1
, X2
, ...)
Value (y)
Predicted Value (yp
)
TRAINED
PREDICTOR
MODELS
VALIDATED
MODEL
SUPERVISED LEARNING MODEL - WITH VALIDATION
38

Machine Learning Algorithms
http://en.wikipedia.org/wiki/List_of_machine_learning_algorithms
Supervised Learning
1. Instance-based learning
2. Artificial neural network
3. Support vector machines
4. Learning automata
Unsupervised Learning
5. K-nearest neighbor
6. Decision trees
7. Random forests
39

Machine Learning Algorithms
http://en.wikipedia.org/wiki/List_of_machine_learning_algorithms
Supervised Learning
1. Instance-based learning
2. Artificial neural network
3. Support vector machines <-- use in class
4. Learning automata
Unsupervised Learning
5. K-nearest neighbor
6. Decision trees
7. Random forests
40

Advantages of SVM
1. Modern
2. Understandable
3. Controllable
4. Flexible
5. Powerful
41

http://en.wikipedia.org/wiki/Support_vector_machine
SUPPORT VECTOR MACHINES
H1 does not separate the classes.
H2 does, but only with a small margin.
H3 separates them with the maximum margin.
Maximum-margin hyperplane and margins
for an SVM trained with samples from two
classes. Samples on the margin are called the
support vectors.
42

Non-linear Classification
Non-linear models are useful for data that cannot be separated in its original feature
space. They are created through the ‘kernel trick’ where data is first projected into a
higher-dimensional space in which it can be separated, and then the whole model is
projected back into the feature space.
43

Soft-margin Classification
Soft-margin systems build classifiers that are allowed to ignore some misclassifications
that fall within a certain distance (ε) of the separator. They are useful for categorizing
messy or noisy data.
44

Non-linear soft-margin SVM classification used to classify non-separable data
http://en.wikipedia.org/wiki/Support_vector_machine 45

Error Function
46

Optim
ization
Function
M
argin
W
idth
Penalty
Factor
Error Function
47

http://www.svms.org/parameters/
Penalty
Factor
The penalty factor in a SVM penalizes the model (creates higher values in the optimization)
for wrong guesses. It is driven by two parameters (which become inputs into the model):
C - a multiplier that controls the strength of the penalty factor. Higher values of C will pro-
duce larger relative penalties for misclassified points and lead to over-fitting (high variance).
ε (epsilon) - controlls the margin of error or ‘gray area’ of the model (how wrong an example
has to be before it is considered an error). Higher values will produce simpler models but
may result in under-fitting (high bias).
48

BIAS-VARIANCE TRADEOFF
MODEL COMPLEXITY
HIGH BIAS
(underfitting)
HIGH VARIANCE
(overfitting)
‘RIGHT’
MODEL
49

SCIKIT-LEARN MACHINE LEARNING LIBRARY FOR PYTHON
50

SCIKIT-LEARN MACHINE LEARNING LIBRARY FOR PYTHON
51

2D EXAMPLE - SVR [REGRESSION]
52

e = .0001 e = 1 e = 3
C = 500
C = 50
C = 0.001
53

C = 500
C = 50
C = 0.001
e = .0001 e = 1 e = 3
mse = 24.994
54

get the code here: http://goo.gl/RecPfq 55

2D EXAMPLE - SVC [CLASSIFICATION]
56

C = 100
C = 10
C = 1
e = .0001 e = 1 e = 3
57

C = 500
C = 50
C = 1
g = 0.001 g = 0.1 g = 1
e = 34
58

get the code here: http://goo.gl/b3TGOQ 59

WEBSTACK APPLICATION - HEATMAP (DENSITY DISTRIBUTION)
60

WEBSTACK APPLICATION - INTERPOLATION (VALUE PREDICTION)
61

Data Mining the City - A (practical) introduction to Machine Learning

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (6)

Similaire à Data Mining the City - A (practical) introduction to Machine Learning

Similaire à Data Mining the City - A (practical) introduction to Machine Learning (20)

Plus de Danil Nagy

Plus de Danil Nagy (14)

Dernier

Dernier (20)

Data Mining the City - A (practical) introduction to Machine Learning