4. Data Analysis
1. Preparing to run the Data (Munging)
2. Running the model (Analysis)
3. Interpreting the result
5. Machine Learning
Black-box, algorithmic approach to producing predictions or
classifications from data
A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E
Tom Mitchell (1998)
11. Regression
Predict one set of numbers given another set of numbers
Given number of friends x, predict how many
goods I will receive on each facebook posts
12. Scatter Plot
dataset <- read.csv('fbgood.txt',head=TRUE, sep='t', row.names=1)
x = dataset$friends
y = dataset$getgoods
plot(x,y)
14. 2nd order polynomial fit
plot(x,y)
polyfit2 <- lm(y ~ poly(x, 2));
lines(sort(x), polyfit2$fit[order(x)], col = 2, lwd = 3)
15. 3rd order polynomial fit
plot(x,y)
polyfit3 <- lm(y ~ poly(x, 3));
lines(sort(x), polyfit3$fit[order(x)], col = 2, lwd = 3)
16. Other Regression Packages
MASS rlm - Robust Regression
GLM - Generalized linear Models
GAM - Generalized Additive Models
17. Classfication
Identifying to which of a set of categories a new observation belongs,
on the basis of a training set of data
Given features of bank costumer, predict whether
the client will subscribe a term deposit
19. Classify Data With LibSVM
library(e1071)
dataset <- read.csv('bank.csv',head=TRUE, sep=';')
dati = split.data(dataset, p = 0.7)
train = dati$train
test = dati$test
model <- svm(y~., data = train, probability = TRUE)
pred <- predict(model, test[,1:(dim(test)[[2]]-1)], probability = TRUE)
23. Support Vector Machines and
Kernel Methods
e1071 - LIBSVM
kernlab - SVM, RVM and other kernel learning algorithms
klaR - SVMlight
rdetools - Model selection and prediction
24. Dimension Reduction
Seeks linear combinations of the columns of X with maximalvariance
Calculate a new index to measure economy index
of each Taiwan city/county
25. Economic Index of Taiwan
County
縣市
營利事業銷售額
經濟發展支出佔歲出比例
得收入者平均每人可支配所得
2012年《天下雜誌》幸福城市大調查 - 第505期
38. Machine Learning Dignostic
1. Get more training examples
2. Try smaller sets of features
3. Try getting additional features
4. Try adding polynomial features
5. Try parameter increasing/decreasing