22. Data preparation
• Data Cleaning
• Missing Data
• Feature Engineering
• Normalization
• Categorical data Numerical features
• Log-based features or target
• Date/time-related features
• Combine features, e.g. by +, -, x, /
22
24. ML Algorithms: by Representation
Collection of candidate models/programs, aka hypothesis space
24
Decision trees
Instance-based
Neural networks
Model ensembles
25. ML Algorithms: by Evaluation
Evaluation: Quality measure for a model
25
Regression
Example metric: Root Mean Squared Error
RMSE =
Binary classification: confusion matrix
Accuracy: 8 + 971 -> 97,9%
Example: medical test
for a disease
Positive Negative
P
True
positives
TP
False
Negatives
FN
N
False
positives
FP
True
Negatives
TN
True
Class
Predicted class
Accuracy: Better evaluation metrics:
• Precision: 8 / (8 + 19)
• Recall: 8 / (8 + 2)
26. Optimization: how the algorithm ‘learns’, depends on representation and
evaluation
ML Algorithms: by Optimization
26
Greedy Search,
ex. of
combinatorial
optimization
Gradient Descent (or in general: Convex Optimization)
Linear Programming (or in general:
Constrained/Nonlinear Optimization)
28. Choice of ML-algorithm, considerations
• Size & Dimensionality of training set
• Computational efficiency
• Model building, no of parameters
• Eager vs lazy learning
• Online vs batch
• Interpretability
28
40. Data Science for Business
• Focuses more on general principles
than specific algorithms
• Not math-heavy, does contain some
math
• O’Reilly link:
http://shop.oreilly.com/product/063692
0028918.do
• Book website: http://data-science-for-
biz.com/DSB/Home.html
40
41. Agenda
1. Introduction: Hype or Hit?!
2. Machine Learning
1. Demo, SAP ICN
2. Skill set for aspiring ML experts
3. Take-aways
41
42. What has NOT been covered
• Deep learning / Neural Networks
• Specifics of ML-algorithms
• Tools / Libraries / Code
• SAP Products, like HANA / Predictive Analytics / Vora / …
• Hardware
• …
42
43. Take-aways
• Goal of ML: generalize from training data (not optimization!!)
• Part of ‘Data Mining Process’, not a goal in and of itself
• No magic! Just some clever algorithms…
• Increasingly important non-technical aspects:
• Ethics
• Algorithmic transparency
43