SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Morning class summary
Mercè Martín
BigML
Day 1
State of the Art in ML
• History
• Machine Learning problems and Tasks
➔
Supervised Learning: Classi$cation, Regression, Multi-label classi$cation
➔
Unsupervised Learning: Clusters, Anomaly Detectors
➔
Semi-supervised Learning: Inference from partially labeled
• Features: numeric, categorical, date-time, text
text analysis: frequency-weighted bag of words
Poul Petersen (BigML)
Explicit rules
Di1cult to $nd
and re-train
Explicit rules
Di1cult to $nd
and re-train
Explicit rules
Di1cult to $nd
and re-train
Implicit rules
(data rules)
Easy to re-train
• Technology
• Teaching computers to learn:
too general vs. too speci$c (under-$tting vs. over-$tting)
Missing values handling: new category, averages, mutiple choices
State of the Art in ML
Storage
low prices, big data
APIs
Combination and
accessibility
Cloud
Computational
power
Predictive APIs
• Supervised learning:
Classi$cation (output in a set of classes)
Regression (output is a number)
• Unsupervised learning: no output info
• Training / Test separation: partioning data, boostrap or
cross-validation
• Classi$cation: Confusion Matrix 
Evaluating ML Algorithms
Cèsar Ferri (UPV)
• Classi$cation metrics: Accuracy, Precision, Recall, F-measure
Extending to multi-class problems (averaging)
• Regression metrics:
Mean Absolute error
Mean Squared error (more sensitive to extreme errors)
Root Mean Squared Error
Normalized for classi$ers comparison:
Relative Mean Squared Error
Relative Mean Absolute error
R2
• Unsupervised evaluation: no estimations, association rules,
support
• Clustering: distance and shape based evaluation (border, centers,
distribution)
Evaluating ML Algorithms
Cèsar Ferri (UPV)
• History
• Classi$cation and Regression Trees
Structure where data is repeatedly separated in groups
according to attribute values to minimize error / maximize
information gain (split criterion: gini impurity)
Decision Trees
Gonzalo Martinez (UAM)
Expert Based
Systems
Human experts' rules
Automatized Knowledge
Acquisition
Mining archives of cases
(scalable)MYCIN: 600 rules
XCON: 2500 rules Rules:CHAID, CART, ID3, C4.5
Decision Trees
Automatized Knowledge
Acquisition
Mining archives of cases
MYCIN: 600 rules
XCON: 2500 rules
CHAID, CART, ID3, C4.5
PROs
● Convertible to rules
● Categorical and numeric
attributes
● Handle uninformative or
redundant attributes
● Handle missing values
● Non-parametric (no prede$ned
idea of concept to learn)
● Easy to tune (small number of
parameters)
CONs
● Complex features interactions
● Replication problem
Decision Trees
Predicates
Rules are based on the split
predicates
Missing values
Oblique splits (compare features)
Stopping criteria
All instances in one class
No split found
Small number of instances
Gain below threshold
Maximum depth
Pruning
To avoid over-$tting
CART is slower (more trees
needed, avoids complexity)
C4.5 faster but no con$dence
threshold (avoids small nodes)
Parameters Number of
nodes, depth, pruning (on/oD
and con$dence), minimum
number of instances to split
Ensembles of Decision Trees
Gonzalo Martinez (UAM)
• Ensembles of models
Randomizing to
decrease errors
and over-$tting:
data, features or
algorithms
New Instance: x
1 1 2 1 2 11
Combined with voting or non-voting strategies (aggregators)
Best overall performance
(SVN)
Almost parameter-less
On trees, very fast to
train and test
Slower than a single
classifier (mitigated with
pruning)
Ensembles of Decision Trees
• Robust
• Improves error
• Parallelizable
Original dataset
Bootstrap
sample 1
Repeated example
Removed example
…
…
Bootstrap
sample T
BAGGING
Ensembles of Decision Trees
BOOSTING
Original dataset
Iteration 1
…
…
Iteration 2
Good average generalization
error
Not robust (noise)
Can increment error of the
base classifier
Not parallelizable
Ensembles of Decision Trees
• Robust
• Improves error
• Parallelizable
• Better than boosting
• Very fast to train
Original dataset
Bootstrap
sample 1
Repeated example
Removed example
Random feature subset
…
…
Bootstrap
sample T
RANDOM FORESTS
Ensembles of Decision Trees
CLASS SWITCHING
Original dataset
Random
noise 1
…
…
Random
noise T
p=30%
Can improve results
for cases where
normal decision trees
are not specially good
• Human knowledge used to compensate data
problems: broken data (remove corner cases, defaults), missing
values (have meaning), reduce complexity (grouping classes), distances
• Discretization: signi$cant bins against concrete values
• Delta: diDerence or distance between features can be signi$cant
• Standarization: Mean of zero and standard deviation of one
• Normalizing: Feature vectors with unit norm
• Windowing: Previous points distributed in time
Data Transformations and FE
Charles Parker (BigML)
• Projections: Combining to have a new feature basis (lowering
dimensionality)
New axis: Principal component analysis
Keep neighbours: Spectral embeddings , Combination methods (Large
Margin Nearest Neighbor, Xing’s Method)
• Sparsity: compressing sparse text and images data by sampling and
grouping
Data Transformations and FE
• Sub-sampling and Over-sampling: Restore balance by
eliminating over-sampled categories or giving higher weight to under-
represented categories
• Evaluating Unbalanced Datasets
Good accuracy is not enough. Look at precision and recall
Precision vs. Recall trade-oD: you must de$ne the cost for each
(letting out positives against letting in negatives)
Unbalanced Datasets
Poul Petersen (BigML)
Fraud Not Fraud
0
750
1500
2250
3000
3750
• Automatic balancing: equal representation per class
• Weighting: Which instances are more important. Adds new
information to the dataset. Per class or per instance.
Unbalanced Datasets

Contenu connexe

Tendances

Ensemble methods for modeling financial data
Ensemble methods for modeling financial dataEnsemble methods for modeling financial data
Ensemble methods for modeling financial dataGaurav Chakravorty
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine LearningPranav Challa
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learningBabu Priyavrat
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Marina Santini
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsRupak Roy
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsananth
 
Overview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostOverview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostTakami Sato
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 

Tendances (20)

Decision trees
Decision treesDecision trees
Decision trees
 
Ensemble methods for modeling financial data
Ensemble methods for modeling financial dataEnsemble methods for modeling financial data
Ensemble methods for modeling financial data
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
Overview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostOverview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboost
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 

En vedette

En vedette (11)

LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
L9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking PredictionsL9. Real World Machine Learning - Cooking Predictions
L9. Real World Machine Learning - Cooking Predictions
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
L14. Anomaly Detection
L14. Anomaly DetectionL14. Anomaly Detection
L14. Anomaly Detection
 
L6. Unbalanced Datasets
L6. Unbalanced DatasetsL6. Unbalanced Datasets
L6. Unbalanced Datasets
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIsL7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
L1. State of the Art in Machine Learning
L1. State of the Art in Machine LearningL1. State of the Art in Machine Learning
L1. State of the Art in Machine Learning
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 

Similaire à LR1. Summary Day 1

Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionDong Guo
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.pptchatbot9
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Lukas Mandrake
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.pptcongtran88
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data PreparationUmair Shafique
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ngsaranya12345
 
Preprocessing
PreprocessingPreprocessing
Preprocessingmmuthuraj
 
ppt slides
ppt slidesppt slides
ppt slidesbutest
 
Data preprocessing 2
Data preprocessing 2Data preprocessing 2
Data preprocessing 2extraganesh
 

Similaire à LR1. Summary Day 1 (20)

Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
Decision tree
Decision treeDecision tree
Decision tree
 
DT.pptx
DT.pptxDT.pptx
DT.pptx
 
Data reduction
Data reductionData reduction
Data reduction
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
ppt slides
ppt slidesppt slides
ppt slides
 
Data preprocessing 2
Data preprocessing 2Data preprocessing 2
Data preprocessing 2
 

Dernier

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 

Dernier (20)

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 

LR1. Summary Day 1

  • 3. State of the Art in ML • History • Machine Learning problems and Tasks ➔ Supervised Learning: Classi$cation, Regression, Multi-label classi$cation ➔ Unsupervised Learning: Clusters, Anomaly Detectors ➔ Semi-supervised Learning: Inference from partially labeled • Features: numeric, categorical, date-time, text text analysis: frequency-weighted bag of words Poul Petersen (BigML) Explicit rules Di1cult to $nd and re-train Explicit rules Di1cult to $nd and re-train Explicit rules Di1cult to $nd and re-train Implicit rules (data rules) Easy to re-train
  • 4. • Technology • Teaching computers to learn: too general vs. too speci$c (under-$tting vs. over-$tting) Missing values handling: new category, averages, mutiple choices State of the Art in ML Storage low prices, big data APIs Combination and accessibility Cloud Computational power Predictive APIs
  • 5. • Supervised learning: Classi$cation (output in a set of classes) Regression (output is a number) • Unsupervised learning: no output info • Training / Test separation: partioning data, boostrap or cross-validation • Classi$cation: Confusion Matrix  Evaluating ML Algorithms Cèsar Ferri (UPV)
  • 6. • Classi$cation metrics: Accuracy, Precision, Recall, F-measure Extending to multi-class problems (averaging) • Regression metrics: Mean Absolute error Mean Squared error (more sensitive to extreme errors) Root Mean Squared Error Normalized for classi$ers comparison: Relative Mean Squared Error Relative Mean Absolute error R2 • Unsupervised evaluation: no estimations, association rules, support • Clustering: distance and shape based evaluation (border, centers, distribution) Evaluating ML Algorithms Cèsar Ferri (UPV)
  • 7. • History • Classi$cation and Regression Trees Structure where data is repeatedly separated in groups according to attribute values to minimize error / maximize information gain (split criterion: gini impurity) Decision Trees Gonzalo Martinez (UAM) Expert Based Systems Human experts' rules Automatized Knowledge Acquisition Mining archives of cases (scalable)MYCIN: 600 rules XCON: 2500 rules Rules:CHAID, CART, ID3, C4.5
  • 8. Decision Trees Automatized Knowledge Acquisition Mining archives of cases MYCIN: 600 rules XCON: 2500 rules CHAID, CART, ID3, C4.5 PROs ● Convertible to rules ● Categorical and numeric attributes ● Handle uninformative or redundant attributes ● Handle missing values ● Non-parametric (no prede$ned idea of concept to learn) ● Easy to tune (small number of parameters) CONs ● Complex features interactions ● Replication problem
  • 9. Decision Trees Predicates Rules are based on the split predicates Missing values Oblique splits (compare features) Stopping criteria All instances in one class No split found Small number of instances Gain below threshold Maximum depth Pruning To avoid over-$tting CART is slower (more trees needed, avoids complexity) C4.5 faster but no con$dence threshold (avoids small nodes) Parameters Number of nodes, depth, pruning (on/oD and con$dence), minimum number of instances to split
  • 10. Ensembles of Decision Trees Gonzalo Martinez (UAM) • Ensembles of models Randomizing to decrease errors and over-$tting: data, features or algorithms New Instance: x 1 1 2 1 2 11 Combined with voting or non-voting strategies (aggregators) Best overall performance (SVN) Almost parameter-less On trees, very fast to train and test Slower than a single classifier (mitigated with pruning)
  • 11. Ensembles of Decision Trees • Robust • Improves error • Parallelizable Original dataset Bootstrap sample 1 Repeated example Removed example … … Bootstrap sample T BAGGING
  • 12. Ensembles of Decision Trees BOOSTING Original dataset Iteration 1 … … Iteration 2 Good average generalization error Not robust (noise) Can increment error of the base classifier Not parallelizable
  • 13. Ensembles of Decision Trees • Robust • Improves error • Parallelizable • Better than boosting • Very fast to train Original dataset Bootstrap sample 1 Repeated example Removed example Random feature subset … … Bootstrap sample T RANDOM FORESTS
  • 14. Ensembles of Decision Trees CLASS SWITCHING Original dataset Random noise 1 … … Random noise T p=30% Can improve results for cases where normal decision trees are not specially good
  • 15. • Human knowledge used to compensate data problems: broken data (remove corner cases, defaults), missing values (have meaning), reduce complexity (grouping classes), distances • Discretization: signi$cant bins against concrete values • Delta: diDerence or distance between features can be signi$cant • Standarization: Mean of zero and standard deviation of one • Normalizing: Feature vectors with unit norm • Windowing: Previous points distributed in time Data Transformations and FE Charles Parker (BigML)
  • 16. • Projections: Combining to have a new feature basis (lowering dimensionality) New axis: Principal component analysis Keep neighbours: Spectral embeddings , Combination methods (Large Margin Nearest Neighbor, Xing’s Method) • Sparsity: compressing sparse text and images data by sampling and grouping Data Transformations and FE
  • 17. • Sub-sampling and Over-sampling: Restore balance by eliminating over-sampled categories or giving higher weight to under- represented categories • Evaluating Unbalanced Datasets Good accuracy is not enough. Look at precision and recall Precision vs. Recall trade-oD: you must de$ne the cost for each (letting out positives against letting in negatives) Unbalanced Datasets Poul Petersen (BigML) Fraud Not Fraud 0 750 1500 2250 3000 3750
  • 18. • Automatic balancing: equal representation per class • Weighting: Which instances are more important. Adds new information to the dataset. Per class or per instance. Unbalanced Datasets