SlideShare une entreprise Scribd logo
1  sur  37
Machine Learning
Data science for beginners, session 6
Machine Learning: your 5-7 things
Defining machine learning
The Scikit-Learn library
Machine learning algorithms
Choosing an algorithm
Measuring algorithm performance
Defining Machine Learning
Machine Learning = learning models from data
Which advert is the user most likely to click on?
Who’s most likely to win this election?
Which wells are most likely to fail in the next 6 months?
Machine Learning as Predictive Analytics...
Machine Learning Process
● Get data
● Select a model
● Select hyperparameters for that model
● Fit model to data
● Validate model (and change model, if necessary)
● Use the model to predict values for new data
Today’s library: Scikit-Learn (sklearn)
Scikit-Learn’s example datasets
● Iris
● Digits
● Diabetes
● Boston
Select a Model
Algorithm Types
Supervised learning
Regression: learning numbers
Classification: learning classes
Unsupervised learning
Clustering: finding groups
Dimensionality reduction: finding efficient representations
Linear Regression: fit a line to (numerical) data
Linear Regression: First, get your data
import numpy as np
import pandas as pd
gen = np.random.RandomState(42)
num_samples = 40
x = 10 * gen.rand(num_samples)
y = 3 * x + 7+ gen.randn(num_samples)
X = pd.DataFrame(x)
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(x,y)
Linear Regression: Fit model to data
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)
model.fit(X, y)
print('Slope: {}, Intercept: {}'.format(model.coef_, model.intercept_))
Linear Regression: Check your model
Xtest = pd.DataFrame(np.linspace(-1, 11))
predicted = model.predict(Xtest)
plt.scatter(x, y)
plt.plot(Xtest, predicted)
Reality can be a little more like this…
Classification: Predict classes
● Well pump: [working, broken]
● CV: [accept, reject]
● Gender: [male, female, others]
● Iris variety: [iris setosa, iris virginica, iris versicolor]
Classification: The Iris Dataset Petal
Sepal
Classification: first get your data
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Classification: Split your data
ntest=10
np.random.seed(0)
indices = np.random.permutation(len(X))
iris_X_train = X[indices[:-ntest]]
iris_Y_train = Y[indices[:-ntest]]
iris_X_test = X[indices[-ntest:]]
iris_Y_test = Y[indices[-ntest:]]
Classifier: Fit Model to Data
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski')
knn.fit(iris_X_train, iris_Y_train)
Classifier: Check your model
predicted_classes = knn.predict(iris_X_test)
print('kNN predicted classes: {}'.format(predicted_classes))
print('Real classes: {}'.format(iris_Y_test))
Clustering: Find groups in your data
Clustering: get your data
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
Y = iris.target
print("Xs: {}".format(X))
Clustering: Fit model to data
from sklearn import cluster
k_means = cluster.KMeans(3)
k_means.fit(iris.data)
Clustering: Check your model
print("Generated labels: n{}".format(k_means.labels_))
print("Real labels: n{}".format(Y))
Dimensionality Reduction
Dimensionality reduction: Get your data
Dimensionality reduction: Fit model to data
Recap: Choosing an Algorithm
Have: data and expected outputs
Want numbers? Try regression algorithms
Want classes? Try classification algorithms
Have: just data
Want to find structure? Try clustering algorithms
Want to look at it? Try dimensionality reduction
Model Validation
How well does the model fit new data?
“Holdout sets”:
split your data into training and test sets
learn your model with the training set
get a validation score for your test set
Models are rarely perfect… you might have to change parameters or model
● underfitting: model not complex enough to fit the training data
● overfitting: model too complex: fits the training data well, does badly on test
Overfitting and underfitting
The Confusion Matrix
True positive
False positive
False negative
True negative
Test Metrics
Precision:
of all the “true” results, how many were actually “true”?
Precision = tp / (tp + fp)
Recall:
how many of the things that were really “true” were marked as “true” by the
classifier?
Recall = tp / (tp + fn)
F1 score:
harmonic mean of precision and recall
F1_score = 2 * precision * recall / (precision + recall)
Iris classification: metrics
from sklearn import metrics
print(metrics.classification_report(iris_Y_test, predicted_classes))
Exercises
Explore some algorithms
Notebooks 6.x contain examples of machine learning algorithms. Run them,
play with the numbers in them, break them, think about why they might have
broken.

Contenu connexe

Tendances

Tendances (17)

9 python data structure-2
9 python data structure-29 python data structure-2
9 python data structure-2
 
Array Presentation
Array PresentationArray Presentation
Array Presentation
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Chapter 10: hashing data structure
Chapter 10:  hashing data structureChapter 10:  hashing data structure
Chapter 10: hashing data structure
 
Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
 
Array
ArrayArray
Array
 
Towards a theory of data entangelement
Towards a theory of data entangelementTowards a theory of data entangelement
Towards a theory of data entangelement
 
Arrays accessing using for loops
Arrays accessing using for loopsArrays accessing using for loops
Arrays accessing using for loops
 
List
ListList
List
 
Data structures
Data structuresData structures
Data structures
 
An Introduction to the C++ Standard Library
An Introduction to the C++ Standard LibraryAn Introduction to the C++ Standard Library
An Introduction to the C++ Standard Library
 
Introduction to data_structure
Introduction to data_structureIntroduction to data_structure
Introduction to data_structure
 
Core & advanced java classes in mumbai
Core & advanced java classes in mumbaiCore & advanced java classes in mumbai
Core & advanced java classes in mumbai
 
Elementary data structure
Elementary data structureElementary data structure
Elementary data structure
 
Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation
 
cs8251 unit 1 ppt
cs8251 unit 1 pptcs8251 unit 1 ppt
cs8251 unit 1 ppt
 

Similaire à Session 06 machine learning.pptx

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...Dataconomy Media
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonZahid Hasan
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonDr. Volkan OBAN
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnKarlijn Willems
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in RSujaAldrin
 
Quick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 minsQuick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 minsNaveen Davis
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdfMariaKhan905189
 
Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleYvonne K. Matos
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
Dian Vitiana Ningrum ()6211540000020)
Dian Vitiana Ningrum  ()6211540000020)Dian Vitiana Ningrum  ()6211540000020)
Dian Vitiana Ningrum ()6211540000020)dian vit
 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxPerumalPitchandi
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning projectAlex Austin
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsankit_ppt
 
Classification
ClassificationClassification
ClassificationCloudxLab
 

Similaire à Session 06 machine learning.pptx (20)

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
 
Scikit learn cheat_sheet_python
Scikit learn cheat_sheet_pythonScikit learn cheat_sheet_python
Scikit learn cheat_sheet_python
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-Python
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learn
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Quick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 minsQuick Machine learning projects steps in 5 mins
Quick Machine learning projects steps in 5 mins
 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf
 
Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and Kaggle
 
Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Dian Vitiana Ningrum ()6211540000020)
Dian Vitiana Ningrum  ()6211540000020)Dian Vitiana Ningrum  ()6211540000020)
Dian Vitiana Ningrum ()6211540000020)
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptx
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
Classification
ClassificationClassification
Classification
 

Plus de Sara-Jayne Terp

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Sara-Jayne Terp
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageSara-Jayne Terp
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...Sara-Jayne Terp
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other thingsSara-Jayne Terp
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of DisinformationSara-Jayne Terp
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umarylandSara-Jayne Terp
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...Sara-Jayne Terp
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeleySara-Jayne Terp
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksSara-Jayne Terp
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_secSara-Jayne Terp
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copySara-Jayne Terp
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideSara-Jayne Terp
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scaleSara-Jayne Terp
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformationSara-Jayne Terp
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowSara-Jayne Terp
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSara-Jayne Terp
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old thingsSara-Jayne Terp
 
risks and mitigations of releasing data
risks and mitigations of releasing datarisks and mitigations of releasing data
risks and mitigations of releasing dataSara-Jayne Terp
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger dataSara-Jayne Terp
 

Plus de Sara-Jayne Terp (20)

Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...Distributed defense against disinformation: disinformation risk management an...
Distributed defense against disinformation: disinformation risk management an...
 
Risk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of ageRisk, SOCs, and mitigations: cognitive security is coming of age
Risk, SOCs, and mitigations: cognitive security is coming of age
 
disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...disinformation risk management: leveraging cyber security best practices to s...
disinformation risk management: leveraging cyber security best practices to s...
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other things
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of Disinformation
 
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
 
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
 
2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley2021-02-10_CogSecCollab_UBerkeley
2021-02-10_CogSecCollab_UBerkeley
 
Using AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworksUsing AMITT and ATT&CK frameworks
Using AMITT and ATT&CK frameworks
 
2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec2020 12 nyu-workshop_cog_sec
2020 12 nyu-workshop_cog_sec
 
2020 09-01 disclosure
2020 09-01 disclosure2020 09-01 disclosure
2020 09-01 disclosure
 
2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy2019 11 terp_mansonbulletproof_master copy
2019 11 terp_mansonbulletproof_master copy
 
BSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guideBSidesLV 2018 talk: social engineering at scale, a community guide
BSidesLV 2018 talk: social engineering at scale, a community guide
 
Social engineering at scale
Social engineering at scaleSocial engineering at scale
Social engineering at scale
 
engineering misinformation
engineering misinformationengineering misinformation
engineering misinformation
 
Online misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz nowOnline misinformation: they're coming for our brainz now
Online misinformation: they're coming for our brainz now
 
Sj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_beliefSj terp ciwg_nyc2017_credibility_belief
Sj terp ciwg_nyc2017_credibility_belief
 
Belief: learning about new problems from old things
Belief: learning about new problems from old thingsBelief: learning about new problems from old things
Belief: learning about new problems from old things
 
risks and mitigations of releasing data
risks and mitigations of releasing datarisks and mitigations of releasing data
risks and mitigations of releasing data
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger data
 

Dernier

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Dernier (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Session 06 machine learning.pptx

  • 1. Machine Learning Data science for beginners, session 6
  • 2. Machine Learning: your 5-7 things Defining machine learning The Scikit-Learn library Machine learning algorithms Choosing an algorithm Measuring algorithm performance
  • 4. Machine Learning = learning models from data Which advert is the user most likely to click on? Who’s most likely to win this election? Which wells are most likely to fail in the next 6 months?
  • 5. Machine Learning as Predictive Analytics...
  • 6. Machine Learning Process ● Get data ● Select a model ● Select hyperparameters for that model ● Fit model to data ● Validate model (and change model, if necessary) ● Use the model to predict values for new data
  • 8. Scikit-Learn’s example datasets ● Iris ● Digits ● Diabetes ● Boston
  • 10. Algorithm Types Supervised learning Regression: learning numbers Classification: learning classes Unsupervised learning Clustering: finding groups Dimensionality reduction: finding efficient representations
  • 11. Linear Regression: fit a line to (numerical) data
  • 12. Linear Regression: First, get your data import numpy as np import pandas as pd gen = np.random.RandomState(42) num_samples = 40 x = 10 * gen.rand(num_samples) y = 3 * x + 7+ gen.randn(num_samples) X = pd.DataFrame(x) %matplotlib inline import matplotlib.pyplot as plt plt.scatter(x,y)
  • 13. Linear Regression: Fit model to data from sklearn.linear_model import LinearRegression model = LinearRegression(fit_intercept=True) model.fit(X, y) print('Slope: {}, Intercept: {}'.format(model.coef_, model.intercept_))
  • 14. Linear Regression: Check your model Xtest = pd.DataFrame(np.linspace(-1, 11)) predicted = model.predict(Xtest) plt.scatter(x, y) plt.plot(Xtest, predicted)
  • 15. Reality can be a little more like this…
  • 16. Classification: Predict classes ● Well pump: [working, broken] ● CV: [accept, reject] ● Gender: [male, female, others] ● Iris variety: [iris setosa, iris virginica, iris versicolor]
  • 17. Classification: The Iris Dataset Petal Sepal
  • 18. Classification: first get your data import numpy as np from sklearn import datasets iris = datasets.load_iris() X = iris.data Y = iris.target
  • 19. Classification: Split your data ntest=10 np.random.seed(0) indices = np.random.permutation(len(X)) iris_X_train = X[indices[:-ntest]] iris_Y_train = Y[indices[:-ntest]] iris_X_test = X[indices[-ntest:]] iris_Y_test = Y[indices[-ntest:]]
  • 20. Classifier: Fit Model to Data from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski') knn.fit(iris_X_train, iris_Y_train)
  • 21. Classifier: Check your model predicted_classes = knn.predict(iris_X_test) print('kNN predicted classes: {}'.format(predicted_classes)) print('Real classes: {}'.format(iris_Y_test))
  • 22. Clustering: Find groups in your data
  • 23. Clustering: get your data from sklearn import datasets iris = datasets.load_iris() X = iris.data Y = iris.target print("Xs: {}".format(X))
  • 24. Clustering: Fit model to data from sklearn import cluster k_means = cluster.KMeans(3) k_means.fit(iris.data)
  • 25. Clustering: Check your model print("Generated labels: n{}".format(k_means.labels_)) print("Real labels: n{}".format(Y))
  • 29. Recap: Choosing an Algorithm Have: data and expected outputs Want numbers? Try regression algorithms Want classes? Try classification algorithms Have: just data Want to find structure? Try clustering algorithms Want to look at it? Try dimensionality reduction
  • 31. How well does the model fit new data? “Holdout sets”: split your data into training and test sets learn your model with the training set get a validation score for your test set Models are rarely perfect… you might have to change parameters or model ● underfitting: model not complex enough to fit the training data ● overfitting: model too complex: fits the training data well, does badly on test
  • 33. The Confusion Matrix True positive False positive False negative True negative
  • 34. Test Metrics Precision: of all the “true” results, how many were actually “true”? Precision = tp / (tp + fp) Recall: how many of the things that were really “true” were marked as “true” by the classifier? Recall = tp / (tp + fn) F1 score: harmonic mean of precision and recall F1_score = 2 * precision * recall / (precision + recall)
  • 35. Iris classification: metrics from sklearn import metrics print(metrics.classification_report(iris_Y_test, predicted_classes))
  • 37. Explore some algorithms Notebooks 6.x contain examples of machine learning algorithms. Run them, play with the numbers in them, break them, think about why they might have broken.

Notes de l'éditeur

  1. What you’re learning isn’t the data, but a model that will help you understand (and possibly also explain) it.
  2. We bother making models because we want to start asking questions, and (hopefully) making changes in our world. Image from http://www.rosebt.com/blog/descriptive-diagnostic-predictive-prescriptive-analytics
  3. AKA import-instantiate-fit-predict Hyperparameter: things like “how many clusters of data do I think there are in this dataset?”
  4. Lots of great tutorials on http://scikit-learn.org/stable/ You import from this library, which is called “sklearn” in python code.
  5. Iris image from Nociveglia https://www.flickr.com/photos/40385177@N07/.
  6. Supervised versus unsupervised learning: supervised = give the algorithm both input data and the answers for that data (kinda like teaching), and it learns the connection between data and answers; unsupervised = give the algorithm just the data, and it finds the structure in that data Semi-supervised learning (where you only have a few answers) does exist, but isn’t talked about much. There’s also reinforcement learning, where you know if a result is better or worse, but not how much it’s better or worse.
  7. Fit a line to a set of datapoints. Use that line to predict new values
  8. This will give you 40 random samples around the line y = 3x + 7. Random.rand selects from a uniform distribution; random.randn selects from a standard normal distribution.
  9. Note the hyperparameter (fit_intercept). This says that your model doesn’t start at (0,0).
  10. predicted_slope = model.coef_ predicted_intercept = model.intercept_
  11. 1-feature linear regression on the Diabetes dataset. This is where you need to change your model. In this case, you’d start by trying more features, then adapting the model hyperparameters (e.g. it might not be a straight line that you need to fit) or the model that you use (e.g. linear regression might not be the best model type to use on this dataset).
  12. When there are just two classifications, it’s called binary classification.
  13. Classification: finding the link between data and classes. This is the Iris dataset. It’s one of Scikit-learn’s example datasets.
  14. print("Targets: {}".format(iris['target_names'])) print("Target data: {}".format(iris_Y)) print("Features: {}".format(iris['feature_names'])) print("Feature data: {}".format(iris_X))
  15. Why do we split into training and test sets? This is called a “holdout” set… we save some of our data, so we can use it to check how well our classifier does on data it hasn’t seen before. print(‘{} training points, {} test points’.format(len(iris_X_train), len(iris_X_test)))
  16. This is the k nearest neighbours algorithm. For every new datapoint, it looks at the N nearest datapoints it has classifications for, and assigns the new datapoint the class that’s most common amongst them. Here, we’re using 5 neighbours. We’re also using the Minkowski distance (https://machinelearning1.wordpress.com/2013/03/25/three-famous-metrics-manhattan-euclidean-minkowski/) : this tells the algorithm how to compute the distance between two points, so we can define which points are ‘closest’. Common distance metrics you’ll see in machine learning include: Manhattan, or “city block” distance: add the distance along the x axis to the distance along the y axis (“city block” because that’s how you navigate in Manhattan”) Euclidian distance: calculate the straight-line distance between the two points (e.g. sqrt(x^2 + y^2)) Minkowski distance: a variant of Euclidian distance, for large numbers of features
  17. This is the digits example dataset.
  18. This is all in notebook 6.5
  19. There’s no “best” algorithm for every problem. This is also known as the “no free lunch” theory. If you have data and estimate of better/worse: reinforcement learning There are lots of variants on these algorithms: the Scikit-learn cheat sheet will help you choose between them: http://scikit-learn.org/stable/tutorial/machine_learning_map/
  20. Overfitting: matches the training data well, performs badly on new data… has high variance Underfitting: doesn’t match the training data well, might perform well on new data… has high bias Bias/ Variance tradeoff: adjust your hyperparameters until the model performs well on the test data. See e.g. http://scott.fortmann-roe.com/docs/BiasVariance.html
  21. This is all about your parameters e.g. the difference between fitting a straight line, a quadratic curve or a n-dimensional curve. Figures from Jake Van Der Plas’ Python for Data Science book. We’ll talk about the bias-variance tradeoff later.
  22. False positive is also known as a “type 1 error”; false negative is also known as a “type 2 error”.
  23. These numbers are always between 0 and 1. If you want to play with F1, try it in Python, e.g.: import numpy as np p = np.array([.25, .25, .125, .5, .75]) r = np.array([.001, .10, .7, .9, .3]) 2*p*r / (p + r)
  24. Support: how many things that are actually this class did we use to calculate these metrics? Precision: of all the “true” results, how many were actually “true”? Recall: how many of the things that were really “true” were marked as “true” by the classifier? F1: combination of precision and recall