SlideShare une entreprise Scribd logo
1  sur  83
Télécharger pour lire hors ligne
@SrcMinistry @MariuszGil
Machine Learning for Developers
Data processing
@SrcMinistry
My story
Data classification
Bot detection
Minimize
risk of error
Predictions
Click probability
Maximize
CTR or eCPM
A lot of data
data + algo = result
Real problem
+ value estimator
+ chance of sell
+ $ optimization
Tens of thousands
historical transactions
Tens of data
components
Hundreds of
data components
HOW?
Machine Learning
Theory
A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured
by P, improves with experience E
Tom M. Mitchell
Task
Typical ML techniques
Classification
Regression
Clustering
Dimensionality reduction
Association learning
o
oo
o
oo
o
oo
o o
o
o oo
o
o
o
o oo
o
oo
o
o
o
feature 1
feature2
o
oo
o
oo
o
oo
o o
o
o oo
o
o
o
o oo
o
oo
o
o
o
feature 1
feature2
o
oo
o
oo
o
oo
o o
o
o oo
o
o
o
o oo
o
oo
o
o
o
feature 1
feature2
Experience
Typical ML paradigms
Supervised learning
Unsupervised learning
Reinforcement learning
Accuracy
Substantive
Expertise
Hacking
skills
M
ath
&
Statistics
Knowledge
Traditional

Research
Danger

Zone!
Machine

Learning
Data

Science
Substantive
Expertise
Hacking
skills
M
ath
&
Statistics
KnowledgeEvil
Outside Committee

Member
Not that dangerous,

in retrospect
Machine

Learning
James Bond

Villain
NSA
Data Science
That Guy Who Stole

Your Online Identity
Thesis Advisor
Grad School Mate
Machine Learning
Practice
data + algo = result
+-------+--------+------+--------+---------+-------+
| brand | model | year | milage | service | price |
+-------+--------+------+--------+---------+-------+
| ford | mondeo | 2005 | 123000 | 9900 | 67000 |
+-------+--------+------+--------+---------+-------+
| ford | mondeo | 2005 | 175000 | 9900 | 30000 |
+-------+--------+------+--------+---------+-------+
| ford | focus | 2010 | 45000 | 6700 | 30000 |
+-------+--------+------+--------+---------+-------+
…
Learning Data
Algorithm Learning
Classifier ModelReal Data Classification
Failure recipe
+-------+--------+------+--------+---------+-------+
| brand | model | year | milage | service | price |
+-------+--------+------+--------+---------+-------+
| ford | mondeo | 2005 | 123000 | 9900 | 67000 |
+-------+--------+------+--------+---------+-------+
| ford | mondeo | 2005 | 175000 | 9900 | 30000 |
+-------+--------+------+--------+---------+-------+
| ford | focus | 2010 | 45000 | 6700 | 30000 |
+-------+--------+------+--------+---------+-------+
…
+-------+--------+------+--------+---------+--------+-------+
| brand | model | year | milage | service | repair | price |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
…
+-------+--------+------+--------+---------+--------+-------+
| brand | model | year | milage | service | repair | price |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 175000 | 900 | 9000 | 45000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
…
+-------+--------+-----+------+--------+---------+--------+-------+
| brand | model | gen | year | milage | service | repair | price |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 67000 |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 30000 |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 45000 |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 30000 |
+-------+--------+-----+------+--------+---------+--------+-------+
…
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| brand | model | gen | year | milage | service | repair | igla | crying German | price |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 0 | 0 | 67000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 1 | 1 | 30000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 0 | 0 | 45000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 1 | 0 | 30000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
…
Understand your
data first
Exploratory
analysis
http://blogs.adobe.com/digitalmarketing/wp-content/uploads/2013/08/aq2.jpg
ML pipeline
Raw Data Collection
Pre-processing
Sampling
Training Dataset
Algorithm Training
Optimization
Post-processing
Final model
Pre-processingFeature Selection
Feature Scaling
Dimensionality Reduction
Performance Metrics
Model Selection
Test Dataset
CrossValidation
Final Model

Evaluation
Pre-processing Classification
Missing Data
Feature Extraction
Data

Split
Data
Raw Data Collection
Pre-processing
Sampling
Training Dataset
Algorithm Training
Optimization
Final model
Pre-processingFeature Selection
Feature Scaling
Dimensionality Reduction
Performance Metrics
Model Selection
Test Dataset
CrossValidation
Final Model

Evaluation
Pre-processing Classification
Missing Data
Feature Extraction
Data

Split
Post-processing
Data
Classification algorithms
Linear Classification
Logistic Regression
Linear Discriminant Analysis
PLS Discriminant Analysis
Non-Linear Classification
Mixture Discriminant Analysis
Quadratic Discriminant Analysis
Regularized Discriminant Analysis
Neural Networks
Flexible Discriminant Analysis
Support Vector Machines
k-Nearest Neighbor
Naive Bayes
Decission Trees for Classification
Classification and Regression Trees
C4.5
PART
Bagging CART
Random Forest
Gradient Booster Machines
Boosted 5.0
Regression algorithms
Linear Regiression
Ordinary Least Squares Regression
Stepwise Linear Regression
Prinicpal Component Regression
Partial Least Squares Regression
Non-Linear Regression /
Penalized Regression
Ridge Regression
Least Absolute Shrinkage
ElasticNet
Multivariate Adaptive Regression
Support Vector Machines
k-Nearest Neighbor
Neural Network
Decission Trees for Regression
Classification and Regression Trees
Conditional Decision Tree
Rule System
Bagging CART
Random Forest
Gradient Boosted Machine
Cubist
Algorithm is only
element in the ML chain
Demo #1
> dataset(iris)
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> tail(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
> plot(iris[,1:4])
> library(mclust)
> class = iris$Species
> mod2 = MclustDA(iris[,1:4], class, modelType = „EDDA")
> table(class)
class
setosa versicolor virginica
50 50 50
> summary(mod2)
------------------------------------------------
Gaussian finite mixture model for classification
------------------------------------------------
EDDA model summary:
log.likelihood n df BIC
-187.7097 150 36 -555.8024
Classes n Model G
setosa 50 VEV 1
versicolor 50 VEV 1
virginica 50 VEV 1
Training classification summary:
Predicted
Class setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 0 50
Training error = 0.02
> plot(mod2, what = "scatterplot")
Demo #2
> head(titanic.raw)
Class Sex Age Survived
1 3rd Male Child No
2 3rd Male Child No
3 3rd Male Child No
4 3rd Male Child No
5 3rd Male Child No
6 3rd Male Child No
> tail(titanic.raw)
Class Sex Age Survived
2196 Crew Female Adult Yes
2197 Crew Female Adult Yes
2198 Crew Female Adult Yes
2199 Crew Female Adult Yes
2200 Crew Female Adult Yes
2201 Crew Female Adult Yes
> summary(titanic.raw)
Class Sex Age Survived
1st :325 Female: 470 Adult:2092 No :1490
2nd :285 Male :1731 Child: 109 Yes: 711
3rd :706
Crew:885
> library(arules)
Ładowanie wymaganego pakietu: Matrix
Dołączanie pakietu: ‘arules’
> rules <- apriori(titanic.raw)
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
0.8 0.1 1 none FALSE TRUE 0.1 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 220
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[10 item(s), 2201 transaction(s)] done [0.00s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [27 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
> rules <- apriori(titanic.raw,
+ parameter = list(minlen=2, supp=0.005, conf=0.8),
+ appearance = list(rhs=c("Survived=No", "Survived=Yes"),
+ default="lhs"),
+ control = list(verbose=F))
> rules.sorted <- sort(rules, by="lift")
> subset.matrix <- is.subset(rules.sorted, rules.sorted)
> subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
> redundant <- colSums(subset.matrix, na.rm=T) >= 1
> which(redundant)
{Class=2nd,Sex=Female,Age=Child,Survived=Yes} 2
{Class=1st,Sex=Female,Age=Adult,Survived=Yes} 4
{Class=Crew,Sex=Female,Age=Adult,Survived=Yes} 7
{Class=2nd,Sex=Female,Age=Adult,Survived=Yes} 8
> rules.pruned <- rules.sorted[!redundant]
> inspect(rules.pruned)
lhs rhs support confidence lift
1 {Class=2nd,Age=Child} => {Survived=Yes} 0.010904134 1.0000000 3.095640
4 {Class=1st,Sex=Female} => {Survived=Yes} 0.064061790 0.9724138 3.010243
2 {Class=2nd,Sex=Female} => {Survived=Yes} 0.042253521 0.8773585 2.715986
5 {Class=Crew,Sex=Female} => {Survived=Yes} 0.009086779 0.8695652 2.691861
9 {Class=2nd,Sex=Male,Age=Adult} => {Survived=No} 0.069968196 0.9166667 1.354083
3 {Class=2nd,Sex=Male} => {Survived=No} 0.069968196 0.8603352 1.270871
12 {Class=3rd,Sex=Male,Age=Adult} => {Survived=No} 0.175829169 0.8376623 1.237379
6 {Class=3rd,Sex=Male} => {Survived=No} 0.191731031 0.8274510 1.222295
Applications
Tools
Benefits
& Problems
o
oo
o
oo
o
oo
o o
o
o oo
o
o
o
o oo
o
oo
o
o
o
feature 1
feature2
o
o
Does it do well on

the training data?
Does it do well on

the test data?
Better features /

Better parameters
More data
Done!
No No
Yes
by Andrew Ng
Understand your
needs first
Tools will change
Ideas are immortal
@SrcMinistry
Thanks!
@MariuszGil

Contenu connexe

Similaire à Machine learning for developers

4Developers: Mariusz Gil- Holistyczne ujęcie machine learning
4Developers: Mariusz Gil- Holistyczne ujęcie machine learning4Developers: Mariusz Gil- Holistyczne ujęcie machine learning
4Developers: Mariusz Gil- Holistyczne ujęcie machine learningPROIDEA
 
Automating Networks by Converting into API/Webs
Automating Networks by Converting into API/WebsAutomating Networks by Converting into API/Webs
Automating Networks by Converting into API/WebsAPNIC
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQLDag H. Wanvik
 
Automating Networks by using API
Automating Networks by using APIAutomating Networks by using API
Automating Networks by using API一清 井上
 
Windowing Functions - Little Rock Tech Fest 2019
Windowing Functions - Little Rock Tech Fest 2019Windowing Functions - Little Rock Tech Fest 2019
Windowing Functions - Little Rock Tech Fest 2019Dave Stokes
 
Windowing Functions - Little Rock Tech fest 2019
Windowing Functions - Little Rock Tech fest 2019Windowing Functions - Little Rock Tech fest 2019
Windowing Functions - Little Rock Tech fest 2019Dave Stokes
 
Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013Milan Guenther (eda.c)
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Mydbops
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0Mydbops
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesAdrian Nuta
 
Sangam 2019 - The Latest Features
Sangam 2019 - The Latest FeaturesSangam 2019 - The Latest Features
Sangam 2019 - The Latest FeaturesConnor McDonald
 
MongoDB user group israel May
MongoDB user group israel MayMongoDB user group israel May
MongoDB user group israel MayAlon Horev
 
Neo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole WhiteNeo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole WhiteNeo4j
 
Spring MVC - Wiring the different layers
Spring MVC -  Wiring the different layersSpring MVC -  Wiring the different layers
Spring MVC - Wiring the different layersIlio Catallo
 

Similaire à Machine learning for developers (20)

4Developers: Mariusz Gil- Holistyczne ujęcie machine learning
4Developers: Mariusz Gil- Holistyczne ujęcie machine learning4Developers: Mariusz Gil- Holistyczne ujęcie machine learning
4Developers: Mariusz Gil- Holistyczne ujęcie machine learning
 
Automating Networks by Converting into API/Webs
Automating Networks by Converting into API/WebsAutomating Networks by Converting into API/Webs
Automating Networks by Converting into API/Webs
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
 
Automating Networks by using API
Automating Networks by using APIAutomating Networks by using API
Automating Networks by using API
 
Windowing Functions - Little Rock Tech Fest 2019
Windowing Functions - Little Rock Tech Fest 2019Windowing Functions - Little Rock Tech Fest 2019
Windowing Functions - Little Rock Tech Fest 2019
 
Windowing Functions - Little Rock Tech fest 2019
Windowing Functions - Little Rock Tech fest 2019Windowing Functions - Little Rock Tech fest 2019
Windowing Functions - Little Rock Tech fest 2019
 
Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searches
 
Sangam 2019 - The Latest Features
Sangam 2019 - The Latest FeaturesSangam 2019 - The Latest Features
Sangam 2019 - The Latest Features
 
ZIPPGEAR Gearing solution catalog - 2020 Update
ZIPPGEAR Gearing solution catalog - 2020 UpdateZIPPGEAR Gearing solution catalog - 2020 Update
ZIPPGEAR Gearing solution catalog - 2020 Update
 
ZIPP Gear Reducer General Catalog
ZIPP Gear Reducer General CatalogZIPP Gear Reducer General Catalog
ZIPP Gear Reducer General Catalog
 
MongoDB user group israel May
MongoDB user group israel MayMongoDB user group israel May
MongoDB user group israel May
 
Neo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole WhiteNeo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole White
 
Window functions
Window functionsWindow functions
Window functions
 
Spring MVC - Wiring the different layers
Spring MVC -  Wiring the different layersSpring MVC -  Wiring the different layers
Spring MVC - Wiring the different layers
 
Explain
ExplainExplain
Explain
 
ZIPP Gear Reducer Catalog
ZIPP Gear Reducer CatalogZIPP Gear Reducer Catalog
ZIPP Gear Reducer Catalog
 
2020 ZIPP Gear reducer catalog
2020 ZIPP Gear reducer catalog2020 ZIPP Gear reducer catalog
2020 ZIPP Gear reducer catalog
 

Dernier

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Dernier (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Machine learning for developers

  • 1. @SrcMinistry @MariuszGil Machine Learning for Developers Data processing
  • 10. A lot of data
  • 11. data + algo = result
  • 13.
  • 15. + chance of sell
  • 20. HOW?
  • 22. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E Tom M. Mitchell
  • 23. Task
  • 25. o oo o oo o oo o o o o oo o o o o oo o oo o o o feature 1 feature2
  • 26. o oo o oo o oo o o o o oo o o o o oo o oo o o o feature 1 feature2
  • 27. o oo o oo o oo o o o o oo o o o o oo o oo o o o feature 1 feature2
  • 29. Typical ML paradigms Supervised learning Unsupervised learning Reinforcement learning
  • 32. Substantive Expertise Hacking skills M ath & Statistics KnowledgeEvil Outside Committee
 Member Not that dangerous,
 in retrospect Machine
 Learning James Bond
 Villain NSA Data Science That Guy Who Stole
 Your Online Identity Thesis Advisor Grad School Mate
  • 34.
  • 35. data + algo = result
  • 36. +-------+--------+------+--------+---------+-------+ | brand | model | year | milage | service | price | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 123000 | 9900 | 67000 | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 175000 | 9900 | 30000 | +-------+--------+------+--------+---------+-------+ | ford | focus | 2010 | 45000 | 6700 | 30000 | +-------+--------+------+--------+---------+-------+ …
  • 37. Learning Data Algorithm Learning Classifier ModelReal Data Classification
  • 39. +-------+--------+------+--------+---------+-------+ | brand | model | year | milage | service | price | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 123000 | 9900 | 67000 | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 175000 | 9900 | 30000 | +-------+--------+------+--------+---------+-------+ | ford | focus | 2010 | 45000 | 6700 | 30000 | +-------+--------+------+--------+---------+-------+ …
  • 40. +-------+--------+------+--------+---------+--------+-------+ | brand | model | year | milage | service | repair | price | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ | ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ …
  • 41. +-------+--------+------+--------+---------+--------+-------+ | brand | model | year | milage | service | repair | price | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 45000 | +-------+--------+------+--------+---------+--------+-------+ | ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ …
  • 42. +-------+--------+-----+------+--------+---------+--------+-------+ | brand | model | gen | year | milage | service | repair | price | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 45000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+-----+------+--------+---------+--------+-------+ …
  • 43. +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | brand | model | gen | year | milage | service | repair | igla | crying German | price | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 0 | 0 | 67000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 1 | 1 | 30000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 0 | 0 | 45000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 1 | 0 | 30000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ …
  • 48. Raw Data Collection Pre-processing Sampling Training Dataset Algorithm Training Optimization Post-processing Final model Pre-processingFeature Selection Feature Scaling Dimensionality Reduction Performance Metrics Model Selection Test Dataset CrossValidation Final Model
 Evaluation Pre-processing Classification Missing Data Feature Extraction Data
 Split Data
  • 49. Raw Data Collection Pre-processing Sampling Training Dataset Algorithm Training Optimization Final model Pre-processingFeature Selection Feature Scaling Dimensionality Reduction Performance Metrics Model Selection Test Dataset CrossValidation Final Model
 Evaluation Pre-processing Classification Missing Data Feature Extraction Data
 Split Post-processing Data
  • 50. Classification algorithms Linear Classification Logistic Regression Linear Discriminant Analysis PLS Discriminant Analysis Non-Linear Classification Mixture Discriminant Analysis Quadratic Discriminant Analysis Regularized Discriminant Analysis Neural Networks Flexible Discriminant Analysis Support Vector Machines k-Nearest Neighbor Naive Bayes Decission Trees for Classification Classification and Regression Trees C4.5 PART Bagging CART Random Forest Gradient Booster Machines Boosted 5.0
  • 51. Regression algorithms Linear Regiression Ordinary Least Squares Regression Stepwise Linear Regression Prinicpal Component Regression Partial Least Squares Regression Non-Linear Regression / Penalized Regression Ridge Regression Least Absolute Shrinkage ElasticNet Multivariate Adaptive Regression Support Vector Machines k-Nearest Neighbor Neural Network Decission Trees for Regression Classification and Regression Trees Conditional Decision Tree Rule System Bagging CART Random Forest Gradient Boosted Machine Cubist
  • 52. Algorithm is only element in the ML chain
  • 54. > dataset(iris) > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa > tail(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 145 6.7 3.3 5.7 2.5 virginica 146 6.7 3.0 5.2 2.3 virginica 147 6.3 2.5 5.0 1.9 virginica 148 6.5 3.0 5.2 2.0 virginica 149 6.2 3.4 5.4 2.3 virginica 150 5.9 3.0 5.1 1.8 virginica > plot(iris[,1:4])
  • 55.
  • 56. > library(mclust) > class = iris$Species > mod2 = MclustDA(iris[,1:4], class, modelType = „EDDA") > table(class) class setosa versicolor virginica 50 50 50
  • 57. > summary(mod2) ------------------------------------------------ Gaussian finite mixture model for classification ------------------------------------------------ EDDA model summary: log.likelihood n df BIC -187.7097 150 36 -555.8024 Classes n Model G setosa 50 VEV 1 versicolor 50 VEV 1 virginica 50 VEV 1 Training classification summary: Predicted Class setosa versicolor virginica setosa 50 0 0 versicolor 0 47 3 virginica 0 0 50 Training error = 0.02 > plot(mod2, what = "scatterplot")
  • 58.
  • 60. > head(titanic.raw) Class Sex Age Survived 1 3rd Male Child No 2 3rd Male Child No 3 3rd Male Child No 4 3rd Male Child No 5 3rd Male Child No 6 3rd Male Child No > tail(titanic.raw) Class Sex Age Survived 2196 Crew Female Adult Yes 2197 Crew Female Adult Yes 2198 Crew Female Adult Yes 2199 Crew Female Adult Yes 2200 Crew Female Adult Yes 2201 Crew Female Adult Yes > summary(titanic.raw) Class Sex Age Survived 1st :325 Female: 470 Adult:2092 No :1490 2nd :285 Male :1731 Child: 109 Yes: 711 3rd :706 Crew:885
  • 61. > library(arules) Ładowanie wymaganego pakietu: Matrix Dołączanie pakietu: ‘arules’ > rules <- apriori(titanic.raw) Apriori Parameter specification: confidence minval smax arem aval originalSupport support minlen maxlen target ext 0.8 0.1 1 none FALSE TRUE 0.1 1 10 rules FALSE Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE Absolute minimum support count: 220 set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[10 item(s), 2201 transaction(s)] done [0.00s]. sorting and recoding items ... [9 item(s)] done [0.00s]. creating transaction tree ... done [0.00s]. checking subsets of size 1 2 3 4 done [0.00s]. writing ... [27 rule(s)] done [0.00s]. creating S4 object ... done [0.00s].
  • 62. > rules <- apriori(titanic.raw, + parameter = list(minlen=2, supp=0.005, conf=0.8), + appearance = list(rhs=c("Survived=No", "Survived=Yes"), + default="lhs"), + control = list(verbose=F)) > rules.sorted <- sort(rules, by="lift") > subset.matrix <- is.subset(rules.sorted, rules.sorted) > subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA > redundant <- colSums(subset.matrix, na.rm=T) >= 1 > which(redundant) {Class=2nd,Sex=Female,Age=Child,Survived=Yes} 2 {Class=1st,Sex=Female,Age=Adult,Survived=Yes} 4 {Class=Crew,Sex=Female,Age=Adult,Survived=Yes} 7 {Class=2nd,Sex=Female,Age=Adult,Survived=Yes} 8 > rules.pruned <- rules.sorted[!redundant]
  • 63. > inspect(rules.pruned) lhs rhs support confidence lift 1 {Class=2nd,Age=Child} => {Survived=Yes} 0.010904134 1.0000000 3.095640 4 {Class=1st,Sex=Female} => {Survived=Yes} 0.064061790 0.9724138 3.010243 2 {Class=2nd,Sex=Female} => {Survived=Yes} 0.042253521 0.8773585 2.715986 5 {Class=Crew,Sex=Female} => {Survived=Yes} 0.009086779 0.8695652 2.691861 9 {Class=2nd,Sex=Male,Age=Adult} => {Survived=No} 0.069968196 0.9166667 1.354083 3 {Class=2nd,Sex=Male} => {Survived=No} 0.069968196 0.8603352 1.270871 12 {Class=3rd,Sex=Male,Age=Adult} => {Survived=No} 0.175829169 0.8376623 1.237379 6 {Class=3rd,Sex=Male} => {Survived=No} 0.191731031 0.8274510 1.222295
  • 65.
  • 66.
  • 67.
  • 68.
  • 69. Tools
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 79. o oo o oo o oo o o o o oo o o o o oo o oo o o o feature 1 feature2 o o
  • 80. Does it do well on
 the training data? Does it do well on
 the test data? Better features /
 Better parameters More data Done! No No Yes by Andrew Ng
  • 82. Tools will change Ideas are immortal