SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
David Callender
• Finished in top 2% (18th out of >1300) on 3 year
$3 million Machine Learning competition.
• Studied disease propagation in an urban setting
using probabilistic graphical models at Dartmouth
College
• Studied computational protein design at the
University of Washington
• Studied Mathematical foundations of Quantum
Mechanics at Macalester College
Machine Learning in R
circa 2013
David Callender
a.k.a. Using R on Kaggle
who will end up in the hospital
}drug effectiveness
Computer Security:
Determining employee
access needs
What will the salary be for
a given job advertisement
Not Just Kaggle
•Movie
recomendations
•Popular
productions
•Product
recomendations
•Good business
oportunities
•The Entire
Internet
•Probably a lot
more too
Talk Outline
• Motivation
• Concepts
• Algorithms
• Decision Trees and Forests
• Neural networks
• Kaggle
• Interactive session with R packages
• randomForest
• gbm
• neuralnet
Supervised Learning
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S
1 1 female 38 1 0 71.2833 C
1 3 female 26 0 0 7.925 S
1 1 female 35 1 0 53.1 S
0 3 male 35 0 0 8.05 S
0 3 male 33 0 0 8.4583 Q
0 1 male 54 0 0 51.8625 S
0 3 male 2 3 1 21.075 S
1 3 female 27 0 2 11.1333 S
1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked
? 3 male 34.5 0 0 7.8292 Q
? 3 female 47 1 0 7 S
? 2 male 62 0 0 9.6875 Q
? 3 male 27 0 0 8.6625 S
? 3 female 22 1 1 12.2875 S
? 3 male 14 0 0 9.225 S
? 3 female 30 0 0 7.6292 Q
? 2 male 26 1 1 29 S
? 3 female 18 0 0 7.2292 C
? 3 male 21 2 0 24.15 S
Train model with
examples where
you know value of
“survived”
Use model to
predict value of
“survived”
Predicting survival for passengers of Titanic
binary
numeric
catagorical
Overfitting
http://en.wikipedia.org/wiki/File:Overfitting_on_Training_Set_Data.pdf Tomaso Poggio
Decision Trees
http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R
Survived Pclass Sex Age SibSp Parch Fare Embarked
? 3 male 34.5 0 0 7.8292 Q
? 3 female 47 1 0 7 S
? 2 male 62 0 0 9.6875 Q
? 3 male 27 0 0 8.7 S
? 3 female 22 1 1 12.2875 S
? 3 male 14 0 0 9.225 S
? 3 female 30 0 0 7.6292 Q
? 2 male 26 1 1 29 S
? 3 female 18 0 0 7.2292 C
? 3 male 21 2 0 24.15 S
Random Forest (RF)
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S
1 1 female 38 1 0 71.2833 C
1 3 female 26 0 0 7.925 S
1 1 female 35 1 0 53.1 S
0 3 male 35 0 0 8.05 S
0 3 male 33 0 0 8.4583 Q
0 1 male 54 0 0 51.8625 S
0 3 male 2 3 1 21.075 S
1 3 female 27 0 2 11.1333 S
1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S
1 1 female 38 1 0 71.2833 C
1 3 female 26 0 0 7.925 S
1 1 female 35 1 0 53.1 S
0 3 male 35 0 0 8.05 S
0 3 male 33 0 0 8.4583 Q
0 1 male 54 0 0 51.8625 S
0 3 male 2 3 1 21.075 S
1 3 female 27 0 2 11.1333 S
1 2 female 14 1 0 30.0708 C
Random Sub-SpacesBagging
{
{ Voting/Avg
Prediction
Training
Adaboost &
Gradient Boosting
• Initialize a set of weights, One for each training example, with equal value
• Train a tree with weighted training examples
• Add tree to set of trees
• Make predictions with set of trees
• Adjust weights so that the training examples you got wrong have more
weight
• repeat
Logistic Regression
a.k.a The Perceptron
Activation
Function
Weighted sum
Multilayer Feed-forward
Neural Network
R’s Popularity
Tools mentioned in Kaggle user profiles
From blog entry by Ben Hammer
http://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/
Summary of Recent
Competition Winners
Position Algorithm Other Algs. Tools
Adzuna
Salary
1st
Adzuna
Salary
2nd
Adzuna
Salary
3rd
Merck
1st
Merck 2ndMerck
3rd
NN* - Python GPU
NN - C++
NN NB, SVM, LR Python
NN* - Python GPU
GBM & SVM
RF, PCA,
KNN, SVM R & Python
RF & SVM GBM, NN R
Learning More
• Pedro Domingos at University of Washington
• www.coursera.org/course/machlearning
• www.coursera.org/uw
• A Few Useful Things to Know about Machine Learning. Communications
of the ACM
• homes.cs.washington.edu/~pedrod
• blog.kaggle.com
• ufldl.stanford.edu/wiki/

Contenu connexe

En vedette

ixtract - Tears of the sun
ixtract - Tears of the sunixtract - Tears of the sun
ixtract - Tears of the sunStefan Fichtel
 
Qw home automation (qwha)
Qw home automation (qwha)Qw home automation (qwha)
Qw home automation (qwha)vitilaforga
 
Wings training folder - set 2013
Wings training   folder - set 2013Wings training   folder - set 2013
Wings training folder - set 2013treinadormental
 
RISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABAN
RISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABANRISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABAN
RISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABANÁggapBrasil
 
XWiki : Evolutions 2012
XWiki : Evolutions 2012XWiki : Evolutions 2012
XWiki : Evolutions 2012XWiki
 

En vedette (8)

Iyer Matrimony.txt
Iyer Matrimony.txtIyer Matrimony.txt
Iyer Matrimony.txt
 
ixtract - Tears of the sun
ixtract - Tears of the sunixtract - Tears of the sun
ixtract - Tears of the sun
 
QVC Deen
QVC DeenQVC Deen
QVC Deen
 
Qw home automation (qwha)
Qw home automation (qwha)Qw home automation (qwha)
Qw home automation (qwha)
 
Mono rail
Mono railMono rail
Mono rail
 
Wings training folder - set 2013
Wings training   folder - set 2013Wings training   folder - set 2013
Wings training folder - set 2013
 
RISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABAN
RISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABANRISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABAN
RISERVATTO OSASCO SP VL S.FRANCISCO APTO 3_4 DORM 11-7853-9660 GABAN
 
XWiki : Evolutions 2012
XWiki : Evolutions 2012XWiki : Evolutions 2012
XWiki : Evolutions 2012
 

Similaire à Uvrgrp ml

Kaggle digits analysis_final_fc
Kaggle digits analysis_final_fcKaggle digits analysis_final_fc
Kaggle digits analysis_final_fcZachary Combs
 
VSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsVSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsBigML, Inc
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodJames Wong
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodHoang Nguyen
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodYoung Alista
 
Data miningmaximumlikelihood
Data miningmaximumlikelihoodData miningmaximumlikelihood
Data miningmaximumlikelihoodFraboni Ec
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodTony Nguyen
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodLuis Goldster
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihoodHarry Potter
 
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Edureka!
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)nlt2390
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
 
Use of spark for proteomic scoring seattle presentation
Use of spark for  proteomic scoring   seattle presentationUse of spark for  proteomic scoring   seattle presentation
Use of spark for proteomic scoring seattle presentationlordjoe
 
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life SciencesIncreasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life SciencesSandra Gesing
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - EnsemblesBigML, Inc
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用Ryo Iwaki
 

Similaire à Uvrgrp ml (20)

Kaggle digits analysis_final_fc
Kaggle digits analysis_final_fcKaggle digits analysis_final_fc
Kaggle digits analysis_final_fc
 
VSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsVSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic Regressions
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly Detection
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Data miningmaximumlikelihood
Data miningmaximumlikelihoodData miningmaximumlikelihood
Data miningmaximumlikelihood
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Data mining maximumlikelihood
Data mining maximumlikelihoodData mining maximumlikelihood
Data mining maximumlikelihood
 
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
Use of spark for proteomic scoring seattle presentation
Use of spark for  proteomic scoring   seattle presentationUse of spark for  proteomic scoring   seattle presentation
Use of spark for proteomic scoring seattle presentation
 
MyStataLab Assignment Help
MyStataLab Assignment HelpMyStataLab Assignment Help
MyStataLab Assignment Help
 
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life SciencesIncreasing the Efficiency of Workflows: Use Cases in the Life Sciences
Increasing the Efficiency of Workflows: Use Cases in the Life Sciences
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - Ensembles
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 

Dernier

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Dernier (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Uvrgrp ml

  • 1. David Callender • Finished in top 2% (18th out of >1300) on 3 year $3 million Machine Learning competition. • Studied disease propagation in an urban setting using probabilistic graphical models at Dartmouth College • Studied computational protein design at the University of Washington • Studied Mathematical foundations of Quantum Mechanics at Macalester College
  • 2. Machine Learning in R circa 2013 David Callender
  • 3. a.k.a. Using R on Kaggle who will end up in the hospital }drug effectiveness Computer Security: Determining employee access needs What will the salary be for a given job advertisement
  • 4. Not Just Kaggle •Movie recomendations •Popular productions •Product recomendations •Good business oportunities •The Entire Internet •Probably a lot more too
  • 5. Talk Outline • Motivation • Concepts • Algorithms • Decision Trees and Forests • Neural networks • Kaggle • Interactive session with R packages • randomForest • gbm • neuralnet
  • 6. Supervised Learning Survived Pclass Sex Age SibSp Parch Fare Embarked 0 3 male 22 1 0 7.25 S 1 1 female 38 1 0 71.2833 C 1 3 female 26 0 0 7.925 S 1 1 female 35 1 0 53.1 S 0 3 male 35 0 0 8.05 S 0 3 male 33 0 0 8.4583 Q 0 1 male 54 0 0 51.8625 S 0 3 male 2 3 1 21.075 S 1 3 female 27 0 2 11.1333 S 1 2 female 14 1 0 30.0708 C Survived Pclass Sex Age SibSp Parch Fare Embarked ? 3 male 34.5 0 0 7.8292 Q ? 3 female 47 1 0 7 S ? 2 male 62 0 0 9.6875 Q ? 3 male 27 0 0 8.6625 S ? 3 female 22 1 1 12.2875 S ? 3 male 14 0 0 9.225 S ? 3 female 30 0 0 7.6292 Q ? 2 male 26 1 1 29 S ? 3 female 18 0 0 7.2292 C ? 3 male 21 2 0 24.15 S Train model with examples where you know value of “survived” Use model to predict value of “survived” Predicting survival for passengers of Titanic binary numeric catagorical
  • 8. Decision Trees http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R Survived Pclass Sex Age SibSp Parch Fare Embarked ? 3 male 34.5 0 0 7.8292 Q ? 3 female 47 1 0 7 S ? 2 male 62 0 0 9.6875 Q ? 3 male 27 0 0 8.7 S ? 3 female 22 1 1 12.2875 S ? 3 male 14 0 0 9.225 S ? 3 female 30 0 0 7.6292 Q ? 2 male 26 1 1 29 S ? 3 female 18 0 0 7.2292 C ? 3 male 21 2 0 24.15 S
  • 9. Random Forest (RF) Survived Pclass Sex Age SibSp Parch Fare Embarked 0 3 male 22 1 0 7.25 S 1 1 female 38 1 0 71.2833 C 1 3 female 26 0 0 7.925 S 1 1 female 35 1 0 53.1 S 0 3 male 35 0 0 8.05 S 0 3 male 33 0 0 8.4583 Q 0 1 male 54 0 0 51.8625 S 0 3 male 2 3 1 21.075 S 1 3 female 27 0 2 11.1333 S 1 2 female 14 1 0 30.0708 C Survived Pclass Sex Age SibSp Parch Fare Embarked 0 3 male 22 1 0 7.25 S 1 1 female 38 1 0 71.2833 C 1 3 female 26 0 0 7.925 S 1 1 female 35 1 0 53.1 S 0 3 male 35 0 0 8.05 S 0 3 male 33 0 0 8.4583 Q 0 1 male 54 0 0 51.8625 S 0 3 male 2 3 1 21.075 S 1 3 female 27 0 2 11.1333 S 1 2 female 14 1 0 30.0708 C Random Sub-SpacesBagging { { Voting/Avg Prediction Training
  • 10. Adaboost & Gradient Boosting • Initialize a set of weights, One for each training example, with equal value • Train a tree with weighted training examples • Add tree to set of trees • Make predictions with set of trees • Adjust weights so that the training examples you got wrong have more weight • repeat
  • 11. Logistic Regression a.k.a The Perceptron Activation Function Weighted sum
  • 13. R’s Popularity Tools mentioned in Kaggle user profiles From blog entry by Ben Hammer http://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/
  • 14. Summary of Recent Competition Winners Position Algorithm Other Algs. Tools Adzuna Salary 1st Adzuna Salary 2nd Adzuna Salary 3rd Merck 1st Merck 2ndMerck 3rd NN* - Python GPU NN - C++ NN NB, SVM, LR Python NN* - Python GPU GBM & SVM RF, PCA, KNN, SVM R & Python RF & SVM GBM, NN R
  • 15. Learning More • Pedro Domingos at University of Washington • www.coursera.org/course/machlearning • www.coursera.org/uw • A Few Useful Things to Know about Machine Learning. Communications of the ACM • homes.cs.washington.edu/~pedrod • blog.kaggle.com • ufldl.stanford.edu/wiki/