TeamStation AI System Report LATAM IT Salaries 2024
Uvrgrp ml
1. David Callender
• Finished in top 2% (18th out of >1300) on 3 year
$3 million Machine Learning competition.
• Studied disease propagation in an urban setting
using probabilistic graphical models at Dartmouth
College
• Studied computational protein design at the
University of Washington
• Studied Mathematical foundations of Quantum
Mechanics at Macalester College
3. a.k.a. Using R on Kaggle
who will end up in the hospital
}drug effectiveness
Computer Security:
Determining employee
access needs
What will the salary be for
a given job advertisement
5. Talk Outline
• Motivation
• Concepts
• Algorithms
• Decision Trees and Forests
• Neural networks
• Kaggle
• Interactive session with R packages
• randomForest
• gbm
• neuralnet
6. Supervised Learning
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S
1 1 female 38 1 0 71.2833 C
1 3 female 26 0 0 7.925 S
1 1 female 35 1 0 53.1 S
0 3 male 35 0 0 8.05 S
0 3 male 33 0 0 8.4583 Q
0 1 male 54 0 0 51.8625 S
0 3 male 2 3 1 21.075 S
1 3 female 27 0 2 11.1333 S
1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked
? 3 male 34.5 0 0 7.8292 Q
? 3 female 47 1 0 7 S
? 2 male 62 0 0 9.6875 Q
? 3 male 27 0 0 8.6625 S
? 3 female 22 1 1 12.2875 S
? 3 male 14 0 0 9.225 S
? 3 female 30 0 0 7.6292 Q
? 2 male 26 1 1 29 S
? 3 female 18 0 0 7.2292 C
? 3 male 21 2 0 24.15 S
Train model with
examples where
you know value of
“survived”
Use model to
predict value of
“survived”
Predicting survival for passengers of Titanic
binary
numeric
catagorical
8. Decision Trees
http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R
Survived Pclass Sex Age SibSp Parch Fare Embarked
? 3 male 34.5 0 0 7.8292 Q
? 3 female 47 1 0 7 S
? 2 male 62 0 0 9.6875 Q
? 3 male 27 0 0 8.7 S
? 3 female 22 1 1 12.2875 S
? 3 male 14 0 0 9.225 S
? 3 female 30 0 0 7.6292 Q
? 2 male 26 1 1 29 S
? 3 female 18 0 0 7.2292 C
? 3 male 21 2 0 24.15 S
9. Random Forest (RF)
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S
1 1 female 38 1 0 71.2833 C
1 3 female 26 0 0 7.925 S
1 1 female 35 1 0 53.1 S
0 3 male 35 0 0 8.05 S
0 3 male 33 0 0 8.4583 Q
0 1 male 54 0 0 51.8625 S
0 3 male 2 3 1 21.075 S
1 3 female 27 0 2 11.1333 S
1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S
1 1 female 38 1 0 71.2833 C
1 3 female 26 0 0 7.925 S
1 1 female 35 1 0 53.1 S
0 3 male 35 0 0 8.05 S
0 3 male 33 0 0 8.4583 Q
0 1 male 54 0 0 51.8625 S
0 3 male 2 3 1 21.075 S
1 3 female 27 0 2 11.1333 S
1 2 female 14 1 0 30.0708 C
Random Sub-SpacesBagging
{
{ Voting/Avg
Prediction
Training
10. Adaboost &
Gradient Boosting
• Initialize a set of weights, One for each training example, with equal value
• Train a tree with weighted training examples
• Add tree to set of trees
• Make predictions with set of trees
• Adjust weights so that the training examples you got wrong have more
weight
• repeat
13. R’s Popularity
Tools mentioned in Kaggle user profiles
From blog entry by Ben Hammer
http://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/
14. Summary of Recent
Competition Winners
Position Algorithm Other Algs. Tools
Adzuna
Salary
1st
Adzuna
Salary
2nd
Adzuna
Salary
3rd
Merck
1st
Merck 2ndMerck
3rd
NN* - Python GPU
NN - C++
NN NB, SVM, LR Python
NN* - Python GPU
GBM & SVM
RF, PCA,
KNN, SVM R & Python
RF & SVM GBM, NN R
15. Learning More
• Pedro Domingos at University of Washington
• www.coursera.org/course/machlearning
• www.coursera.org/uw
• A Few Useful Things to Know about Machine Learning. Communications
of the ACM
• homes.cs.washington.edu/~pedrod
• blog.kaggle.com
• ufldl.stanford.edu/wiki/