PyATL talk about machine learning. Provides both an intro to machine learning and how to do it with Python. Includes simple examples with code and results.
2. Machine Learning (ML):
• Finding patterns in data
• Modeling patterns
• Use models to make
predictions
Slide #2
Intro to Machine Learning with Python
matt@liveramp.com
3. ML can be easy*
• You already have ML applications!
• You can start applying ML methods
now with Python &scikit-learn
• Theoretical knowledge of ML not
needed (initially)*
*Gaining more background, theory, and
experience will help
Slide #3
Intro to Machine Learning with Python
matt@liveramp.com
8. Variance/Bias Trade Off
• Need models that can adapt to
relationships in our data
• Highly adaptable models can over-fit
and will not generalize
• Regularization – Common strategy to
address variance/bias trade off
Slide #8
Intro to Machine Learning with Python
matt@liveramp.com
13. Example: Image Classification
• Classify
handwritten digits
with ML models
• Each input is an
entire image
• Output is digit in
the image
Slide #13
Intro to Machine Learning with Python
matt@liveramp.com
15. import numpyas np
from sklearn.ensembleimport RandomForestClassifier
with np.load(’train.npz') as data:
pixels_train = data['pixels']
labels_train = data['labels’]
with np.load(’test.npz') as data:
pixels_test = data['pixels']
# flatten
X_train = pixels_train.reshape(pixels_train.shape[0], -1)
X_test = pixels_test.reshape(pixels_test.shape[0], -1)
model = RandomForestClassifier(n_estimators=50)
model.fit(X_train, labels_train)
labels_test = model.predict(X_test)
Slide #15
Intro to Machine Learning with Python
matt@liveramp.com
16. Predicting the tags of Stack Overflow
questions with machine learning
Kaggle Data Science Competition
• Given 6 million
training questions
labeled with tags
• Predict the tags for
2 million unlabeled
test questions
www.users.globalnet.co.uk/~slocks/instructions.html
stackoverflow.com/questions/895371/bubble-sort-homework
Slide #16
Intro to Machine Learning with Python
matt@liveramp.com
17. Text Classification Overview
Feature Extraction &
Selection
Raw Posts
Slide #17
Model Selection
& Training
Vector Space
Intro to Machine Learning with Python
Machine
Learning Model
matt@liveramp.com
18. Term Frequency Feature Extraction
Characterize text by the frequency of specific
words in each text entry
Slide #18
processing
sorted
array
faster
“Why is processing a
sorted array faster
than processing an
array this is not
sorted?”
Term Frequencies
why
Example Title:
1
2
2
2
1
Ignore common words
(i.e. stop words)
Intro to Machine Learning with Python
matt@liveramp.com
22. ML can be easy*
• You already have ML problems!
• You can start applying ML methods now
with Python &scikit-learn
• Theoretical knowledge of ML not needed
(initially)*
scikit-learn.org
github.com/scikit-learn
Slide #24
Intro to Machine Learning with Python
matt@liveramp.com
23. Helping companies use their marketing data to delight customers
Tools
Opportunities
• Backend Engineers
• Data Scientists
• Full-Stack Engineers
• Java
• Hadoop (Map/Reduce)
• Ruby
Build and work with large distributed systems that
process massive data sets.
Check out: liveramp.com/careers
Slide #25
Intro to Machine Learning with Python
matt@liveramp.com