Machine Learning in easy pieces

Machine Learning
In easy pieces
Sakshi Ganeriwal

Behavioral Analytics
©2018 PayPal Inc. Confidential and proprietary.
I am planning to buy
Hmm! It takes a lot of time
Why do I get weird suggestions
Got it! Let me place the order
Transaction Failed

General AI
A computer system that uses
learning or other methods to solve
a particular problem.
ØSpecific
ØLearning mechanisms
ØNot Extensive
A computer system that operates
like a human brain. Solves new
problems on the spot.
ØLearn Context
ØConsciousness
ØAdaptive
Narrow AI
Personal Assistants

MACHINE LEARNING
DEEP MACHINE
LEARNING
NATURAL
LANGUAGE
PROCESSING
ARTIFICIAL INTELLIGENCE
SPEECH
RECOGNITION
EXPERT
SYSTEMS

Over the years
test to judge whether machines exhibit human
intelligence
TURING TEST
IBM’s computer beat the world chess champion
after losing 5 - till the year before
DEEP BLUE
Watson beats two previous winners of the contest
WATSON WINS JEOPARDY
first Artificial Intelligence conference
DARTMOUTH CONFERENCE
NASA's robotic exploration rovers
autonomously navigate the surface of Mars
SPIRIT AND OPPORTUNITY
20101950 1997
1956 2004
screenplay Sunspring, Daddy’s Car pop song,
Japanese AI novel and dark poems
LITERATURE
Google’s Deep Mind was able to identify cats with 75%
accuracy after being fed 10 million YouTube videos
LETS FIND CATS
2016
2012

Why is AI & ML famous now?
COMPUTATION
DATA
STATISTICAL MODELS
3 PILLARS:

What is Machine Learning?
Learn from experience Follow instructionsLearn from experience
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4

What is Machine Learning?
Learn from experience Follow instructionsLearn from experience
data

How does machine learning work?
INPUT
ALGORITHM
OUTPUT
BREAK DOWN THE PROCESS INTO THREE COMPONENTS:
AKA DATA SET
AKA MODELS
AKA TARGET LABELS/ GROUPED OUTPUT

Inputs: the data that powers ML
FROM SOURCE CODE TO STATISTICS, DATA SETS CAN CONTAIN JUST ABOUT ANYTHING
GSA / data - Assorted data from the General Services Administration.
GoogleTrends / data - An index of all open-source data
nationalparkservice / data - An unofficial repository of National Park Service data.
fivethirtyeight / data - Data and code behind the stories and interactives at FiveThirtyEight
beamandrew / medical-data
src-d / awesome-machine-learning-on-source-code –
Interesting links & research papers related to Machine Learning applied to source code
ImageNet - large visual database designed for use in visual object recognition software research

Algorithms: how data is processed and analyzed
SUPERVISED LEARNING
UNSUPERVISED LEARNING
REINFORCED LEARNING
CATEGORIES:

Supervised Learning Unsupervised Learning Reinforced Learning
A target output is aimed for and the system
learns from the data model provided
We can give examples,
but we cannot give an algorithm to get from
input to output
No particular end goal.
A structured complex data is
provided to the system to provide
insights
We have some data,
but we have no idea where to start
looking for useful/interesting stuff
Provide feedback on the action
taken by the system which uses it
to learn further
We have no idea how to do
something,
but we can say whether it has been
done right or wrong

Price of House
70 lakhs
1.6 Crore
?
15
10
5
20
Price(10,00,000)
Size of the house (1000 ft squared)
7
5 12
What is the best estimate for the
price of the house?

Price of House
70 lakhs
1.6 Crore
?
15
10
20
Price(10,00,000)
7
5 12
5

Linear Regression
70 lakhs
1.6 Crore
?
15
10
20
7
5 12
5
11

Linear Regression
ERROR: + ++
Gradient Descent

Detecting Spam e-mails
Spam Non-Spam“Cheap”
100 emails
25 spam 75 Non-spam

If an email contains the word “cheap”, what is the probability of it being spam?
20 5

20
5

80%
20%

“Cheap”
Spelling Mistake
Missing title
etc..
80%
70%
95% Naive Bayes
Algorithm

Acceptance at a University
Test Grades

GradesTest
Student 1
Test: 9/10
Grades: 8/10
Student 2
Test: 3/10
Grades: 4/10
Student 3
Test: 7/10
Grades: 6/10

Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10

Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes

Logistic RegressionGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes

Logistic Regression
ERROR: 2
Gradient Descent
Log-loss function

Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Student 3
Test: 9/10
Grades: 6/10

Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Student 4
Test: 9/10
Grades: 1/10

Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10

Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Gradient Descent

Neural Network

Logistic Regression & Neural Networks

Convolutional NN Recurrent NN
• Fixed size input and outputs.
• feed-forward artificial neural network
• Use connectivity pattern
• Learns to recognize patterns across like study images
• break a component into subcomponents
• Handle arbitrary input/output lengths.
• Internal memory to process arbitrary sequences of
inputs.
• Use time-series information i.e. what I spoke last will
impact what I will speak next.
• Ideal for text and speech analysis.
• Create combinations of subcomponents (image
captioning, text generation, language translation, etc.)

Did anyone order pizza?

K-means clustering

Did anyone order pizza?
STOP
Too Big

Hierarchical clustering

Supervised learning VS. Unsupervised
learning
Source: Quora

Supervised learning VS. PLUS
Unsupervised learning
Unsupervised learning as feature engineering
E.g.: clustering + KNN, Matrix Factorization
One of the “Tricks” in Deep Learning is how it combines unsupervised/supervised learning
Stacked Autoencoders, training of CNN
Source: Quora

OUTPUT
CLASSIFICATION: GENERATE AN OUTPUT VALUE FOR EACH ITEM IN A DATA SET
REGRESSION: GIVEN THE DATA, PREDICT THE MOST LIKELY VALUE FOR VARIABLE
UNDER CONSIDERATION
CLUSTERING: GROUP THE DATA INTO SIMILAR PATTERNS
A FEW APPROACHES TO FINDING OUTPUTS INCLUDE:

Machine Learning Infrastructure
Source: Quora

ML infrastructure: Experimentation & Production
ØOption 1:
ØFavor experimentation and only invest in productionizing once something
shows results. E.g. Have ML researchers use R and then ask Engineers to
implement things in production when they work
ØOption 2:
ØFavor production and have “researchers” struggle to figure out how to run
experiments. E.g. Implement highly optimized code and have ML researchers
experiment only through data available in logs/DB
Source: Quora

The two faces of your ML infrastructure
Optimal solution:
ØHave ML “researchers” experiment on iPython Notebooks using Python tools
(scikit-learn, Theano…). Use same tools in production whenever possible,
implement optimized versions only when needed.
ØImplement abstraction layers on top of optimized implementations so they can
be accessed from regular/friendly experimentation tools
Source: Quora

The untold story of
Data Science vs. and ML engineering
Is ML at a point at which you don’t have to be a data scientist to take advantage of it?
There are good tools to get started, BUT
For state-of-art performance, one needs rigorous quantitative understanding
Source: Quora

The data-driven ML innovation funnel
Data Research
Data research & hypothesis
building ->Data Science
AB Testing
Online experimentation, AB Testing analysis->Data Science
ML Exploration– Product Design
ML solution building &
implementation ->ML Engineering
Source: Quora

Coursera Deep Learning Specialization by deeplearning.ai
Luis Serrano youtube video Lecture [Udacity course]
DEMYSTIFYING DEEP LEARNING&AI
Data Skeptic Podcast
Talking Machines Podcast
ML Hackerearth
Kaggle Tutorial
https://github.com/collections/machine-learning

scikit-learn / scikit-learn - machine learning in Python
tensorflow / tensorflow - Computation using data flow graphs for scalable machine learning
Theano / Theano - Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays
efficiently
davisking / dlib - A toolkit for making real world machine learning and data analysis applications in C++
apache / predictionio - a machine learning server for developers and ML engineers. Built on Apache Spark, HBase and Spray.
Machine Learning Models&Algorithms | Amazon SageMaker on AWS - Build, train, and deploy machine learning models & algorithms at scale
KNIME - open source data analytics, reporting and integration platform.
MLlib | Apache Spark
Word2vec - group of related models that are used to produce word embeddings
GloVe - Unsupervised learning algorithm for obtaining vector representations for words
shogun-toolbox / shogun

Supervised Learning Unsupervised Learning Reinforced Learning
Most Industrialized.
igrigorik / decisiontree - ID3-based
implementation of the ML Decision Tree
algorithm
popular approach in natural
language processing (NLP)
keon / awesome-nlp - A curated
list of resources dedicated to NLP
develop self-driving cars or teach a
robot how to manufacture an item.
openai / gym - A toolkit for
developing and comparing
reinforcement learning algorithms.
aikorea / awesome-rl -
Reinforcement learning resources
curated
EXAMPLES IN PRACTICE:
umutisik / Eigentechno- Principal Component Analysis on music loops
jpmckinney / tf-idf-similarity- Ruby gem to calculate the similarity between texts using tf*idf
scikit-learn-contrib / lightning- Large-scale linear classification, regression and ranking in Python
gwding / draw_convnet

Certificate in Statistics
and Computational
Data Science
Certificate programs
in R-Programming
and Statistics.
Certificate Programs
in Data Science
(Microsoft
Professional Program)
Certificate Programs
in Machine Learning
CSCI E-81 Machine
Learning and Data
Mining (Harvard
University)
Certificate in Machine
Learning (University
of Washington)

Artificial Intelligence Certifications

Artificial Intelligence Graduate Certificate (Stanford University)
Machine Learning at Columbia University (free Content, certification option)
Machine Learning at Georgia Tech (Free Content, certification option)
IBM Watson Certifications
Microsoft Machine Learning & AI Certification
PG Diploma in Machine Learning and AI – Upgrad and IIIT-B
Certified Artificial Intelligence Professional (Govt. of India with V Skills)
NVIDIA Deep Learning Programs

Getting started
josephmisiti / awesome-
machine-learning –
A curated list of awesome
Machine Learning frameworks,
libraries and software.
ujjwalkarn / Machine-Learning-
Tutorials -
machine learning and deep
learning tutorials, articles and
other resources
ChristosChristofidis
/ awesome-deep-learning
A curated list of awesome
Deep Learning tutorials,
projects and
communities.
fastai / courses
fast.ai Courses
jtoy / awesome-tensorflow
TensorFlow - A curated list of
dedicated resources
http://tensorflow.org
nlintz / TensorFlow-Tutorials
Simple tutorials using Google's
TensorFlow Framework
pkmital / tensorflow_tutorials
From the basics to slightly more
interesting applications of Tensorflow
Machine Learning Deep Learning TensorFlow

Reading Resources
Keras Document
Learn about image classification and neural networks
Visualizing and Understanding CNNs
AlexNet
VGGNet
GoogLeNet
Text Classification with Keras

IF YOU ONLY REMEMBER
ONE THING FROM THIS TALK

JUST BUILD
SOMETHING WITH DATA
AND EXPECT TO GET STUCK AT IT FOR A WHILE

THE
(SOMEWHAT)
UNFORTUNATE
TRUTH

Math & Machine Learning
Ø Linear Algebra
Ø Calculus
Ø Statistics
Ø Probability
Ø MIT Linear Algebra Open Course
Ø MIT Calculus Open Course
Ø MIT Stats and Probability Course

Thank You!!
sganeriwal@paypal.com
saganeriwal@gmail.com

Machine Learning in easy pieces

Recommandé

Recommandé

Contenu connexe

Similaire à Machine Learning in easy pieces

Similaire à Machine Learning in easy pieces (20)

Dernier

Dernier (20)

Machine Learning in easy pieces