SlideShare a Scribd company logo
1 of 40
Continuous Online
Learners
Anuj Gupta
Saurabh Arora
(Freshdesk)
Agenda
1. Problem v 1.0
2. Solution
3. Issues
a. Drift
b. Evolving Vocab
c. Feedback loop
4. Problem v 2.0
5. Our Solution
a. Global
b. Local
c. glocal
d. Drift Detection
6. Local – pros and cons
7. Way Forward
8. Conclusion/takeaway
Problem Statement – v 1.0
• Build a spam filter for twitter
• Use case: In customer service, we listen to twitter on behalf of brands and figure out what is that
brands can respond to.
• Examples:
To filter spam from the actionable in real-time twitter stream of brands.
Twitter is noisy
There is ~65-70% noise in consumer-to-business communication
[and 100% noise in business-to-consumer ].
% of noise is only higher if you are big B2C company
Solution
• Model it as (binary) classification problem.
• Acquire good quality dataset.
• Engineer features – there are some very good indicators.
• Select an algorithm.
• Train-test-tune, ~85% accuracy.
• Deploy.
Actionable Spam
Paradise lost
In production the model started very well, however, as time* went by we found the running accuracy of our
model started falling down.
*within couple of weeks of deployment
• Our data was changing and changing fast.
Behind the Scene
Non-stationary distributions
A stationary process is time-independent  the averages remain more or less the constant.
This is also called drift – distribution generating the data changes over time.
• Vocabulary of our dataset was increasing.
o Unlike any other language - twitter vocabulary evolves faster, significantly faster.
Behind the Scene
• Not learning from mistakes: In our system, user (brand agent) has the option to tell the system know if
the classification done by the system is wrong.
• The model was not utilizing these signals to improve.
Behind the Scene
In Nutshell
• Based on last few slides, degradation (with time) in the prediction accuracies
of our model shouldn’t come as surprise.
• This is not just specific to twitter data. In general, these problems are likely
occur in following domains :
o Monitoring & Anomaly detection (one-class classification) in adversarial setting
o Recommendations (where the user preferences are continuously changing; evolving labels)
o Stock market predictions (concept drift; evolving distributions).
• Build a spam filter for twitter which can:
o Handle drift in data.
o Learn (and improve) using feedbacks.
o Handle fast evolving vocabulary.
Problem Statement – v 2.0
• Build a classifier which can:
o Handle drift in data.
o Learn (and improve) using feedbacks.
o Handle fast evolving vocabulary.
Possible Solutions
• Frequently retrain your model on the updated data and deploy the same.
o Training, testing, fine-tuning – lot of work. Doesn’t scale at all.
o Loose all old learnings
• Continuous Learning : Model adapts to the new incoming data.
What worked for us
Deep Learning Model
Batch trained
Large Corpus
No short term updates
Per-brand model
Fast learner
Instant feedback
Detect drift
Text Representation
• Preprocess the tweets – replace mentions,
hashtags, urls, emojis, dates, numbers,
currency by relevant constants. Remove
stop words.
• How good is your preprocessing ?
- ZIPF’s Law
• Given a large corpus, if t1, t2, t3 are the
most common term (ascending order) in
the corpus and cfi be the collection
frequency of the ith most common term,
then cfi a 1/i
Raw dataset - Zipf’s (mis)fit
Preprocessed dataset - Zipf’s fit
Text Representation
• Words Embedding:
o Use Google’s pre-trained word2vec model to replace a word by its corresponding embedding (300
dimensions).
o For a tweet, we average all the word embedding vectors for its constituent words.
o For missing words, we generate a random number between (-0.25, 0.25) for each of 300 dimensions. (Yann
LeCun 2014)
o Final representation:
Tweet = 300 dim vector of real numbers
● DeepNet
○ CNN
○ Trained over a corpus of ~8 million tweets
○ Of the shelf architecture gave us ~86% cv accuracy.
Global model
Local
• Goals
o Strictly improves with every feedback.
o Higher retention of older concepts
• Desired properties
o Online learner
o Fast learner; aggressive model update
 Incorporates (every last) Feedback successfully
(After model update, if the same data point is presented, it must correctly predict its class label.)
o Don’t forget recent i data points
(After model update, if the last N data point is presented, it must predict its class label with higher accuracy.)
Building feedback loop
ML model
<Tweet, Yp>
<Tweet, Y>
If Y ≠ Yp
● Reward/punish if the
prediction is right/wrong.
● For binary classification
problem, underlying
MDP is too small (2
states). Doesn’t learn
much.
Works fine if the velocity of
feedback data is high (don’t
have to wait long to accumulate
a mini-batch of feedbacks).
Many applications don’t have
high velocity.
Just 1 data point - can skew the
model
Reinforcement Learning mini-batches Instant feedback, tiny-
batches
Possible Approaches
Building feedback loop
• We model a feedback point <Tweet, Y> as a datapoint presented to local model
in online setting.
• Thus, a bunch of feedbacks = incoming data stream
• Thus, we use a Online Learner.
• Online method in ML:
Data is modeled as stream.
Model makes a prediction (y’), when presented with data point (x).
Environment reveals the correct class label (y)
If y ≠ y’, update the model.
Online Algorithms
http://scikit-learn.org/stable/auto_examples/linear_model/plot_sgd_comparison.html
You can try various on-line classifiers on
your dataset. We chose Crammer’s PA-II
as our local model.
• Dataset – 160K tweets from 2015, time sequenced
• Feedback incorporation improves accuracy:
o Trained (offline batch mode) model on first 100K data points.
o On test set (last 60k data points) it gave 74% accuracy (offline batch mode)
o Then ran the model on test data (50k data points) in online fashion
Model made a total 9028 mistakes.
These mistakes were instantaneously fed into the local model as feedback.
This gives a accuracy ~85 % across the test set.
○ We gained ~11% accuracy by incorporating feedback.
Results of Local :
PA-II parameter tuning
Improving accuracy
Its no fluke
We tested the local by feeding it with wrong feedbacks:
glokal : Ensembling global and local
• We use online stacking to ensemble our continuously adapting local and
erudite DeepNet model
• Outputs of the global and local go to an OnlineSVM.
• We train the ensemble in batch offline but continue to train it further on
feedback points in an online fashion.
• We get an cv accuracy of 82%
Global
Local
Online SVM
glocal
● Handle Drift
○ Periodically replace the model.
■ Shooting in the dark esp. when drifts are far and few
○ Find if a drift has indeed occurred or not
■ If it has, adapt to the changes.
■ 3 main algorithms:
● DDM (Gama et. al 2004)
● EDDM
● DDD
■ What about the old model - it knows the old concept, so keep it if the old distribution
lingers.
Last but not the Least
Handle Drift
We borrow Drift Detection Method (Gama et. al 2004)
Pros
• Improves running accuracy
• Personalization : The notion of spam varies from brand to brand. Some
brands treat ‘Hi’, ‘Hello’ as spam while some treat them as actionable.
The local model serves well as per user statistical model, thus brining in user
personalization. Thus, learning from feedback, the model adapts to the
notions of the brand.
• Its light weight, fast thus easy to boot-strap, deploy and scale.
Cons
• PA-II decision boundary is a hyper-plane that divides feature space into 2 half-
planes.
• Margin of the data point a distance b/w data point and the hyperplane.
• An update on the model results in new hyper plane to remain as close as
possible to the current one while achieving at least a unit margin on the most
recent data point.
• Thus, incorporating a feedback is nothing but shifting the hyperplane to a unit
margin on the feedback point.
• Lets see this visually.
Cons
• This shifting of hyperplane increases model’s accuracy on one class (correct
label of the feedback point) while decreases model’s accuracy on other class.
• To verify the above, split the test set into 2 chunks as per class. And run the
local only on 1 chunk. If the above hypothesis is true then:
• #feedbacks should be very small and only in the initial part of the data set
• The running accuracy should on increase.
• Changing the algorithm doesn’t help much – all online learning classifiers in current literature are linear
Way Forward
• Instead of modeling the problem as classification, model it as ranking
(Gmail’s priority inbox does this).
• Actionable tweets are high in ranking, spam tweets are low in ranking.
• Actionable vs Spam = finding a cut of in the ranking.
• Incorporating feedback = updating the algorithm to get a better ranking
without getting biased towards one class.
• This is a work in progress.
Take Home
• Incorporating feedback is an important step in improving your model’s
performance.
• Global + Local is a great way to introduce personalization in ML.
• PA-II does well as local provided your data is such that most data points are far
from the decision hyperplane.
• For domains where distributions are continuously evolving, handling drift is
must.
References
1. “Online Passive-Aggressive Algorithms” - Crammer et al., JMLR 2006
2. “The learning behind gmail priority inbox” – Aberdeen et al., LCCC: NIPS Workshop 2010
3. “Learning with drift detection” – Gama et al., BSAI 2004
4. Baena-Garcıa, Manuel, et al. "Early drift detection method." - Baena-Garcıa et al., IWKDSD, 2006
5. "DDD: A new ensemble approach for dealing with concept drift." - Minku et al., IEEE transactions (2012)
6. "Adaptive regularization of weight vectors." ” - Crammer et al., ANIPS 2009
7. Soft Confidence Weighted algorithms - Wang et al., 2012
8. LIBOL - A Library for Online Learning Algorithms. https://github.com/LIBOL/LIBOL
Thank You
Please feel free to reach out post this talk or on the interwebs.
@anujgupta82, @tanish2k
Anuj Gupta Saurabh Arora

More Related Content

What's hot

Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaAndre Pemmelaar
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Turi, Inc.
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningMadhu Sanjeevi (Mady)
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorRoelof Pieters
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in PythonImry Kissos
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep LearningAsim Jalis
 
Memory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringMemory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringAkram El-Korashy
 

What's hot (20)

Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Memory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question AnsweringMemory Networks, Neural Turing Machines, and Question Answering
Memory Networks, Neural Turing Machines, and Question Answering
 

Viewers also liked

Continuous Learning Algorithms - a Research Proposal Paper
Continuous Learning Algorithms - a Research Proposal PaperContinuous Learning Algorithms - a Research Proposal Paper
Continuous Learning Algorithms - a Research Proposal Papertjb910
 
Continuous Learning
Continuous LearningContinuous Learning
Continuous LearningDashlane
 
Creating competitive advantage
Creating competitive advantageCreating competitive advantage
Creating competitive advantageShanskrite Eshita
 
25 Biggest Company and Product Failures
25 Biggest Company and Product Failures25 Biggest Company and Product Failures
25 Biggest Company and Product FailuresJesse Daniel
 
Lean Analytics Cycle
Lean Analytics CycleLean Analytics Cycle
Lean Analytics CycleHiten Shah
 

Viewers also liked (7)

Continuous Learning Algorithms - a Research Proposal Paper
Continuous Learning Algorithms - a Research Proposal PaperContinuous Learning Algorithms - a Research Proposal Paper
Continuous Learning Algorithms - a Research Proposal Paper
 
BCG Creating People Advantage (2008)
BCG Creating People Advantage (2008)BCG Creating People Advantage (2008)
BCG Creating People Advantage (2008)
 
Continuous Learning
Continuous LearningContinuous Learning
Continuous Learning
 
Creating competitive advantage
Creating competitive advantageCreating competitive advantage
Creating competitive advantage
 
25 Biggest Company and Product Failures
25 Biggest Company and Product Failures25 Biggest Company and Product Failures
25 Biggest Company and Product Failures
 
Lean Analytics Cycle
Lean Analytics CycleLean Analytics Cycle
Lean Analytics Cycle
 
Big Brand Failures
Big Brand FailuresBig Brand Failures
Big Brand Failures
 

Similar to Continuous Online Learners

ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsAnuj Gupta
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesAnuj Gupta
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...Edge AI and Vision Alliance
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilitiesAllan D. Butler
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Building Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLBuilding Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLsparktc
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
 

Similar to Continuous Online Learners (20)

ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systems
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakes
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Building Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLBuilding Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemML
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 

More from Anuj Gupta

Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisAnuj Gupta
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLPAnuj Gupta
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Representation Learning for NLP
Representation Learning for NLPRepresentation Learning for NLP
Representation Learning for NLPAnuj Gupta
 

More from Anuj Gupta (8)

Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
 
Representation Learning for NLP
Representation Learning for NLPRepresentation Learning for NLP
Representation Learning for NLP
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

Continuous Online Learners

  • 2. Agenda 1. Problem v 1.0 2. Solution 3. Issues a. Drift b. Evolving Vocab c. Feedback loop 4. Problem v 2.0 5. Our Solution a. Global b. Local c. glocal d. Drift Detection 6. Local – pros and cons 7. Way Forward 8. Conclusion/takeaway
  • 3. Problem Statement – v 1.0 • Build a spam filter for twitter • Use case: In customer service, we listen to twitter on behalf of brands and figure out what is that brands can respond to. • Examples: To filter spam from the actionable in real-time twitter stream of brands.
  • 4. Twitter is noisy There is ~65-70% noise in consumer-to-business communication [and 100% noise in business-to-consumer ]. % of noise is only higher if you are big B2C company
  • 5. Solution • Model it as (binary) classification problem. • Acquire good quality dataset. • Engineer features – there are some very good indicators. • Select an algorithm. • Train-test-tune, ~85% accuracy. • Deploy. Actionable Spam
  • 6. Paradise lost In production the model started very well, however, as time* went by we found the running accuracy of our model started falling down. *within couple of weeks of deployment
  • 7. • Our data was changing and changing fast. Behind the Scene Non-stationary distributions A stationary process is time-independent  the averages remain more or less the constant. This is also called drift – distribution generating the data changes over time.
  • 8. • Vocabulary of our dataset was increasing. o Unlike any other language - twitter vocabulary evolves faster, significantly faster. Behind the Scene
  • 9.
  • 10. • Not learning from mistakes: In our system, user (brand agent) has the option to tell the system know if the classification done by the system is wrong. • The model was not utilizing these signals to improve. Behind the Scene
  • 11. In Nutshell • Based on last few slides, degradation (with time) in the prediction accuracies of our model shouldn’t come as surprise. • This is not just specific to twitter data. In general, these problems are likely occur in following domains : o Monitoring & Anomaly detection (one-class classification) in adversarial setting o Recommendations (where the user preferences are continuously changing; evolving labels) o Stock market predictions (concept drift; evolving distributions).
  • 12. • Build a spam filter for twitter which can: o Handle drift in data. o Learn (and improve) using feedbacks. o Handle fast evolving vocabulary. Problem Statement – v 2.0 • Build a classifier which can: o Handle drift in data. o Learn (and improve) using feedbacks. o Handle fast evolving vocabulary.
  • 13. Possible Solutions • Frequently retrain your model on the updated data and deploy the same. o Training, testing, fine-tuning – lot of work. Doesn’t scale at all. o Loose all old learnings • Continuous Learning : Model adapts to the new incoming data.
  • 14. What worked for us Deep Learning Model Batch trained Large Corpus No short term updates Per-brand model Fast learner Instant feedback Detect drift
  • 15. Text Representation • Preprocess the tweets – replace mentions, hashtags, urls, emojis, dates, numbers, currency by relevant constants. Remove stop words. • How good is your preprocessing ? - ZIPF’s Law • Given a large corpus, if t1, t2, t3 are the most common term (ascending order) in the corpus and cfi be the collection frequency of the ith most common term, then cfi a 1/i
  • 16. Raw dataset - Zipf’s (mis)fit
  • 17. Preprocessed dataset - Zipf’s fit
  • 18. Text Representation • Words Embedding: o Use Google’s pre-trained word2vec model to replace a word by its corresponding embedding (300 dimensions). o For a tweet, we average all the word embedding vectors for its constituent words. o For missing words, we generate a random number between (-0.25, 0.25) for each of 300 dimensions. (Yann LeCun 2014) o Final representation: Tweet = 300 dim vector of real numbers
  • 19. ● DeepNet ○ CNN ○ Trained over a corpus of ~8 million tweets ○ Of the shelf architecture gave us ~86% cv accuracy. Global model
  • 20. Local • Goals o Strictly improves with every feedback. o Higher retention of older concepts • Desired properties o Online learner o Fast learner; aggressive model update  Incorporates (every last) Feedback successfully (After model update, if the same data point is presented, it must correctly predict its class label.) o Don’t forget recent i data points (After model update, if the last N data point is presented, it must predict its class label with higher accuracy.)
  • 21. Building feedback loop ML model <Tweet, Yp> <Tweet, Y> If Y ≠ Yp
  • 22. ● Reward/punish if the prediction is right/wrong. ● For binary classification problem, underlying MDP is too small (2 states). Doesn’t learn much. Works fine if the velocity of feedback data is high (don’t have to wait long to accumulate a mini-batch of feedbacks). Many applications don’t have high velocity. Just 1 data point - can skew the model Reinforcement Learning mini-batches Instant feedback, tiny- batches Possible Approaches
  • 23. Building feedback loop • We model a feedback point <Tweet, Y> as a datapoint presented to local model in online setting. • Thus, a bunch of feedbacks = incoming data stream • Thus, we use a Online Learner. • Online method in ML: Data is modeled as stream. Model makes a prediction (y’), when presented with data point (x). Environment reveals the correct class label (y) If y ≠ y’, update the model.
  • 24. Online Algorithms http://scikit-learn.org/stable/auto_examples/linear_model/plot_sgd_comparison.html You can try various on-line classifiers on your dataset. We chose Crammer’s PA-II as our local model.
  • 25. • Dataset – 160K tweets from 2015, time sequenced • Feedback incorporation improves accuracy: o Trained (offline batch mode) model on first 100K data points. o On test set (last 60k data points) it gave 74% accuracy (offline batch mode) o Then ran the model on test data (50k data points) in online fashion Model made a total 9028 mistakes. These mistakes were instantaneously fed into the local model as feedback. This gives a accuracy ~85 % across the test set. ○ We gained ~11% accuracy by incorporating feedback. Results of Local :
  • 28. Its no fluke We tested the local by feeding it with wrong feedbacks:
  • 29. glokal : Ensembling global and local • We use online stacking to ensemble our continuously adapting local and erudite DeepNet model • Outputs of the global and local go to an OnlineSVM. • We train the ensemble in batch offline but continue to train it further on feedback points in an online fashion. • We get an cv accuracy of 82% Global Local Online SVM glocal
  • 30.
  • 31. ● Handle Drift ○ Periodically replace the model. ■ Shooting in the dark esp. when drifts are far and few ○ Find if a drift has indeed occurred or not ■ If it has, adapt to the changes. ■ 3 main algorithms: ● DDM (Gama et. al 2004) ● EDDM ● DDD ■ What about the old model - it knows the old concept, so keep it if the old distribution lingers. Last but not the Least
  • 32. Handle Drift We borrow Drift Detection Method (Gama et. al 2004)
  • 33. Pros • Improves running accuracy • Personalization : The notion of spam varies from brand to brand. Some brands treat ‘Hi’, ‘Hello’ as spam while some treat them as actionable. The local model serves well as per user statistical model, thus brining in user personalization. Thus, learning from feedback, the model adapts to the notions of the brand. • Its light weight, fast thus easy to boot-strap, deploy and scale.
  • 34. Cons • PA-II decision boundary is a hyper-plane that divides feature space into 2 half- planes. • Margin of the data point a distance b/w data point and the hyperplane. • An update on the model results in new hyper plane to remain as close as possible to the current one while achieving at least a unit margin on the most recent data point. • Thus, incorporating a feedback is nothing but shifting the hyperplane to a unit margin on the feedback point. • Lets see this visually.
  • 35. Cons • This shifting of hyperplane increases model’s accuracy on one class (correct label of the feedback point) while decreases model’s accuracy on other class. • To verify the above, split the test set into 2 chunks as per class. And run the local only on 1 chunk. If the above hypothesis is true then: • #feedbacks should be very small and only in the initial part of the data set • The running accuracy should on increase.
  • 36. • Changing the algorithm doesn’t help much – all online learning classifiers in current literature are linear
  • 37. Way Forward • Instead of modeling the problem as classification, model it as ranking (Gmail’s priority inbox does this). • Actionable tweets are high in ranking, spam tweets are low in ranking. • Actionable vs Spam = finding a cut of in the ranking. • Incorporating feedback = updating the algorithm to get a better ranking without getting biased towards one class. • This is a work in progress.
  • 38. Take Home • Incorporating feedback is an important step in improving your model’s performance. • Global + Local is a great way to introduce personalization in ML. • PA-II does well as local provided your data is such that most data points are far from the decision hyperplane. • For domains where distributions are continuously evolving, handling drift is must.
  • 39. References 1. “Online Passive-Aggressive Algorithms” - Crammer et al., JMLR 2006 2. “The learning behind gmail priority inbox” – Aberdeen et al., LCCC: NIPS Workshop 2010 3. “Learning with drift detection” – Gama et al., BSAI 2004 4. Baena-Garcıa, Manuel, et al. "Early drift detection method." - Baena-Garcıa et al., IWKDSD, 2006 5. "DDD: A new ensemble approach for dealing with concept drift." - Minku et al., IEEE transactions (2012) 6. "Adaptive regularization of weight vectors." ” - Crammer et al., ANIPS 2009 7. Soft Confidence Weighted algorithms - Wang et al., 2012 8. LIBOL - A Library for Online Learning Algorithms. https://github.com/LIBOL/LIBOL
  • 40. Thank You Please feel free to reach out post this talk or on the interwebs. @anujgupta82, @tanish2k Anuj Gupta Saurabh Arora

Editor's Notes

  1. Data points are often non-stationary or have means, variances and covariances that change over time. Non-stationary behaviors can be trends, cycles, random walks or combinations of the three.
  2.  if t_1, t_2, t_3 are the most common term (ascending order) in the corpus and cf_i be the collection frequency of the i^th most common term, then cf_i is proportional to 1/i