SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
An introduction to weakly supervised learning.
Best practices.
Kristina Khvatova
Software Developer
Softec S.p.A.
Master in Computer Science and Applied Mathematics
Saint-Petersburg State University
Master in Computer Science and Data Analysis
Milano-Bicocca University
kristina.a.khvatova@gmail.com
https://www.linkedin.com/in/kristina-khvatova-a529b21
Overview
➢ Introduction to weak supervision
➢ Three types of weakly supervised learning:
● incomplete
● inexact
● inaccurate
➢ Snorkel
➢ Brexit tweets classification with weak supervised learning
4

Problem Definition
5

Weak supervision
Weak supervision is the technique
of building models based on new
generated data.
Types:
- incomplete
- inexact
- inaccurate
6

Incomplete weak supervision
● Active learning
● Semi - supervised learning
7
Active learning
Incomplete weak supervision
● High accuracy
● Low costs
8
Active learning
Incomplete weak supervision
High costs for the project
and high precision (90%)
Decrease costs and
precision of the project
(70%)
Cost of query labels is
the same as in (b), but
the precision is much
more higher (90%) the
same as (a)
9

Semi-supervised learning
Incomplete weak supervision
1
0
Generative models Label propagation TSVM
Semi-supervised learning
Incomplete weak supervision
1
1
Inexact weak supervision
1
2
Inexact weak supervision
“Is object localization for free? – Weakly-supervised learning with convolutional neural networks.” (CVPR2015)
1
3
Inaccurate weak supervision
http://www.scholarpedia.org/article/Ensemble_learning
1
4
Snorkel: The System for Programmatically
Building and Managing Training Data
Snorkel is a system for programmatically building and managing training datasets to rapidly and flexibly fuel machine
learning models.
● Data Programming with DDLite: Putting Humans in a Different Part of the Loop (June 2016)
● Conversational agents at IBM: Bootstrapping Conversational Agents With Weak Supervision (AAAI 2019)
● Web content & event classification at Google: Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial
Scale (SIGMOD Industry 2019), and Google AI blog post
● Business intelligence at Intel: Osprey: Non-Programmer Weak Supervision of Imbalanced Extraction Problems (SIGMOD
DEEM 2019)
● Anti-semitic tweet classification w/ Snorkel + transfer learning.
● Clinical text classification: A clinical text classification paradigm using weak supervision and deep representation (BMC
MIDM 2019)
● Social media text mining: Deep Text Mining of Instagram Data without Strong Supervision (ICWI 2018)
● Cardiac MRI classification with Stanford Medicine: Weakly supervised classification of rare aortic valve malformations
using unlabeled cardiac MRI sequences (BioArxiv 2018)
● Medical image triaging at Stanford Radiology: Cross-Modal Data Programming for Medical Images (NeurIPS ML4H 2017)
● GWAS KBC with Stanford Genomics: A Machine-Compiled Database of Genome-Wide Association Studies (NeurIPS
ML4H 2016)
1
5
Weak Supervision Formulation
A high-level schematic of the basic weak supervision “pipeline”: We start with one or more weak supervision sources:
crowdsourced data, heuristic rules, distant supervision, and/or weak classifiers provided by a subject matter expert. The
core technical challenge is to unify and model these sources. Then, this must be used to train the end model.
https://dawn.cs.stanford.edu/2017/07/16/weak-supervision/
1
6
Snorkel: data programming
“Prime Minister Lee
Hsien Loong and his wife
Ho Ching leave a polling
station after casting their
votes in Singapore”
(NYTimes.com)
“Prime Minister Lee
Hsien Loong and his wife
Ho Ching leave a polling
station after casting their
votes in Singapore”
(NYTimes.com)
1
7
Demo: Step-By-Step Guide for Building a
Brexit Tweet Classifier
https://github.com/HazyResearch/snorkel
https://github.com/HazyResearch/metal
1
8
Demo: Step-By-Step Guide for Building a
Brexit Tweet Classifier
➔ Collecting unlabeled data: 3184
(tweets that contain #Brexit)
➔ Label 500 examples: 250 - ‘leave’, 250 - ‘stay’
➔ Create 5 LFs, apply on 2684 unlabeled tweets.
“Predicting Brexit:Classifying Agreement is Better than Sentiment and Pollsters”
1
9
Demo: Step-By-Step Guide for Building a
Brexit Tweet Classifier
Safer In #EU? No! No! No! Terrorists want
the UK to STAY Remember 7/7 Paris
#EUreferendum #VoteLeave
#Liverpool have broke the #Spanish
dominance in Europe... #English #football
says Yes We Belong in #Europe! #Stay
#strongerin
Tweet Label functions
“Predicting Brexit:Classifying Agreement is Better than Sentiment and Pollsters”
@StrongerIn so if we stay in eu that means we get more zero hours contracts and employers can say 'we dont need to
now, fuck off' 󾓪 #TakeControl #VoteLeave
2
0
Result: Brexit Tweet Classifier
Tweet Classifier on 500 labeled examples Tweet Classifier with Snorkel
LR ACCURACY: 0.52 LR ACCURACY: 0.78
Summary
➢ Weak supervision
■ incomplete
■ inexact
■ inaccurate
➢ Snorkel and Snorkel metal
➢ Demo application: Brexit Tweet Classifier

Contenu connexe

Tendances

A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper reviewtaeseon ryu
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...Virot "Ta" Chiraphadhanakul
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classificationWael Badawy
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 

Tendances (20)

A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Densenet CNN
Densenet CNNDensenet CNN
Densenet CNN
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 

Similaire à Weak supervised learning - Kristina Khvatova

Correctness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up SeattleCorrectness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up SeattleDomino Data Lab
 
Trustworthy Computational Science: Lessons Learned and Next Steps
Trustworthy Computational Science: Lessons Learned and Next StepsTrustworthy Computational Science: Lessons Learned and Next Steps
Trustworthy Computational Science: Lessons Learned and Next StepsVon Welch
 
IntroML_3_DataVisualisation
IntroML_3_DataVisualisationIntroML_3_DataVisualisation
IntroML_3_DataVisualisationElio Laureano
 
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengePurdue RCODI
 
From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?Ling-Jyh Chen
 
A Feedback Systems Approach to Digital Transformation
A Feedback Systems Approach to Digital TransformationA Feedback Systems Approach to Digital Transformation
A Feedback Systems Approach to Digital TransformationMichael von Kutzschenbach
 
The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...
The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...
The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...Heather Miller
 
Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningMirXahid1
 
Some New Directions in the Economics of AI
Some New Directions in the Economics of AISome New Directions in the Economics of AI
Some New Directions in the Economics of AIJuan Mateos-Garcia
 
2019_11_21 世新大學|資料視覺化課程
2019_11_21 世新大學|資料視覺化課程2019_11_21 世新大學|資料視覺化課程
2019_11_21 世新大學|資料視覺化課程彭其捷 Jack
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfArmyTrilidiaDevegaSK
 
Good Riddance: Academic Publishers are Abandoning Publishing
Good Riddance: Academic Publishers are Abandoning PublishingGood Riddance: Academic Publishers are Abandoning Publishing
Good Riddance: Academic Publishers are Abandoning PublishingBjörn Brembs
 
Reveal the Security Risks in the software Development Lifecycle Meetup 060320...
Reveal the Security Risks in the software Development Lifecycle Meetup 060320...Reveal the Security Risks in the software Development Lifecycle Meetup 060320...
Reveal the Security Risks in the software Development Lifecycle Meetup 060320...lior mazor
 
Exploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeExploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeProduct School
 
What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century Human Capital Media
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Chris Hammerschmidt
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 

Similaire à Weak supervised learning - Kristina Khvatova (20)

Correctness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up SeattleCorrectness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up Seattle
 
Gutenberg H4D Stanford 2019
Gutenberg H4D Stanford 2019Gutenberg H4D Stanford 2019
Gutenberg H4D Stanford 2019
 
Trustworthy Computational Science: Lessons Learned and Next Steps
Trustworthy Computational Science: Lessons Learned and Next StepsTrustworthy Computational Science: Lessons Learned and Next Steps
Trustworthy Computational Science: Lessons Learned and Next Steps
 
IntroML_3_DataVisualisation
IntroML_3_DataVisualisationIntroML_3_DataVisualisation
IntroML_3_DataVisualisation
 
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
 
From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?From AirBox to Smart City: where are we and what's next?
From AirBox to Smart City: where are we and what's next?
 
A Feedback Systems Approach to Digital Transformation
A Feedback Systems Approach to Digital TransformationA Feedback Systems Approach to Digital Transformation
A Feedback Systems Approach to Digital Transformation
 
The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...
The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...
The Times They Are a-Changin’: A Data-Driven Portrait of New Trends in How We...
 
Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learning
 
Some New Directions in the Economics of AI
Some New Directions in the Economics of AISome New Directions in the Economics of AI
Some New Directions in the Economics of AI
 
2019_11_21 世新大學|資料視覺化課程
2019_11_21 世新大學|資料視覺化課程2019_11_21 世新大學|資料視覺化課程
2019_11_21 世新大學|資料視覺化課程
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Good Riddance: Academic Publishers are Abandoning Publishing
Good Riddance: Academic Publishers are Abandoning PublishingGood Riddance: Academic Publishers are Abandoning Publishing
Good Riddance: Academic Publishers are Abandoning Publishing
 
On data literacy by Marek Danis
On data literacy by Marek Danis On data literacy by Marek Danis
On data literacy by Marek Danis
 
Reveal the Security Risks in the software Development Lifecycle Meetup 060320...
Reveal the Security Risks in the software Development Lifecycle Meetup 060320...Reveal the Security Risks in the software Development Lifecycle Meetup 060320...
Reveal the Security Risks in the software Development Lifecycle Meetup 060320...
 
Exploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeExploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks Like
 
What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 

Plus de Data Science Milan

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital paymentsData Science Milan
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansData Science Milan
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIData Science Milan
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraData Science Milan
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraData Science Milan
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AIData Science Milan
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...Data Science Milan
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharData Science Milan
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoData Science Milan
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep LearningData Science Milan
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Data Science Milan
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...Data Science Milan
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyData Science Milan
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...Data Science Milan
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by CervedData Science Milan
 

Plus de Data Science Milan (20)

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital payments
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plans
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AI
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del Pra
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex Honchar
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
 

Dernier

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Dernier (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Weak supervised learning - Kristina Khvatova

  • 1. An introduction to weakly supervised learning. Best practices.
  • 2. Kristina Khvatova Software Developer Softec S.p.A. Master in Computer Science and Applied Mathematics Saint-Petersburg State University Master in Computer Science and Data Analysis Milano-Bicocca University kristina.a.khvatova@gmail.com https://www.linkedin.com/in/kristina-khvatova-a529b21
  • 3. Overview ➢ Introduction to weak supervision ➢ Three types of weakly supervised learning: ● incomplete ● inexact ● inaccurate ➢ Snorkel ➢ Brexit tweets classification with weak supervised learning
  • 5. 5  Weak supervision Weak supervision is the technique of building models based on new generated data. Types: - incomplete - inexact - inaccurate
  • 6. 6  Incomplete weak supervision ● Active learning ● Semi - supervised learning
  • 7. 7 Active learning Incomplete weak supervision ● High accuracy ● Low costs
  • 8. 8 Active learning Incomplete weak supervision High costs for the project and high precision (90%) Decrease costs and precision of the project (70%) Cost of query labels is the same as in (b), but the precision is much more higher (90%) the same as (a)
  • 10. 1 0 Generative models Label propagation TSVM Semi-supervised learning Incomplete weak supervision
  • 12. 1 2 Inexact weak supervision “Is object localization for free? – Weakly-supervised learning with convolutional neural networks.” (CVPR2015)
  • 14. 1 4 Snorkel: The System for Programmatically Building and Managing Training Data Snorkel is a system for programmatically building and managing training datasets to rapidly and flexibly fuel machine learning models. ● Data Programming with DDLite: Putting Humans in a Different Part of the Loop (June 2016) ● Conversational agents at IBM: Bootstrapping Conversational Agents With Weak Supervision (AAAI 2019) ● Web content & event classification at Google: Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (SIGMOD Industry 2019), and Google AI blog post ● Business intelligence at Intel: Osprey: Non-Programmer Weak Supervision of Imbalanced Extraction Problems (SIGMOD DEEM 2019) ● Anti-semitic tweet classification w/ Snorkel + transfer learning. ● Clinical text classification: A clinical text classification paradigm using weak supervision and deep representation (BMC MIDM 2019) ● Social media text mining: Deep Text Mining of Instagram Data without Strong Supervision (ICWI 2018) ● Cardiac MRI classification with Stanford Medicine: Weakly supervised classification of rare aortic valve malformations using unlabeled cardiac MRI sequences (BioArxiv 2018) ● Medical image triaging at Stanford Radiology: Cross-Modal Data Programming for Medical Images (NeurIPS ML4H 2017) ● GWAS KBC with Stanford Genomics: A Machine-Compiled Database of Genome-Wide Association Studies (NeurIPS ML4H 2016)
  • 15. 1 5 Weak Supervision Formulation A high-level schematic of the basic weak supervision “pipeline”: We start with one or more weak supervision sources: crowdsourced data, heuristic rules, distant supervision, and/or weak classifiers provided by a subject matter expert. The core technical challenge is to unify and model these sources. Then, this must be used to train the end model. https://dawn.cs.stanford.edu/2017/07/16/weak-supervision/
  • 16. 1 6 Snorkel: data programming “Prime Minister Lee Hsien Loong and his wife Ho Ching leave a polling station after casting their votes in Singapore” (NYTimes.com) “Prime Minister Lee Hsien Loong and his wife Ho Ching leave a polling station after casting their votes in Singapore” (NYTimes.com)
  • 17. 1 7 Demo: Step-By-Step Guide for Building a Brexit Tweet Classifier https://github.com/HazyResearch/snorkel https://github.com/HazyResearch/metal
  • 18. 1 8 Demo: Step-By-Step Guide for Building a Brexit Tweet Classifier ➔ Collecting unlabeled data: 3184 (tweets that contain #Brexit) ➔ Label 500 examples: 250 - ‘leave’, 250 - ‘stay’ ➔ Create 5 LFs, apply on 2684 unlabeled tweets. “Predicting Brexit:Classifying Agreement is Better than Sentiment and Pollsters”
  • 19. 1 9 Demo: Step-By-Step Guide for Building a Brexit Tweet Classifier Safer In #EU? No! No! No! Terrorists want the UK to STAY Remember 7/7 Paris #EUreferendum #VoteLeave #Liverpool have broke the #Spanish dominance in Europe... #English #football says Yes We Belong in #Europe! #Stay #strongerin Tweet Label functions “Predicting Brexit:Classifying Agreement is Better than Sentiment and Pollsters” @StrongerIn so if we stay in eu that means we get more zero hours contracts and employers can say 'we dont need to now, fuck off' 󾓪 #TakeControl #VoteLeave
  • 20. 2 0 Result: Brexit Tweet Classifier Tweet Classifier on 500 labeled examples Tweet Classifier with Snorkel LR ACCURACY: 0.52 LR ACCURACY: 0.78
  • 21. Summary ➢ Weak supervision ■ incomplete ■ inexact ■ inaccurate ➢ Snorkel and Snorkel metal ➢ Demo application: Brexit Tweet Classifier