SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Applications of Multimodal
Learning in media search
engines
Dmitry Voitekh
Proxet
1
Media content search. Information retrieval problem
To perform search on media content (gifs, images, videos) one can’t simply use
original files (set of pixels, frames etc), since they cannot be efficiently indexed
2
Media content search. Information retrieval problem
Usually media documents are converted into more compressed representations
(textual or vectorized) for which various known search strategies can be applied.
Search = Content + Candidate Generation + Ranking
3
Media search engines. Textual representations
Media content can be converted into textual data via the following approaches:
1) OCR (Optical Character Recognition)
2) ASR (Automatic Speech Recognition)
3) Tags annotation (either manual or automatic via ML model)
4) Video summarization models
4
Search can be organized in one of the following ways:
1) Full-text search solutions to rank generated text documents for the given
search query
2) Train an LTR (Learning To Rank) model that predicts relevancy for each pair
(text document, search query). Training dataset is needed!
Media search engines. Textual representations
5
Issues with “textual” approach:
1) Visual (or audio) signal cannot be converted into text without information loss
(discretization problem)
2) To better represent the content, various models/signals should be used =>
more complicated system
Media search engines. Textual representations
6
Images and videos can be converted into meaningful and efficiently compressed
vector representations via CV models.
We can build similarity index of all documents, perform clustering to group
documents into categories that can be used for search etc.
Media search engines. Vector representations
7
NLP models can be used to represent search query.
To match search query against documents:
1) LTR - predict relevance for the given pair of vectors
2) Mapper model - fuse both search query and document vectors into a single
vector space
Media search engines. Vector representations
8
Dataset
Pairs (document, search query) with relevance scores.
1) Manual annotation (e.g. via crowdsourcing job)
a) Takes time to collect
b) Can be expensive because should cover large part of the search space
2) Online. Based on engagement data (logged events)
a) Approximates relevance with some noise
b) Having substantial traffic, large and diverse dataset can be built on a periodic
basis (trends, seasonality)
9
Case Study. Gifs platform
10
Dataset. Engagement data (training/validation)
Billions of anonymized events per day
are logged to capture:
1) views
2) clicks
3) shares
4) favourites
for each gif and search query.
Can be grouped into “sessions” by utilizing
client-specific details
11
“Sessions” can be unfolded into sequences of gifs clicked by each user:
session 1: gif_1, gif_2, gif_3
...
Or we can incorporate both search queries and gifs:
session 1: hello, gif_1, gif_2, good_morning, gif_3
...
Dataset. Engagement data (training/validation)
12
To address positional bias for different grids:
1) shuffling of search results for a small percentage
of traffic
2) probabilistic modeling based on hierarchical
pooling to estimate positional bias effect on CTR
For content safety: both search queries and gifs
datasets are filtered via maintained blacklists and
nsfw models
Dataset. Engagement data (training/validation)
13
Human judgements obtained via
crowdsourcing tasks that estimate:
1) query-gif relevance
2) gif-gif relevance
● Complex relevance criteria defined by
business
● Rarely updated and relatively compact
Dataset. Manually labeled (benchmark)
14
Metric - % of triplets for which
(anchor, positive) relevancy >
(anchor, negative) relevancy
Dataset. Manually labeled (benchmark)
Triplets dataset (anchor, positive, negative)
OR
15
MVP. Gifs embeddings for Recommender System.
Train Gensim Skip-Gram model only on gifs:
session 1: gif_1, gif_2, gif_3
, where gif_* is an identifier of a gif that was clicked during a session.
For inference: kNN search in the embedding space (nmslib).
Baseline. Word2Vec model
16
V1. Joint embeddings for search queries and gifs:
session 1: query_1, gif_1, gif_2, query_2, query_3
...
, where query_* - identifier of a search query issued by a user,
and gif_* - identifier of a gif that was clicked during a session
Baseline. Word2Vec model
17
18
Baseline. Word2Vec model
Pros: Search queries and gifs in a single space. Also, gifs’ tags can be
incorporated. Applications:
1) Search (query -> relevant gifs)
2) Recommender System (gif -> relevant gifs)
3) Tags Suggestion (query -> relevant tags)
Cons: Identifiers (not gif/query content) are used => cold start problem
The less frequent is the identifier, the less accurately it is positioned in the
embedding space
19
Baseline. Word2Vec model
Search prototype
20
Baseline. Word2Vec model
Tag Suggestion for gifs
21
Baseline. Word2Vec model
Search. Implicit usage. Features for ElasticSearch
1) Query Expansion
love you to the moon and back => love, adore you, couple, happy
2) Tag Suggestion for gifs
gif_1 => love, happy, couple
Results:
+ 10% CTR relative change
22
Baseline. Word2Vec model
Recommender System. kNN index
+ 9% CTR relative change compared to MVP version
23
Baseline. Word2Vec model
Tag Suggestion. kNN index
+ 40% CTR relative change compared to previous version
24
Cold start. Part 1. StarSpace
Extend search query with identifiers of its word n-grams:
how_are_you_id, gif_1, doing_good_id, gif_2
becomes:
how_are_you_id, how_id, are_id, you_id, gif_1, doing_good_id, doing_id, good_id, gif_2
● Model additionally learns to compare word n-grams with document identifiers
● Unseen search query vector = average of available tokens’ vectors
25
Cold start. Part 2. Word2Vec + BERT
Take pre-trained BERT model and fine-tune it jointly with Word2Vec
BERT learns mapping from search query tokens to Word2Vec gifs space
Cold start problem is solved for queries, but is still an issue for gifs ;(
26
The key point is that we haven’t really
utilized gif data (e.g. visual representation,
tags etc) yet.
What if we extend the approach like
BERT+Word2Vec to all available signals?
Mixture of Embedding Experts
27
https://arxiv.org/pdf/1804.02516.pdf
28
29
30
We still have the same unified embedding space, but without the cold start
problem
Leverage all available gifs metadata:
1) Visual representation
2) Tags representation
3) OCR representation
Mixture of Embedding Experts
31
Bonus. Expand a search query
32
33
34
Summary
1) Embeddings are great for various IR tasks
2) The ideal application is a candidate generation step
3) Start with a simple baseline with recall as high as possible
4) Wise collection of implicit users’ feedback is a vital part of good embeddings
5) Use human-verified datasets for benchmarks
6) The more data sources you have, the better is the quality of representations
35
1) Word2Vec illustration: http://jalammar.github.io/illustrated-word2vec
2) nmslib. Efficient aNN search: https://github.com/nmslib/nmslib
3) Starspace for space fusion: https://github.com/facebookresearch/StarSpace
4) DSSM: https://www.microsoft.com/en-us/research/project/dssm
5) Pinterest multimodal learning:
https://labs.pinterest.com/user/themes/pin_labs/assets/paper/training-and-evaluating.pdf
6) Mixture of embedding experts: https://arxiv.org/pdf/1804.02516.pdf
Links
36

Contenu connexe

Similaire à Dmitry Voitekh "Applications of Multimodal Learning in media search engines"

Similaire à Dmitry Voitekh "Applications of Multimodal Learning in media search engines" (20)

Text Mining with Automatic Annotation from Unstructured Content
Text Mining with Automatic Annotation from Unstructured ContentText Mining with Automatic Annotation from Unstructured Content
Text Mining with Automatic Annotation from Unstructured Content
 
IRJET- Image Seeker:Finding Similar Images
IRJET- Image Seeker:Finding Similar ImagesIRJET- Image Seeker:Finding Similar Images
IRJET- Image Seeker:Finding Similar Images
 
System analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systemsSystem analysis and design for multimedia retrieval systems
System analysis and design for multimedia retrieval systems
 
Analysing image collections with the computer vision network approach
Analysing image collections with  the computer vision network approachAnalysing image collections with  the computer vision network approach
Analysing image collections with the computer vision network approach
 
IRJET - Content based Image Classification
IRJET -  	  Content based Image ClassificationIRJET -  	  Content based Image Classification
IRJET - Content based Image Classification
 
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
 
final ppt.pptx
final ppt.pptxfinal ppt.pptx
final ppt.pptx
 
final ppt.pptx
final ppt.pptxfinal ppt.pptx
final ppt.pptx
 
How to build your in-house ChatGPT
How to build your in-house ChatGPT How to build your in-house ChatGPT
How to build your in-house ChatGPT
 
Image Tagging
Image TaggingImage Tagging
Image Tagging
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLP
 
Imagically Image Forensic Tool
Imagically Image Forensic ToolImagically Image Forensic Tool
Imagically Image Forensic Tool
 
Web crawler with email extractor and image extractor
Web crawler with email extractor and image extractorWeb crawler with email extractor and image extractor
Web crawler with email extractor and image extractor
 
Automatic Visual Concept Detection in Videos: Review
Automatic Visual Concept Detection in Videos: ReviewAutomatic Visual Concept Detection in Videos: Review
Automatic Visual Concept Detection in Videos: Review
 
Paper 153
Paper 153Paper 153
Paper 153
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
 
IRJET- Foster Hashtag from Image and Text
IRJET-  	  Foster Hashtag from Image and TextIRJET-  	  Foster Hashtag from Image and Text
IRJET- Foster Hashtag from Image and Text
 
Privacy Preserving Mining in Code Profiling Data
Privacy Preserving Mining in Code Profiling DataPrivacy Preserving Mining in Code Profiling Data
Privacy Preserving Mining in Code Profiling Data
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
 

Plus de Fwdays

Plus de Fwdays (20)

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Dmitry Voitekh "Applications of Multimodal Learning in media search engines"

  • 1. Applications of Multimodal Learning in media search engines Dmitry Voitekh Proxet 1
  • 2. Media content search. Information retrieval problem To perform search on media content (gifs, images, videos) one can’t simply use original files (set of pixels, frames etc), since they cannot be efficiently indexed 2
  • 3. Media content search. Information retrieval problem Usually media documents are converted into more compressed representations (textual or vectorized) for which various known search strategies can be applied. Search = Content + Candidate Generation + Ranking 3
  • 4. Media search engines. Textual representations Media content can be converted into textual data via the following approaches: 1) OCR (Optical Character Recognition) 2) ASR (Automatic Speech Recognition) 3) Tags annotation (either manual or automatic via ML model) 4) Video summarization models 4
  • 5. Search can be organized in one of the following ways: 1) Full-text search solutions to rank generated text documents for the given search query 2) Train an LTR (Learning To Rank) model that predicts relevancy for each pair (text document, search query). Training dataset is needed! Media search engines. Textual representations 5
  • 6. Issues with “textual” approach: 1) Visual (or audio) signal cannot be converted into text without information loss (discretization problem) 2) To better represent the content, various models/signals should be used => more complicated system Media search engines. Textual representations 6
  • 7. Images and videos can be converted into meaningful and efficiently compressed vector representations via CV models. We can build similarity index of all documents, perform clustering to group documents into categories that can be used for search etc. Media search engines. Vector representations 7
  • 8. NLP models can be used to represent search query. To match search query against documents: 1) LTR - predict relevance for the given pair of vectors 2) Mapper model - fuse both search query and document vectors into a single vector space Media search engines. Vector representations 8
  • 9. Dataset Pairs (document, search query) with relevance scores. 1) Manual annotation (e.g. via crowdsourcing job) a) Takes time to collect b) Can be expensive because should cover large part of the search space 2) Online. Based on engagement data (logged events) a) Approximates relevance with some noise b) Having substantial traffic, large and diverse dataset can be built on a periodic basis (trends, seasonality) 9
  • 10. Case Study. Gifs platform 10
  • 11. Dataset. Engagement data (training/validation) Billions of anonymized events per day are logged to capture: 1) views 2) clicks 3) shares 4) favourites for each gif and search query. Can be grouped into “sessions” by utilizing client-specific details 11
  • 12. “Sessions” can be unfolded into sequences of gifs clicked by each user: session 1: gif_1, gif_2, gif_3 ... Or we can incorporate both search queries and gifs: session 1: hello, gif_1, gif_2, good_morning, gif_3 ... Dataset. Engagement data (training/validation) 12
  • 13. To address positional bias for different grids: 1) shuffling of search results for a small percentage of traffic 2) probabilistic modeling based on hierarchical pooling to estimate positional bias effect on CTR For content safety: both search queries and gifs datasets are filtered via maintained blacklists and nsfw models Dataset. Engagement data (training/validation) 13
  • 14. Human judgements obtained via crowdsourcing tasks that estimate: 1) query-gif relevance 2) gif-gif relevance ● Complex relevance criteria defined by business ● Rarely updated and relatively compact Dataset. Manually labeled (benchmark) 14
  • 15. Metric - % of triplets for which (anchor, positive) relevancy > (anchor, negative) relevancy Dataset. Manually labeled (benchmark) Triplets dataset (anchor, positive, negative) OR 15
  • 16. MVP. Gifs embeddings for Recommender System. Train Gensim Skip-Gram model only on gifs: session 1: gif_1, gif_2, gif_3 , where gif_* is an identifier of a gif that was clicked during a session. For inference: kNN search in the embedding space (nmslib). Baseline. Word2Vec model 16
  • 17. V1. Joint embeddings for search queries and gifs: session 1: query_1, gif_1, gif_2, query_2, query_3 ... , where query_* - identifier of a search query issued by a user, and gif_* - identifier of a gif that was clicked during a session Baseline. Word2Vec model 17
  • 18. 18
  • 19. Baseline. Word2Vec model Pros: Search queries and gifs in a single space. Also, gifs’ tags can be incorporated. Applications: 1) Search (query -> relevant gifs) 2) Recommender System (gif -> relevant gifs) 3) Tags Suggestion (query -> relevant tags) Cons: Identifiers (not gif/query content) are used => cold start problem The less frequent is the identifier, the less accurately it is positioned in the embedding space 19
  • 21. Baseline. Word2Vec model Tag Suggestion for gifs 21
  • 22. Baseline. Word2Vec model Search. Implicit usage. Features for ElasticSearch 1) Query Expansion love you to the moon and back => love, adore you, couple, happy 2) Tag Suggestion for gifs gif_1 => love, happy, couple Results: + 10% CTR relative change 22
  • 23. Baseline. Word2Vec model Recommender System. kNN index + 9% CTR relative change compared to MVP version 23
  • 24. Baseline. Word2Vec model Tag Suggestion. kNN index + 40% CTR relative change compared to previous version 24
  • 25. Cold start. Part 1. StarSpace Extend search query with identifiers of its word n-grams: how_are_you_id, gif_1, doing_good_id, gif_2 becomes: how_are_you_id, how_id, are_id, you_id, gif_1, doing_good_id, doing_id, good_id, gif_2 ● Model additionally learns to compare word n-grams with document identifiers ● Unseen search query vector = average of available tokens’ vectors 25
  • 26. Cold start. Part 2. Word2Vec + BERT Take pre-trained BERT model and fine-tune it jointly with Word2Vec BERT learns mapping from search query tokens to Word2Vec gifs space Cold start problem is solved for queries, but is still an issue for gifs ;( 26
  • 27. The key point is that we haven’t really utilized gif data (e.g. visual representation, tags etc) yet. What if we extend the approach like BERT+Word2Vec to all available signals? Mixture of Embedding Experts 27
  • 29. 29
  • 30. 30
  • 31. We still have the same unified embedding space, but without the cold start problem Leverage all available gifs metadata: 1) Visual representation 2) Tags representation 3) OCR representation Mixture of Embedding Experts 31
  • 32. Bonus. Expand a search query 32
  • 33. 33
  • 34. 34
  • 35. Summary 1) Embeddings are great for various IR tasks 2) The ideal application is a candidate generation step 3) Start with a simple baseline with recall as high as possible 4) Wise collection of implicit users’ feedback is a vital part of good embeddings 5) Use human-verified datasets for benchmarks 6) The more data sources you have, the better is the quality of representations 35
  • 36. 1) Word2Vec illustration: http://jalammar.github.io/illustrated-word2vec 2) nmslib. Efficient aNN search: https://github.com/nmslib/nmslib 3) Starspace for space fusion: https://github.com/facebookresearch/StarSpace 4) DSSM: https://www.microsoft.com/en-us/research/project/dssm 5) Pinterest multimodal learning: https://labs.pinterest.com/user/themes/pin_labs/assets/paper/training-and-evaluating.pdf 6) Mixture of embedding experts: https://arxiv.org/pdf/1804.02516.pdf Links 36