SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
S
Search Ranking
A Deep Dive
Venkata Vineel Yalamarthi (u0881808)
Our interests
S  Scalability
S  Machine Learning
S  Natural Language Understanding
S  Java, Python
S  VLSI and Scripting Languages
What is Information Retrieval ?
S  In the era of Big Data with data in multiple forms
(structured and unstructured text, images, videos)
and increasing usage of computing across different
devices and media and peaking consumerism, IR is
nothing but study of algorithms, tools and
techniques by leveraging multiple disciplines of
computer science (Data Mining, Machine
Learning, Computer Vision, Visualization and
Natural Language Processing) to bring most
relevant information with minimal cognitive
effort .
What did we do and Learn?
S  Different Commercial Vertical Engines
S  Elastic Search
S  Java plugin for Elastic Search
S  Search Re-Ranking : A NLP Approach
S  Expedia Personalized Search Ranking – 2013
S  Computer Vision and Visualization Examples
An example from
Computational Advertisement
S  Night-stand has different Meanings
S  If Search Engines , don’t understand meaning
properly ,customer’s lose money
S  How do they understand the context ?
S  Different signals
S  User History and Query Understanding
S  NLP is Crucial
Query Understanding
Night stand at a friend’s place
VS
Night stand for my dorm
Used night-stands on discount
Simplest Search Engines
S  Narrow Down the Search by department
S  Entity Matching using Lewenstein's distance /Soundex
Algorithm
S  Smyth Vs Smith
S  Bare String Matching
Commercial Search Engines
S  Yelp or FourSquare or Ebay
Multiple - Signals
1. Is user looking for a hotel or a salon ?
2. What are diff options available ? If multiple then do
sentiment analysis ? Click rate Analysis
3 . Location and Social Network Analysis
4. We need VERY good query understanding
What we DON’T care about ?
S  Search (Grep) algorithm , Page Rank Vs
S  Search Ranking/Relevance
Distributional Hypothesis
a word by is characterized
by the company it keeps
--- Firth (1957 )
Bag of Words Model
S  Don’t preserve semantics
S  Rama went to Lanka in Search of Seetha
S  Seetha went to Lanka in Search of Rama
S  [1 0 1 1 1 0 1]
S  [1 0 1 1 1 1 1]
S  Dict = {//sort these words , chaitanya}
Can you do sentiment analysis
Positive, Negative, Neutral
The shutter lag of this digital camera is annoying sometimes, especially when capturing cute
baby.
S  I received the camera as a Christmas present from relatives and enjoyed it a lot.
S  Presence or Absence of words don’t help- Sentiment Analysis
We need Better Representations
S  C Vs Java . Object Oriented Modeling
S  Properties + Methods : class Student {
S  Float getcGPA ; boolean isHeEligibleToTakeGradCourses()
{
S  { } }
Good structures to represent and play with and get
meaningful results
Related work – Structured
Learning
S 
TF –IDF approach
S Purely statistical
S Doesn’t preserve semantics
Query : When Lady Gaga sings
S  R1 : lady gaga sings and kati perry dances
S  R2: lady gaga dances and keri parry sings
S  N-GRAM or TF-IDF approach works here..
S  Why ?
Query :When Lady Gaga sings
S  R1 : lady gaga dances and keri parry sings
S  R2 : lady gaga dances and sings and katy perry dances
S Does TF-IDF /Bag of Words /Vector
Space Model work here ?
S Yes / No ?
How can we solve this?
Current Search Engines=mostly
key word match
Same Query on Bing
We need a plug and play
solution
S . Create parse tree representations T1, T2,
T3…. T10 for R1, R2, R3… R10 respectively.
S Create parse tree representation for the query Q.
S Find the similarity score of each results tree T
with that of Q.
S Sort all of them and present to the user.
Elastic Search
S  Distributed Search Server based on Lucene
S  Based on Lucene
S  Is it a Data Base ?
S  Is it SQL/No SQL ?
S  When we have lot of data bases, why should we care about it ?
Lets look at in action .
How does Elastic Search work
today ?
S  It uses TF-IDF for Search Ranking
S  It assigns scores to each and every document
Data Mining approach
S  Not every thing is the natural Language Text
S  We may have lot of features , the interdependency among
them may not be known to us.
S  Big Data Not Always means Huge Data, It could also be
small data with huge number of features that might require
statistics and Data Mining
Expedia Personalized Hotels
Ranking
S  Used Random Forest .
How to measure Search
Ranking ? Precision /Recall
are not convenient
NDCG
S  Discounted Cumulative Grading
Twitter Streaming with Elastic
Search
S  River plugins can be built for Elastic Search .
S  Lets look at the demo .
Visualization/Computer Vision
S  Kibana - Elastic Search
S  Flow- App from Amazon in research phase.

Contenu connexe

Similaire à InformationRetrieval

data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdfvkharish18
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)ShehryarSH1
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeOsama Ghandour Geris
 
OOUXHO 2020 quantum content
OOUXHO 2020   quantum contentOOUXHO 2020   quantum content
OOUXHO 2020 quantum contentDimiter Simov
 
Artificial Intelligence Research
Artificial Intelligence ResearchArtificial Intelligence Research
Artificial Intelligence ResearchNigarAlishzade
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
Oleksabdra Kardash "Let AI plan your trip"
Oleksabdra Kardash "Let AI plan your trip"Oleksabdra Kardash "Let AI plan your trip"
Oleksabdra Kardash "Let AI plan your trip"Lviv Startup Club
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
SharePoint Information Architecture & Usability - SharePoint Saturday The Con...
SharePoint Information Architecture & Usability - SharePoint Saturday The Con...SharePoint Information Architecture & Usability - SharePoint Saturday The Con...
SharePoint Information Architecture & Usability - SharePoint Saturday The Con...Richard Harbridge
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
Power to the People!
Power to the People!Power to the People!
Power to the People!Zef Fugaz
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendEgor Pushkin
 
Nova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web TalkNova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web Talksyawal
 

Similaire à InformationRetrieval (20)

data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdf
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)
 
Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
 
OOUXHO 2020 quantum content
OOUXHO 2020   quantum contentOOUXHO 2020   quantum content
OOUXHO 2020 quantum content
 
Artificial Intelligence Research
Artificial Intelligence ResearchArtificial Intelligence Research
Artificial Intelligence Research
 
Data science
Data science Data science
Data science
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Oleksabdra Kardash "Let AI plan your trip"
Oleksabdra Kardash "Let AI plan your trip"Oleksabdra Kardash "Let AI plan your trip"
Oleksabdra Kardash "Let AI plan your trip"
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
KOHN.ppt
KOHN.pptKOHN.ppt
KOHN.ppt
 
KOHN.ppt
KOHN.pptKOHN.ppt
KOHN.ppt
 
SharePoint Information Architecture & Usability - SharePoint Saturday The Con...
SharePoint Information Architecture & Usability - SharePoint Saturday The Con...SharePoint Information Architecture & Usability - SharePoint Saturday The Con...
SharePoint Information Architecture & Usability - SharePoint Saturday The Con...
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Power to the People!
Power to the People!Power to the People!
Power to the People!
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
 
Nova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web TalkNova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web Talk
 

InformationRetrieval

  • 1. S Search Ranking A Deep Dive Venkata Vineel Yalamarthi (u0881808)
  • 2. Our interests S  Scalability S  Machine Learning S  Natural Language Understanding S  Java, Python S  VLSI and Scripting Languages
  • 3.
  • 4. What is Information Retrieval ? S  In the era of Big Data with data in multiple forms (structured and unstructured text, images, videos) and increasing usage of computing across different devices and media and peaking consumerism, IR is nothing but study of algorithms, tools and techniques by leveraging multiple disciplines of computer science (Data Mining, Machine Learning, Computer Vision, Visualization and Natural Language Processing) to bring most relevant information with minimal cognitive effort .
  • 5. What did we do and Learn? S  Different Commercial Vertical Engines S  Elastic Search S  Java plugin for Elastic Search S  Search Re-Ranking : A NLP Approach S  Expedia Personalized Search Ranking – 2013 S  Computer Vision and Visualization Examples
  • 6. An example from Computational Advertisement S  Night-stand has different Meanings S  If Search Engines , don’t understand meaning properly ,customer’s lose money S  How do they understand the context ? S  Different signals S  User History and Query Understanding S  NLP is Crucial
  • 7. Query Understanding Night stand at a friend’s place VS Night stand for my dorm Used night-stands on discount
  • 9. S  Narrow Down the Search by department S  Entity Matching using Lewenstein's distance /Soundex Algorithm S  Smyth Vs Smith S  Bare String Matching
  • 10. Commercial Search Engines S  Yelp or FourSquare or Ebay Multiple - Signals 1. Is user looking for a hotel or a salon ? 2. What are diff options available ? If multiple then do sentiment analysis ? Click rate Analysis 3 . Location and Social Network Analysis 4. We need VERY good query understanding
  • 11. What we DON’T care about ? S  Search (Grep) algorithm , Page Rank Vs S  Search Ranking/Relevance
  • 12. Distributional Hypothesis a word by is characterized by the company it keeps --- Firth (1957 )
  • 13. Bag of Words Model S  Don’t preserve semantics S  Rama went to Lanka in Search of Seetha S  Seetha went to Lanka in Search of Rama S  [1 0 1 1 1 0 1] S  [1 0 1 1 1 1 1] S  Dict = {//sort these words , chaitanya}
  • 14. Can you do sentiment analysis Positive, Negative, Neutral The shutter lag of this digital camera is annoying sometimes, especially when capturing cute baby. S  I received the camera as a Christmas present from relatives and enjoyed it a lot. S  Presence or Absence of words don’t help- Sentiment Analysis
  • 15. We need Better Representations S  C Vs Java . Object Oriented Modeling S  Properties + Methods : class Student { S  Float getcGPA ; boolean isHeEligibleToTakeGradCourses() { S  { } } Good structures to represent and play with and get meaningful results
  • 16. Related work – Structured Learning S 
  • 17. TF –IDF approach S Purely statistical S Doesn’t preserve semantics
  • 18. Query : When Lady Gaga sings S  R1 : lady gaga sings and kati perry dances S  R2: lady gaga dances and keri parry sings S  N-GRAM or TF-IDF approach works here.. S  Why ?
  • 19. Query :When Lady Gaga sings S  R1 : lady gaga dances and keri parry sings S  R2 : lady gaga dances and sings and katy perry dances S Does TF-IDF /Bag of Words /Vector Space Model work here ? S Yes / No ?
  • 20. How can we solve this?
  • 23. We need a plug and play solution S . Create parse tree representations T1, T2, T3…. T10 for R1, R2, R3… R10 respectively. S Create parse tree representation for the query Q. S Find the similarity score of each results tree T with that of Q. S Sort all of them and present to the user.
  • 24. Elastic Search S  Distributed Search Server based on Lucene S  Based on Lucene S  Is it a Data Base ? S  Is it SQL/No SQL ? S  When we have lot of data bases, why should we care about it ? Lets look at in action .
  • 25. How does Elastic Search work today ? S  It uses TF-IDF for Search Ranking S  It assigns scores to each and every document
  • 26. Data Mining approach S  Not every thing is the natural Language Text S  We may have lot of features , the interdependency among them may not be known to us. S  Big Data Not Always means Huge Data, It could also be small data with huge number of features that might require statistics and Data Mining
  • 28. How to measure Search Ranking ? Precision /Recall are not convenient
  • 30. Twitter Streaming with Elastic Search S  River plugins can be built for Elastic Search . S  Lets look at the demo .
  • 31. Visualization/Computer Vision S  Kibana - Elastic Search S  Flow- App from Amazon in research phase.