SlideShare une entreprise Scribd logo
1  sur  17
Vector Space Model
By: Tharuka Vishwajith
Boolean Model
• Based on set theory and Boolean logic
• Exact matching of documents to a user query
• Uses the Boolean AND, OR and NOT operators
D1 D2 D3 D4 D5 D6
Cat 1 1 0 1 0 1
Dog 1 1 1 1 1 0
Rat 0 1 0 1 0 1
Apple 0 0 0 0 1 0
Orange 0 0 1 1 0 1
Computer 0 0 0 1 1 1
• query: Dog AND Cat AND NOT Computer
• computation: 111110 AND 110101 AND 111000 = 110000
• result: document set {D1,D2}
D1 D2 D3 D4 D5 D6
Cat 1 1 0 1 0 1
Dog 1 1 1 1 1 0
Rat 0 1 0 1 0 1
Apple 0 0 0 0 1 0
Orange 0 0 1 1 0 1
Computer 0 0 0 1 1 1
Boolean Model ...
Advantages
• Relatively easy to implement and scalable
• Fast query processing based on parallel scanning of indexes
Disadvantages
• Does not pay attention to synonymy
• Does not pay attention to polysemy
• No ranking of output
• Often the user has to learn a special syntax such as the use of double quotes to
search for phrases
Vector Space Model
• Algebraic model representing text documents and queries as vectors
based on the index terms
• One dimension for each term
• Compute the similarity (angle) between the query vector and the
document vectors
Dog
Computer
D2
D1
5
1
2 8
Query
θ1
θ2
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
1 + log(tf)
Term frequency (tf) count
Log normalization:
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
Log Frequency Weightage
Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2
Term SaS PaP WH
affection 3.06 0.83 0.52
jealous 2.00 0.55 0.46
gossip 1.30 0 0.40
wuthering 0 0 0.58
Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2
Length normalization for WH = (2.3)2 + (2.04)2 + (1.78)2 + (2.58) 2
= 3.87
= 3.31
= 4.39
Term SaS PaP WH
affection 3.06 2.76 2.30
jealous 2.00 1.84 2.04
gossip 1.30 0 1.78
wuthering 0 0 2.58
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
After Length Normalization
Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2
Term SaS PaP WH
affection 3.06 / 3.87 2.78 / 3.31 2.30 / 4.39
jealous 2.00 / 3.87 1.84 / 3.31 2.04 / 4.39
gossip 1.30 / 3.87 0 / 3.31 1.78 / 4.39
wuthering 0 / 3.87 0 / 3.31 2.58 / 4.39
Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2
Length normalization for WH = (2.3)2 + (2.04)2 + (1.77)2 + (2.57) 2
= 3.87
= 3.31
= 4.39
Cosine similarity among 3 documents
Term SaS PaP WH
affection 115 58 20
jealous 10 7 11
gossip 2 0 6
wuthering 0 0 38
After Length Normalization
Cos( SaS . PaP ) ∝ (0.79 x 0.84) + (0.51 x 0.56)
Term SaS PaP WH
affection 0.79 0.84 0.52
jealous 0.51 0.56 0.46
gossip 0.33 0 0.40
wuthering 0 0 0.58
Cos ( PaP . WH ) ∝ (0.84 x 0.52) + (0.56 x 0.46)
Cos ( SaS . WH ) ∝ (0.79 x 0.52) + (0.51 x 0.46) + (0.33 x 0.4)
= 0.95
= 0.69
= 0.78
Vector space model in information retrieval
Vector space model in information retrieval

Contenu connexe

Tendances

Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsVaibhav Khanna
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsMounia Lalmas-Roelleke
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)Ameer Sameer
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrievalNanthini Dominique
 
Information retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic modelsInformation retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic modelsVaibhav Khanna
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
Information retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic modelsInformation retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic modelsVaibhav Khanna
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3alaa223
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsVaibhav Khanna
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)silambu111
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval systemLeslie Vargas
 

Tendances (20)

Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Inverted index
Inverted indexInverted index
Inverted index
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Information retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic modelsInformation retrieval 15 alternative algebraic models
Information retrieval 15 alternative algebraic models
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Information retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic modelsInformation retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic models
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDFCS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I  PPT  IN PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of words
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 

Similaire à Vector space model in information retrieval

Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1asslang
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with RAkira Murakami
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with RAkira Murakami
 
Text-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesText-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesJinho Choi
 
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdfBasavaRajeshwari2
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jamsBol.com Techlab
 
BenG Update on automatic labelling
BenG Update on automatic labellingBenG Update on automatic labelling
BenG Update on automatic labellingVictor de Boer
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTetsuya Sakai
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data SynchronizationKlemen Verdnik
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
Pytroch-basic.pptx
Pytroch-basic.pptxPytroch-basic.pptx
Pytroch-basic.pptxrebeen4
 
Chapter 1 number and code system sss
Chapter 1 number and code system sssChapter 1 number and code system sss
Chapter 1 number and code system sssBaia Salihin
 

Similaire à Vector space model in information retrieval (20)

Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1
 
01 introduction
01 introduction01 introduction
01 introduction
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
Text-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven TemplatesText-to-SQL with Data-Driven Templates
Text-to-SQL with Data-Driven Templates
 
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
012675925c0f652bb179b6a33cd3d13b_MIT6_003F11_lec01.pdf
 
Three steps to untangle data traffic jams
Three steps to untangle data traffic jamsThree steps to untangle data traffic jams
Three steps to untangle data traffic jams
 
BenG Update on automatic labelling
BenG Update on automatic labellingBenG Update on automatic labelling
BenG Update on automatic labelling
 
ppt_pspp.pdf
ppt_pspp.pdfppt_pspp.pdf
ppt_pspp.pdf
 
LEC 1.pptx
LEC 1.pptxLEC 1.pptx
LEC 1.pptx
 
Digital Logic & Design
Digital Logic & DesignDigital Logic & Design
Digital Logic & Design
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data Synchronization
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Pytroch-basic.pptx
Pytroch-basic.pptxPytroch-basic.pptx
Pytroch-basic.pptx
 
Chapter 1 number and code system sss
Chapter 1 number and code system sssChapter 1 number and code system sss
Chapter 1 number and code system sss
 

Dernier

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Dernier (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Vector space model in information retrieval

  • 1. Vector Space Model By: Tharuka Vishwajith
  • 2. Boolean Model • Based on set theory and Boolean logic • Exact matching of documents to a user query • Uses the Boolean AND, OR and NOT operators D1 D2 D3 D4 D5 D6 Cat 1 1 0 1 0 1 Dog 1 1 1 1 1 0 Rat 0 1 0 1 0 1 Apple 0 0 0 0 1 0 Orange 0 0 1 1 0 1 Computer 0 0 0 1 1 1
  • 3. • query: Dog AND Cat AND NOT Computer • computation: 111110 AND 110101 AND 111000 = 110000 • result: document set {D1,D2} D1 D2 D3 D4 D5 D6 Cat 1 1 0 1 0 1 Dog 1 1 1 1 1 0 Rat 0 1 0 1 0 1 Apple 0 0 0 0 1 0 Orange 0 0 1 1 0 1 Computer 0 0 0 1 1 1
  • 4. Boolean Model ... Advantages • Relatively easy to implement and scalable • Fast query processing based on parallel scanning of indexes Disadvantages • Does not pay attention to synonymy • Does not pay attention to polysemy • No ranking of output • Often the user has to learn a special syntax such as the use of double quotes to search for phrases
  • 5. Vector Space Model • Algebraic model representing text documents and queries as vectors based on the index terms • One dimension for each term • Compute the similarity (angle) between the query vector and the document vectors
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 1 + log(tf) Term frequency (tf) count Log normalization:
  • 13. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 Log Frequency Weightage Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2 Term SaS PaP WH affection 3.06 0.83 0.52 jealous 2.00 0.55 0.46 gossip 1.30 0 0.40 wuthering 0 0 0.58 Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2 Length normalization for WH = (2.3)2 + (2.04)2 + (1.78)2 + (2.58) 2 = 3.87 = 3.31 = 4.39 Term SaS PaP WH affection 3.06 2.76 2.30 jealous 2.00 1.84 2.04 gossip 1.30 0 1.78 wuthering 0 0 2.58
  • 14. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 After Length Normalization Length normalization for SaS = (3.06)2 + (2)2 + (1.3)2 + (0) 2 Term SaS PaP WH affection 3.06 / 3.87 2.78 / 3.31 2.30 / 4.39 jealous 2.00 / 3.87 1.84 / 3.31 2.04 / 4.39 gossip 1.30 / 3.87 0 / 3.31 1.78 / 4.39 wuthering 0 / 3.87 0 / 3.31 2.58 / 4.39 Length normalization for PaP = (2.76)2 + (1.84)2 + (0)2 + (0) 2 Length normalization for WH = (2.3)2 + (2.04)2 + (1.77)2 + (2.57) 2 = 3.87 = 3.31 = 4.39
  • 15. Cosine similarity among 3 documents Term SaS PaP WH affection 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 After Length Normalization Cos( SaS . PaP ) ∝ (0.79 x 0.84) + (0.51 x 0.56) Term SaS PaP WH affection 0.79 0.84 0.52 jealous 0.51 0.56 0.46 gossip 0.33 0 0.40 wuthering 0 0 0.58 Cos ( PaP . WH ) ∝ (0.84 x 0.52) + (0.56 x 0.46) Cos ( SaS . WH ) ∝ (0.79 x 0.52) + (0.51 x 0.46) + (0.33 x 0.4) = 0.95 = 0.69 = 0.78