SlideShare une entreprise Scribd logo
1  sur  26
ALTRI MODELLI PER L’INFORMATION  RETRIEVAL : BM25 QUERY LIKELIHOOD RANKING CONFRONTO FRA LANGUAGES MODELS Andrea Schiavinato, Giugno 2008
IL MODELLO PROBABILISTICO ,[object Object],[object Object]
IL MODELLO BM25  (BEST MATCH 25, 1994) Valori tipici dei parametri:  α =1.2;  β =100; b=0.75
IL MODELLO BM25 – QUALCHE ESEMPIO ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],gold silver truck TOT D1 -log(2)*1.02 0 0 -0.71 D2 0 log(2)*1.33 -log(2)*0.96 0.26 D3 -log(2)*1.02 0 -log(2)*1.02 -1.42
IL MODELLO BM25 – QUALCHE ESEMPIO ,[object Object],[object Object],[object Object],[object Object],Lincoln appare in 300 documenti Lincoln appare 25 volte nel documento
IL MODELLO BM25 – QUALCHE ESEMPIO ,[object Object],[object Object],[object Object],Frequenza “president” Frequenza “lincoln” Punteggio 15 25 20.66 25 15 20.36 1 25 18.2 0 25 15.66 25 1 12.95 15 1 12.74 15 0 5
IL MODELLO BM25 – COMMENTI ,[object Object],[object Object],[object Object]
LANGUAGES MODELS – ESEMPIO 1 ,[object Object],[object Object],[object Object],Parola Prob white 0.2 house 0.19 USA 0.10 president 0.09 golf 0.05 …
LANGUAGES MODELS – ESEMPIO 2 ,[object Object],[object Object],Query: Bisogno informativo dell’utente  (reale/percepito/esplicitato) Parola Prob lincoln 0.3 president 0.3 america 0.15 president 0.15 war 0.15 …
LANGUAGES MODELS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
QUERY LIKELIHOOD RANKING  - IDEA ,[object Object],Documento Language model  ricavato dal documento Query: RANK DEL DOCUMENTO Parola Prob america 0.2 lincoln 0.19 USA 0.10 president 0.09 golf 0.05 …
QUERY LIKELIHOOD RANKING  - FORMULA ,[object Object],[object Object],[object Object]
MIGLIORIA: SMOOTHING ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
QUERY LIKELIHOOD RANKING – QUALCHE ESEMPIO ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],gold silver truck TOT D1 -1.98 -4.70 -4.70 -11.38 D2 -4.70 -1.45 -2.10 -8.25 D3 -1.98 -4.70 -1.98 -8.66
QUERY LIKELIHOOD RANKING - COMMENTI ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Frequenza “president” Frequenza “lincoln” Punteggio Punteggio BM25 15 25 -10.53 20.66 25 15 -10.55 20.36 1 25 -12.99 18.2 25 1 -13.25 12.95 0 25 -14.40 15.66 15 0 -19.05 5.00
IR CONFRONTANDO LANGUAGES MODELS ,[object Object],Documento DIVERGENZA Query RANK DEL DOCUMENTO Insieme di documenti relevant Parola Prob america 0.2 lincoln 0.19 USA 0.10 president 0.09 … Parola Prob lincoln 0.3 president 0.3 america 0.15 president 0.15 …
CONFRONTO FRA LANGUAGES MODELS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
IR CONFRONTANDO LANGUAGES MODELS ,[object Object],[object Object],[object Object]
COME CALCOLARE IL RELEVANCE MODEL? ,[object Object],[object Object],[object Object]
COME CALCOLARE IL RELEVANCE MODEL (MEGLIO)? ,[object Object],[object Object],Query likelihood ranking per il documento D (senza il logaritmo)
COME ORDINARE I DOCUMENTI? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
CALCOLO DEL RELEVANCE MODEL -ESEMPIO w P(w|D1)*peso(D1) P(w|D2)*peso(D2) P(w|D3)*peso(D3) Tot Tot norm silver 0,000 -1,939 -0,808 -2,748 0,148 a -0,615 -0,970 -0,808 -2,393 0,129 in -0,615 -0,970 -0,808 -2,393 0,129 of -0,615 -0,970 -0,808 -2,393 0,129 arrived 0,000 -0,970 -0,808 -1,778 0,096 truck 0,000 -0,970 -0,808 -1,778 0,096 gold -0,615 0,000 -0,808 -1,423 0,077 Shipment -0,615 0,000 -0,808 -1,423 0,077 delivery 0,000 -0,970 0,000 -0,970 0,052 damaged -0,615 0,000 0,000 -0,615 0,033 fire -0,615 0,000 0,000 -0,615 0,033
PUNTEGGIO DELLA DIVERGENZA ,[object Object],[object Object],[object Object],w P(w|R) P(w|D1) Log(P(w|R))*Log(P(w|D1)) silver 0,148 0,143 3,714 a 0,129 0,000 0,000 in 0,129 0,143 3,983 of 0,129 0,000 0,000 arrived 0,096 0,143 4,561 truck 0,096 0,143 4,561 gold 0,077 0,143 4,994 Shipment 0,077 0,143 4,994 delivery 0,052 0,143 5,741 damaged 0,033 0,000 0,000 fire 0,033 0,000 0,000 TOT 32,547
CONFRONTO FRA LANGUAGES MODELS - COMMENTI ,[object Object],[object Object],[object Object],[object Object]
RIFERIMENTI ,[object Object],[object Object],[object Object],[object Object],[object Object]
COME VARIA IL PUNTEGGIO IN FUNZIONE DELLA LUNGHEZZA DEL DOCUMENTO (BM25) L=0.5 L=0.9 L=1.4

Contenu connexe

En vedette

Egypt By Brad & Eilidh
Egypt By Brad & EilidhEgypt By Brad & Eilidh
Egypt By Brad & Eilidhiarthur
 
Networking slides
Networking slidesNetworking slides
Networking slidesiarthur
 
Unit 3 Storage And Retreival Of Information
Unit 3 Storage And Retreival Of InformationUnit 3 Storage And Retreival Of Information
Unit 3 Storage And Retreival Of Informationiarthur
 
Project Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An IntroductionProject Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An IntroductionTeresa
 
Sales And Marketing Functions
Sales And Marketing FunctionsSales And Marketing Functions
Sales And Marketing Functionsiarthur
 
Dragons Den Powerpoint
Dragons Den PowerpointDragons Den Powerpoint
Dragons Den Powerpointiarthur
 
Computer Structure Slides
Computer Structure SlidesComputer Structure Slides
Computer Structure Slidesiarthur
 

En vedette (10)

CRSA New Uniforms
CRSA New UniformsCRSA New Uniforms
CRSA New Uniforms
 
Egypt By Brad & Eilidh
Egypt By Brad & EilidhEgypt By Brad & Eilidh
Egypt By Brad & Eilidh
 
Social learning bruxels
Social learning bruxelsSocial learning bruxels
Social learning bruxels
 
Networking slides
Networking slidesNetworking slides
Networking slides
 
Unit 7
Unit 7Unit 7
Unit 7
 
Unit 3 Storage And Retreival Of Information
Unit 3 Storage And Retreival Of InformationUnit 3 Storage And Retreival Of Information
Unit 3 Storage And Retreival Of Information
 
Project Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An IntroductionProject Management for Academic Health Science Libraries - An Introduction
Project Management for Academic Health Science Libraries - An Introduction
 
Sales And Marketing Functions
Sales And Marketing FunctionsSales And Marketing Functions
Sales And Marketing Functions
 
Dragons Den Powerpoint
Dragons Den PowerpointDragons Den Powerpoint
Dragons Den Powerpoint
 
Computer Structure Slides
Computer Structure SlidesComputer Structure Slides
Computer Structure Slides
 

Wir Schiavinato 170608

  • 1. ALTRI MODELLI PER L’INFORMATION RETRIEVAL : BM25 QUERY LIKELIHOOD RANKING CONFRONTO FRA LANGUAGES MODELS Andrea Schiavinato, Giugno 2008
  • 2.
  • 3. IL MODELLO BM25 (BEST MATCH 25, 1994) Valori tipici dei parametri: α =1.2; β =100; b=0.75
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. CALCOLO DEL RELEVANCE MODEL -ESEMPIO w P(w|D1)*peso(D1) P(w|D2)*peso(D2) P(w|D3)*peso(D3) Tot Tot norm silver 0,000 -1,939 -0,808 -2,748 0,148 a -0,615 -0,970 -0,808 -2,393 0,129 in -0,615 -0,970 -0,808 -2,393 0,129 of -0,615 -0,970 -0,808 -2,393 0,129 arrived 0,000 -0,970 -0,808 -1,778 0,096 truck 0,000 -0,970 -0,808 -1,778 0,096 gold -0,615 0,000 -0,808 -1,423 0,077 Shipment -0,615 0,000 -0,808 -1,423 0,077 delivery 0,000 -0,970 0,000 -0,970 0,052 damaged -0,615 0,000 0,000 -0,615 0,033 fire -0,615 0,000 0,000 -0,615 0,033
  • 23.
  • 24.
  • 25.
  • 26. COME VARIA IL PUNTEGGIO IN FUNZIONE DELLA LUNGHEZZA DEL DOCUMENTO (BM25) L=0.5 L=0.9 L=1.4