SlideShare une entreprise Scribd logo
1  sur  12
NAME : DEEPALI RAIKAR ROLL NO : 11150157 MSC.IT(PART – I )
SIGNATURE FILES
Typically “SIGNATURE FILE” is just a “BAG OF WORDS” Signature files is a technique applied for “Document Retrieval”. The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user. This is done by creating a signature for each document.
A signature is created as an “abstraction” of a document. A signature is a compressed version of a database. All signatures that represent the documents  are kept in a file called “SIGNATURE FILES”. The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.
Characteristics of signature file Word oriented index structure Low overhead Suitable for not very large text Suitable for conventional databases For most applications inverted files       outperform the signature file.
There are various types of signatures, namely : Word signatures Is a fixed-length bit-string representation of word Document Signatures Query Signatures
How Word Signatures are  generated Using “TRIPLETS” of word. Each word is divided into the overlapping      triplet of characters triplet is given some numeric value Use the number as the input to the Hash Function The hash function produces a number  which represents the bit position of the triplet in the word signature
Example of a word signature 111000111001 is a signature created for word “SIGNATURE” RE* *SI SIG IGN GNA NAT ATU TUR URE 12        3           7          3           2         9           1         12        8    Numeric value  of each triplet 111000111001 final word signature generated  using hash function
Document signature Can be created using two methods Concatenation of word signature Superimposed coding Characteristics of Document signatures The length can vary A fixed number of bits may precede Fixing the length of the document signature is possible The length can be set to the longest document in the collection For shorter documents extra “0” can be added.
Example of signature file
Which is better	 inverted file or signature file Inverted Files Accurate Easy to maintain Slow retrieval  Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”
Signature files

Contenu connexe

Tendances

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
similarity measure
similarity measure similarity measure
similarity measure ZHAO Sam
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrievalNanthini Dominique
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
MACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULEMACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULEDrBindhuM
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval modelbaradhimarch81
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean modelVaibhav Khanna
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval ModelsNisha Arankandath
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction) Primya Tamil
 

Tendances (20)

Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
similarity measure
similarity measure similarity measure
similarity measure
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Digital library
Digital libraryDigital library
Digital library
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
MACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULEMACHINE LEARNING-LEARNING RULE
MACHINE LEARNING-LEARNING RULE
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
Web search vs ir
Web search vs irWeb search vs ir
Web search vs ir
 
Multimedia Information Retrieval
Multimedia Information RetrievalMultimedia Information Retrieval
Multimedia Information Retrieval
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 

En vedette

A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...Mumbai Academisc
 
E tutorial - digital signature
E tutorial - digital signatureE tutorial - digital signature
E tutorial - digital signaturethesanyamjain
 
R-Trees and Geospatial Data Structures
R-Trees and Geospatial Data StructuresR-Trees and Geospatial Data Structures
R-Trees and Geospatial Data StructuresAmrinder Arora
 
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSWResumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSWWorkology
 
Digital Signature
Digital SignatureDigital Signature
Digital Signaturesaurav5884
 

En vedette (7)

A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...
 
E tutorial - digital signature
E tutorial - digital signatureE tutorial - digital signature
E tutorial - digital signature
 
R-Trees and Geospatial Data Structures
R-Trees and Geospatial Data StructuresR-Trees and Geospatial Data Structures
R-Trees and Geospatial Data Structures
 
B-tree & R-tree
B-tree & R-treeB-tree & R-tree
B-tree & R-tree
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
 
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSWResumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
 
Digital Signature
Digital SignatureDigital Signature
Digital Signature
 

Dernier

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 

Dernier (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

Signature files

  • 1. NAME : DEEPALI RAIKAR ROLL NO : 11150157 MSC.IT(PART – I )
  • 3. Typically “SIGNATURE FILE” is just a “BAG OF WORDS” Signature files is a technique applied for “Document Retrieval”. The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user. This is done by creating a signature for each document.
  • 4. A signature is created as an “abstraction” of a document. A signature is a compressed version of a database. All signatures that represent the documents are kept in a file called “SIGNATURE FILES”. The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.
  • 5. Characteristics of signature file Word oriented index structure Low overhead Suitable for not very large text Suitable for conventional databases For most applications inverted files outperform the signature file.
  • 6. There are various types of signatures, namely : Word signatures Is a fixed-length bit-string representation of word Document Signatures Query Signatures
  • 7. How Word Signatures are generated Using “TRIPLETS” of word. Each word is divided into the overlapping triplet of characters triplet is given some numeric value Use the number as the input to the Hash Function The hash function produces a number which represents the bit position of the triplet in the word signature
  • 8. Example of a word signature 111000111001 is a signature created for word “SIGNATURE” RE* *SI SIG IGN GNA NAT ATU TUR URE 12 3 7 3 2 9 1 12 8 Numeric value of each triplet 111000111001 final word signature generated using hash function
  • 9. Document signature Can be created using two methods Concatenation of word signature Superimposed coding Characteristics of Document signatures The length can vary A fixed number of bits may precede Fixing the length of the document signature is possible The length can be set to the longest document in the collection For shorter documents extra “0” can be added.
  • 11. Which is better inverted file or signature file Inverted Files Accurate Easy to maintain Slow retrieval Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”