SlideShare une entreprise Scribd logo
1  sur  3
Télécharger pour lire hors ligne
Henrique Tiggemann.et al. Int. Journal of Engineering Research and Application www.ijera.com
ISSN : 2248-9622, Vol. 6, Issue 7, ( Part -4) July 2016, pp.20-22
www.ijera.com 20|P a g e
Document Retrieval System, a Case Study
Mahmood Alfathe,Safa Al-Taie
Electrical and Computer Engineering DepartmentFlorida Institute of Technology, Melbourne, FL 32901, USA
ABSTRACT
In this work we have proposed a method for automatic indexing and retrieval. This method will provide as a
result the most likelihood document which is related to the input query. The technique used in this project is
known as singular-value decomposition, in this method a large term by document matrix is analyzed and
decomposed into 100 factors. Documents are represented by 100 item vector of factor weights. On the other
hand queries are represented as pseudo-document vectors, which are formed from weighed combinations of
terms.
Keywords–Indexing, Retrieval, Speech, Text
I. INTRODUCTION
This approach tries to overcome the
problems of term matching retrieval by statistically
treating the unhealable term-document association
data. In this proposed method it is assumed that there
is some underlying latent semantic structure of data
that is may be have been obscured by the random
distribution of words. This latent structure has been
estimated by using statistical techniques to eliminate
the “noise”.
The Latent Semantic Indexing (LSI) which
we have used in this project is the singular-value
decomposition. We have constructed a “semantic
space” by taking a massive matrix of term document
association data, where documents and terms that are
closely associated have been placed together. With
the singular-value decomposition we could arrange
the space as all associative patterns of the data and
neglect and ignore all he miner, less important
effects. At the end most of the terms that have not
appear in the document still end close to the
document. The position in the space appeared as a
kind of indexing. At the end the position of the term
in the space of the document determines if the is
close enough to the document or not.
Insufficiency Of Current Automatic Indexing And
Retrieval Methods
A fundamental deficiency of current
information retrieval methods is that the words
searchers use often are not the same as those by
which the information they seek has been indexed.
There are actually two sides to the issue; we will call
them broadly synonymy and polysemy. We use
synonymy in a very general sense to describe the
fact that there are many ways to refer to the same
object. Users in different contexts, or with different
needs, knowledge, or linguistic habits will describe
the same information using different terms. Indeed,
we have found that the degree of variability in
descriptive term usage is much greater than is
commonly suspected. For example, two people
choose the same main key word for a single well-
known object less than 20% of the time [5].
Comparably poor agreement has been reported in
studies of inter-indexer consistency [6] and in the
generation of search terms by either expert
intermediaries [7] or less experienced searchers [8]
[9] . The prevalence of synonyms tends to decrease
the "recall" performance of retrieval systems. By
polysemy we refer to the general fact that most
words have more than one distinct meaning
(homography). In different contexts or when used by
different people the same term (e.g. "chip") takes on
varying referential significance. Thus the use of a
term in a search query does not necessarily mean
that a document containing or labeled by the same
term is of interest. Polysemy isone factor underlying
poor "precision". The failure of current automatic
indexing to overcome these problems can be largely
traced to three factors. The first factor is that the way
index terms are identified is incomplete. The terms
used to describe or index a document typically
contain only a fraction of the terms that users as
agroup will try to look it up under. This is partly
because the documents themselves do not contain all
the terms users will apply, and sometimes because
term selection procedures intentionally omit many of
the terms in a document. Attempts to deal with the
synonymy problem have relied on intellectual or
automatic term expansion, or the construction of a
thesaurus. These are presumably advantageous for
conscientious and knowledgeable searchers who can
use such tools to suggest additional search terms.
The drawback for fully automatic methods is that
some added terms may have different meaning from
that intended (the polysemy effect) leading to rapid
degradation of precision [1]. It is worth noting in
passing that experiments with small interactive data
bases have shown monotonic improvements in recall
RESEARCH ARTICLE OPEN ACCESS
Henrique Tiggemann.et al. Int. Journal of Engineering Research and Application www.ijera.com
ISSN : 2248-9622, Vol. 6, Issue 7, ( Part -4) July 2016, pp.20-22
www.ijera.com 21|P a g e
rate without overall loss of precision as more
indexing terms, either taken from the documents or
from large samples of actual users‟ words are added
[2] [3] . Whether this "unlimited aliasing" method,
which we have described elsewhere, will be
effective in very large data bases remains to be
determined. Not only is there a potential issue of
ambiguity and lack of precision, but the problem of
identifying index terms that are not in the text of
documentsgrows cumbersome. This was one of the
motives for the approach to be described here. The
second factor is the lack of an adequate automatic
method for dealing with polysemy. One common
approach is the use of controlled vocabularies and
human intermediaries to act as translators. Not only
is this solution extremely expensive, but it is not
necessarily effective. Another approach is to allow
Boolean intersection or coordination with other
terms to disambiguate meaning. Success is severely
hampered by users‟ inability to think of appropriate
limiting terms if they do exist, and by the fact that
such terms may not occur in the documents or may
not have been included in the indexing. The third
factor is somewhat more technical, having to do with
the way in which current automatic indexing and and
retrieval systems actually work. In such systems
each word type is treated as independent of any
other [4]. Thus matching (or not) both of two terms
that almost always occur together is counted as
heavily as matching two that are rarely found in the
same document. Thus the scoring of success, in
either straight Boolean or
Coordination level searches, fails to take
redundancy into account, and as a result may distort
results to an unknown degree. This problem
exacerbates a user‟s difficulty in using compound-
term queries effectively to expand or limit a search.
II. THE CHOICE OF METHOD FOR
UNCOVERING LATENT SEMANTIC
STRUCTURE
The goal is to find and fit a useful model of
the relationships between terms and documents. We
want to use the matrix of observed occurrences of
terms applied to documents to estimate parameters
of that underlying model. With the resulting model
we can then estimate what the observed occurrences
really should have been. In this way, for example,
we might predict that a given term should be
associated with a document, even though, because of
variability in word use, no such association was
observed. The first question is what sort of model to
choose. A notion of semantic similarity, between
documents and between terms, seems central to
modeling the patterns of term usage across
documents. This led us to restrict consideration to
proximity models, i.e., models that try to put similar
items near each other in some space or structure.
Such models include: hierarchical, partition and
overlapping clustering; ultra-metric and additive
trees; and factor-analytic and multidimensional
distance models,as shown in the survey in [10].
Aiding information retrieval by discovering latent
proximity structure has at least two lines of
precedence in the literature. Hierarchical
classification analyses are frequently used for term
and document clustering [11]. Latent class analysis
and factor analysis have also been explored before
for automatic document indexing and retrieval.
III. CONCLUSION
Assuming there is a set of „correct answers‟
to the query. The docs in this set are called
relevant to the query. The set of documents
returned by the system are called retrieved
documents. Precision, is what percentage of the
retrieved documents are relevant. Recall, is
what percentage of all relevant documents are
retrieved. A noisy input is a common problem
especially with the OCR techniques where
many problems can occur like spelling errors,
this suggested method will spell the letters or
words correctly.
REFERENCES
[1]. Tougas, Jane E. and Raymond J. Spiteri.
"Updating The Partial Singular Value
Decomposition In Latent Semantic
Indexing". Computational Statistics & Data
Analysis 52.1 (2007): 174-183. Web.
[2]. Aswani Kumar, Ch., M. Radvansky, and J.
Annapurna. "Analysis Of A Vector Space
Model, Latent Semantic Indexing And
Formal Concept Analysis For Information
Retrieval". Cybernetics and Information
Technologies 12.1 (2012): n. pag. Web.
[3]. Atreya, Avinash and Charles Elkan. "Latent
Semantic Indexing (LSI) Fails For TREC
Collections". SIGKDD Explor. Newsl. 12.2
(2011): 5. Web.
[4]. Phadnis, Neelam and Jayant Gadge.
"Framework For Document Retrieval Using
Latent Semantic Indexing". International
Journal of Computer Applications 94.14
(2014): 37-41. Web.
[5]. Papadimitriou, Christos H. et al. "Latent
Semantic Indexing: A Probabilistic
Analysis".Journal of Computer and System
Sciences 61.2 (2000): 217-235. Web.
[6]. Landauer, Thomas K. Handbook Of Latent
Semantic Analysis. New York: Routledge,
2011. Print.
[7]. Foltz, P. W. "Using Latent Semantic
Indexing For Information
Henrique Tiggemann.et al. Int. Journal of Engineering Research and Application www.ijera.com
ISSN : 2248-9622, Vol. 6, Issue 7, ( Part -4) July 2016, pp.20-22
www.ijera.com 22|P a g e
Filtering". SIGOIS Bull. 11.2-3 (1990): 40-
47. Web.
[8]. Aswani Kumar, Ch., M. Radvansky, and J.
Annapurna. "Analysis Of A Vector Space
Model, Latent Semantic Indexing And
Formal Concept Analysis For Information
Retrieval". Cybernetics and Information
Technologies 12.1 (2012): n. pag. Web.
[9]. Papadimitriou, Christos H. et al. "Latent
Semantic Indexing: A Probabilistic
Analysis".Journal of Computer and System
Sciences 61.2 (2000): 217-235. Web.
[10]. "Latent Semantic Indexing Analysis Of K-
Means Document Clustering For Changing
Index Terms Weighting". The KIPS
Transactions:PartB 10B.7 (2003): 735-742.
Web.
[11]. Martínez-Torres, M. R. "Content Analysis
Of Open Innovation Communities Using
Latent Semantic Indexing". Technology
Analysis & Strategic Management 27.7
(2015): 859-875. Web.

Contenu connexe

Tendances

Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm IJECEIAES
 
P036401020107
P036401020107P036401020107
P036401020107theijes
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
 
an efficient approach for co extracting opinion targets based in online revie...
an efficient approach for co extracting opinion targets based in online revie...an efficient approach for co extracting opinion targets based in online revie...
an efficient approach for co extracting opinion targets based in online revie...INFOGAIN PUBLICATION
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarityrahulmonikasharma
 
Framework for opinion as a service on review data of customer using semantics...
Framework for opinion as a service on review data of customer using semantics...Framework for opinion as a service on review data of customer using semantics...
Framework for opinion as a service on review data of customer using semantics...IJECEIAES
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Aleksi Aaltonen
 
Query formulation process
Query formulation processQuery formulation process
Query formulation processmalathimurugan
 
A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...
A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...
A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...IJARBEST JOURNAL
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methodsiosrjce
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersMOHDSAIFWAJID1
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationIJMER
 

Tendances (19)

Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
 
P036401020107
P036401020107P036401020107
P036401020107
 
A0210110
A0210110A0210110
A0210110
 
Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597
 
N15-1013
N15-1013N15-1013
N15-1013
 
Ijetcas14 446
Ijetcas14 446Ijetcas14 446
Ijetcas14 446
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
 
D1802023136
D1802023136D1802023136
D1802023136
 
an efficient approach for co extracting opinion targets based in online revie...
an efficient approach for co extracting opinion targets based in online revie...an efficient approach for co extracting opinion targets based in online revie...
an efficient approach for co extracting opinion targets based in online revie...
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarity
 
Framework for opinion as a service on review data of customer using semantics...
Framework for opinion as a service on review data of customer using semantics...Framework for opinion as a service on review data of customer using semantics...
Framework for opinion as a service on review data of customer using semantics...
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
 
Query formulation process
Query formulation processQuery formulation process
Query formulation process
 
A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...
A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...
A SURVEY ON QUESTION AND ANSWER SYSTEM BY RETRIEVING THE DESCRIPTIONS USING L...
 
In3415791583
In3415791583In3415791583
In3415791583
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methods
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clusters
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
 

En vedette

Structural Design and Rehabilitation of Reinforced Concrete Structure
Structural Design and Rehabilitation of Reinforced Concrete StructureStructural Design and Rehabilitation of Reinforced Concrete Structure
Structural Design and Rehabilitation of Reinforced Concrete StructureIJERA Editor
 
Development and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI ImagesDevelopment and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI ImagesIJERA Editor
 
An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...
An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...
An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...IJERA Editor
 
Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...
Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...
Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...IJERA Editor
 
Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...
Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...
Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...IJERA Editor
 
A Review of Intel Galileo Development Board’s Technology
A Review of Intel Galileo Development Board’s TechnologyA Review of Intel Galileo Development Board’s Technology
A Review of Intel Galileo Development Board’s TechnologyIJERA Editor
 
Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...
Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...
Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...IJERA Editor
 
Model formulation and Analysis of Total Weight of Briquettes after mixing for...
Model formulation and Analysis of Total Weight of Briquettes after mixing for...Model formulation and Analysis of Total Weight of Briquettes after mixing for...
Model formulation and Analysis of Total Weight of Briquettes after mixing for...IJERA Editor
 
Number of iterations needed in Monte Carlo Simulation using reliability analy...
Number of iterations needed in Monte Carlo Simulation using reliability analy...Number of iterations needed in Monte Carlo Simulation using reliability analy...
Number of iterations needed in Monte Carlo Simulation using reliability analy...IJERA Editor
 
Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...
Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...
Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...IJERA Editor
 
Direct Kinematic modeling of 6R Robot using Robotics Toolbox
Direct Kinematic modeling of 6R Robot using Robotics ToolboxDirect Kinematic modeling of 6R Robot using Robotics Toolbox
Direct Kinematic modeling of 6R Robot using Robotics ToolboxIJERA Editor
 
Investigation of Different Types of Cement Material on Thermal Properties of ...
Investigation of Different Types of Cement Material on Thermal Properties of ...Investigation of Different Types of Cement Material on Thermal Properties of ...
Investigation of Different Types of Cement Material on Thermal Properties of ...IJERA Editor
 
Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...
Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...
Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...IJERA Editor
 
Collation of Mobile operatives
Collation of Mobile operativesCollation of Mobile operatives
Collation of Mobile operativesIJERA Editor
 
An Evaluation System of Surface Water Quality in Algeria (Application on the ...
An Evaluation System of Surface Water Quality in Algeria (Application on the ...An Evaluation System of Surface Water Quality in Algeria (Application on the ...
An Evaluation System of Surface Water Quality in Algeria (Application on the ...IJERA Editor
 
Data Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentData Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentIJERA Editor
 
Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...
Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...
Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...IJERA Editor
 
Effect of Petrophysical Parameters on Water Saturation in Carbonate Formation
Effect of Petrophysical Parameters on Water Saturation in Carbonate FormationEffect of Petrophysical Parameters on Water Saturation in Carbonate Formation
Effect of Petrophysical Parameters on Water Saturation in Carbonate FormationIJERA Editor
 

En vedette (18)

Structural Design and Rehabilitation of Reinforced Concrete Structure
Structural Design and Rehabilitation of Reinforced Concrete StructureStructural Design and Rehabilitation of Reinforced Concrete Structure
Structural Design and Rehabilitation of Reinforced Concrete Structure
 
Development and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI ImagesDevelopment and Comparison of Image Fusion Techniques for CT&MRI Images
Development and Comparison of Image Fusion Techniques for CT&MRI Images
 
An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...
An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...
An Improved Bandwidth for Electromagnetic Gap Coupled Rhombus Shaped Microstr...
 
Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...
Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...
Evaluation of Effect of Lateral Forces on Multi-Storeyed Rcc Frame by Conside...
 
Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...
Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...
Experimental studies on PCM filled Flat Plate Solar Water Heater without and ...
 
A Review of Intel Galileo Development Board’s Technology
A Review of Intel Galileo Development Board’s TechnologyA Review of Intel Galileo Development Board’s Technology
A Review of Intel Galileo Development Board’s Technology
 
Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...
Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...
Determination of Proton Energy and Dosage to Obtain SOBP Curve in the Proton ...
 
Model formulation and Analysis of Total Weight of Briquettes after mixing for...
Model formulation and Analysis of Total Weight of Briquettes after mixing for...Model formulation and Analysis of Total Weight of Briquettes after mixing for...
Model formulation and Analysis of Total Weight of Briquettes after mixing for...
 
Number of iterations needed in Monte Carlo Simulation using reliability analy...
Number of iterations needed in Monte Carlo Simulation using reliability analy...Number of iterations needed in Monte Carlo Simulation using reliability analy...
Number of iterations needed in Monte Carlo Simulation using reliability analy...
 
Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...
Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...
Transport properties of Gum mediated synthesis of Indium Oxide (In2O3) Nano f...
 
Direct Kinematic modeling of 6R Robot using Robotics Toolbox
Direct Kinematic modeling of 6R Robot using Robotics ToolboxDirect Kinematic modeling of 6R Robot using Robotics Toolbox
Direct Kinematic modeling of 6R Robot using Robotics Toolbox
 
Investigation of Different Types of Cement Material on Thermal Properties of ...
Investigation of Different Types of Cement Material on Thermal Properties of ...Investigation of Different Types of Cement Material on Thermal Properties of ...
Investigation of Different Types of Cement Material on Thermal Properties of ...
 
Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...
Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...
Mechanistic Aspects of Oxidation of P-Bromoacetophen one by Hexacyanoferrate ...
 
Collation of Mobile operatives
Collation of Mobile operativesCollation of Mobile operatives
Collation of Mobile operatives
 
An Evaluation System of Surface Water Quality in Algeria (Application on the ...
An Evaluation System of Surface Water Quality in Algeria (Application on the ...An Evaluation System of Surface Water Quality in Algeria (Application on the ...
An Evaluation System of Surface Water Quality in Algeria (Application on the ...
 
Data Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentData Ware House System in Cloud Environment
Data Ware House System in Cloud Environment
 
Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...
Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...
Comparison of Shell and Tube Heat Exchanger using Theoretical Methods, HTRI, ...
 
Effect of Petrophysical Parameters on Water Saturation in Carbonate Formation
Effect of Petrophysical Parameters on Water Saturation in Carbonate FormationEffect of Petrophysical Parameters on Water Saturation in Carbonate Formation
Effect of Petrophysical Parameters on Water Saturation in Carbonate Formation
 

Similaire à Document Retrieval System, a Case Study

Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringIOSR Journals
 
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...IJNSA Journal
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learningcsandit
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGRAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
Improved method for pattern discovery in text mining
Improved method for pattern discovery in text miningImproved method for pattern discovery in text mining
Improved method for pattern discovery in text miningeSAT Journals
 
Improved method for pattern discovery in text mining
Improved method for pattern discovery in text miningImproved method for pattern discovery in text mining
Improved method for pattern discovery in text miningeSAT Publishing House
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesKausar Mukadam
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringIRJET Journal
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPIRJET Journal
 

Similaire à Document Retrieval System, a Case Study (20)

Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
J017145559
J017145559J017145559
J017145559
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document Clustering
 
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : A C...
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
Measure Term Similarity Using a Semantic Network Approach
Measure Term Similarity Using a Semantic Network ApproachMeasure Term Similarity Using a Semantic Network Approach
Measure Term Similarity Using a Semantic Network Approach
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGRAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
Improved method for pattern discovery in text mining
Improved method for pattern discovery in text miningImproved method for pattern discovery in text mining
Improved method for pattern discovery in text mining
 
Improved method for pattern discovery in text mining
Improved method for pattern discovery in text miningImproved method for pattern discovery in text mining
Improved method for pattern discovery in text mining
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
 
G04124041046
G04124041046G04124041046
G04124041046
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 

Dernier

welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Dernier (20)

welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

Document Retrieval System, a Case Study

  • 1. Henrique Tiggemann.et al. Int. Journal of Engineering Research and Application www.ijera.com ISSN : 2248-9622, Vol. 6, Issue 7, ( Part -4) July 2016, pp.20-22 www.ijera.com 20|P a g e Document Retrieval System, a Case Study Mahmood Alfathe,Safa Al-Taie Electrical and Computer Engineering DepartmentFlorida Institute of Technology, Melbourne, FL 32901, USA ABSTRACT In this work we have proposed a method for automatic indexing and retrieval. This method will provide as a result the most likelihood document which is related to the input query. The technique used in this project is known as singular-value decomposition, in this method a large term by document matrix is analyzed and decomposed into 100 factors. Documents are represented by 100 item vector of factor weights. On the other hand queries are represented as pseudo-document vectors, which are formed from weighed combinations of terms. Keywords–Indexing, Retrieval, Speech, Text I. INTRODUCTION This approach tries to overcome the problems of term matching retrieval by statistically treating the unhealable term-document association data. In this proposed method it is assumed that there is some underlying latent semantic structure of data that is may be have been obscured by the random distribution of words. This latent structure has been estimated by using statistical techniques to eliminate the “noise”. The Latent Semantic Indexing (LSI) which we have used in this project is the singular-value decomposition. We have constructed a “semantic space” by taking a massive matrix of term document association data, where documents and terms that are closely associated have been placed together. With the singular-value decomposition we could arrange the space as all associative patterns of the data and neglect and ignore all he miner, less important effects. At the end most of the terms that have not appear in the document still end close to the document. The position in the space appeared as a kind of indexing. At the end the position of the term in the space of the document determines if the is close enough to the document or not. Insufficiency Of Current Automatic Indexing And Retrieval Methods A fundamental deficiency of current information retrieval methods is that the words searchers use often are not the same as those by which the information they seek has been indexed. There are actually two sides to the issue; we will call them broadly synonymy and polysemy. We use synonymy in a very general sense to describe the fact that there are many ways to refer to the same object. Users in different contexts, or with different needs, knowledge, or linguistic habits will describe the same information using different terms. Indeed, we have found that the degree of variability in descriptive term usage is much greater than is commonly suspected. For example, two people choose the same main key word for a single well- known object less than 20% of the time [5]. Comparably poor agreement has been reported in studies of inter-indexer consistency [6] and in the generation of search terms by either expert intermediaries [7] or less experienced searchers [8] [9] . The prevalence of synonyms tends to decrease the "recall" performance of retrieval systems. By polysemy we refer to the general fact that most words have more than one distinct meaning (homography). In different contexts or when used by different people the same term (e.g. "chip") takes on varying referential significance. Thus the use of a term in a search query does not necessarily mean that a document containing or labeled by the same term is of interest. Polysemy isone factor underlying poor "precision". The failure of current automatic indexing to overcome these problems can be largely traced to three factors. The first factor is that the way index terms are identified is incomplete. The terms used to describe or index a document typically contain only a fraction of the terms that users as agroup will try to look it up under. This is partly because the documents themselves do not contain all the terms users will apply, and sometimes because term selection procedures intentionally omit many of the terms in a document. Attempts to deal with the synonymy problem have relied on intellectual or automatic term expansion, or the construction of a thesaurus. These are presumably advantageous for conscientious and knowledgeable searchers who can use such tools to suggest additional search terms. The drawback for fully automatic methods is that some added terms may have different meaning from that intended (the polysemy effect) leading to rapid degradation of precision [1]. It is worth noting in passing that experiments with small interactive data bases have shown monotonic improvements in recall RESEARCH ARTICLE OPEN ACCESS
  • 2. Henrique Tiggemann.et al. Int. Journal of Engineering Research and Application www.ijera.com ISSN : 2248-9622, Vol. 6, Issue 7, ( Part -4) July 2016, pp.20-22 www.ijera.com 21|P a g e rate without overall loss of precision as more indexing terms, either taken from the documents or from large samples of actual users‟ words are added [2] [3] . Whether this "unlimited aliasing" method, which we have described elsewhere, will be effective in very large data bases remains to be determined. Not only is there a potential issue of ambiguity and lack of precision, but the problem of identifying index terms that are not in the text of documentsgrows cumbersome. This was one of the motives for the approach to be described here. The second factor is the lack of an adequate automatic method for dealing with polysemy. One common approach is the use of controlled vocabularies and human intermediaries to act as translators. Not only is this solution extremely expensive, but it is not necessarily effective. Another approach is to allow Boolean intersection or coordination with other terms to disambiguate meaning. Success is severely hampered by users‟ inability to think of appropriate limiting terms if they do exist, and by the fact that such terms may not occur in the documents or may not have been included in the indexing. The third factor is somewhat more technical, having to do with the way in which current automatic indexing and and retrieval systems actually work. In such systems each word type is treated as independent of any other [4]. Thus matching (or not) both of two terms that almost always occur together is counted as heavily as matching two that are rarely found in the same document. Thus the scoring of success, in either straight Boolean or Coordination level searches, fails to take redundancy into account, and as a result may distort results to an unknown degree. This problem exacerbates a user‟s difficulty in using compound- term queries effectively to expand or limit a search. II. THE CHOICE OF METHOD FOR UNCOVERING LATENT SEMANTIC STRUCTURE The goal is to find and fit a useful model of the relationships between terms and documents. We want to use the matrix of observed occurrences of terms applied to documents to estimate parameters of that underlying model. With the resulting model we can then estimate what the observed occurrences really should have been. In this way, for example, we might predict that a given term should be associated with a document, even though, because of variability in word use, no such association was observed. The first question is what sort of model to choose. A notion of semantic similarity, between documents and between terms, seems central to modeling the patterns of term usage across documents. This led us to restrict consideration to proximity models, i.e., models that try to put similar items near each other in some space or structure. Such models include: hierarchical, partition and overlapping clustering; ultra-metric and additive trees; and factor-analytic and multidimensional distance models,as shown in the survey in [10]. Aiding information retrieval by discovering latent proximity structure has at least two lines of precedence in the literature. Hierarchical classification analyses are frequently used for term and document clustering [11]. Latent class analysis and factor analysis have also been explored before for automatic document indexing and retrieval. III. CONCLUSION Assuming there is a set of „correct answers‟ to the query. The docs in this set are called relevant to the query. The set of documents returned by the system are called retrieved documents. Precision, is what percentage of the retrieved documents are relevant. Recall, is what percentage of all relevant documents are retrieved. A noisy input is a common problem especially with the OCR techniques where many problems can occur like spelling errors, this suggested method will spell the letters or words correctly. REFERENCES [1]. Tougas, Jane E. and Raymond J. Spiteri. "Updating The Partial Singular Value Decomposition In Latent Semantic Indexing". Computational Statistics & Data Analysis 52.1 (2007): 174-183. Web. [2]. Aswani Kumar, Ch., M. Radvansky, and J. Annapurna. "Analysis Of A Vector Space Model, Latent Semantic Indexing And Formal Concept Analysis For Information Retrieval". Cybernetics and Information Technologies 12.1 (2012): n. pag. Web. [3]. Atreya, Avinash and Charles Elkan. "Latent Semantic Indexing (LSI) Fails For TREC Collections". SIGKDD Explor. Newsl. 12.2 (2011): 5. Web. [4]. Phadnis, Neelam and Jayant Gadge. "Framework For Document Retrieval Using Latent Semantic Indexing". International Journal of Computer Applications 94.14 (2014): 37-41. Web. [5]. Papadimitriou, Christos H. et al. "Latent Semantic Indexing: A Probabilistic Analysis".Journal of Computer and System Sciences 61.2 (2000): 217-235. Web. [6]. Landauer, Thomas K. Handbook Of Latent Semantic Analysis. New York: Routledge, 2011. Print. [7]. Foltz, P. W. "Using Latent Semantic Indexing For Information
  • 3. Henrique Tiggemann.et al. Int. Journal of Engineering Research and Application www.ijera.com ISSN : 2248-9622, Vol. 6, Issue 7, ( Part -4) July 2016, pp.20-22 www.ijera.com 22|P a g e Filtering". SIGOIS Bull. 11.2-3 (1990): 40- 47. Web. [8]. Aswani Kumar, Ch., M. Radvansky, and J. Annapurna. "Analysis Of A Vector Space Model, Latent Semantic Indexing And Formal Concept Analysis For Information Retrieval". Cybernetics and Information Technologies 12.1 (2012): n. pag. Web. [9]. Papadimitriou, Christos H. et al. "Latent Semantic Indexing: A Probabilistic Analysis".Journal of Computer and System Sciences 61.2 (2000): 217-235. Web. [10]. "Latent Semantic Indexing Analysis Of K- Means Document Clustering For Changing Index Terms Weighting". The KIPS Transactions:PartB 10B.7 (2003): 735-742. Web. [11]. Martínez-Torres, M. R. "Content Analysis Of Open Innovation Communities Using Latent Semantic Indexing". Technology Analysis & Strategic Management 27.7 (2015): 859-875. Web.