Soumettre la recherche
Mettre en ligne
Ijetcas14 624
•
0 j'aime
•
401 vues
Iasir Journals
Suivre
A Survey on String Similarity Matching Search Techniques
Lire moins
Lire la suite
Formation
Signaler
Partager
Signaler
Partager
1 sur 3
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Ijetcas14 639
Ijetcas14 639
Iasir Journals
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and more
Adriel Café
Data Integration Ontology Mapping
Data Integration Ontology Mapping
Pradeep B Pillai
Ontology Mapping
Ontology Mapping
samhati27
Learning ontologies
Learning ontologies
Alexander De Leon
Ontology Mapping
Ontology Mapping
butest
A03730108
A03730108
theijes
Ontology mapping for the semantic web
Ontology mapping for the semantic web
Worawith Sangkatip
Recommandé
Ijetcas14 639
Ijetcas14 639
Iasir Journals
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and more
Adriel Café
Data Integration Ontology Mapping
Data Integration Ontology Mapping
Pradeep B Pillai
Ontology Mapping
Ontology Mapping
samhati27
Learning ontologies
Learning ontologies
Alexander De Leon
Ontology Mapping
Ontology Mapping
butest
A03730108
A03730108
theijes
Ontology mapping for the semantic web
Ontology mapping for the semantic web
Worawith Sangkatip
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clusters
MOHDSAIFWAJID1
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
cscpconf
Identifying the semantic relations on
Identifying the semantic relations on
ijistjournal
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
IOSR Journals
Information extraction using discourse
Information extraction using discourse
ijitcs
Ontology-based Data Integration
Ontology-based Data Integration
Janna Hastings
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
iosrjce
G04124041046
G04124041046
IOSR-JEN
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic Galvan1
CorinaF
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignment
Guus Schreiber
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
University of Bari (Italy)
TextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
Shubhangi Tandon
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[
Zac Darcy
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
IJERA Editor
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
dannyijwest
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
Ijetcas14 643
Ijetcas14 643
Iasir Journals
Ijetcas14 648
Ijetcas14 648
Iasir Journals
Ijetcas14 641
Ijetcas14 641
Iasir Journals
Ijetcas14 632
Ijetcas14 632
Iasir Journals
Contenu connexe
Tendances
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clusters
MOHDSAIFWAJID1
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
cscpconf
Identifying the semantic relations on
Identifying the semantic relations on
ijistjournal
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
IOSR Journals
Information extraction using discourse
Information extraction using discourse
ijitcs
Ontology-based Data Integration
Ontology-based Data Integration
Janna Hastings
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
iosrjce
G04124041046
G04124041046
IOSR-JEN
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic Galvan1
CorinaF
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignment
Guus Schreiber
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
University of Bari (Italy)
TextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
Shubhangi Tandon
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[
Zac Darcy
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
IJERA Editor
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
dannyijwest
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
Tendances
(18)
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clusters
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
Identifying the semantic relations on
Identifying the semantic relations on
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Docu...
Information extraction using discourse
Information extraction using discourse
Ontology-based Data Integration
Ontology-based Data Integration
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
G04124041046
G04124041046
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic Galvan1
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignment
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
TextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
En vedette
Ijetcas14 643
Ijetcas14 643
Iasir Journals
Ijetcas14 648
Ijetcas14 648
Iasir Journals
Ijetcas14 641
Ijetcas14 641
Iasir Journals
Ijetcas14 632
Ijetcas14 632
Iasir Journals
ijetcas14 650
ijetcas14 650
Iasir Journals
Ijetcas14 337
Ijetcas14 337
Iasir Journals
Ijetcas14 647
Ijetcas14 647
Iasir Journals
Ijetcas14 619
Ijetcas14 619
Iasir Journals
BITSAA 30 under 30 Awards 2005
BITSAA 30 under 30 Awards 2005
Anupendra Sharma
En vedette
(9)
Ijetcas14 643
Ijetcas14 643
Ijetcas14 648
Ijetcas14 648
Ijetcas14 641
Ijetcas14 641
Ijetcas14 632
Ijetcas14 632
ijetcas14 650
ijetcas14 650
Ijetcas14 337
Ijetcas14 337
Ijetcas14 647
Ijetcas14 647
Ijetcas14 619
Ijetcas14 619
BITSAA 30 under 30 Awards 2005
BITSAA 30 under 30 Awards 2005
Similaire à Ijetcas14 624
Text databases and information retrieval
Text databases and information retrieval
unyil96
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
INFOGAIN PUBLICATION
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
IOSR Journals
C017161925
C017161925
IOSR Journals
A0210110
A0210110
inventionjournals
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Universitas Pembangunan Panca Budi
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
IOSR Journals
An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...
IJECEIAES
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
IJORCS
Ju3517011704
Ju3517011704
IJERA Editor
P036401020107
P036401020107
theijes
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IJDKP
String Searching and Matching
String Searching and Matching
Umma Khatuna Jannat
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
International Journal of Technical Research & Application
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
cscpconf
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
ijcnes
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
IJCSIS Research Publications
Bl24409420
Bl24409420
IJERA Editor
An optimal unsupervised text data segmentation 3
An optimal unsupervised text data segmentation 3
prj_publication
Similaire à Ijetcas14 624
(20)
Text databases and information retrieval
Text databases and information retrieval
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
C017161925
C017161925
A0210110
A0210110
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Ju3517011704
Ju3517011704
P036401020107
P036401020107
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
String Searching and Matching
String Searching and Matching
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
Bl24409420
Bl24409420
An optimal unsupervised text data segmentation 3
An optimal unsupervised text data segmentation 3
Plus de Iasir Journals
Ijetcas14 615
Ijetcas14 615
Iasir Journals
Ijetcas14 608
Ijetcas14 608
Iasir Journals
Ijetcas14 605
Ijetcas14 605
Iasir Journals
Ijetcas14 604
Ijetcas14 604
Iasir Journals
Ijetcas14 598
Ijetcas14 598
Iasir Journals
Ijetcas14 594
Ijetcas14 594
Iasir Journals
Ijetcas14 593
Ijetcas14 593
Iasir Journals
Ijetcas14 591
Ijetcas14 591
Iasir Journals
Ijetcas14 589
Ijetcas14 589
Iasir Journals
Ijetcas14 585
Ijetcas14 585
Iasir Journals
Ijetcas14 584
Ijetcas14 584
Iasir Journals
Ijetcas14 583
Ijetcas14 583
Iasir Journals
Ijetcas14 580
Ijetcas14 580
Iasir Journals
Ijetcas14 578
Ijetcas14 578
Iasir Journals
Ijetcas14 577
Ijetcas14 577
Iasir Journals
Ijetcas14 575
Ijetcas14 575
Iasir Journals
Ijetcas14 572
Ijetcas14 572
Iasir Journals
Ijetcas14 571
Ijetcas14 571
Iasir Journals
Ijetcas14 567
Ijetcas14 567
Iasir Journals
Ijetcas14 562
Ijetcas14 562
Iasir Journals
Plus de Iasir Journals
(20)
Ijetcas14 615
Ijetcas14 615
Ijetcas14 608
Ijetcas14 608
Ijetcas14 605
Ijetcas14 605
Ijetcas14 604
Ijetcas14 604
Ijetcas14 598
Ijetcas14 598
Ijetcas14 594
Ijetcas14 594
Ijetcas14 593
Ijetcas14 593
Ijetcas14 591
Ijetcas14 591
Ijetcas14 589
Ijetcas14 589
Ijetcas14 585
Ijetcas14 585
Ijetcas14 584
Ijetcas14 584
Ijetcas14 583
Ijetcas14 583
Ijetcas14 580
Ijetcas14 580
Ijetcas14 578
Ijetcas14 578
Ijetcas14 577
Ijetcas14 577
Ijetcas14 575
Ijetcas14 575
Ijetcas14 572
Ijetcas14 572
Ijetcas14 571
Ijetcas14 571
Ijetcas14 567
Ijetcas14 567
Ijetcas14 562
Ijetcas14 562
Dernier
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
Shobhayan Kirtania
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
GeoBlogs
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
pragatimahajan3
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Pooja Nehwal
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
TechSoup
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
Disha Kariya
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
EduSkills OECD
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
GaneshChakor2
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
Chameera Dedduwage
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
sanyamsingh5019
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
Dr. Mazin Mohamed alkathiri
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
National Information Standards Organization (NISO)
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
nomboosow
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
dawncurless
Dernier
(20)
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
Ijetcas14 624
1.
International Association of
Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net IJETCAS 14-624; © 2014, IJETCAS All Rights Reserved Page 286 ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 A Survey on String Similarity Matching Search Techniques S.Balan1, Dr. P.Ponmuthuramalingam2 1Ph.D. Research Scholar, 2Associate Professor & Head, Department of Computer Science, Government Arts College (Autonomous), Coimbatore, Tamilnadu, INDIA. Abstract: String similarity matching search Problem is mainly used to find text which is present in the documents. In thousands of years many features are available in the modern world but yet people not realized to find the information correctly. Because of huge amount of information’s stored in the World Wide Web. The field of information retrieval was born in the year 1950 and H.P. Luhun in the year of 1957 find the basic idea of searching text with computer. The problem of string matching is to find errors .for example in online searching, user faces different problems and irrelevant information’s. The goal of this survey is to present overview of string similarity matching and comparison of different algorithms to conclude the better performance on searching the text. There are many areas where this problem appears and one of the most demanding is information retrieval to find relevant information in text collection and the important tool is named as string matching. Keywords: Information retrieval, String Matching, Similarity Search, Approximate String Match I. Introduction In recent years the problem is growing communities of information retrieval and computational biology. The field of information retrieval problem can be addressed into different views. A string is a sequence of characters over a finite set of alphabet. Similarity search provides a list of input data similar to an input query. In the context of search engines such as Google or yahoo search is based on document similarity and query similarity. Document similarity is nothing but overall similarity of an entire document to the given query. Query similarity suggests many query strings while searching is based on machine learning. [Thomas Bocek, et al., 1997]. At first 1992, text retrieval conference or TREC [Harman 1993] sponsored by US government which aims to encouraging research in information retrieval from large text collections. In that many old techniques are modified and many new techniques are identified to retrieve over large number of text collections. The first algorithms developed in information retrieval for searching the World Wide Web during the year 1996 to 1998. Early there are various models and implementations are available for information retrieval system. Boolean system is used to specify the user information based on combination of And, Or, Not’s. Using this system they are not overcome to produce the relevant information. Several models are proposed for these process in that three most models are vector space model, the probabilistic models, and inference network model [Amit Singhal 2001]. Vector space model is represented by a vector of terms [Gerard Salton, 1975]. Terms are typically words or phrases. Any text can be represented by a vector in high dimensional space. Text belongs to non-zero value. Most vector term processed in a positive value to assign a numeric score to a document for a query. In the year of 1960 maron and kuhun proposed many Probabilistic model and it is based on the general principle that document in a collection should be ranked by decreasing probability of their relevance to a query [Amit Singhal 2001]. Estimation is the key part of this model. Inference network model is a document retrieval model as an inference process in an inference network. [Van Rijsbergen1979] Most techniques implemented under this model. Similarity search is important for time- sensitive applications. The increasing amounts of electronic information available on the web in order to improve data quality or find all information based on the user request. To provide a similarity search in the dictionary size may be too slow for many applications. There are various existing methods are available for fast similarity search for example English dictionary and a randomly generated dictionary and compared search performance for dynamic programming, a keyword tree, neighborhood generation and n-grams with index lookup extraction [Amit Chandel, 2006]. The extraction of structured and unstructured text is a challenging problem in many applications such as data warehousing, web data integration and bio-informatics. For example, to identify book author from html pages, match of text string with book author is displayed and found the accuracy of the string extraction [Amit Chandel, 2006]. This paper categorized into four sections. Section-1 contain the introduction to information retrieval and string similarity search, Section-2 contain the literature survey, Section-3 contain Analysis of string similarity search Section-4 includes conclusion while references mentioned in the last section. II. Literature Survey It is defined as a finite state pattern matching machine from the keywords to process the text string in a single pass. To improve the speed of a library bibliographic search program by factor of 5 to 10. The main purpose of
2.
S. Balan et
al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, pp. 286- 288 IJETCAS 14-624; © 2014, IJETCAS All Rights Reserved Page 287 this technique is to allow a bibliographer to find in a citation index all titles and satisfying some Boolean function of keywords and phrases. If m is a program which takes as input the text string s and produces as output the locations in p at which keywords y appear as substrings. It consists of a set of states and it is represented by a number. The behavior of the pattern matching machine is carried out by three functions named as go to function go, a failure function fa and an output function out [Alfred V. Aho, et al.,1975]. Edit distance [Levenstein V.I, 1966] is the minimum number of operations required to transform one string into another with operations being a deletion, an insertion or a replacement. Navarro’s NR-grep [Navarro.G, 2000] is an exhaustive online similarity search algorithm. NR stands for non-deterministic reverse pattern matching. It uses bit-parallelism and forward and backward searching. An n-gram is created by sliding a window of length g over the data and noting the content and position of all such windows. An extension of this approach for large text collections uses cosine similarity [Koudas, et al., 2004], t is a global measure to represent a vector of their frequencies. Approximate similarity search based on hashing is to hash the points from the database from the probability of higher objects that close to another. It is based on hierarchical tree decomposition for large number of dimensions. There are various algorithms such as locality-sensitive hashing, analysis of locality-sensitive hashing and nearest neighbor search. Approximate string matching is about finding a pattern in a text where one or both of them have suffered some kind of undesirable corruption. The classification and the existing schemes in context of data structure are suffix tree, suffix array, Q-grams, Q-samples. Search approach method is classified into two ways namely partitioning into exact searching and intermediate partitioning based on text and patterns [Kaushik Chakrabartie, et al., 2000]. The existing algorithms are hamming distance, reversals, block distance, Q-gram distance, allowing swaps, approximate searching in multidimensional texts, in graphs, multi pattern approximate matching , non standard algorithms such as approximate or parallel algorithms, indexed searching, these are the other surveys on string similarity matching. There are various string matching types namely multiple string match, extended string matching, regular expression matching and approximate matching. The approximate matching contains various algorithms to find the similarity of given string such as dynamic programming algorithms, computing edit distance, text searching, improving the average case, other algorithm based on dynamic programming, algorithms based on automata, bit-parallel algorithms, parallelizing the NFA, parallelizing the DP matrix, algorithm for fast filtering the text, partitioning into k + 1 pieces, approximate BNDM, other filtration algorithms, multi pattern approximate searching, a hashing based algorithm for one error, searching for extended strings and regular expressions. III. Analysis of String Similarity Matching Techniques Sno Author Name Title Methods Advantages Dis Advantages 1 Alfred V. Aho and Margaret J. Corasick Efficient String Matching An Aid to Bibliographic Search Pattern matching algorithm Construction of go to, output and failure functions Time complexity of algorithms Locates keyword in a text string Directed graph begins at the state 0 Time complexity is large Substrings may overlap with one another Partially computed output function Failure function stored in one dimensional array 2 Arvind Arasu, Venkatesh Ganti, et al.; Efficient Exact-Set Similarity Joins Threshold based SSJoin Hamming SSJoin Jaccard SSJoin Threshold parameter is high Vector representation between two sets Similarity value is 0 or 1. Different similarity sets Dimension is differ Common elements 3 Thomas Bocek, Burkhard Stiller, et al., Fast Similarity Search in Large Dictionaries Edit distance NR|-Grep N-grams and Cosine Similarity Minimum operations required from one string to one string to another Reverse pattern matching Offline approach Dictionary size is low Avoids number of searching words in NR- grep method Similarity is shared 4 Kaushik Chakrabarti, Dong Xin, et al., An Efficient Filter for Approximate Membership Checking Pruning condition Filtering by ISH Weighted signatures Three similarity measures are identified Sub string search is quick Weighted signature is in decreasing order Lower bound value is not identified String similarity is less Different number of signatures
3.
S. Balan et
al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(3), June-August, 2014, pp. 286- 288 IJETCAS 14-624; © 2014, IJETCAS All Rights Reserved Page 288 5 Amit Chandel, P.C.Nagesh, et al., Efficient Batch Top-k for Dictionary-based Entity Recognition Batch Top-K Simple Top-K Segmented Algorithm Finding the most top-k score Decreasing IDF Values A token of a the sub query is strong or weak Increasing run time for threshold values Upper bound scoreless is removed Existing tight features is not unique 6 Aristides Gionis, Piotr Indyk, et al., Similarity Search in High Dimensions via Hashing Locality Sensitive Hashing Color Histograms Texture Features Better run time Dependence on data size To measure the performance Value is small and there is resort needed One index is not sufficient Compare with SR-tree is low 7 Daniel Karch,Dennis Luxen,etal., Improved Fast Similarity Search in Dictionaries Preprocessing Space Preprocessing Time Query Performance String Split Parameter based on query time Ten Times Faster Maximum Distance calculated Speed is low Does not Store any information’s Query time and search space size is average. 8 Amit Singhal Modern Information Retrieval: A Brief Overview Vector Space Model Probabilistic Model Inference Network Model Calculate using the Term Weighting Relevance feedback based on user queries Retrieval effectiveness Boolean systems are less effective Poor stemming Style of phrase generation is not critical IV. Conclusion In this paper, survey focus on various algorithms for string similarity matching based on search techniques. Some of the algorithm for set similarity with its property value is 0 or 1. It indicates the previous algorithms matches more than in many cases. The performance of the algorithm is analyzed and stated in a table manner. Additionally it focuses on information retrieval and search engine in World Wide Web. To improve the quality of a word search similarity, next the exact similarity is finer based on semantic relationship of a word. This further reduces the time size for a large database. V. References [1]. Alfred V. Aho and Margaret J. Corasick Bell Laboratories, Efficient String Matching An Aid to Bibliographic Search, communications of the ACM, Vol. 18 No.6, June 1975. [2]. Amit Chandel, P.C.Nagesh, Suita Sarawagi, Efficient Batch Top-k for Dictionary-based Entity Recognition, Proc. 22nd International Conference Data Engineering., pp.28, 2006. [3]. Amit Singhal, Modern Information Retrieval: A Brief Overview, IEEE Computer Society Technical Committee on Data Engineering, pp 1-9, 2001. [4]. Aristides Gionis, Piotr Indyk, Rajeev Motwani, Similarity Search in High Dimensions via Hashing, Proceedings of the 25th VLDB Conference,Edinburgh, Scotland, pp 518, 1999. [5]. Arvind Arasu, Venkatesh Ganti, Raghav Kaushik, Efficient Exact-Set Similarity Joins, VLDB ’06, September 12-15, 2006, Seoul, Korea,VLDB Endowment, ACM 1-59593-385-9/06/09. [6]. Daniel Karch,Dennis Luxen, Peter Sanders, Improved Fast Similarity Search in Dictionaries, presented at the 17th Symposium on String Processing and Information Retrieval, 2010. [7]. Gerard Salton, A.Wong, and C. S. Yang. A vector space model for information retrieval. Communications of the ACM, 18(11):613–620, November 1975. [8]. Harman D.K, Overview of the first Text Retrieval Conference (TREC-1). In Proceedings of the First Text REtrieval Conference (TREC-1), pages 1–20. NIST Special Publication 500-207, March 1993. [9]. Kaushik Chakrabarti, Dong Xin, et al., An Efficient Filter for Approximate Membership Checking, SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada, 2008 ACM 9781605581026/08/06. [10]. Koudas D.S.N, A. Marathe. Flexible String Matching Against Large Databases in Practice. In VLDB, pages 1078–1086, 2004. [11]. Levenstein V.I, Binary codes capable of correcting insertions and reversals. Sov. Phys. Dokl., 10:707–101966. [12]. Navarro.G, NR-grep: A Fast and Flexible Pattern Matching Tool, Technical Report TR/DCC-2000-3 Technical report, University of Chile, Departmento de Ciencias de la Computacion, Santiago, 2000, http://www.dcc.uchile.cl/gnavarro. [13]. Thomas Bocek, Burkhard Stiller, et al., Fast Similarity Search in Large Dictionaries, University of Zurich, Department of Informatics (IFI), Binzmühl estrasse 14, CH-8050 Zürich, Switzerland, 2007. [14]. Van Rijsbergen C.J, Information Retrieval. Butter worths, London, 1979.
Télécharger maintenant