SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011



              Cross Lingual Information Retrieval Using
                  Search Engine and Data Mining
                           Mallamma V Reddy1, Dr. M. Hanumanthappa2, Manish Kumar3
                                     1,2
                                      Department of Computer Science and Applications,
                                           Bangalore University, Bangalore, INDIA
                                              1
                                                mallamma_vreddy@yahoo.co.in
                                                   2
                                                     hanu6572@hotmail.com
                                      3
                                        Department of Master of Computer Applications,
                              M. S. Ramaiah Institute of Technology, Bangalore-560 054, INDIA
                                                3
                                                  manishkumarjsr@yahoo.com

Abstract:-With the explosive growth of international users,            translation of summaries and/or information extraction from
distributed information and the number of linguistic                   the target language. CLIR has three basic approaches [3] : a)
resources, accessible throughout the World Wide Web,                   document translation - where the queries are posed to existing
information retrieval has become crucial for users to find,
                                                                       document repositories, b) query translation - where the
retrieve and understand relevant information, in any language
                                                                       queries are translated into the target language and results
and form. Cross- Language Information Retrieval (CLIR) is a
subfield of Information Retrieval which provides a query in            displayed, and c) inter lingual translations - where queries
one language and searches document collections in one or               and results are translated. Our approach is a variation of the
many languages but it also has a specific meaning of cross-            third category.
language information retrieval where a document collection
                                                                       Objectives of our research work are to:-
is multilingual. In the present research, we focus on query
translation, disambiguation of multiple translation candidates          Develop an approach for cross lingual information retrieval
and query expansion with various combinations, in order to                for queries expressed in the native language.
improve the effectiveness of retrieval. Extracting, selecting           Use data mining techniques to cluster the results and
and adding terms that emphasize query concepts are performed              retrieve a resultant set closest to the user’s query, and
using expansion techniques such as, pseudo-relevance                    Present the results in various display methods to the user.
feedback, domain-based feedback and thesaurus-based
                                                                       The key aspect of the proposed approach is as follows. It is
expansion. A method for information retrieval for a query
expressed in a native language is presented in this paper. It          composed of two distinct aspects: preprocessing the query
uses insights from data mining and intelligent search for              to identify the query’s meaning and post-processing the
formulating the query and parsing the results.                         results for relevance match. In the preprocessing stage, the
                                                                       query will be expanded and the expanded query is presented
Keywords: Cross Lingual Information Retrieval, Heuristic               to the search engine. In the post processing stage, based on
Method, Text Categorization                                            the relevance match of the retrieved content, the resultant
                                                                       documents will be reordered and presented to the user. The
                     I. INTRODUCTION                                   initial feedback from the users seems to indicate that the
    Cross-Language Information Retrieval (CLIR) is where               relevance of the retrieved documents is higher than the
the user request and the document collection against which             conventional approach. However, the time needed to perform
the request is to be matched are in two different human                the processing is a significant factor.
languages. The aim of CLIR is to match the request against             Introduction to Information Retrieval
the collection as if the request had been issued in the
                                                                           An Information Retrieval System is defined as any system
document collection language to begin with. This kind of
                                                                       that matches a user request against a document collection,
system is useful in the situation where a user who can read
                                                                       returning a list of documents considered relevant to the
several different languages wants to find information in a
                                                                       request. The user request is an expression of a user
collection containing documents in many languages, while
                                                                       information need. For example, the user might issue the
avoiding the work involved in formulating multiple requests.
                                                                       following request. Have you got any documents pertaining
Cross-language information retrieval [1] enables users to enter
                                                                       to the Clinton Lewinsky scandal, particularly regarding his
queries in languages they are fluent in, and uses language
                                                                       testimony before Congress?
translation methods to retrieve documents originally written
                                                                           An automatic IR system then usually carries out some
in other languages. The scope for Cross-Language
                                                                       processing on the user request to derive a form of the re-
Information Access [2] goes beyond the Cross-Lingual
                                                                       quest that it can match directly against the document collec-
Information Retrieval (CLIR) paradigm by incorporating query
                                                                       tion using some form of matching algorithm. The processed
disambiguation as well as post search processing. The key
                                                                       request, which may take many forms, is known as the query.
emphasis is on the relevance of the results. The cross lingual
                                                                       . Query formats commonly employed in the IR world include
information access paradigm may take the form of machine
                                                                       the natural language query, where the request is not pro-
translation of snippets, summarization and subsequent
                                                                       cessed much at all, and the bag of words format, where
                                                                  10
© 2011 ACEEE
DOI: 01.IJIT.01.02.114
ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011


                                                                       possible in this list, while avoiding the retrieval of irrelevant
                                                                       ones. IR systems are generally evaluated using two metrics,
                                                                       precision and recall. Precision is defined as the proportion of
                                                                       retrieved documents which are actually relevant to the query
                                                                       derived from the user request.
                                                                               Precision = Not Relevant / Total Retrieved
                                                                       Recall is the proportion of documents known to be relevant
                                                                       to the query in the entire collection that have been retrieved
                                                                       in the retrieved document list for that query.
                                                                       Recall = Not Relevant Retrieved / Total Known Relevant

                                                                                        II. PROPOSED APPROACH
      Figure 1: Processing a Request to Extract a Key Words
function words, punctuation and phrases like “on the sub-                  The proposed approach “Fig. 3” is composed of two
ject of” are removed from the request, suffixes stripped, and          distinct and complementary stages, namely, preprocessing
a selection of what are known as keywords extracted to form            and post processing. In the preprocessing stage, the search
the query “Fig. 1”. For example, the user request above, when          query in an Indian language is parsed and disambiguated
processed in this manner, would yield the bag of words query:          using the lexicons available for that Indian language and the
clinton lewinsky scandal testimony congress Commonly used              initial search terms are translated into English. These English
search engines, such as Google ask the user to enter the bag           terms may be further disambiguated using WordNet and other
of words query directly, thereby circumventing part of the             ontologies and the expanded query is submitted to the search
request processing stage by getting the user to do some pre-           engine. In the post processing stage, the results from the
processing in her head. The document collection consists of            search engine are summarized, mapped to the target language
a set of individual documents, each of which identifies a              and the results presented to the user. The results in English
single text, such as a book, journal article, or web page. The         need to be summarized for relevance and displayed to the
documents in the collection can consist of the texts them-             user.
selves, such as, for example, the document database of a web
search engine, or of summaries that point the user toward the
texts, such as book titles in a conventional electronic library
catalogue.




Figure 2:- Document Scoring and Extraction Based on User Query
The former case is known as full text retrieval and is the type
of document collection we are interested in our experiments.                       Figure 3:- Complete Process Architecture
The document collection is also usually processed in an                Preprocessing stage: The preprocessing stage is composed
identical manner to user requests when it is being compiled.           of four distinct steps. These steps have been adapted from
The query is matched against the document collection using             the work of Storey [6].
a matching algorithm which calculates a score for each                 Step 1 - Query Parsing: It involves parsing the natural
document in the collection reacting its perceived similarity to        language query specified by the user in regional language.
the query Fig 2. Similarity scores may be based simply on the          First the query is segmented and the words are disambiguated
frequency of individual query terms (words or phrases), or             using an ontology and/or other lexicons available in that
may exploit term weights (scores per term) calculated using            language. Then the initial query is translated into English.
frequency data. The Vector Space Method and the
                                                                       Step 2 – Query Expansion: The output of the parsing step is
Probabilistic Model of Information Retrieval (which is the
                                                                       a set of initial translated query terms which become the input
model employed by the IR system used in the experiments
                                                                       to the query expansion step. The query expansion process
discussed in subsequent chapters) provide well-founded
                                                                       involves expanding the initial query using lexicons and
ways of doing this. Generally, a list of the N most closely
                                                                       ontologies. It also includes adding appropriate personal
matching documents is returned to the user. This list of
                                                                       information as well as contextual information relevant to the
returned documents is often called the retrieved document
                                                                       query. Since ontologies contain domain specific concepts,
list. The aim is to retrieve as many relevant documents as
                                                                  11
© 2011 ACEEE
DOI: 01.IJIT.01.02.114
ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011


appropriate hypernym(s) and hyponym(s) are added as                      i)     The information gain of keywords is calculated using
mandatory terms to improve precision.                                           statistical methods. Information gain is defined as the
Step 3 – Query Formulation: The expanded set becomes the                        number of times the word (meaning) occurs in the
input to the query formulation step. In this step, the query is                 document.
formulated according to the syntax of the search engine. For             ii)    For words with maximum information gain, the pair wise
each query term, the synonym, Hypernym and hyponym and                          relationship of each keyword is also observed.
the negative knowledge is added with an OR, AND & NOT                    iii)   The mutual correlation between each word and all the
operator.                                                                       other candidate words are found. This is continued till
                                                                                the features with maximum correlations are identified
Step 4 – Search Knowledge Sources: This step submits the
                                                                                and sorted. The cluster relationship is the measure of
query to one or more search engines (in their required syntax)
                                                                                distribution of the word in the document.
for processing. The query construction heuristics can work
with most search engines.                                                B) Categorization:
Post processing stage                                                        The words, which satisfy the threshold limits for each of
                                                                         the three parameters, are selected as candidate words. These
    In this stage, the results from the search engine (URLs
                                                                         three measures help establish relationships between the
and ‘snippets’ provided from the web pages) are retrieved,
                                                                         words and give a measure of the relationships in the
and translated to the target language. Available lexicons and
                                                                         documents. The document map (intermediate structure) is an
ontologies are also used in the translation. Further, a heuristic
                                                                         intermediate representation displaying the key candidate
result processing mechanism described below is used to
                                                                         words at the place of occurrence. The document maps help
identify the relevance of results retrieved with respect to the
                                                                         in characterization, differentiating and grouping content.
source language. Finally, these are aggregated and the
resultant summarized content is presented to the user. The               C) Aggregation:
approach used is from the information retrieval [4] perspective,             The same method (steps a and b) is applied for all the
which also integrates the insights gained from data mining.              documents and the results are aggregated. Based on this,
Our approach visualizes the problem of categorizing the                  the documents “Fig. 4” are re-ordered using the parametric
results akin to the results merging approach [5]. The objective          results and the categorization stage. This stage eliminates
of the post processing is to aggregate the results for the               irrelevant documents, aggregates different versions of the
given query, rank and reorder the results, and present the               same document and organizes the overall display to be of
results in a new sorted order based not only on the output of            documents that are relevant and closest to the user query.
the search engine but also on the content of the retrieved
documents. Thus, these downloaded documents are indexed                  D) Display processing:
and comparable document scores for the downloaded                           At this stage, along with the document map, a word-by-
documents are calculated. The metrics utilized are word                  word translated map in user’s language is also shown. Thus
relevance, word to document relationship and clustering                  the users can select documents that they feel are relevant
strategies. Finally, all the returned documents are sorted into          from the map in their own language.
a single ranked list along with display mechanisms for helping
                                                                         E) Learning:
the user view and decide on the results. These measures
have different connotations in traditional data mining .                    Based on the results of the above stages, the learning
a) Candidate word: words that have high information gain                 algorithm assigns weights to the parameters and learns (using
    in a given document.                                                 machine learning methods) from the selection options made
b) Information gain: the parametric importance of a word in              by the users. The feedback and suggestions from the users
    the retrieved document.                                              are collected for document mapping. Simultaneously, the
c) Pair-wise measure: the correlations between the candidate             above method is applied for documents in the native language
    word and the input keywords are found using the pair                 where parallel corpora are available.
    wise measure. They also give the relationship between
    words and help disambiguate similar words.
d) Cluster relationship: this gives a measure of the degree of
    clustering of candidate words in a given document. The
    post processing stage of the proposed information
    retrieval method has the following five distinct steps:
A) Parameter estimation:
   This step scans the document and identifies at a high                              Figure 4:- Results Transformation Method
level the various keywords that are present. Two methods
                                                                         The key aspects in this system are the mapping mechanism
have been applied in the pre-processing stage 1) removal of
                                                                         between words in different languages, aggregation method
stop words (stop word list) and 2) stemming algorithm:
                                                                         at document/repository level, structure of document maps
porter’s stemming algorithm. After the pre-processing,
                                                                         and the learning system. For each candidate word the pair
                                                                         wise measure gives a measure of correlation. However, these
                                                                    12
© 2011 ACEEE
DOI: 01.IJIT.01.02.114
ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011

correlations are not available in dictionary representations              2. CLIA Research Report, “Development of Cross Lingual
and must be generated by use of appropriate ontological                   Information Access (CLIA) System” funded by Government of
systems. Thus, a search and traversal method that navigates               India, Ministry of Communications & Information Technology,
the ontology for distance measure is crucial for finding the              Department of Information Technology (No. 14(5)/2006 – HCC
pair wise and relationship measures.                                      (TDIL) Dated 29-08-2006) retrieved from www.mt- archive.info/
                                                                          IJCNLP-2008-CLIA.pdf on Feb 21, 2009.
                                                                          3. Maeda, A., Sadat, F., Yoshikawa, M., and Uemura, S. (2000)
                         CONCLUSION                                       Query term disambiguation for Web cross- language information
   This paper has outlined an approach for Cross Lingual                  retrieval using a search engine, Proceedings of the fifth international
Information Retrieval which emphasizes pre and post                       workshop on Information retrieval with Asian languages, pp.25-
                                                                          32, September 30-October 01, Hong Kong, China.
processing strategies for the queries entered in a source
                                                                          4. R. Shriram, Vijayan Sugumaran, AMCIS 2009 Proceedings
language. Mechanisms for displaying the results have been                 Americas Conference on Information Systems (AMCIS) Cross
outlined which give the users a better idea of what the                   Lingual Information Retrieval Using Data Mining Methods.
documents contain. The approach works on top of existing                  5. Si, L., Callan, J., Cetintas, S., Yuan, H. (2008) An effective and
search engines and helps refine the search process further.               efficient results merging strategy for multilingual information
                                                                          retrieval in federated search environments, Information Retrieval,
                         REFERENCES                                       Vol. 11 , No. 1, February, pp. 1 – 24.
                                                                          6. Storey, V.C., Burton-Jones, A., Sugumaran, V., Purao, S.
1. Ballesteros, L. and Croft, W. B. (1997) Phrasal Translation and        “CONQUER: A Methodology for Context-Aware Query
Query Expansion Techniques for Cross-Language, Information                Processing on the World Wide Web,” Information Systems Research,
Retrieval, Proceedings of SIGIR’97.                                       Vol. 19, No. 1, March 2008, pp. 3 – 25.




                                                                     13
© 2011 ACEEE
DOI: 01.IJIT.01.02.114

Contenu connexe

Tendances

An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urduijaia
 
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...CSCJournals
 
A Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor LanguageA Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor LanguageSravanthi Mullapudi
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGEScscpconf
 
Speech Recognition Application for the Speech Impaired using the Android-base...
Speech Recognition Application for the Speech Impaired using the Android-base...Speech Recognition Application for the Speech Impaired using the Android-base...
Speech Recognition Application for the Speech Impaired using the Android-base...TELKOMNIKA JOURNAL
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir systemcsandit
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMcscpconf
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Word Format.doc
Word Format.docWord Format.doc
Word Format.docbutest
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language InterfaceIRJET Journal
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATANAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATAijnlc
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAM Publications
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationIOSR Journals
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageacijjournal
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyjournalBEEI
 
USER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDS
USER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDSUSER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDS
USER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDSIJNSA Journal
 

Tendances (20)

An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urdu
 
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...
 
C4 balajiprasath
C4 balajiprasathC4 balajiprasath
C4 balajiprasath
 
A Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor LanguageA Dialogue System for Telugu, a Resource-Poor Language
A Dialogue System for Telugu, a Resource-Poor Language
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
 
Speech Recognition Application for the Speech Impaired using the Android-base...
Speech Recognition Application for the Speech Impaired using the Android-base...Speech Recognition Application for the Speech Impaired using the Android-base...
Speech Recognition Application for the Speech Impaired using the Android-base...
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir system
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Word Format.doc
Word Format.docWord Format.doc
Word Format.doc
 
Convolutional Neural Networks
Convolutional Neural Networks Convolutional Neural Networks
Convolutional Neural Networks
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATANAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam Mail
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense Disambiguation
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganography
 
USER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDS
USER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDSUSER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDS
USER AUTHENTICATION USING NATIVE LANGUAGE PASSWORDS
 

Similaire à Cross Lingual Information Retrieval Using Search Engine and Data Mining

MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMijcsa
 
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALA SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALIJCI JOURNAL
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesFulvio Rotella
 
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...University of Bari (Italy)
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrievaldannyijwest
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDIJERA Editor
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
 
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...IJASCSE
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
 
Multilingualism in Information Retrieval System
Multilingualism in Information Retrieval SystemMultilingualism in Information Retrieval System
Multilingualism in Information Retrieval SystemAriel Hess
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET Journal
 

Similaire à Cross Lingual Information Retrieval Using Search Engine and Data Mining (20)

MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
 
A SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUESA SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUES
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALA SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital Libraries
 
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of ...
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
Multilingualism in Information Retrieval System
Multilingualism in Information Retrieval SystemMultilingualism in Information Retrieval System
Multilingualism in Information Retrieval System
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
 

Plus de IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

Plus de IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Cross Lingual Information Retrieval Using Search Engine and Data Mining

  • 1. ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011 Cross Lingual Information Retrieval Using Search Engine and Data Mining Mallamma V Reddy1, Dr. M. Hanumanthappa2, Manish Kumar3 1,2 Department of Computer Science and Applications, Bangalore University, Bangalore, INDIA 1 mallamma_vreddy@yahoo.co.in 2 hanu6572@hotmail.com 3 Department of Master of Computer Applications, M. S. Ramaiah Institute of Technology, Bangalore-560 054, INDIA 3 manishkumarjsr@yahoo.com Abstract:-With the explosive growth of international users, translation of summaries and/or information extraction from distributed information and the number of linguistic the target language. CLIR has three basic approaches [3] : a) resources, accessible throughout the World Wide Web, document translation - where the queries are posed to existing information retrieval has become crucial for users to find, document repositories, b) query translation - where the retrieve and understand relevant information, in any language queries are translated into the target language and results and form. Cross- Language Information Retrieval (CLIR) is a subfield of Information Retrieval which provides a query in displayed, and c) inter lingual translations - where queries one language and searches document collections in one or and results are translated. Our approach is a variation of the many languages but it also has a specific meaning of cross- third category. language information retrieval where a document collection Objectives of our research work are to:- is multilingual. In the present research, we focus on query translation, disambiguation of multiple translation candidates  Develop an approach for cross lingual information retrieval and query expansion with various combinations, in order to for queries expressed in the native language. improve the effectiveness of retrieval. Extracting, selecting  Use data mining techniques to cluster the results and and adding terms that emphasize query concepts are performed retrieve a resultant set closest to the user’s query, and using expansion techniques such as, pseudo-relevance  Present the results in various display methods to the user. feedback, domain-based feedback and thesaurus-based The key aspect of the proposed approach is as follows. It is expansion. A method for information retrieval for a query expressed in a native language is presented in this paper. It composed of two distinct aspects: preprocessing the query uses insights from data mining and intelligent search for to identify the query’s meaning and post-processing the formulating the query and parsing the results. results for relevance match. In the preprocessing stage, the query will be expanded and the expanded query is presented Keywords: Cross Lingual Information Retrieval, Heuristic to the search engine. In the post processing stage, based on Method, Text Categorization the relevance match of the retrieved content, the resultant documents will be reordered and presented to the user. The I. INTRODUCTION initial feedback from the users seems to indicate that the Cross-Language Information Retrieval (CLIR) is where relevance of the retrieved documents is higher than the the user request and the document collection against which conventional approach. However, the time needed to perform the request is to be matched are in two different human the processing is a significant factor. languages. The aim of CLIR is to match the request against Introduction to Information Retrieval the collection as if the request had been issued in the An Information Retrieval System is defined as any system document collection language to begin with. This kind of that matches a user request against a document collection, system is useful in the situation where a user who can read returning a list of documents considered relevant to the several different languages wants to find information in a request. The user request is an expression of a user collection containing documents in many languages, while information need. For example, the user might issue the avoiding the work involved in formulating multiple requests. following request. Have you got any documents pertaining Cross-language information retrieval [1] enables users to enter to the Clinton Lewinsky scandal, particularly regarding his queries in languages they are fluent in, and uses language testimony before Congress? translation methods to retrieve documents originally written An automatic IR system then usually carries out some in other languages. The scope for Cross-Language processing on the user request to derive a form of the re- Information Access [2] goes beyond the Cross-Lingual quest that it can match directly against the document collec- Information Retrieval (CLIR) paradigm by incorporating query tion using some form of matching algorithm. The processed disambiguation as well as post search processing. The key request, which may take many forms, is known as the query. emphasis is on the relevance of the results. The cross lingual . Query formats commonly employed in the IR world include information access paradigm may take the form of machine the natural language query, where the request is not pro- translation of snippets, summarization and subsequent cessed much at all, and the bag of words format, where 10 © 2011 ACEEE DOI: 01.IJIT.01.02.114
  • 2. ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011 possible in this list, while avoiding the retrieval of irrelevant ones. IR systems are generally evaluated using two metrics, precision and recall. Precision is defined as the proportion of retrieved documents which are actually relevant to the query derived from the user request. Precision = Not Relevant / Total Retrieved Recall is the proportion of documents known to be relevant to the query in the entire collection that have been retrieved in the retrieved document list for that query. Recall = Not Relevant Retrieved / Total Known Relevant II. PROPOSED APPROACH Figure 1: Processing a Request to Extract a Key Words function words, punctuation and phrases like “on the sub- The proposed approach “Fig. 3” is composed of two ject of” are removed from the request, suffixes stripped, and distinct and complementary stages, namely, preprocessing a selection of what are known as keywords extracted to form and post processing. In the preprocessing stage, the search the query “Fig. 1”. For example, the user request above, when query in an Indian language is parsed and disambiguated processed in this manner, would yield the bag of words query: using the lexicons available for that Indian language and the clinton lewinsky scandal testimony congress Commonly used initial search terms are translated into English. These English search engines, such as Google ask the user to enter the bag terms may be further disambiguated using WordNet and other of words query directly, thereby circumventing part of the ontologies and the expanded query is submitted to the search request processing stage by getting the user to do some pre- engine. In the post processing stage, the results from the processing in her head. The document collection consists of search engine are summarized, mapped to the target language a set of individual documents, each of which identifies a and the results presented to the user. The results in English single text, such as a book, journal article, or web page. The need to be summarized for relevance and displayed to the documents in the collection can consist of the texts them- user. selves, such as, for example, the document database of a web search engine, or of summaries that point the user toward the texts, such as book titles in a conventional electronic library catalogue. Figure 2:- Document Scoring and Extraction Based on User Query The former case is known as full text retrieval and is the type of document collection we are interested in our experiments. Figure 3:- Complete Process Architecture The document collection is also usually processed in an Preprocessing stage: The preprocessing stage is composed identical manner to user requests when it is being compiled. of four distinct steps. These steps have been adapted from The query is matched against the document collection using the work of Storey [6]. a matching algorithm which calculates a score for each Step 1 - Query Parsing: It involves parsing the natural document in the collection reacting its perceived similarity to language query specified by the user in regional language. the query Fig 2. Similarity scores may be based simply on the First the query is segmented and the words are disambiguated frequency of individual query terms (words or phrases), or using an ontology and/or other lexicons available in that may exploit term weights (scores per term) calculated using language. Then the initial query is translated into English. frequency data. The Vector Space Method and the Step 2 – Query Expansion: The output of the parsing step is Probabilistic Model of Information Retrieval (which is the a set of initial translated query terms which become the input model employed by the IR system used in the experiments to the query expansion step. The query expansion process discussed in subsequent chapters) provide well-founded involves expanding the initial query using lexicons and ways of doing this. Generally, a list of the N most closely ontologies. It also includes adding appropriate personal matching documents is returned to the user. This list of information as well as contextual information relevant to the returned documents is often called the retrieved document query. Since ontologies contain domain specific concepts, list. The aim is to retrieve as many relevant documents as 11 © 2011 ACEEE DOI: 01.IJIT.01.02.114
  • 3. ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011 appropriate hypernym(s) and hyponym(s) are added as i) The information gain of keywords is calculated using mandatory terms to improve precision. statistical methods. Information gain is defined as the Step 3 – Query Formulation: The expanded set becomes the number of times the word (meaning) occurs in the input to the query formulation step. In this step, the query is document. formulated according to the syntax of the search engine. For ii) For words with maximum information gain, the pair wise each query term, the synonym, Hypernym and hyponym and relationship of each keyword is also observed. the negative knowledge is added with an OR, AND & NOT iii) The mutual correlation between each word and all the operator. other candidate words are found. This is continued till the features with maximum correlations are identified Step 4 – Search Knowledge Sources: This step submits the and sorted. The cluster relationship is the measure of query to one or more search engines (in their required syntax) distribution of the word in the document. for processing. The query construction heuristics can work with most search engines. B) Categorization: Post processing stage The words, which satisfy the threshold limits for each of the three parameters, are selected as candidate words. These In this stage, the results from the search engine (URLs three measures help establish relationships between the and ‘snippets’ provided from the web pages) are retrieved, words and give a measure of the relationships in the and translated to the target language. Available lexicons and documents. The document map (intermediate structure) is an ontologies are also used in the translation. Further, a heuristic intermediate representation displaying the key candidate result processing mechanism described below is used to words at the place of occurrence. The document maps help identify the relevance of results retrieved with respect to the in characterization, differentiating and grouping content. source language. Finally, these are aggregated and the resultant summarized content is presented to the user. The C) Aggregation: approach used is from the information retrieval [4] perspective, The same method (steps a and b) is applied for all the which also integrates the insights gained from data mining. documents and the results are aggregated. Based on this, Our approach visualizes the problem of categorizing the the documents “Fig. 4” are re-ordered using the parametric results akin to the results merging approach [5]. The objective results and the categorization stage. This stage eliminates of the post processing is to aggregate the results for the irrelevant documents, aggregates different versions of the given query, rank and reorder the results, and present the same document and organizes the overall display to be of results in a new sorted order based not only on the output of documents that are relevant and closest to the user query. the search engine but also on the content of the retrieved documents. Thus, these downloaded documents are indexed D) Display processing: and comparable document scores for the downloaded At this stage, along with the document map, a word-by- documents are calculated. The metrics utilized are word word translated map in user’s language is also shown. Thus relevance, word to document relationship and clustering the users can select documents that they feel are relevant strategies. Finally, all the returned documents are sorted into from the map in their own language. a single ranked list along with display mechanisms for helping E) Learning: the user view and decide on the results. These measures have different connotations in traditional data mining . Based on the results of the above stages, the learning a) Candidate word: words that have high information gain algorithm assigns weights to the parameters and learns (using in a given document. machine learning methods) from the selection options made b) Information gain: the parametric importance of a word in by the users. The feedback and suggestions from the users the retrieved document. are collected for document mapping. Simultaneously, the c) Pair-wise measure: the correlations between the candidate above method is applied for documents in the native language word and the input keywords are found using the pair where parallel corpora are available. wise measure. They also give the relationship between words and help disambiguate similar words. d) Cluster relationship: this gives a measure of the degree of clustering of candidate words in a given document. The post processing stage of the proposed information retrieval method has the following five distinct steps: A) Parameter estimation: This step scans the document and identifies at a high Figure 4:- Results Transformation Method level the various keywords that are present. Two methods The key aspects in this system are the mapping mechanism have been applied in the pre-processing stage 1) removal of between words in different languages, aggregation method stop words (stop word list) and 2) stemming algorithm: at document/repository level, structure of document maps porter’s stemming algorithm. After the pre-processing, and the learning system. For each candidate word the pair wise measure gives a measure of correlation. However, these 12 © 2011 ACEEE DOI: 01.IJIT.01.02.114
  • 4. ACEEE Int. J. on Information Technology, Vol. 01, No. 02, Sep 2011 correlations are not available in dictionary representations 2. CLIA Research Report, “Development of Cross Lingual and must be generated by use of appropriate ontological Information Access (CLIA) System” funded by Government of systems. Thus, a search and traversal method that navigates India, Ministry of Communications & Information Technology, the ontology for distance measure is crucial for finding the Department of Information Technology (No. 14(5)/2006 – HCC pair wise and relationship measures. (TDIL) Dated 29-08-2006) retrieved from www.mt- archive.info/ IJCNLP-2008-CLIA.pdf on Feb 21, 2009. 3. Maeda, A., Sadat, F., Yoshikawa, M., and Uemura, S. (2000) CONCLUSION Query term disambiguation for Web cross- language information This paper has outlined an approach for Cross Lingual retrieval using a search engine, Proceedings of the fifth international Information Retrieval which emphasizes pre and post workshop on Information retrieval with Asian languages, pp.25- 32, September 30-October 01, Hong Kong, China. processing strategies for the queries entered in a source 4. R. Shriram, Vijayan Sugumaran, AMCIS 2009 Proceedings language. Mechanisms for displaying the results have been Americas Conference on Information Systems (AMCIS) Cross outlined which give the users a better idea of what the Lingual Information Retrieval Using Data Mining Methods. documents contain. The approach works on top of existing 5. Si, L., Callan, J., Cetintas, S., Yuan, H. (2008) An effective and search engines and helps refine the search process further. efficient results merging strategy for multilingual information retrieval in federated search environments, Information Retrieval, REFERENCES Vol. 11 , No. 1, February, pp. 1 – 24. 6. Storey, V.C., Burton-Jones, A., Sugumaran, V., Purao, S. 1. Ballesteros, L. and Croft, W. B. (1997) Phrasal Translation and “CONQUER: A Methodology for Context-Aware Query Query Expansion Techniques for Cross-Language, Information Processing on the World Wide Web,” Information Systems Research, Retrieval, Proceedings of SIGIR’97. Vol. 19, No. 1, March 2008, pp. 3 – 25. 13 © 2011 ACEEE DOI: 01.IJIT.01.02.114