Contenu connexe
Similaire à A Survey of Information Retrieval Architectures, Models and Methodologies
Similaire à A Survey of Information Retrieval Architectures, Models and Methodologies (20)
Plus de IAEME Publication
Plus de IAEME Publication (20)
A Survey of Information Retrieval Architectures, Models and Methodologies
- 1. INTERNATIONALComputer EngineeringCOMPUTER ENGINEERING
International Journal of JOURNAL OF and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 1, January- February (2013), pp. 182-194
IJCET
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEME
www.jifactor.com
A SURVEY ON VARIOUS ARCHITECTURES, MODELS AND
METHODOLOGIES FOR INFORMATION RETRIEVAL
Prakasha S Shashidhar HR Dr. G T Raju
sprakashjpg@yahoo.co.in shashi_dhara@yahoo.com gtraju1990@yahoo.com
RNSIT, Bengaluru 560098 RNSIT, Bengaluru 560098 RNSIT, Bengaluru 560098
ABSTRACT
The typical Information Retrieval (IR) model of the search process consists of three
essentials: query, documents and search results. An user looking to fulfill information need
has to formulate a query usually consisting of a small set of keywords summarizing the
information need. The goal of an IR system is to retrieve documents containing information
which might be useful or relevant to the user. Throughout the search process there is a loss of
focus, because keyword queries entered by users often do not suitably summarize their
complex information needs, and IR systems do not sufficiently interpret the contents of
documents leading to result lists containing irrelevant and redundant information.
The short keyword query used as input to the retrieval system can be supplemented
with topic categories from structured Web resources. The topic categories can be used as
query context to retrieve documents that are not only relevant to the query but also belongs to
a relevant topic category. Category information is especially useful for the task of entity
ranking where the user is searching for a certain type of entity such as companies or persons.
Category information can help to improve the search results by promoting in the ranking
pages belonging to relevant topic categories, or categories similar to the relevant categories.
Users may raise various queries to describe the same information need. For example, to
search for National Board of Accreditation, queries “National Board of Accreditation (NBA)”
or “NB Accreditation” may be formulated. Directly using individual queries to describe
context cannot capture contexts concisely and accurately. Also queries may arise where
“NBA” can be expanded as either “National Basketball Association” or “National Board of
accreditation”. Hence it becomes extremely important to go for context based query based on
the user history and present requirements of the user in that context.
In this paper, an extensive survey has been made on different Architectures, Models
and Methodologies that have been used in IR by various researchers along with the
comparison of results against various performance metrics, also highlighting the need for
context based query.
182
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
Keywords: Query Model, Ranking Model, feedback-model, Retrieval model, query context
1. INTRODUCTION
Given the constantly increasing information overflow of the digital age, the
importance of IR has become critical. Web search is one of the most challenging problems of
the Internet today, striving to provide users with search results most relevant to their
information needs. IR deals with the representation, storage, organization of, and access to
information items such as documents, Web pages, online catalogues, structured and semi-
structured records, and multimedia objects [Baeza-Yates and Ribeiro-Neto, 2011].
Web search engines are by far the most popular and heavily used IR applications. The
next step in the search process is to translate the information need into a query, which can be
easily processed by the search engine. The primary goal of an IR system is to retrieve all the
documents which are relevant to a user query while retrieving as few non-relevant documents
as possible. To achieve this goal IR systems must somehow `interpret' the contents of the
documents in a collection, and rank them according to a degree of relevance to the user
query. The `interpretation' of a document involves extracting syntactic and semantic
information from the document and using this information to match the user information
need.
The notion of relevance is at the centre of IR. While for simple navigational
information needs the search process is straightforward, for more complex information needs
we need focused retrieval methods. The notion of `focused retrieval' can be defined as
providing more direct access to relevant information by locating the relevant information
inside the retrieved documents [Trotman et al., 2007].
The first element of the search process is the query. In an ideal situation this short
keyword query is a suitable summarization of the information need, and the user will only
have to inspect the first few search results to fulfill his information need. To overcome the
shallowness of the query, i.e., users entering only a few keywords poorly summarizing the
information need, we add context to the query to focus the search results on the relevant
context. We define context as: all available information about the user's information need,
besides the query itself. Different forms of context can be considered to implicitly or
explicitly gather more information on the user's search request. Potential forms of query
context are document relevance, and category information.
The second elements of search we examine are the documents. Documents on the
Web are rich in structure. Documents can contain HTML structure, link structure, different
types of classification schemes, etc. Most of the structural elements however are not used
consistently throughout the Web. A key question is how to deal with all this (semi-)structured
information, that is how IR systems can `interpret' these documents to reduce the shallowness
in the document representation.
A problem in Web search is the large amount of redundant and duplicate information
on the Web. Web pages can have many duplicates or near-duplicates. Web pages containing
redundant information can be hard to recognize for a search engine, but users easily
recognize redundant information and this will usually not help them in their search. Most
structured Web resources have organized their information in such a way that they do not
contain, or significantly reduce redundant information [Anna Maria Kaptein 2011].
Structured resources provide two interesting opportunities: `Documents categorized
into a category structure' and `Absence of redundant information'. Category information is of
183
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
vital importance to a special type of search, namely entity ranking. Entity ranking is the task
of finding documents representing entities of an appropriate entity type that are relevant to a
query. Entities can be almost anything, from broad categories such as persons, locations and
organizations to more specific types such as churches, science-fiction writers or CDs.
Searchers looking for entities are arguably better served by presenting a ranked list of
entities. Rather directly, than a list of Web pages with relevant but also potentially redundant
information about these entities. Category information can be used to favor pages belonging
to appropriate entity types[Anna Maria Kaptein 2011].
Search Intent and Context is an important criterion in catering to the users query.
Suppose a user raises a query “apple” It is hard to determine the user’s search intent that is,
whether the user is interested in the history of apple Inc, or the fruit apple. Without looking at
the context of search, the existing methods often suggest many queries for various possible
intents, and thus result in a low accuracy in query suggestion. The query context which
consists of the search intent expressed by the users’ recent queries can help to better
understand thesaurus search intent and make more meaningful suggestions.
2. DIFFERENT MODELS USED IN IR
For effectively retrieving relevant documents by IR strategies, the documents are
typically transformed into a suitable representation. Each retrieval strategy incorporates a
specific model for its document representation purposes. Keke Cai et al., in their paper use
retrieval process based on context-based Retrieval model consists of KL_divergence retrieval
model for initial retrieval [9]. Similarly Tangjian Deng et al., present a brain memory
inspired, context-based information re-finding framework, which enables users to re-find
results accessed before by relevant contexts [16]. Yunping Huanget et al., propose a new
query model refinement approach: random walk smoothing method which exploits the
expanded terms and term relationships based on the feedback documents [13]. Xiaohui Yan
et al., address the problem of context-aware query recommendation. Unlike the existing
approaches which leverage query sequence patterns in query sessions, they use the click-
through of the given query as the major clue of user search intents to provide context-aware
recommendation [22]. Chang Liu and Nicholas J. Belkinhas proposes an a personalized IR
model based on implicit acquisition of task type and document preferences as search context
by observing and analyzing user behaviors, and then use implicit relevance feedback to re-
rank or reformulate user queries to help users search effectively and efficiently [4].
Huanhuan Cao et al., proposes modeling search context by CRF[31]. Ji-Rong Wen et al.,
proposes four models for contextual retrieval [20]. Protima Banerjee et al., proposed the
Aspect Model forms the foundation of the Probabilistic Latent Semantic Analysis (PLSA)
method. They also put forward a technique that estimates a relevance model from the query
alone without the need for training data. Yan Qi et al., proposes a Query-driven feedback-
based conflict resolution. They have developed data structures and algorithms to enable
feedback-based conflict resolution during query processing on imperfectly aligned data [25].
The various models listed above are used for query expansion with the help of various
feedback techniques. By expanding the query it adds a context to the query. The above said
models are also used for ranking the query. Comparison of these models has been presented
in Table 1.
184
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
Model Author Approach Parameters Inference Inputs
Markov Random
Field (MRF). MRR are
Top ranked
respectivel-y
KL_Divergence KekeCai MRF based document list Top-ranked
improved by
Retrieval Model[9]. sentence retrieval and ranked list documents
19.7%, 25.5% and
Bayesian average
24.1%
network
Query model Random walk
Yunping Score of each
refinement smoothing Query
Huang vertex
approach[13]. method
λ to 0.1
λ -controls the
or 0.2 usually
Probabilistic Xiaohui High-order weight of the Feedback
yields the best
model[22]. Yan method initial query documents
retrieval
model.
performance
Intuitive Model
Query And 51:1% of the query
Modeling Search
Huanhuan Context Model occurrences &
Context by Document- dmax Query’s
Cao 51:7% of the URL
CRF[20]. Eliminate Noisy clicks remained
Elements Model
Documents
Improvement in
Protima smoothing with
Aspect Model[25]. PLSA method precision & recall
Banerjee parameter - λ probability
(no % specified)
p(d)
Concept
matching. Quest
The FICSR pre-
Query driven processing
Feedback based Yan Qi et module the stabbed version
k- simple paths User query
Conflict al Constraint was 60% faster
resolution[15]. analysis &
system feedback
User’s feedback
Query Model and vk-aggregate set of
Liang Jeff Mean precision
Ranking document keywords -
Chen 10.2 for 30 query
Model[33]. parameter- sc (Qk)
Table1. Comparison of Various Models used by different authors for IR
3. THE VARIOUS ARCHITECTURES OF IR
The various architectures for query context are defined since all the existing systems
do not perform ranking a query pattern according to context. Some of the architectures are
mentioned in the following sentences. Giorgio Orsi et al., has proposed a SAFE architecture
that receives input of sequence of keywords and produces, as output, a ranking over a set of
query patterns, possibly with a suggested assignment for their parameters [19]. They also
propose The Context Model is an instantiation of the context vocabulary and defines the
context model for the given application. In particular, the context-model specifies the
(possibly hierarchical) context dimensions for the specific application, along with their
possible values. A K Sharma et al., proposes Query Semantic Search System (QUESEM,
185
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
/’Qu-sem/) to improve the search quality. QUESEM maintains a database of definitions
(referred to as Definition Repository), as the core of the system to accomplish its desired
task [26]. Haizhou Fu et al., proposes CoSisystem architecture consists of three core
components: an indexer, a context-sensitive cost model and a query interpreter [23].
Christian Sengstock and Michael Gertz proposes architecture of the CONQUER system is
composed of a model generation component, a model index, and a suggestion service
[37]. Reiner Kraft et al., propose the overall Y! Q system design and architecture. The Y!
Q back-end comprises three major system components for processing contextual search
queries: Content Analysis (CA), Query Planning and Rewriting Framework (QPW), and
Contextual Ranking (CR) [29]. Liang Jeff Chen et al., proposes Query Model and
Ranking Model. In Query model a document, denoted by d, is modeled as a tuple of
fields, each consisting of a bag of words [33].
The various architectures mentioned above suggest to improve the retrieval
process by enhancing the context of query. A comparison of these architectures is
presented in Table 2.
Models /
Architecture Authors Inputs Inference
Methods
65% queries were found on top
SAFE Giorgio Keyword The Context of theranked list25% of cases,
architecture [19]. Orsi Search, Model users found the query in the
second position
indexer
CoSi will learn what user is
a context- asking for & rank the intended
CoSisystem Haizhou keyword sensitive cost interpretationhigher such that the
architecture [23]. Fu queries model end users can _nd them more
query easily.
interpreter
Model
Generator space-complexity of O(1) per
Architecture Of patterns
Christian node in the FP-tree & O(1)
the CONQUER and their Model Index
Sengstock runtime-complexity overhead for
System[ 37]. synopses Suggestion each node update opertion.
Service
CA Y!Q is superior to Yahoo! WS
Y!Q System component 32.3% of the context and query
Reiner
Design And QPW’s pairs, while Yahoo! WS is better
Kraft
Architecture[29]. only 8.3% of them (with 59.4%
CR tied.)
Table 2: Comparison of various Architectures proposed by different authors for IR
186
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
4. METHODOLOGIES PROPOSED BY DIFFERENT BY AUTHORS
A K Sharma et al., proposes two algorithms, Local Site Search for Query and
Definition Generation & Annotation. As the response pages are retrieved from dictionary
based sites, it is assumed that they will contain the direct thesaurus and synonyms of the
query terms[26]. Lidong Bing et al., proposes scoring algorithm and Latent Topic
Analysis and Training Algorithm [32].
Wenwei Xue et al., proposes algorithm for context attribute matching and context
schema matching [27]. Reiner Kraft et al.,proposed two algorithms for ranking and
filtering of documents. They are rank averaging and MC4 [29]. Liang Jeff Chen et al
proposes Data-Mining-based Selection and graph decomposition algorithm [33].
Huanhuan Cao et al., proposes algorithm for clustering queries. In their method, a cluster
C is a set of queries [36]. ZimingZhuang and Silviu Cucerzan proposes re-ranking
algorithm. Q-Rank is based on a straight-forward yet very effective rationale, that the
most frequently seen query extensions of a target query (terms extracted from queries that
contain the target query as an affix) and adjacent queries (queries that immediately
precede or follow a query in a user search session) provide important hints about users’
search intents [35]. Zhen Liao et al., proposes Query Stream Clustering with Iterative
Scanning (QSC-IS). Query Stream Clustering with Master-Slave Model (QSC-MS) and
query suggestion algorithm [1]. Mariam Daoud et al., proposed session based
personalized search algorithm which describes the general view of the overall process of
our session-based personalized search is set according to the algorithm [30]. Minmin
Chen et al., proposed adaptive self training algorithm [31]. Self training is a very
commonly used algorithm to wrap complex models for semi-supervised learning [30].
The various algorithms used in IR range from query clustering, query ranking, to
query suggestion to query expansion. The query clustering usually clusters similar queries
that leads to a similar or same documents viewed by the user. In query ranking algorithm
the queries are ranked according to frequency with which users raise their queries. The
algorithms that use the concept of query expansion use some kind feedback or probability
technique to expand the query. A comparison of these methodologies has been presented
in Table 3.
5. APPLICATIONS OF IR
The applications of IR are mainly classified into general applications and domain
specific applications. The general applications includes digital libraries, Search Engines
etc, Domain specific application includes Expert Search Finding, Genomic IR Geographic
IR etc.,
5.1General applications of IR
Digital libraries: A digital library is a library in which collections are stored in digital
formats (as opposed to print, microform, or other media) and accessible by computers.
187
- 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
Parameters
Author Technique / Methodology Outcome / Results performance
considered
Defination_Generator_An
AK Sharma et Keywords From 0.6 lakhs to a 1.6 lakhs relevance
notator(D)
al[26]. results is achieved from 2.5 lakhs results
Local _Site_searching Query
Scoring algorithm Query The differences between the
Lindongbing et performances of our method and CTA
Latent topic analysis and
al[32]. Ranking are significant with significance level
training algorithm
0.05.
Context attribute a pair of context
matching attributes
schema matcher CAMSUBSYN achieved as high as
Wenweixue et
integrates a local 100% precision and 64% recall upon our
al[27].
Context schema matching schema into the dataset
current set of
global schemas
assigning a score
Rank averaging algorithm to every position
in a rank list, 95 % confidence interval is [2.873,
the input is k 2.972]), compared to an average of 2.54
Reneirkraft et
ranked lists ([2.45, 2.66]) based on ComScore
al[29].
which (which includes MSN, Google, and
MC4 algorithm
are the top few Yahoo)
results of k sub
queries.
For two keyword
Data-mining based
combinations The average number of MeSH terms in a
Liang Jeffchen et selection algorithm
P1; P2, citation after the inheritance is 44better
al[33].
Graph decomposition keyword ranking in 21 out of 30 queries
algorithm combinations
The average overall precision of CRF-B,
Huanhuancao et Algorithm for clustering Diameter CRF-B-C and CRF-B-C-T is improved
al[36]. queries parameter Dmax across different K by 50%, 52% and
57%, respectively.
Interpolation parameter (γ). When
Zimingzhuang et varying γ, on average, Q-Rank improved
Re-ranking algorithm adjacent queries
al[35]. the rankings for 75.8% of the re-ranked
queries.
Query stream clustering The M1-th
with iterative scanning query.
Zhen liao et al Query stream clustering Total response time is still small, that is,
x modM= ω.
[1]. with master-slave model about 0.3 millisecond.
preceding
Query suggestion
queries
The setting (r =0,3) produces the best
Mariam Session personalized improvement in personalized search
Query
daoud[30]. search algorithm since it produces higher precision
improvement at P@5 (11,63%).
Adaptive self training
Minminchen[31] Unlabeled 51.38% precision with only 10% of the
with conditional random
. queries training data labeled.
fields
Table 3: Comparison of different methodologies for IR
188
- 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
The digital content may be stored locally, or accessed remotely via computer networks. A
digital library is a type of IR system.
Search engines :
- Desktop search: is the name for the field of search tools which search the contents of
a user's own computer files, rather than searching the Internet. These tools are
designed to find information on the user's PC, including web browser histories, e-mail
archives, text documents, sound files, images and video.
- Enterprise search : Enterprise search is the practice of making content from multiple
enterprise-type sources, such as databases and intranets, searchable to a defined
audience.
- Federated search : Federated search is an IR technology that allows the simultaneous
search of multiple searchable resources. A user makes a single query request which is
distributed to the search engines participating in the federation. The federated search
then aggregates the results that are received from the search engines for presentation
to the user.
- Mobile search : Mobile search is an evolving branch of IR services that is centered
on the convergence of mobile platforms and mobile phones, or that it can be used to
tell information about something and other mobile devices. Web search engine ability
in a mobile form allows users to find mobile content on websites which are available
to mobile devices on mobile networks
- Social search : Social search or a social search engine is a type of web search that
takes into account the Social Graph of the person initiating the search query. When
applied to web search this Social-Graph approach to relevance is in contrast to
established algorithmic or machine-based approaches where relevance is determined
by analyzing the text of each document or the link structure of the documents.
Web search : It is designed to search for information on the World Wide Web. The search
results are generally presented in a line of results often referred to as Search Engine Results
Pages (SERPs). The information may be a specialist in web pages, images, information and
other types of files. Some search engines also mine data available in databases or open
directories.
5.2 Domain Specific applications of IR
In domain specific IR the information is based on a particular domain and
classification based on the specific domain. The domain may be legal system, geographic
system etc…
Expert search finding: Expert search is a task of growing importance in Enterprise settings.
An expert search system predicts and ranks the expertise of a set of candidate persons with
respect to the user’s query.
Genomic IR: The in-silico revolution has changed how biologists characterise DNA and
protein sequences. As a first step to exploring the structure and function of an unknown
sequence, biologists search large genomic databases for similar sequences. This process of
Genomic IR has allowed significant advances in biology and led to advancements in critical
areas such as cancer research.
189
- 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
Geographic IR : Geographic IR (GIR) is the augmentation of IR with geographic metadata.
GIR involves extracting and resolving the meaning of locations in unstructured text. This is
known as Geo-parsing. After identifying location references in text, a GIR system must index
this information for search and retrieval
Legal IR : Legal IR is the science of IR applied to legal text, including legislation, case law,
and scholarly works. Accurate legal IR is important to provide access to the law to laymen
and legal professionals
Vertical search : A vertical search engine, as distinct from a general web search engine,
focuses on a specific segment of online content. The vertical content area may be based on
topicality, media type, or genre of content. Common verticals include shopping, the
automotive industry, legal information, medical information, and travel.
5.3 Other Applications of IR
IR has been applied in other fields also such as Adversarial IR , Automatic
summarization, Question Answering etc.,
Adversarial IR : Adversarial IR is a topic in IR related to strategies for working with a data
source where some portion of it has been manipulated maliciously. Tasks can include
gathering, indexing, and filtering, retrieving and ranking information from such a data source.
Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation
Automatic summarization : Automatic summarization is the creation of a shortened version
of a text by a computer program. The phenomenon of information overload has meant that
access to coherent and correctly-developed summaries is vital. As access to data has
increased so has interest in automatic summarization. An example of the use of
summarization technology is employed in Google search engine.
Multi-document summarization : Multi-document summarization is an automatic
procedure aimed at extraction of information from multiple texts written about the same topic
- Compound term processing : Compound term processing is the name that is used
for a category of techniques in IR applications that performs matching on the basis of
compound terms. Compound terms are built by combining two (or more) simple
terms, for example "triple" is a single word term but "triple heart bypass" is a
compound term.
Cross-lingual retrieval : Cross-Language IR (CLIR) is a subfield of IR dealing with
retrieving information written in a language different from the language of the user's query.
- Document classification : The task of document classification is to assign a
document to one or more classes or categories. This may be done "manually" (or
"intellectually") or algorithmically. The intellectual classification of documents has
mostly been the province of library science, while the algorithmic classification of
documents is used mainly in information science and computer science
Spam filtering : is a statistical technique of e-mail filtering. It makes use of a naive Bayes
classifier to identify spam e-mail.
Question answering : Question Answering (QA) is a computer science discipline within the
fields of IR and Natural Language Processing (NLP) which is concerned with building
systems that automatically answer questions posed by humans in a natural language. A QA
implementation, usually a computer program, may construct its answers by querying a
structured database of knowledge or information, usually a knowledge base.
190
- 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
6. OPEN ISSUES/CHALLENGES
Although the discussed models implement efficiently the stated objectives, but
still they lack in efficient retrieval process when context is to be considered. When user
submits a query for the first time, the search engine is unable to find a context of the
query. However, if some events of web pages can be captured, this problem can be
resolved. Some of the open challenges in this area are
Reducing the volume of the documents for effective retrieval. i.e., to improve the
quality of documents to be considered for retrieval through filtering of irrelevant
and redundant documents
Ranking of structured and unstructured documents for better accuracy in retrieval
Context awareness in both modeling and scaling up of query suggestion
Visualization and presentation of search results with in-depth summarized
analysis.
To address the above challenges, we propose a novel retrieval technique which is query
based on the context along with concept which enhances retrieval operation through
exploitation of unstructured documents that can increase the focused retrieval of
documents especially from web by capturing recent browsing sessions of the user.
The snippets used in modern Web search are query based and are proven to be better than
static document summaries. For instance, we can examine for the word clouds, in respect
of the following:
Depth on the query side: to add depth on the user side is a bottleneck for delivering
more accurate retrieval results. Users provide only 2 to 3 keywords on average to search
in the complete Web.
Depth in the document representation: Documents on the Web are rich in structure.
Most of the structural elements however are not used consistently throughout the Web. A
key question is how to compact with semi structured information.
Depth on the result side: While a query can have thousands of relevant results, only the
first 10 or 20 results will get any attention in a Web search interface. Often these first n
results will still contain redundant information.
Our main objective is to exploit query context and document structure to address
following challenges
Ambiguity in query from the user
Appropriate feedback from the user search logs
Effective use and exploitation of structured and unstructured documents for better
query formulation and search results.
191
- 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
7. CONCLUSION
In this paper, we have discussed and analyzed various models, algorithms and
architectures against their performance that have been used by various researchers in IR. The
various models discussed are used for query ranking and query expansion with the help of
various feedback techniques that adds context to the query. The various architectures
discussed are either completely new architectures or some variations in the existing
architecture models to improve the retrieval process by enhancing the context of query. The
various algorithms used in IR range from query clustering, query ranking, to query
suggestion and query expansion. The query clustering usually clusters a similar query that
leads to a similar set of documents viewed by the user. In query ranking algorithm, the
queries are ranked according to frequency with which the users submit their queries. The
algorithms that use the concept of query expansion use some kind feedback or probability
technique to expand the query. Although the discussed models implement efficiently the
stated objectives, but still they lack in efficient retrieval process when context is to be
considered. Hence exploitation of structured and unstructured documents which can increase
the focused retrieval of documents from web has become a challenging one.
REFERENCES
[1] Zhen Liao, Nankai University, Daxin Jiang, Microsoft Research Asia, Enhong Chen,
University of Science and Technology of China, Jian Pei, Simon Fraser University,
HUANHUAN CAO, University of Science and Technology of China, Hang Li, Microsoft
Research Asia “Mining Concept Sequences from Large-Scale Search Logs for Context-Aware
Query Suggestion “ACM Transactions, October 2011.
[2] Mario Cataldi Università di Torino, Claudio Schifanella Università di Torino K. SelçukCandan
Arizona State University, Maria Luisa SapinoUniversità di Torino Luigi Di Caro Università di
Torino “CoSeNa: a Context-based Search and Navigation System” 2009 October ACM.
[3] Michal Kajaba and PavolNavrat, “Personalized Web Search Using Context Enhanced
Query”.International Conference on Computer Systems and Technologies - CompSysTech’09
[4] Chang Liu and Nicholas J. Belkin “Implicit Acquisition of Context for Personalization
ofInformation Retrieval Systems”CaRR 2011, February 13, 2011, Stanford, CA, USA.
[5] Ziv Bar-Yossef Google Inc. MATAM, Bldg 30 Israel and Naama Kraus Computer Science
Department Technion, Israel “Context-Sensitive Query Auto-Completion”CIKM’10, October
26–30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM.
[6] RianneKaptein University of Amsterdam, “Effective Focused Retrieval by Exploiting Query
Context and Document Structure” ACM October 6, 2011.
[7] Zheng Ye1;2, Xiangji Huang2 and Hongfei Lin1 1Department of Computer Science and
Engineering, Dalian University of Technology Dalian China 2 School of Information
Technology York University, Toronto, Ontario, M3J 1P3, Canada “A Bayesian Network
Approach to Context Sensitive Query Expansion” SAC’11 March 21-25, 2011, TaiChung,
Taiwan. Copyright 2011 ACM.
[8] Minmin Chen1,Jian-Tao Sun2, Xiaochuan Ni2, Yixin Chen1 1Department of Computer Science
and Engineering Washington University in Saint Louis, Saint Louis, MO, USA 2Microsoft
Research Asia, Beijing, P.R. China “ Improving Context-Aware Query Classification
viaAdaptive Self-training” October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011
ACM.
[9] KekeCai, Chun Chen*, Jiajun Bu, Peng Huang, Zhiming Kang College of Computer Science,
University Hangzhou,China “Exploration of Query Context for Information Retrieval” May 8–
12, 2007, Banff, Alberta, Canada. ACM.
192
- 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
[10] Lev Finkelstein, EvgeniyGabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, GadiWolfman,
And EytanRuppin Zapper Technologies, Inc. “Placing Search in Context: The Concept
Revisited” ACM Transactions on Information Systems, Vol. 20, No. 1, January 2002.
[11] Raymond Y.K. Lau, Centre for Information Technology Innovation, Queensland University of
Technology and Peter D. Bruza and Dawei Song, Distributed Systems Technology Centre, The
University of Queensland, Australia “Belief Revision for Adaptive Information Retrieval” July
25–29, 2004, Sheffield, South Yorkshire, UK. Copyright 2004 ACM.
[12] Jiang Bian,College of Computing, Georgia Institute of Technology, Tie-Yan Liu, Tao Qin
Microsoft Research Asia,HongyuanZha,College of Computing, Georgia Institute of Technology
“ Ranking with Query-Dependent Loss for Web Search” February 4–6, 2010, New York City,
New York, USA. Copyright 2010 ACM.
[13] Yunping Huang, Le Sun Institute of Software, Chinese Academy of Sciences, Beijing, China
and Jian-Yun Nie ,Department of Computer Science and Operations Research, University of
Montreal, Canada “Query Model Refinement Using Word Graphs” October 26–30, 2010,
Toronto, Ontario, Canada. Copyright 2010 ACM.
[14] Jing Bai 1, Jian-Yun Nie 1,Hugues Bouchard 2, and Guihong Cao 1 1 Department IRO,
University of Montreal Canada 2 Yahoo! Inc. Montreal, Quebec, Canada “Using Query
Contexts in Information Retrieval” July 23–27, 2007, Ámsterdam, The Netherlands. Copyright
2007 ACM.
[15] Yan Qi Arizona State University Tempe, USA, K. SelçukCandan, Arizona State University,
Tempe, AZ 85287, USA and Maria Luisa Sapino ,Universita’ di Torino,Italy”FICSR:
Feedback-based InConSistencyResolution and Query Processing on Misaligned Data Sources”
June 12–14, 2007, Beijing, China. Copyright 2007 ACM.
[16] Tangjian Deng, Liang Zhao, Ling Feng Tsinghua ,National Laboratory for Information Science
and Technology Tsinghua University, Beijing, China and WenweiXue Nokia Research Center,
Beijing, China “Information Re-finding by Context: A Brain MemoryInspired Approach”
October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM.
[17] Xing Wei, FuchunPeng, Huihsin Tseng Yumao Lu, Benoit Dumoulin Yahoo! Labs, California,
USA, “Context Sensitive Synonym Discovery for Web SearchQueries” November 2–6, 2009,
Hong Kong, ChinaCopyright 2009 ACM.
[18] Ivan T. Bowman, School of Computer Science, University of Waterloo And Kenneth Salem
School of Computer Science ,University of Waterloo “ Optimization of Query Streams Using
SemanticPrefetching” June 1318 2004, Paris, France, Copyright 2004 ACM.
[19] Giorgio Orsi, Politecnico di Milano,Italy,LetiziaTanca,Politecnico di Milano, Italy, Eugenio
Zimeo,Universitá del Sannio,Italy“Keyword-based, Context-aware Selection of Natural
Language Query Patterns” March 22–24, 2011, Uppsala, Sweden., Copyright 2011 ACM.
[20] Huanhuan Cao1¤,Daxin Jiang2 Jian Pei3 Enhong Chen1 Hang Li2 ,1University of Science and
Technology of China 2Microsoft Research Asia 3Simon Fraser University “Towards Context-
Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search
Logs” April 20–24, 2009, Madrid, Spain. ACM.
[21] Carla Teixeira Lopes, Departamento de EngenhariaInformáticaFaculdade de Engenharia,
Universidade do Porto, Rua Dr. Roberto Frias , Portugal, Cristina Ribeiro, Departamento de
EngenhariaInformáticaFaculdade de Engenharia, Universidade do “Context Effect on Query
Formulation and Subjective Relevance in Health Searches” August 18–21, 2010, New
Brunswick, New Jersey, USA. Copyright 2010 ACM.
[22] Xiaohui Yan, JiafengGuo, Xueqi Cheng, Institute of Computing Technology, CASBeijing,
China “Context-Aware Query Recommendation by Learning High-Order Relation in Query
Logs” October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM.
[23] HaizhouFu,North Carolina State, University, Raleigh, NC, SidanGao,North Carolina State
University, Raleigh, NC,KemaforAnyanwu,North Carolina State, University, Raleigh, NC
“CoSi: Context-Sensitive Keyword Query Interpretation on RDF Databases” 2011, March 28–
April 1, 2011, Hyderabad, India. ACM.
193
- 13. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME
[24] Ying-Hsang Liu Nicholas J. Belkin, Rutgers University, USA “Query Reformulation, Search
Performance, and Term Suggestion Devices in Question-Answering Tasks” Information
Interaction in Context, 2008, London, UK Copyright 2008 ACM.
[25] Protima Banerjee, College of Information Science and Technology, Drexel University
Philadelphia, and Hyoil Han ,College of Information Science and Technology, Drexel
University Philadelphia, USA “Incorporation of Corpus-Specific Semantic Information into
Question Answering Context” October 30, 2008, Napa Valley, California, USA. Copyright
2008 ACM.
[26] A. K. Sharma Computer Engg. Department YMCA Univ. of Sc. & Technology Faridabad,
India, NeelamDuhan Computer Engg. Department YMCA Univ. of Sc. & Technology
Faridabad, India and Bharti Sharma Computer Engg. Department MVN Instt. ofEngg&
Technology Palwal, India“A Semantic Search System using Query Definitions” December 28-
30, 2010, Allahabad, UP, India. Copyright 2010 ACM.
[27] WenweiXue, HungkengPung, Paulito P. PalmesSchool of Computing, National University of
Singapore , Singapore 117543 and Tao GuInstitute for Infocomm Research ,Terrace, Singapore
“Schema Matching for Context-Aware Computing” September 21-24, 2008, Seoul, Korea.
Copyright 2008 ACM.
[28] Huanhuan Cao1 Derek Hao Hu2 Dou Shen3 Daxin Jiang4 ,Jian-Tao Sun4 ,Enhong Chen and
Qiang Yang2 ,1University of Science and Technology of China 2Hong Kong University of
Science and Technology 3Microsoft Corporation 4Microsoft Research Asia “Context-Aware
Query Classification” July 19–23, 2009, Boston, Massachusetts, USA. Copyright 2009 ACM.
[29] Reiner Kraft, Chi Chao Chang, FarzinMaghoul, Ravi Kumar Yahoo!, Inc. Sunnyvale, USA
“Searching with Context”.
[30] Mariam Daoud,LyndaTamine-Lechani and MohandBoughanem Institute de Recherche
enInformatique de Toulouse, France“Learning user interests for a session-based personalized
search” Information Interaction in Context, 2008, London, UK. Copyright 2008 ACM.
[31] Ji-Rong Wen, Microsoft Research Asia Beijing, China,Ni Lao, Tsinghua University Beijing,
China and Wei-Ying Ma Microsoft Research Asia Beijing, China “Probabilistic Model for
Contextual Retrieval” July 25-29, 2004, Sheffield, South Yorkshire, UK. Copyright 2004 ACM.
[32] Lidong Bing Wai Lam ,Department of Systems Engineering and Engineering Management, The
Chinese University of Hong Kong Shatin, Hong Kong and Tak-Lam Wong Department of
Mathematics and Information Technology The Hong Kong Institute of Education “Using Query
Log and Social Tagging to Refine Queries Based on Latent Topics” October 24–28, 2011,
Glasgow, Scotland, UK. Copyright 2011 ACM.
[33] Liang Jeff Chen, UC San Diego La Jolla, CA, US and YannisPapakonstantinou UC San Diego
“Context-sensitive Ranking for Document Retrieval” June12–16, 2011, Athens, Greece.
Copyright 2011 ACM.
[34] Reiner Kraft, FarzinMaghoul and Chi Chao ChangYahoo!, Inc.701 First AvenueSunnyvale, CA
94089“Y!Q: Contextual Search at the Point of Inspiration” October 31–November 5, 2005,
Bremen, Germany. Copyright 2005 ACM.
[35] ZimingZhuang, The Pennsylvania State University, University Park, USA and SilviuCucerzan
Microsoft Research Redmond, USA “Re-Ranking Search Results Using Query Logs”
November 5–11, 2006, Arlington, Virginia, USA. ACM.
[36] Huanhuan Cao1 Daxin Jiang2 Jian Pei3 Qi He4, Zhen Liao5, Enhong Chen1 ,Hang Li2
,1University of Science and Technology of China ,2Microsoft Research Asia, 3Simon Fraser
University,4Nanyang Technological University ,5Nankai University“Context-Aware Query
Suggestion by Mining Click-Through and Session Data” August 24–27, 2008, Las Vegas,
Nevada, USA. Copyright 2008 ACM.
[37] Christian Sengstock and Michael Gertz Institute of Computer Science, University of
Heidelberg, Germany“CONQUER: A System for Efficient Context-awareQuery Suggestions”
2011, March 28–April 1, 2011, Hyderabad, India, ACM.
194