Elca evaluation for keyword search on probabilistic xml data

ELCA: Evaluation for Keyword Search on Probabilistic XML
Data
ABSTRACT
As probabilistic data management is becoming one of the main research focuses and
keyword search is turning into a more popular query means, it is natural to think how to
support keyword queries on probabilistic XML data. With regards to keyword query on
deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics
allows more relevant fragments rooted at the ELCAs to appear as results and is more
popular compared with other keyword query result semantics (such as SLCAs). In this
paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic
XML documents. After defining probabilistic ELCA semantics in terms of possible world
semantics, we propose an approach to compute ELCA probabilities without generating
possible worlds. Then we develop an efficient stack-based algorithm that can find all
probabilistic ELCA results and their ELCA probabilities for a given keyword query on a
probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA
algorithm and compare it with its SLCA counterpart in aspects of result effectiveness, time
and space efficiency, and scalability.
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com

Modules:
Data storage and search:
we describe an approach based on tree-based association rules(tars) mined
rules, which provide approximate, intensional information on both the
structure and the contents of xml documents and can be stored in xml format
as well. There are two main approaches to xml document access: keyword-
based search and query-answering. the idea of mining association rules to
provide summarized representations of xml documents has been investigated
in many proposals either by using languages xquery.
file organization blacks
We do not store the data in a single file because, in hadoop and mapreduce
framework, a file is the smallest unit of input to a mapreduce job and, in the
absence of caching, a file is always read from the disk. if we have all the data
in one file, the whole file will be input to jobs for each query. Instead, we
divide the data into multiple smaller files.
User index based search:
We introduce indexes on tars to further speed up the access to mined trees -
and in general of intentional query answering. In general, path indexes are
proposed to quickly answer queries that follow some frequent path template,
and are built by indexing only those paths having highly frequent queries. We
start from a different perspective: we want to provide quick, and often
approximate, answers also to casual queries.
Query plan generation:
We define the query plan generation problem, and show that generating the
best (i.e., least cost) query plan for the ideal model as well as for the practical

is computationally expensive. then, we will present a heuristic and a greedy
approach to generate an approximate solution to generate the best plan.
Running example:
We will use the following query as a running example in this section.
Running example
select ?v, ?x, ?y, ?z where{
?x xml : type ub : graduatestudent
?y xml : type ub : university
?z ?v ub : department
?x ub : memberof ?z
?x ub : undergraduatedegreefrom ?y }
5. Time Base Search:
Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA
results and their ELCA probabilities for a given keyword query on a probabilistic XML
document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare
it with its SLCA counterpart in aspects of result effectiveness, time.

Existing System:
Semantic web technologies are being developed to present data in standardized way such
that such data can be retrieved and understood by both human and machine. Historically,
web pages are published in plain html files which are not suitable for reasoning.
1. No user data privacy
2. Existing commercial tools and technologies do not scale well in cloud
3. Computing settings.
Proposed System:
Integrates the functionalities proposed in our approach. Given an XML document, it
enables users to extract intensional knowledge and compose traditional queries as well as
queries over the intensional knowledge, receiving both extensional and intensional answers.
Users formulate XQueries over the original data, and queries are automatically translated
and executed on the intensional knowledge.
Propose an approach to compute ELCA probabilities without generating possible worlds.
Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA
results and their ELCA probabilities for a given keyword query on a probabilistic XML
document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare
it with its SLCA counterpart in aspects of result effectiveness, time.

ALGORITHM:
IN THIS SECTION, WE INTRODUCE AN ALGORITHM, PRELCA, TO PUT THE CONCEPTUAL IDEA IN THE
PREVIOUS SECTION INTO PROCEDURAL COMPUTATION STEPS. WE START WITH INDEXING
PROBABILISTIC XML DATA, AND THEN INTRODUCE PRELCA ALGORITHM, IN THE END, WE DISCUSS
WHY IT IS RELUCTANT TO FIND EFFECTIVE UPPER BOUNDS FOR ELCA PROBABILITIES, AND IT TURNS
OUT THAT PRELCA ALGORITHM MAY BE THE ONLY ACCEPTABLE SOLUTION.

System Requirements:
Hardware Requirements:
• System : Pentium IV 2.4 GHz.
• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Sony.
• Ram : 512 Mb.
Software Requirements:
• Operating system : Windows 7.
• Coding Language : ASP.Net 4.0 with C#
• Data Base : SQL Server 2008.

Elca evaluation for keyword search on probabilistic xml data

Recommandé

Recommandé

Contenu connexe

Plus de IEEEFINALYEARPROJECTS

Plus de IEEEFINALYEARPROJECTS (20)

Dernier

Dernier (20)

Elca evaluation for keyword search on probabilistic xml data