SlideShare a Scribd company logo
1 of 1
Impulse Technologies
                                      Beacons U to World of technology
        044-42133143, 98401 03301,9841091117 ieeeprojects@yahoo.com www.impulse.net.in
      Efficient and Effective Duplicate Detection in Hierarchical Data
   Abstract
          Although there is a long line of work on identifying duplicates in relational
   data, only a few solutions focus on duplicate detection in more complex
   hierarchical structures, like XML data. In this paper, we present a novel method for
   XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to
   determine the probability of two XML elements being duplicates, considering not
   only the information within the elements, but also the way that information is
   structured. In addition, to improve the efficiency of the network evaluation, a novel
   pruning strategy, capable of significant gains over the unoptimized version of the
   algorithm, is presented. Through experiments, we show that our algorithm is able
   to achieve high precision and recall scores in several datasets. XMLDup is also
   able to outperform another state of the art duplicate detection solution, both in
   terms of efficiency and of effectiveness. Finally, we also study how important the
   structure of elements is in the duplicate detection process. We observe that, not
   only structure can clearly influence the outcome, but also that, by ensuring a
   structure that is adequate to the characteristics of the data, we can actually improve
   the quality of the results.




  Your Own Ideas or Any project from any company can be Implemented
at Better price (All Projects can be done in Java or DotNet whichever the student wants)
                                                                                          1

More Related Content

What's hot

Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkagePapitha Velumani
 
Master Thesis Abstract
Master Thesis AbstractMaster Thesis Abstract
Master Thesis AbstractBruno Dzogovic
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKMULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKNexgen Technology
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEEFINALYEARSTUDENTPROJECTS
 

What's hot (10)

Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkage
 
Master Thesis Abstract
Master Thesis AbstractMaster Thesis Abstract
Master Thesis Abstract
 
Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Spe165 t
Spe165 tSpe165 t
Spe165 t
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKMULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
 

Viewers also liked

Metodología
MetodologíaMetodología
MetodologíaZevas AJa
 
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014Adi Yoffe
 
Flipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital PlatformFlipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital PlatformHoney de Peralta
 
Sri Lanka: The Land of Serendip
Sri Lanka:  The Land of SerendipSri Lanka:  The Land of Serendip
Sri Lanka: The Land of SerendipMarian Jensen
 
Ancillary film magazine
Ancillary film magazineAncillary film magazine
Ancillary film magazinehannahodlin
 
Wallace and Gromit Exhibition
Wallace and Gromit ExhibitionWallace and Gromit Exhibition
Wallace and Gromit ExhibitionJade Delaney
 
Mon portfolio - JB THOMAS
Mon portfolio - JB THOMASMon portfolio - JB THOMAS
Mon portfolio - JB THOMASjbthomas38
 
Deutsche sprache
Deutsche  spracheDeutsche  sprache
Deutsche spracheAnddel
 
Ibogaine
IbogaineIbogaine
IbogaineiVeada
 
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメントサイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメントCicom Brains Inc.
 
Draft film poster 2
Draft film poster 2Draft film poster 2
Draft film poster 2hannahodlin
 
Film noir target audience
Film noir target audienceFilm noir target audience
Film noir target audienceSFDobson94
 
Film production risk assessment
Film production risk assessmentFilm production risk assessment
Film production risk assessmenthannahodlin
 

Viewers also liked (18)

Favourites 6 vo
Favourites 6 voFavourites 6 vo
Favourites 6 vo
 
Metodología
MetodologíaMetodología
Metodología
 
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
 
Flipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital PlatformFlipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital Platform
 
Sri Lanka: The Land of Serendip
Sri Lanka:  The Land of SerendipSri Lanka:  The Land of Serendip
Sri Lanka: The Land of Serendip
 
Drogas
DrogasDrogas
Drogas
 
Ancillary film magazine
Ancillary film magazineAncillary film magazine
Ancillary film magazine
 
Wallace and Gromit Exhibition
Wallace and Gromit ExhibitionWallace and Gromit Exhibition
Wallace and Gromit Exhibition
 
Mon portfolio - JB THOMAS
Mon portfolio - JB THOMASMon portfolio - JB THOMAS
Mon portfolio - JB THOMAS
 
Deutsche sprache
Deutsche  spracheDeutsche  sprache
Deutsche sprache
 
Ibogaine
IbogaineIbogaine
Ibogaine
 
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメントサイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
 
Documento
DocumentoDocumento
Documento
 
Manusia purba
Manusia purbaManusia purba
Manusia purba
 
Draft film poster 2
Draft film poster 2Draft film poster 2
Draft film poster 2
 
Film noir target audience
Film noir target audienceFilm noir target audience
Film noir target audience
 
Articulo triboniano caceres
Articulo triboniano caceresArticulo triboniano caceres
Articulo triboniano caceres
 
Film production risk assessment
Film production risk assessmentFilm production risk assessment
Film production risk assessment
 

Similar to 3

RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESijwscjournal
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESijwscjournal
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarityitrejos
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Dotnet a graph-based consensus maximization approach for combining multiple ...
Dotnet  a graph-based consensus maximization approach for combining multiple ...Dotnet  a graph-based consensus maximization approach for combining multiple ...
Dotnet a graph-based consensus maximization approach for combining multiple ...Ecwaytech
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecway2004
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...ecwayprojects
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytechnoz
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytech
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwayt
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytechnoz
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwayt
 

Similar to 3 (20)

K04302082087
K04302082087K04302082087
K04302082087
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULES
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULES
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarity
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
2
22
2
 
2
22
2
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
Dotnet a graph-based consensus maximization approach for combining multiple ...
Dotnet  a graph-based consensus maximization approach for combining multiple ...Dotnet  a graph-based consensus maximization approach for combining multiple ...
Dotnet a graph-based consensus maximization approach for combining multiple ...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 

More from IMPULSE_TECHNOLOGY (20)

17
1717
17
 
16
1616
16
 
15
1515
15
 
25
2525
25
 
24
2424
24
 
23
2323
23
 
22
2222
22
 
21
2121
21
 
20
2020
20
 
19
1919
19
 
18
1818
18
 
16
1616
16
 
15
1515
15
 
14
1414
14
 
13
1313
13
 
12
1212
12
 
11
1111
11
 
10
1010
10
 
9
99
9
 
8
88
8
 

Recently uploaded

31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 

Recently uploaded (20)

Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 

3

  • 1. Impulse Technologies Beacons U to World of technology 044-42133143, 98401 03301,9841091117 ieeeprojects@yahoo.com www.impulse.net.in Efficient and Effective Duplicate Detection in Hierarchical Data Abstract Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. In this paper, we present a novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup is also able to outperform another state of the art duplicate detection solution, both in terms of efficiency and of effectiveness. Finally, we also study how important the structure of elements is in the duplicate detection process. We observe that, not only structure can clearly influence the outcome, but also that, by ensuring a structure that is adequate to the characteristics of the data, we can actually improve the quality of the results. Your Own Ideas or Any project from any company can be Implemented at Better price (All Projects can be done in Java or DotNet whichever the student wants) 1