SlideShare une entreprise Scribd logo
1  sur  10
Top-k Approach For Compact
   Storage Structure


Guided By,
Dr. Radha Senthilkumar   By,
                         S.Meenakshi,
Assistant Professor
                         2011611009,
Department of IT         M.Tech I.T
Problem Definition
 Evaluating the tree edit distance for large xml trees is
  difficult.
 The best known xml algorithm have cubic run time and
  quadratic complexity is not scalable.
 A core problem is to efficiently prune sub trees.
Literature Survey cont…
 “Efficient Top-k Approximate Subtree Matchingin Small
  Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨
  hlen, and Themis Palpanas, IEEE transactions on knowledge and
  data engineering, vol. 22, no. 8, August 2011.

 The top-k approximatec matches of a small query tree Q within a
  large document tree.
 Using prefix ring buffer that allows to efficiently prune subtrees.
 TASM is portable because it relies on the postorder queue structure
  which can be implemented by any xml processing that allows an
  efficient postorder traversal of trees.
Literature Survey cont…
 Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan
  Wang, Xinxing ChenMay “Optimal top-k generation of attribute
  combinations based on ranked lists” proc. ACM SIGMOD Int’l
  Conf. on Management of Data pp.1-12,2012.

• A novel top-k query type, called top-k,m queries.
• Suppose we are given a set of groups and each group contains a set of
  attributes, each of which is associated with a ranked list of tuples.
• All lists are ranked in decreasing order of the scores of tuples. We
  want the top-k combinations of attributes according to the
  corresponding top-m tuples with matching IDs.
Literature Survey cont..
 K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3,
  pp. 422-433, 1979.


• The string-to-string correction problem, which is to determine the
  distance between two strings as measured by the minimum cost
  sequence of edit operations needed to transform one string into the
  other.
 Three edit operations: changing one node of a tree into another node,
  deleting one node from a tree, or inserting a node into a tree; and they
  presented an algorithm that computes the distance between two
  strings in time O(m* n), where m and n are the lengths of the two
  given strings.
Objective
 To implement the concept of dominating queries
  by the approach of Top-k Approximate Subtree
  Matching Problem.
 To evaluate the performance of dominating
  queries in the compact storage structure.
Dominating Queries
 The number of result is controllable.
 The result is Scaling invariant.
 No user defined ranking function is requierd.
 Each point is assigned an intuitive score which determines
  its rank.

TASM:
• The problem of ranking the k best approximate matches of
  a small query tree in the large document tree.
References
 “Efficient Top-k Approximate Subtree Matchingin Small Memory
  “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and
  Themis Palpanas, IEEE transactions on knowledge and data
  engineering, vol. 22, no. 8, August 2011.
 Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang,
  Xinxing ChenMay “Optimal top-k generation of attribute
  combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf.
  on Management of Data pp.1-12,2012.
 N. Augsten, M.H. Bo¨ hlen, C.E. Dyreson, and J.
  Gamper,“Approximate Joins for Data-Centric XML,” Proc. IEEE 24th
  Int’lConf. Data Eng. (ICDE), pp. 814-823, 2008.
 K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no.
  3, pp. 422-433, 1979.
Timeline Chart
PHASE        REVIEW 1         REVIEW II           REVIEW III

          Learning to work   Implement the      Evaluate the
            with TASM           concept of     dominating
PHASE I         (July)         dominating      queries in compact
                             queries(August-   storage structure
                               September)         ( October and
                                                    November)
Thank You

Contenu connexe

Tendances

A survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matricesA survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matrices
unyil96
 
3D 딥러닝 동향
3D 딥러닝 동향3D 딥러닝 동향
3D 딥러닝 동향
NAVER Engineering
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
IJDKP
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
Editor IJMTER
 
Statistical global modeling of β^- decay halflives systematics ...
Statistical global modeling of β^- decay halflives systematics ...Statistical global modeling of β^- decay halflives systematics ...
Statistical global modeling of β^- decay halflives systematics ...
butest
 
Accelerated training convergence in superposed quantum networks
Accelerated training convergence in superposed quantum networksAccelerated training convergence in superposed quantum networks
Accelerated training convergence in superposed quantum networks
Christopher Altman
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
CSCJournals
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
iaemedu
 

Tendances (14)

Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
A survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matricesA survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matrices
 
3D 딥러닝 동향
3D 딥러닝 동향3D 딥러닝 동향
3D 딥러닝 동향
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
Statistical global modeling of β^- decay halflives systematics ...
Statistical global modeling of β^- decay halflives systematics ...Statistical global modeling of β^- decay halflives systematics ...
Statistical global modeling of β^- decay halflives systematics ...
 
Accelerated training convergence in superposed quantum networks
Accelerated training convergence in superposed quantum networksAccelerated training convergence in superposed quantum networks
Accelerated training convergence in superposed quantum networks
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
 
Ia3613981403
Ia3613981403Ia3613981403
Ia3613981403
 
A0360109
A0360109A0360109
A0360109
 
C0312023
C0312023C0312023
C0312023
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learning
 

En vedette (20)

P73 76
P73 76P73 76
P73 76
 
Bank mini
Bank miniBank mini
Bank mini
 
Nova reklama marketingovoe prodvigenie
Nova reklama marketingovoe prodvigenieNova reklama marketingovoe prodvigenie
Nova reklama marketingovoe prodvigenie
 
Pbl 6
Pbl 6Pbl 6
Pbl 6
 
เฟียเจท์ 1
เฟียเจท์ 1เฟียเจท์ 1
เฟียเจท์ 1
 
P85 89
P85 89P85 89
P85 89
 
Pbl 4.2
Pbl 4.2Pbl 4.2
Pbl 4.2
 
Nova reklama marketingovoe prodvigenie
Nova reklama marketingovoe prodvigenieNova reklama marketingovoe prodvigenie
Nova reklama marketingovoe prodvigenie
 
оптические методы исследования потоков 2003
оптические методы исследования потоков 2003оптические методы исследования потоков 2003
оптические методы исследования потоков 2003
 
8.1
8.18.1
8.1
 
Pbl2
Pbl2Pbl2
Pbl2
 
Memories at GITAM
Memories at GITAMMemories at GITAM
Memories at GITAM
 
130614 ist constructivo
130614 ist constructivo130614 ist constructivo
130614 ist constructivo
 
Pbl4.1
Pbl4.1Pbl4.1
Pbl4.1
 
Pbl3
Pbl3Pbl3
Pbl3
 
geosurge-00
geosurge-00geosurge-00
geosurge-00
 
Pbl 7.2
Pbl 7.2Pbl 7.2
Pbl 7.2
 
Pbl 6
Pbl 6Pbl 6
Pbl 6
 
Pbl1
Pbl1Pbl1
Pbl1
 
Pbl7.2
Pbl7.2Pbl7.2
Pbl7.2
 

Similaire à 2011611009

Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.doc
butest
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
Ninad Samel
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 
Part2- The Atomic Information Resource
Part2- The Atomic Information ResourcePart2- The Atomic Information Resource
Part2- The Atomic Information Resource
JEAN-MICHEL LETENNIER
 

Similaire à 2011611009 (20)

CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters TechniqueDocument Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Technique
 
Decision tree
Decision treeDecision tree
Decision tree
 
DT.pptx
DT.pptxDT.pptx
DT.pptx
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Chapter1_C.doc
Chapter1_C.docChapter1_C.doc
Chapter1_C.doc
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
P229 godfrey
P229 godfreyP229 godfrey
P229 godfrey
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Lloyd Swarmfest 2010 Presentation
Lloyd   Swarmfest 2010 PresentationLloyd   Swarmfest 2010 Presentation
Lloyd Swarmfest 2010 Presentation
 
Clustering
ClusteringClustering
Clustering
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 
Space efficient structures for json documents
Space efficient structures for json documentsSpace efficient structures for json documents
Space efficient structures for json documents
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
Part2- The Atomic Information Resource
Part2- The Atomic Information ResourcePart2- The Atomic Information Resource
Part2- The Atomic Information Resource
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Dernier (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 

2011611009

  • 1. Top-k Approach For Compact Storage Structure Guided By, Dr. Radha Senthilkumar By, S.Meenakshi, Assistant Professor 2011611009, Department of IT M.Tech I.T
  • 2. Problem Definition  Evaluating the tree edit distance for large xml trees is difficult.  The best known xml algorithm have cubic run time and quadratic complexity is not scalable.  A core problem is to efficiently prune sub trees.
  • 3. Literature Survey cont…  “Efficient Top-k Approximate Subtree Matchingin Small Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and Themis Palpanas, IEEE transactions on knowledge and data engineering, vol. 22, no. 8, August 2011.  The top-k approximatec matches of a small query tree Q within a large document tree.  Using prefix ring buffer that allows to efficiently prune subtrees.  TASM is portable because it relies on the postorder queue structure which can be implemented by any xml processing that allows an efficient postorder traversal of trees.
  • 4. Literature Survey cont…  Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing ChenMay “Optimal top-k generation of attribute combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf. on Management of Data pp.1-12,2012. • A novel top-k query type, called top-k,m queries. • Suppose we are given a set of groups and each group contains a set of attributes, each of which is associated with a ranked list of tuples. • All lists are ranked in decreasing order of the scores of tuples. We want the top-k combinations of attributes according to the corresponding top-m tuples with matching IDs.
  • 5. Literature Survey cont..  K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3, pp. 422-433, 1979. • The string-to-string correction problem, which is to determine the distance between two strings as measured by the minimum cost sequence of edit operations needed to transform one string into the other.  Three edit operations: changing one node of a tree into another node, deleting one node from a tree, or inserting a node into a tree; and they presented an algorithm that computes the distance between two strings in time O(m* n), where m and n are the lengths of the two given strings.
  • 6. Objective  To implement the concept of dominating queries by the approach of Top-k Approximate Subtree Matching Problem.  To evaluate the performance of dominating queries in the compact storage structure.
  • 7. Dominating Queries  The number of result is controllable.  The result is Scaling invariant.  No user defined ranking function is requierd.  Each point is assigned an intuitive score which determines its rank. TASM: • The problem of ranking the k best approximate matches of a small query tree in the large document tree.
  • 8. References  “Efficient Top-k Approximate Subtree Matchingin Small Memory “Nikolaus Augsten, Denilson Barbosa, Michael M. Bo¨ hlen, and Themis Palpanas, IEEE transactions on knowledge and data engineering, vol. 22, no. 8, August 2011.  Jiaheng Lu, Pierre Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing ChenMay “Optimal top-k generation of attribute combinations based on ranked lists” proc. ACM SIGMOD Int’l Conf. on Management of Data pp.1-12,2012.  N. Augsten, M.H. Bo¨ hlen, C.E. Dyreson, and J. Gamper,“Approximate Joins for Data-Centric XML,” Proc. IEEE 24th Int’lConf. Data Eng. (ICDE), pp. 814-823, 2008.  K.-C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26,no. 3, pp. 422-433, 1979.
  • 9. Timeline Chart PHASE REVIEW 1 REVIEW II REVIEW III Learning to work Implement the Evaluate the with TASM concept of dominating PHASE I (July) dominating queries in compact queries(August- storage structure September) ( October and November)