SlideShare une entreprise Scribd logo
1  sur  6
Base Paper Title:

                   Record Matching over Query Results

                      From Multiple Web Databases




Modified Title:

      Mining sequential patterns matching over high utility data sets.




Abstract:

      Record matching, which identifies the records that represent the same

real-world entity, is an important step for data integration. Most state-of-

the-art record matching methods are supervised, which requires the user to

provide training data. These methods are not applicable for the Web

database scenario, where the records to match are query results

dynamically generated onthe- fly. Such records are query-dependent and a

prelearned method using training examples from previous query results

may fail on the results of a new query. To address the problem of record

                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
matching in the Web database scenario, we present an unsupervised,online

record matching method, UDD, which, for a given query, can effectively

identify duplicates from the query result records of multiple Web

databases. After removal of the same-source duplicates, the “presumed”

non duplicate records from the same source can be used as training

examples alleviating the burden of users having to manually label training

examples. Starting from the non duplicate set, we use two cooperating

classifiers, a weighted component similarity summing classifier and an SVM

classifier, to iteratively identify duplicates in the query results from

multiple Web databases. Experimental results show that UDD works well

for the Web database scenario where existing supervised methods do not

apply.




Existing System:




                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
• Relational database systems

         • All web data base (unknown user are easy to destroy the data

            base)




Proposed System:



  •   False data can discover the actions when unauthorized users

      attempted to access computer systems or authorized users attempted

      to misuse their privileges.

  • Association rule mining

  • An algorithm based on sequential pattern mining using the same data

      collected by the Databases.




                      Ambit lick Solutions
           Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Our Proposed Work apart from Base paper:



Sequential pattern mining

        a. Apriori-like methods(gsp)

        b. Pattern-growth methods(Free Span, Prefix Span)




Hardware Specification



Processor Type            : Pentium -III

Speed                    : 1.6 GHZ

Ram                      : 128 MB RAM


                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Hard disk                  : 8 GB HD



Software Specification



Operating System                 : Linux / Windows

Programming Package : JAVA

Tools                     : Eclipse, Weka Data Mining Tools.

Data Base                 : MySQL

SDK                       : JDK1.5.0




Algorithm:

  • Association rule mining

        o Find large item sets for a given minsup, and

        o Compute rules for a given minconf based on the item sets

             obtained before.

  • Sequential pattern mining

                       Ambit lick Solutions
            Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
•   UDD Algorithm

  •   component weight assignment algorithm




Modules:



  1. Analysis and design of Data sets /items:

  2. Data preprocessing

  3. sequential pattern mining

  4. Record matching with web data base

  5. Performance analysis




                      Ambit lick Solutions
           Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com

Contenu connexe

Plus de ambitlick

Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portal
ambitlick
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
ambitlick
 

Plus de ambitlick (20)

IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2  IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2
 
Adaptive weight factor estimation from user review 1
Adaptive weight factor estimation from user   review 1Adaptive weight factor estimation from user   review 1
Adaptive weight factor estimation from user review 1
 
Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portal
 
Embassy
EmbassyEmbassy
Embassy
 
Crm
Crm Crm
Crm
 
Mutual distance bounding protocols
Mutual distance bounding protocolsMutual distance bounding protocols
Mutual distance bounding protocols
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
 
Efficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsEfficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secrets
 
Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Energy efficient protocol for deterministic
Energy efficient protocol for deterministicEnergy efficient protocol for deterministic
Energy efficient protocol for deterministic
 
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
 
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor NetworksA Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
 
On Multihop Distances in Wireless Sensor Networks with Random Node Locations
On Multihop Distances in Wireless Sensor Networks with Random Node LocationsOn Multihop Distances in Wireless Sensor Networks with Random Node Locations
On Multihop Distances in Wireless Sensor Networks with Random Node Locations
 
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor NetworksAccurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
 
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
 
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
 
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor NetworkEnergy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Dernier (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 

Mining sequential patterns matching over high utility data sets

  • 1. Base Paper Title: Record Matching over Query Results From Multiple Web Databases Modified Title: Mining sequential patterns matching over high utility data sets. Abstract: Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of- the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated onthe- fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on the results of a new query. To address the problem of record Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 2. matching in the Web database scenario, we present an unsupervised,online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the “presumed” non duplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the non duplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply. Existing System: Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 3. • Relational database systems • All web data base (unknown user are easy to destroy the data base) Proposed System: • False data can discover the actions when unauthorized users attempted to access computer systems or authorized users attempted to misuse their privileges. • Association rule mining • An algorithm based on sequential pattern mining using the same data collected by the Databases. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 4. Our Proposed Work apart from Base paper: Sequential pattern mining a. Apriori-like methods(gsp) b. Pattern-growth methods(Free Span, Prefix Span) Hardware Specification Processor Type : Pentium -III Speed : 1.6 GHZ Ram : 128 MB RAM Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 5. Hard disk : 8 GB HD Software Specification Operating System : Linux / Windows Programming Package : JAVA Tools : Eclipse, Weka Data Mining Tools. Data Base : MySQL SDK : JDK1.5.0 Algorithm: • Association rule mining o Find large item sets for a given minsup, and o Compute rules for a given minconf based on the item sets obtained before. • Sequential pattern mining Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 6. UDD Algorithm • component weight assignment algorithm Modules: 1. Analysis and design of Data sets /items: 2. Data preprocessing 3. sequential pattern mining 4. Record matching with web data base 5. Performance analysis Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com