SlideShare une entreprise Scribd logo
1  sur  6
Base Paper Title:

                   Record Matching over Query Results

                      From Multiple Web Databases




Modified Title:

      Mining sequential patterns matching over high utility data sets.




Abstract:

      Record matching, which identifies the records that represent the same

real-world entity, is an important step for data integration. Most state-of-

the-art record matching methods are supervised, which requires the user to

provide training data. These methods are not applicable for the Web

database scenario, where the records to match are query results

dynamically generated onthe- fly. Such records are query-dependent and a

prelearned method using training examples from previous query results

may fail on the results of a new query. To address the problem of record

                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
matching in the Web database scenario, we present an unsupervised,online

record matching method, UDD, which, for a given query, can effectively

identify duplicates from the query result records of multiple Web

databases. After removal of the same-source duplicates, the “presumed”

non duplicate records from the same source can be used as training

examples alleviating the burden of users having to manually label training

examples. Starting from the non duplicate set, we use two cooperating

classifiers, a weighted component similarity summing classifier and an SVM

classifier, to iteratively identify duplicates in the query results from

multiple Web databases. Experimental results show that UDD works well

for the Web database scenario where existing supervised methods do not

apply.




Existing System:




                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
• Relational database systems

         • All web data base (unknown user are easy to destroy the data

            base)




Proposed System:



  •   False data can discover the actions when unauthorized users

      attempted to access computer systems or authorized users attempted

      to misuse their privileges.

  • Association rule mining

  • An algorithm based on sequential pattern mining using the same data

      collected by the Databases.




                      Ambit lick Solutions
           Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Our Proposed Work apart from Base paper:



Sequential pattern mining

        a. Apriori-like methods(gsp)

        b. Pattern-growth methods(Free Span, Prefix Span)




Hardware Specification



Processor Type            : Pentium -III

Speed                    : 1.6 GHZ

Ram                      : 128 MB RAM


                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Hard disk                  : 8 GB HD



Software Specification



Operating System                 : Linux / Windows

Programming Package : JAVA

Tools                     : Eclipse, Weka Data Mining Tools.

Data Base                 : MySQL

SDK                       : JDK1.5.0




Algorithm:

  • Association rule mining

        o Find large item sets for a given minsup, and

        o Compute rules for a given minconf based on the item sets

             obtained before.

  • Sequential pattern mining

                       Ambit lick Solutions
            Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
•   UDD Algorithm

  •   component weight assignment algorithm




Modules:



  1. Analysis and design of Data sets /items:

  2. Data preprocessing

  3. sequential pattern mining

  4. Record matching with web data base

  5. Performance analysis




                      Ambit lick Solutions
           Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com

Contenu connexe

Plus de ambitlick

Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portal
ambitlick
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
ambitlick
 

Plus de ambitlick (20)

IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2  IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2
 
Adaptive weight factor estimation from user review 1
Adaptive weight factor estimation from user   review 1Adaptive weight factor estimation from user   review 1
Adaptive weight factor estimation from user review 1
 
Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portal
 
Embassy
EmbassyEmbassy
Embassy
 
Crm
Crm Crm
Crm
 
Mutual distance bounding protocols
Mutual distance bounding protocolsMutual distance bounding protocols
Mutual distance bounding protocols
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
 
Efficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsEfficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secrets
 
Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Energy efficient protocol for deterministic
Energy efficient protocol for deterministicEnergy efficient protocol for deterministic
Energy efficient protocol for deterministic
 
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
 
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor NetworksA Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
 
On Multihop Distances in Wireless Sensor Networks with Random Node Locations
On Multihop Distances in Wireless Sensor Networks with Random Node LocationsOn Multihop Distances in Wireless Sensor Networks with Random Node Locations
On Multihop Distances in Wireless Sensor Networks with Random Node Locations
 
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor NetworksAccurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
 
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
 
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
 
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor NetworkEnergy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
 

Dernier

Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 

Dernier (20)

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 

Mining sequential patterns matching over high utility data sets

  • 1. Base Paper Title: Record Matching over Query Results From Multiple Web Databases Modified Title: Mining sequential patterns matching over high utility data sets. Abstract: Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of- the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated onthe- fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on the results of a new query. To address the problem of record Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 2. matching in the Web database scenario, we present an unsupervised,online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the “presumed” non duplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the non duplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply. Existing System: Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 3. • Relational database systems • All web data base (unknown user are easy to destroy the data base) Proposed System: • False data can discover the actions when unauthorized users attempted to access computer systems or authorized users attempted to misuse their privileges. • Association rule mining • An algorithm based on sequential pattern mining using the same data collected by the Databases. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 4. Our Proposed Work apart from Base paper: Sequential pattern mining a. Apriori-like methods(gsp) b. Pattern-growth methods(Free Span, Prefix Span) Hardware Specification Processor Type : Pentium -III Speed : 1.6 GHZ Ram : 128 MB RAM Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 5. Hard disk : 8 GB HD Software Specification Operating System : Linux / Windows Programming Package : JAVA Tools : Eclipse, Weka Data Mining Tools. Data Base : MySQL SDK : JDK1.5.0 Algorithm: • Association rule mining o Find large item sets for a given minsup, and o Compute rules for a given minconf based on the item sets obtained before. • Sequential pattern mining Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 6. UDD Algorithm • component weight assignment algorithm Modules: 1. Analysis and design of Data sets /items: 2. Data preprocessing 3. sequential pattern mining 4. Record matching with web data base 5. Performance analysis Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com