SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
MAPPING & INTEGRATING
                                                                                      MULTIPLE FORMS INTO A DATABASE
                                                                                   Yuan An, Ritu Khare, Il-Yeol Song, Xiaohua Hu
Background                                                                            The FormMapper System                                                                                                                                                             Desirable Characteristics
                                                                                                                                                                                                                                                                        of Database
                                                                                                                                                  Semantic Form Tree                                                                                                    (w.r.t. the input form)
     Patient Information                           PatientInformation                                     Tree Extraction Component                                                     Form Mapping and Integration Component
                                                                                                                                                        root                                                                                                            Completeness
                                                                                        FORM
       Date:                                     piId Date Patient HPI VitalSign
                                                                                         X1                 Layered Hidden Markov                                                 Initial Correspondence                                                                Correctness
       Patient                                                                           Y1   z1                Models(HMMs)                       x1          x2
                                                                                                                                                                                 Generation and Validation                                                              Compactness
         Name:                                                                           X2                                                                                                                                                                  DATABASE
                                               Patient                                                                                            Y1      Y2         Y3                                                              Merging
                                                                                         Y2   z2            Parent Child Association                                                                                                                                    Normalization (3NF)
         Gender:       M   F                   pId Name Gender DOB                                                                                                              Database Birthing Algorithm           NEW           Algorithm
                                                                                         Y3   z3                     Rules                                                                                             DB                                               Optimization (minimize
          DOB:                                                                                                                                 z1        z2          z3                                                                                                 potential NULL values & the
       HPI:                                                                            Input Form
                                              Gender          Vital Signs                                                    Fig. 3 The FormMapper System has two components: (1) Tree Extraction (2) Form Integration.                                                 number of database
       Vital Sign                                                                                                                                                                                                                                                       elements)
         Height:                              gId options     vId Height Weight BP
                                                                                      Key Techniques                                                                                                 Tj                                                                   Tj
         Weight:                              001 Male                                                                                                                       Tj                                                             Tj
                                                                                      Hierarchical Representation of Forms as Form Trees                                     ID c                    ID f                                                                 ID
          BP:                                 002 Female                                                                                                                                                                                    ID f
                                                                                      Hidden Markov Models for Form Information Extraction
                                                                                      Sophisticated Matching techniques for Deriving Mapping                                                                                                                                         Tr
  Fig. 1 Using forms as the front-end interface mapping to a back-end database is a                                                                                                                       T                                     T                           T
                                                                                      Correspondences between tree and database                                                                                                                                                      ID fj f
                                                                                                                                                                                                          ID Options                                                        ID
  standard way for data collection. Figure shows a scenario in healthcare domain                                                                          textbox                      radiobutton                               checkbox
                                                                                                                                                                                                                                                ID ck
                                                                                      Form Tree Patterns and DB design principles to translate a                                                          1 Vk
                                                                                      form tree into an equivalent database (See Fig. 4)                  a)Textbox Pattern
Motivation and Focus                                                                  Quantitative metric (quality tuning factor) to facilitate the                                                                                                                 d)Category – Subcategory Pattern
In the quest for database usability, several DIY and WYSIWYG approaches               decision of merging(or not merging) two mapped tables                                                 b)Radiobutton Pattern                       c)Checkbox Pattern
enable non-technical users to design forms. Such approaches (e.g.                                                                                                                                    Fig. 4 Some Form Tree to Database Mapping Patterns.
FormAssembly) automatically translate forms into databases while
shielding the users from technical details. Such approaches, however,
neither support database evolution due to changing user requirements                                                                                                                                                                                                        Implications
nor support multiple users managing a common database.                                Empirical Study in Healthcare                                                                                            FormMapper
                                                                                                                                                                                                               Vs Gold 1
                                                                                                                                                                                                                                                   FormMapper
                                                                                                                                                                                                                                                   Vs Gold 2
                                                                                                                                                                                                                                                                   High potential to replace the
                                                                                                                                                                                                                                                                   human experts
While there exist many techniques to forward engineer a single form to                                                                                                                                                               Perfect
                                                                                                                                                                                                                                                   6%
                                                                                                                                                                                                                                                                   As more forms are mapped, the
                                                                                                                                                                200
an individual back-end database, mapping multiple forms to an existing                         Datasets              Tree Extraction Component                            Database 1        FormMapper              20%              Match                         database grows automatically in
                                                                                                                                                                150
                                                                                                                    Expectation Maximization                                                Gold 1                                   Positive
structured database remains unexplored. This work addresses the                        16 highly complex data-                                                  100                                                                  Mismatch
                                                                                                                                                                                                                                                   40              a principled manner .
                                                                                                                    Algorithm on 52 clinical forms                                          Gold 2             28%         52%                            54
                                                                                                                                                                 50                                                                                %               It is challenging to automate the
problem of automatically mapping multiple(possibly overlapping)                        entry forms from 3                                                                                                                            Negative             %
                                                                                                                    Viterbi Algorithm for decoding                0                                                                  Mismatch                      aspects of mapping that rely on
forms to an existing structured database.                                              healthcare institutions.
                                                                                                                    5 parent child association rules                      Tables Columns Values Foreign                                                            human understanding of domain
                                                                                       Average 57 form                                                                                                                    Fig. 6. Comparison of Tables.
                                                                                                                                                                                                 Keys                                                              semantics.
 Healthy Living Program              Challenges in Mapping Forms to Databases          elements per form            Accuracy: 96.93%                            200
  Date:                                 How to automatically understand a user-                                                                                 150       Database 2                                  FormMapper Vs Gold DB
                                                                                              Benchmarks            Duration: 0.07 sec per form                 100
  Patient                               created form and extract semantic                                                                                                                                      On an average, 87% of the database
                                                                                       16 Gold Standard Trees                                                    50                                                                                                       Work in Progress
    Name:                               relationships among form elements?                                                                                                                                     tables are either identical or
                                                                                       Prepared Using a DIY
                                                                                                                    Form Integration Component                    0                                                                                                Leverage Ontology and Controlled
    DOB:                                                                                                                                                                                                       superior(positive mismatch) to the
                                        How to automatically map the semantic                                       Indexing using Lucene                                 Tables Columns Values      Foreign                                                       Vocabularies to handle semantic
                                                                                       form design tool.                                                                                              Keys     gold database tables based on the
   Social Activities                    model extracted from a form to the                                          Quality tuning factor = 0.5                                                                defined database characteristics.                   heterogeneity.
    Smokes:                             existing database?                             Two sets of 3 Gold                                                       200       Database 3
                                                                                                                                                                                                               Inferior cases (negative mismatch) is               More sophisticated
                                                                                       Standard Databases           Duration: 3 sec per form                    150
                                                                                                                                                                                                                                                                   Correspondence Generation and
    Alcohol:                            How to automatically evolve the existing       prepared by 2 database                                                   100                                            mostly due to the missing
                                        database with desired properties and                                                                                     50                                            correspondences (due to extraction                  Validation Techniques
    Hours Watching TV:                                                                 experts each with at
                                        what are these properties?                     least 10 years of                                                          0                                            inaccuracies) and imprecisely derived               Consider more complicated
    Hours Exercise:
                                                                                       experience.                                                                        Tables Columns Values      Foreign   cardinalities                  among                merging situations (e.g. a table
                                                                                                                                                                                                      Keys     category/subcategory in forms.                      corresponds to a column)
Fig. 2 A New Form representing a                                                                                                                                    Fig. 5. Scale of the evolved Databases
new (or evolved) user requirement




 CVDI is a collaboration between the University of Louisiana at Lafayette & Drexel University

Contenu connexe

En vedette

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

En vedette (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Mapping and Integration of Multiple Forms into Relational Databases

  • 1. MAPPING & INTEGRATING MULTIPLE FORMS INTO A DATABASE Yuan An, Ritu Khare, Il-Yeol Song, Xiaohua Hu Background The FormMapper System Desirable Characteristics of Database Semantic Form Tree (w.r.t. the input form) Patient Information PatientInformation Tree Extraction Component Form Mapping and Integration Component root Completeness FORM Date: piId Date Patient HPI VitalSign X1 Layered Hidden Markov Initial Correspondence Correctness Patient Y1 z1 Models(HMMs) x1 x2 Generation and Validation Compactness Name: X2 DATABASE Patient Y1 Y2 Y3 Merging Y2 z2 Parent Child Association Normalization (3NF) Gender: M F pId Name Gender DOB Database Birthing Algorithm NEW Algorithm Y3 z3 Rules DB Optimization (minimize DOB: z1 z2 z3 potential NULL values & the HPI: Input Form Gender Vital Signs Fig. 3 The FormMapper System has two components: (1) Tree Extraction (2) Form Integration. number of database Vital Sign elements) Height: gId options vId Height Weight BP Key Techniques Tj Tj Weight: 001 Male Tj Tj Hierarchical Representation of Forms as Form Trees ID c ID f ID BP: 002 Female ID f Hidden Markov Models for Form Information Extraction Sophisticated Matching techniques for Deriving Mapping Tr Fig. 1 Using forms as the front-end interface mapping to a back-end database is a T T T Correspondences between tree and database ID fj f ID Options ID standard way for data collection. Figure shows a scenario in healthcare domain textbox radiobutton checkbox ID ck Form Tree Patterns and DB design principles to translate a 1 Vk form tree into an equivalent database (See Fig. 4) a)Textbox Pattern Motivation and Focus Quantitative metric (quality tuning factor) to facilitate the d)Category – Subcategory Pattern In the quest for database usability, several DIY and WYSIWYG approaches decision of merging(or not merging) two mapped tables b)Radiobutton Pattern c)Checkbox Pattern enable non-technical users to design forms. Such approaches (e.g. Fig. 4 Some Form Tree to Database Mapping Patterns. FormAssembly) automatically translate forms into databases while shielding the users from technical details. Such approaches, however, neither support database evolution due to changing user requirements Implications nor support multiple users managing a common database. Empirical Study in Healthcare FormMapper Vs Gold 1 FormMapper Vs Gold 2 High potential to replace the human experts While there exist many techniques to forward engineer a single form to Perfect 6% As more forms are mapped, the 200 an individual back-end database, mapping multiple forms to an existing Datasets Tree Extraction Component Database 1 FormMapper 20% Match database grows automatically in 150 Expectation Maximization Gold 1 Positive structured database remains unexplored. This work addresses the 16 highly complex data- 100 Mismatch 40 a principled manner . Algorithm on 52 clinical forms Gold 2 28% 52% 54 50 % It is challenging to automate the problem of automatically mapping multiple(possibly overlapping) entry forms from 3 Negative % Viterbi Algorithm for decoding 0 Mismatch aspects of mapping that rely on forms to an existing structured database. healthcare institutions. 5 parent child association rules Tables Columns Values Foreign human understanding of domain Average 57 form Fig. 6. Comparison of Tables. Keys semantics. Healthy Living Program Challenges in Mapping Forms to Databases elements per form Accuracy: 96.93% 200 Date: How to automatically understand a user- 150 Database 2 FormMapper Vs Gold DB Benchmarks Duration: 0.07 sec per form 100 Patient created form and extract semantic On an average, 87% of the database 16 Gold Standard Trees 50 Work in Progress Name: relationships among form elements? tables are either identical or Prepared Using a DIY Form Integration Component 0 Leverage Ontology and Controlled DOB: superior(positive mismatch) to the How to automatically map the semantic Indexing using Lucene Tables Columns Values Foreign Vocabularies to handle semantic form design tool. Keys gold database tables based on the Social Activities model extracted from a form to the Quality tuning factor = 0.5 defined database characteristics. heterogeneity. Smokes: existing database? Two sets of 3 Gold 200 Database 3 Inferior cases (negative mismatch) is More sophisticated Standard Databases Duration: 3 sec per form 150 Correspondence Generation and Alcohol: How to automatically evolve the existing prepared by 2 database 100 mostly due to the missing database with desired properties and 50 correspondences (due to extraction Validation Techniques Hours Watching TV: experts each with at what are these properties? least 10 years of 0 inaccuracies) and imprecisely derived Consider more complicated Hours Exercise: experience. Tables Columns Values Foreign cardinalities among merging situations (e.g. a table Keys category/subcategory in forms. corresponds to a column) Fig. 2 A New Form representing a Fig. 5. Scale of the evolved Databases new (or evolved) user requirement CVDI is a collaboration between the University of Louisiana at Lafayette & Drexel University