SlideShare a Scribd company logo
1 of 38
Importing and Using diverse Schemas
  and Data with the TopBraid Suite
The enterprise data integration problem

                                          How does government
                                          spending in certain
                XML                       sectors relates to my
                                          company’s earnings?


   RDB                                    How does the historic
                                          spending relates to the
                                          current figures?

Spreadsheet
                                          Give me report about
                                          all of my customers
                                          across the whole
                                          organization
 © Copyright 2007-2009 TopQuadrant Inc.                       Slide 2
Merging data with RDF
“Rote” syntactic transformation into RDF (the
mathematically simplest way to denote linked data)          Once in RDF:
                                                             Merges happen as part
   XML                                                      of the infrastructure
                                                             Concepts can be
                                                            mapped to one another
                                                                 For example, to say that

RDB                                                              one notion of
                                                                 “Customer” is more
                                                                 general than another
                                                            Without needing to
Spreadsheet                                                 reference the syntactic
                                                            type of the source!
               Mapping is also captured in RDF
                        Data transformation (on merged data) make no reference to syntax
                        of the source – they can be written in a single language (SPARQL)
© Copyright 2007-2009 TopQuadrant Inc.                                               Slide 3
Semantic Mappings




© Copyright 2007-2009 TopQuadrant Inc.                       Slide 4
Benefits of separating Syntactic details
              from Semantic mapping - 1
  Rote import provides a name (URI) for every
   entity in every data source, so that they can be
   referenced
         It is easier to discuss how "my" use of the word
             "Customer" relates to "your" use than to agree who
             gets to define "Customer“
  By translating into a simple, common language, all
       mappings and transforms can be of the same
       form (i.e., SPARQL).
         In contrast to several transforms for each pair of
             languages
© Copyright 2007-2009 TopQuadrant Inc.                         Slide 5
Benefits of separating Syntactic details
            from Semantic mapping - 2
  Each new kind of source only needs one new
   importer
         In contrast to needing one for each old syntax

  Import modules don’t need to implement the
       merge functionality
         The underlying data representation supports merge as
          a primitive operation
         No need to worry about a number of information
          types and when they can be merged; there is just one.



© Copyright 2007-2009 TopQuadrant Inc.                     Slide 6
TopBraid Suite’s Implementation - 1
  Built-in default converters transform information
       from a variety of sources into RDF (rote import):
         Arbitrary XML, XML Schema, Spreadsheets, Databases,
          etc
         Depending on the complexity, conversion logic is
          either encoded in an ontology or in a Java module
         If round-triping is supported, all information from the
          original is preserved (sometimes in annotations)




© Copyright 2007-2009 TopQuadrant Inc.                      Slide 7
TopBraid Suite’s Implementation - 2
  Once in RDF, SPARQL is used to transform and
       map as needed
         Imported RDF “as-is” may not be what a particular
          application requires
         Transformation steps are represented using mapping
          ontologies and/or SPIN (
          http://www.topquadrant.com/spin/ ) rules/templates
         Entire transformation process is saved as a
          SPARQLMotion (
          http://www.topquadrant.com/sparqlmotion/) script
          for repeated executions

© Copyright 2007-2009 TopQuadrant Inc.                   Slide 8
Semantic XML




© Copyright 2007-2009 TopQuadrant Inc.                  Slide 9
Built-in Converter Example: Semantic XML

 Select an XML file and open it in TopBraid
     Composer (you may need to right click on a file
     and select Open With > TopBraid)
     Each element name becomes a class
     Each attribute becomes datatype property
     Nesting is mapped into a dedicated object property
         (composite:child)

     (we are using a simple file describing people and jobs)


© Copyright 2007-2009 TopQuadrant Inc.                    Slide 10
Built-in Converter Example: Semantic XML


                                         Converted
                                         to RDF




© Copyright 2007-2009 TopQuadrant Inc.               Slide 11
Built-in Converter Example: Semantic XML




  Each element becomes
                                         composite:child property captures
  a class with instances
                                         the hierarchical nesting in the XML
  for each occurrence of
                                         document
  the element in the
  document
© Copyright 2007-2009 TopQuadrant Inc.                                   Slide 12
Semantic Tables




© Copyright 2007-2009 TopQuadrant Inc.                     Slide 13
Built-in Converter Example: Semantic
                                Tables *
 Select an Excel file and simply open it in TopBraid
     Composer
     Each sheet becomes a class
     Columns become datatype properties
     Rows become instances
     Cells will be converted into triples, where the subject is
         the row instance, the predicate is the column property,
         and the object is a literal with the value of the cell

*Assumes that the spreadsheet is structured as a table. Not all spreadsheets
are designed this way. To support different design patterns TopBraid Suite
offers more than one spreadsheet importer.

© Copyright 2007-2009 TopQuadrant Inc.                                         Slide 14
Built-in Converter Example: Semantic
                                Tables


                                              Converted
                                              to RDF




© Copyright 2007-2009 TopQuadrant Inc.            Slide 15
Other default importers
    Relational Databases
           Uses simple mapping of tables to classes, columns and foreign
            keys to properties
    XML profiles
           Extends Semantic XML with pre-built profiles such as one for
            XHTML
    XML Schema
           Complex logic provided in a specialized Java module
    UML, RDFa, RSS, e-Mail, additional spreadsheet
     importers, …


© Copyright 2007-2009 TopQuadrant Inc.                              Slide 16
Merging Data




© Copyright 2007-2009 TopQuadrant Inc.                  Slide 17
Next Steps
   RDF converted from the XML file and RDF from the
    spreadsheet can now be merged:
       Open one, switch to Import tab, drag and drop the second one
        or
       Create a mapping/aggregation file and import both, XML and
        spreadsheet
   Creating connections
       Conceptually XML and Excel examples are linked:
         • XML lis d re p o leinc ingthe jo sa o a tio the
                   ts iffe nt e p     lud    ir b nd rg niza ns y
           w rk fo
            o     r
         • Exc l ha c m a info a n o a d b ind try s c rs
              e s o p ny        rm tio rg nize y us     e to
       But there are no connections in the raw data
       SPARQL queries (CONSTRUCT) including query templates (to
        generalize query patterns) can be used to establish connections
             • Ma p sa re o e in them p ingo lo ie a s rip fo re e t
                      p ing re c rd d  ap   nto g s nd c ts r p a
                 e c n
                  xe utio
© Copyright 2007-2009 TopQuadrant Inc.                          Slide 18
Scripting Data Transformations




© Copyright 2007-2009 TopQuadrant Inc.                Slide 19
Step by Step Example

 Extract and convert data from a real XML file
 Publish result as a web page
 Combine SPARQLMotion, Web Service, Semantic
  XML, and XSD to accomplish the result.
 Step by step instructions are provided, requires
  TopBraid Composer Maestro Edition
 Also requires some familiarity with SPARQL and
  SPARQLMotion
        Recommended first step is to go through the
            SPARQLMotion tutorial and examples at:
            http://www.topquadrant.com/sparqlmotion/
© Copyright 2007-2009 TopQuadrant Inc.                          Slide 20
Open XML file
     We will use an XML file from the US Federal Government
      about the FEA. Download from:
              http://www.whitehouse.gov/omb/assets/fea_docs/FEA_XML_Doc_Rev
               _2_3.xml

     Open it with
      Semantic XML




© Copyright 2007-2009 TopQuadrant Inc.                                Slide 21
Explore converted RDF

    There are 42 BusinessLines in this XML file.
    Each one has a Name, Defintion, and SubFunction detail.
    Click on one and explore in the graph view




© Copyright 2007-2009 TopQuadrant Inc.                           Slide 22
Extract some information using SPARQL

    Looking at the graph, write a SPARQL query that will determine
     the name of the business line and the BusinessLineID

    Check that the
     business line
     in the graph
     appears in the
     solution




© Copyright 2007-2009 TopQuadrant Inc.                          Slide 23
Extract correlated information with
                                 SPARQL
    Extend your query to find the corresponding
     BusinessLineDefinitionText.
    Display just the names and descriptions of the business lines.




Save this query in a
safe place – we’ll use
it later




© Copyright 2007-2009 TopQuadrant Inc.                            Slide 24
Shortcut: SPARQL by EXAMPLE - 1
   Complete queries (or for more complex queries a starting point),
    can be generated directly from a graph
   We call this generation capability “SPARQL by Example” – saves a
    lot of tedious work and helps to prevent mistakes
   To get started display the graph pattern for a single business line
                                                   Click to “pin down”,
                                                   all the classes in the
                                                   diagram, the rest
                                                   will be treated as a
                                                   variable
                                                Click to on the star icon to
                                                generate a query
                                                Run it in the usual way

                                                Looks good, but we are not
                                                getting the text fields
© Copyright 2007-2009 TopQuadrant Inc.                                 Slide 25
Shortcut: SPARQL by EXAMPLE - 2
Click to “pin down” text fields so that they are included in the query
                                                       We get one result
                                                       But we need names and
                                                       descriptions for all business
                                                       lines, not just the one we
                                                       pinned down!


                                                       Modify the query by hand to
                                                       turn the name and
                                                       description into variables
                                                       and to include only these
                                                       variables in the SELECT list




© Copyright 2007-2009 TopQuadrant Inc.                                        Slide 26
Encode the process in SPARQLMotion

    Create a new SPARLQMotion file.
           Click “Yes”, it will declare web services.
    Create a new SPARQLMotion script.
    Start with a CreateSpreadsheet;
     call it findLOB




© Copyright 2007-2009 TopQuadrant Inc.                   Slide 27
Encode the process in SPARQLMotion
    Bring the XML file into the SPARQLMotion script by dragging it
     onto the canvas.
    This automatically makes a SXML import module.




© Copyright 2007-2009 TopQuadrant Inc.                           Slide 28
Encode the process in SPARQLMotion

    Connect these two modules together




© Copyright 2007-2009 TopQuadrant Inc.        Slide 29
Encode the process in SPARQLMotion

    Add your query to findLOB (double-click to edit)




© Copyright 2007-2009 TopQuadrant Inc.                  Slide 30
Encode the process in SPARQLMotion
    Add a “ModifyPrefixes” module to specify the namespace for the
     query you just pasted.
    Connect it with next to findLOB module
    Copy-and paste the base URI of the XML file with a space before
     and a # after




© Copyright 2007-2009 TopQuadrant Inc.                         Slide 31
Test the script
    Run the whole script with the debug button
      select the last step

 Results appear in the Console tab
 Results are in tab-delimited form




© Copyright 2007-2009 TopQuadrant Inc.                     Slide 32
Exposing Results with Web Services




© Copyright 2007-2009 TopQuadrant Inc.             Slide 33
Serve as a web page
 Add a Return Text module to the script. Call it showLOB.
 Make it the last module, right after findLOB




© Copyright 2007-2009 TopQuadrant Inc.                         Slide 34
View as a web page
    Point you browser to:
http://localhost:8083/tbl/actions?action=sparqlmotion&id=showLOB




© Copyright 2007-2009 TopQuadrant Inc.                             Slide 35
Extend the script to create an HTML file
                                                              These first two modules
                                                              can be re-used from the
                                                                    initial script


                                                              xhtml.owl can be found
                                                              in your TBC folder, just
                                                                  drag and drop it

                                                                  ApplyConstruct
                                                              See Copy and Paste file
                                                                    for details

                                         ConvertRDFtoXML
                                          no configuration
                                              needed

                                            ReturnXML
                                         Mimetype text/html
© Copyright 2007-2009 TopQuadrant Inc.                                              Slide 36
Viewing in a Web Browser
http://localhost:8083/tbl/actions?action=sparqlmotion&id=tabulateLOB




© Copyright 2007-2009 TopQuadrant Inc.                                 Slide 37
To Learn More

 Attend one of TopQuadrant’s Semantic Web
 Technology Trainings:
  Semantic Web Technology & Introduction to TopBraid Suite
  TopBraid Suite Advanced Product Training Series

 For scheduled dates, locations and other
 information, visit:
  http://www.topquadrant.com/training/training_overview.html

 Private, on-site trainings are also available
  Call (703) 299-9330 or write to trainings@topquadrant.com.

 © Copyright 2007-2009 TopQuadrant Inc.                   Slide 38

More Related Content

Viewers also liked

Henninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-FinalHenninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-FinalScott Henninger
 
Semantic Enterprise Architecture
Semantic Enterprise ArchitectureSemantic Enterprise Architecture
Semantic Enterprise ArchitectureMichael zur Muehlen
 
Semantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologySemantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologyRinke Hoekstra
 
Semantic Web for Enterprise Architecture
Semantic Web for Enterprise ArchitectureSemantic Web for Enterprise Architecture
Semantic Web for Enterprise ArchitectureJames Lapalme
 
Process Innovation vs. Governance, Risk and Compliance
Process Innovation vs. Governance, Risk and ComplianceProcess Innovation vs. Governance, Risk and Compliance
Process Innovation vs. Governance, Risk and ComplianceMichael zur Muehlen
 
Syntax and semantics
Syntax and semanticsSyntax and semantics
Syntax and semanticsRushdi Shams
 

Viewers also liked (6)

Henninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-FinalHenninger_MakingReferenceDataMoreMeaningful-Final
Henninger_MakingReferenceDataMoreMeaningful-Final
 
Semantic Enterprise Architecture
Semantic Enterprise ArchitectureSemantic Enterprise Architecture
Semantic Enterprise Architecture
 
Semantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologySemantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web Technology
 
Semantic Web for Enterprise Architecture
Semantic Web for Enterprise ArchitectureSemantic Web for Enterprise Architecture
Semantic Web for Enterprise Architecture
 
Process Innovation vs. Governance, Risk and Compliance
Process Innovation vs. Governance, Risk and ComplianceProcess Innovation vs. Governance, Risk and Compliance
Process Innovation vs. Governance, Risk and Compliance
 
Syntax and semantics
Syntax and semanticsSyntax and semantics
Syntax and semantics
 

Similar to Data Transformation using Semantic Web Standards

ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...
ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...
ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...rumito
 
Virtuoso Relational To RDF Mapping
Virtuoso Relational To RDF MappingVirtuoso Relational To RDF Mapping
Virtuoso Relational To RDF Mappingrumito
 
Semantic RDF based integration framework for heterogeneous XML data sources
Semantic RDF based integration framework for heterogeneous XML data sourcesSemantic RDF based integration framework for heterogeneous XML data sources
Semantic RDF based integration framework for heterogeneous XML data sourcesDeniz Kılınç
 
Fyp presentation 2 (SQL Converter)
Fyp presentation 2 (SQL Converter)Fyp presentation 2 (SQL Converter)
Fyp presentation 2 (SQL Converter)Muhammad Shafiq
 
Asp.net interview questions
Asp.net interview questionsAsp.net interview questions
Asp.net interview questionsAkhil Mittal
 
RDA-DCAM and Application Profiles
RDA-DCAM and Application ProfilesRDA-DCAM and Application Profiles
RDA-DCAM and Application ProfilesMikael Nilsson
 
RESTful Services
RESTful ServicesRESTful Services
RESTful ServicesKurt Cagle
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfpbonillo1
 
Making the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked DataMaking the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked DataKingsley Uyi Idehen
 
IBM Solutions '99 XML and Java: Lessons Learned
IBM Solutions '99 XML and Java: Lessons LearnedIBM Solutions '99 XML and Java: Lessons Learned
IBM Solutions '99 XML and Java: Lessons LearnedTed Leung
 
ITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 EssayITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 EssaySheena Crouch
 
Data Virtualization Primer -
Data Virtualization Primer -Data Virtualization Primer -
Data Virtualization Primer -Kenneth Peeples
 
ESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligence
ESWC2008 SPARQL BI OpenLink- SPARQL for Business IntelligenceESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligence
ESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligencerumito
 
A Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlA Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlIRJET Journal
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integrationrumito
 
Catalog-based Conversion from Relational Database into XML Schema (XSD)
Catalog-based Conversion from Relational Database into XML Schema (XSD)Catalog-based Conversion from Relational Database into XML Schema (XSD)
Catalog-based Conversion from Relational Database into XML Schema (XSD)CSCJournals
 

Similar to Data Transformation using Semantic Web Standards (20)

ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...
ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...
ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...
 
Virtuoso Relational To RDF Mapping
Virtuoso Relational To RDF MappingVirtuoso Relational To RDF Mapping
Virtuoso Relational To RDF Mapping
 
Semantic RDF based integration framework for heterogeneous XML data sources
Semantic RDF based integration framework for heterogeneous XML data sourcesSemantic RDF based integration framework for heterogeneous XML data sources
Semantic RDF based integration framework for heterogeneous XML data sources
 
Fyp presentation 2 (SQL Converter)
Fyp presentation 2 (SQL Converter)Fyp presentation 2 (SQL Converter)
Fyp presentation 2 (SQL Converter)
 
Asp.net interview questions
Asp.net interview questionsAsp.net interview questions
Asp.net interview questions
 
RDA-DCAM and Application Profiles
RDA-DCAM and Application ProfilesRDA-DCAM and Application Profiles
RDA-DCAM and Application Profiles
 
RESTful Services
RESTful ServicesRESTful Services
RESTful Services
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
Making the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked DataMaking the Conceptual Layer Real via HTTP based Linked Data
Making the Conceptual Layer Real via HTTP based Linked Data
 
IBM Solutions '99 XML and Java: Lessons Learned
IBM Solutions '99 XML and Java: Lessons LearnedIBM Solutions '99 XML and Java: Lessons Learned
IBM Solutions '99 XML and Java: Lessons Learned
 
ITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 EssayITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 Essay
 
As 400
As 400As 400
As 400
 
Data Virtualization Primer -
Data Virtualization Primer -Data Virtualization Primer -
Data Virtualization Primer -
 
dvprimer-concepts
dvprimer-conceptsdvprimer-concepts
dvprimer-concepts
 
Interoperability
InteroperabilityInteroperability
Interoperability
 
ESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligence
ESWC2008 SPARQL BI OpenLink- SPARQL for Business IntelligenceESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligence
ESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligence
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
A Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using XmlA Survey on Heterogeneous Data Exchange using Xml
A Survey on Heterogeneous Data Exchange using Xml
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integration
 
Catalog-based Conversion from Relational Database into XML Schema (XSD)
Catalog-based Conversion from Relational Database into XML Schema (XSD)Catalog-based Conversion from Relational Database into XML Schema (XSD)
Catalog-based Conversion from Relational Database into XML Schema (XSD)
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Data Transformation using Semantic Web Standards

  • 1. Importing and Using diverse Schemas and Data with the TopBraid Suite
  • 2. The enterprise data integration problem How does government spending in certain XML sectors relates to my company’s earnings? RDB How does the historic spending relates to the current figures? Spreadsheet Give me report about all of my customers across the whole organization © Copyright 2007-2009 TopQuadrant Inc. Slide 2
  • 3. Merging data with RDF “Rote” syntactic transformation into RDF (the mathematically simplest way to denote linked data) Once in RDF:  Merges happen as part XML of the infrastructure  Concepts can be mapped to one another For example, to say that RDB one notion of “Customer” is more general than another Without needing to Spreadsheet reference the syntactic type of the source! Mapping is also captured in RDF Data transformation (on merged data) make no reference to syntax of the source – they can be written in a single language (SPARQL) © Copyright 2007-2009 TopQuadrant Inc. Slide 3
  • 4. Semantic Mappings © Copyright 2007-2009 TopQuadrant Inc. Slide 4
  • 5. Benefits of separating Syntactic details from Semantic mapping - 1  Rote import provides a name (URI) for every entity in every data source, so that they can be referenced  It is easier to discuss how "my" use of the word "Customer" relates to "your" use than to agree who gets to define "Customer“  By translating into a simple, common language, all mappings and transforms can be of the same form (i.e., SPARQL).  In contrast to several transforms for each pair of languages © Copyright 2007-2009 TopQuadrant Inc. Slide 5
  • 6. Benefits of separating Syntactic details from Semantic mapping - 2  Each new kind of source only needs one new importer  In contrast to needing one for each old syntax  Import modules don’t need to implement the merge functionality  The underlying data representation supports merge as a primitive operation  No need to worry about a number of information types and when they can be merged; there is just one. © Copyright 2007-2009 TopQuadrant Inc. Slide 6
  • 7. TopBraid Suite’s Implementation - 1  Built-in default converters transform information from a variety of sources into RDF (rote import):  Arbitrary XML, XML Schema, Spreadsheets, Databases, etc  Depending on the complexity, conversion logic is either encoded in an ontology or in a Java module  If round-triping is supported, all information from the original is preserved (sometimes in annotations) © Copyright 2007-2009 TopQuadrant Inc. Slide 7
  • 8. TopBraid Suite’s Implementation - 2  Once in RDF, SPARQL is used to transform and map as needed  Imported RDF “as-is” may not be what a particular application requires  Transformation steps are represented using mapping ontologies and/or SPIN ( http://www.topquadrant.com/spin/ ) rules/templates  Entire transformation process is saved as a SPARQLMotion ( http://www.topquadrant.com/sparqlmotion/) script for repeated executions © Copyright 2007-2009 TopQuadrant Inc. Slide 8
  • 9. Semantic XML © Copyright 2007-2009 TopQuadrant Inc. Slide 9
  • 10. Built-in Converter Example: Semantic XML  Select an XML file and open it in TopBraid Composer (you may need to right click on a file and select Open With > TopBraid) Each element name becomes a class Each attribute becomes datatype property Nesting is mapped into a dedicated object property (composite:child) (we are using a simple file describing people and jobs) © Copyright 2007-2009 TopQuadrant Inc. Slide 10
  • 11. Built-in Converter Example: Semantic XML Converted to RDF © Copyright 2007-2009 TopQuadrant Inc. Slide 11
  • 12. Built-in Converter Example: Semantic XML Each element becomes composite:child property captures a class with instances the hierarchical nesting in the XML for each occurrence of document the element in the document © Copyright 2007-2009 TopQuadrant Inc. Slide 12
  • 13. Semantic Tables © Copyright 2007-2009 TopQuadrant Inc. Slide 13
  • 14. Built-in Converter Example: Semantic Tables *  Select an Excel file and simply open it in TopBraid Composer Each sheet becomes a class Columns become datatype properties Rows become instances Cells will be converted into triples, where the subject is the row instance, the predicate is the column property, and the object is a literal with the value of the cell *Assumes that the spreadsheet is structured as a table. Not all spreadsheets are designed this way. To support different design patterns TopBraid Suite offers more than one spreadsheet importer. © Copyright 2007-2009 TopQuadrant Inc. Slide 14
  • 15. Built-in Converter Example: Semantic Tables Converted to RDF © Copyright 2007-2009 TopQuadrant Inc. Slide 15
  • 16. Other default importers  Relational Databases  Uses simple mapping of tables to classes, columns and foreign keys to properties  XML profiles  Extends Semantic XML with pre-built profiles such as one for XHTML  XML Schema  Complex logic provided in a specialized Java module  UML, RDFa, RSS, e-Mail, additional spreadsheet importers, … © Copyright 2007-2009 TopQuadrant Inc. Slide 16
  • 17. Merging Data © Copyright 2007-2009 TopQuadrant Inc. Slide 17
  • 18. Next Steps  RDF converted from the XML file and RDF from the spreadsheet can now be merged:  Open one, switch to Import tab, drag and drop the second one or  Create a mapping/aggregation file and import both, XML and spreadsheet  Creating connections  Conceptually XML and Excel examples are linked: • XML lis d re p o leinc ingthe jo sa o a tio the ts iffe nt e p lud ir b nd rg niza ns y w rk fo o r • Exc l ha c m a info a n o a d b ind try s c rs e s o p ny rm tio rg nize y us e to  But there are no connections in the raw data  SPARQL queries (CONSTRUCT) including query templates (to generalize query patterns) can be used to establish connections • Ma p sa re o e in them p ingo lo ie a s rip fo re e t p ing re c rd d ap nto g s nd c ts r p a e c n xe utio © Copyright 2007-2009 TopQuadrant Inc. Slide 18
  • 19. Scripting Data Transformations © Copyright 2007-2009 TopQuadrant Inc. Slide 19
  • 20. Step by Step Example  Extract and convert data from a real XML file  Publish result as a web page  Combine SPARQLMotion, Web Service, Semantic XML, and XSD to accomplish the result.  Step by step instructions are provided, requires TopBraid Composer Maestro Edition  Also requires some familiarity with SPARQL and SPARQLMotion  Recommended first step is to go through the SPARQLMotion tutorial and examples at: http://www.topquadrant.com/sparqlmotion/ © Copyright 2007-2009 TopQuadrant Inc. Slide 20
  • 21. Open XML file  We will use an XML file from the US Federal Government about the FEA. Download from:  http://www.whitehouse.gov/omb/assets/fea_docs/FEA_XML_Doc_Rev _2_3.xml  Open it with Semantic XML © Copyright 2007-2009 TopQuadrant Inc. Slide 21
  • 22. Explore converted RDF  There are 42 BusinessLines in this XML file.  Each one has a Name, Defintion, and SubFunction detail.  Click on one and explore in the graph view © Copyright 2007-2009 TopQuadrant Inc. Slide 22
  • 23. Extract some information using SPARQL  Looking at the graph, write a SPARQL query that will determine the name of the business line and the BusinessLineID  Check that the business line in the graph appears in the solution © Copyright 2007-2009 TopQuadrant Inc. Slide 23
  • 24. Extract correlated information with SPARQL  Extend your query to find the corresponding BusinessLineDefinitionText.  Display just the names and descriptions of the business lines. Save this query in a safe place – we’ll use it later © Copyright 2007-2009 TopQuadrant Inc. Slide 24
  • 25. Shortcut: SPARQL by EXAMPLE - 1  Complete queries (or for more complex queries a starting point), can be generated directly from a graph  We call this generation capability “SPARQL by Example” – saves a lot of tedious work and helps to prevent mistakes  To get started display the graph pattern for a single business line Click to “pin down”, all the classes in the diagram, the rest will be treated as a variable Click to on the star icon to generate a query Run it in the usual way Looks good, but we are not getting the text fields © Copyright 2007-2009 TopQuadrant Inc. Slide 25
  • 26. Shortcut: SPARQL by EXAMPLE - 2 Click to “pin down” text fields so that they are included in the query We get one result But we need names and descriptions for all business lines, not just the one we pinned down! Modify the query by hand to turn the name and description into variables and to include only these variables in the SELECT list © Copyright 2007-2009 TopQuadrant Inc. Slide 26
  • 27. Encode the process in SPARQLMotion  Create a new SPARLQMotion file.  Click “Yes”, it will declare web services.  Create a new SPARQLMotion script.  Start with a CreateSpreadsheet; call it findLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 27
  • 28. Encode the process in SPARQLMotion  Bring the XML file into the SPARQLMotion script by dragging it onto the canvas.  This automatically makes a SXML import module. © Copyright 2007-2009 TopQuadrant Inc. Slide 28
  • 29. Encode the process in SPARQLMotion  Connect these two modules together © Copyright 2007-2009 TopQuadrant Inc. Slide 29
  • 30. Encode the process in SPARQLMotion  Add your query to findLOB (double-click to edit) © Copyright 2007-2009 TopQuadrant Inc. Slide 30
  • 31. Encode the process in SPARQLMotion  Add a “ModifyPrefixes” module to specify the namespace for the query you just pasted.  Connect it with next to findLOB module  Copy-and paste the base URI of the XML file with a space before and a # after © Copyright 2007-2009 TopQuadrant Inc. Slide 31
  • 32. Test the script  Run the whole script with the debug button  select the last step  Results appear in the Console tab  Results are in tab-delimited form © Copyright 2007-2009 TopQuadrant Inc. Slide 32
  • 33. Exposing Results with Web Services © Copyright 2007-2009 TopQuadrant Inc. Slide 33
  • 34. Serve as a web page  Add a Return Text module to the script. Call it showLOB.  Make it the last module, right after findLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 34
  • 35. View as a web page  Point you browser to: http://localhost:8083/tbl/actions?action=sparqlmotion&id=showLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 35
  • 36. Extend the script to create an HTML file These first two modules can be re-used from the initial script xhtml.owl can be found in your TBC folder, just drag and drop it ApplyConstruct See Copy and Paste file for details ConvertRDFtoXML no configuration needed ReturnXML Mimetype text/html © Copyright 2007-2009 TopQuadrant Inc. Slide 36
  • 37. Viewing in a Web Browser http://localhost:8083/tbl/actions?action=sparqlmotion&id=tabulateLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 37
  • 38. To Learn More  Attend one of TopQuadrant’s Semantic Web Technology Trainings:  Semantic Web Technology & Introduction to TopBraid Suite  TopBraid Suite Advanced Product Training Series  For scheduled dates, locations and other information, visit:  http://www.topquadrant.com/training/training_overview.html  Private, on-site trainings are also available  Call (703) 299-9330 or write to trainings@topquadrant.com. © Copyright 2007-2009 TopQuadrant Inc. Slide 38