SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Linked Census Data
                                    Rinke Hoekstra
                               CEDAR Kickoff, 26 January 2012




donderdag 26 januari 12
Overview
             “Can Linked Data make a difference for historical analysis?”


              Problem

              Procedure (as I understand it)

              Step-by-step

              Vocabularies, tools

              Conclusion


donderdag 26 januari 12
Problem
              ~519 Excel spreadsheets (more?... I heard 1200)

              Want to do analysis over time and space, but...

              Structure

                    Excel sheets cannot be readily imported in a database

              Contents

                    Excel sheets are not normalised (age) nor harmonised (occupations/places)

                    Excel sheets contain errors (both original and data-entry)

              Want to preserve all stages of data cleansing/harmonisation


donderdag 26 januari 12
Procedure
                                  Verbatim import of sheets to
            Archiving                database/triple store

           Correcting/         Add missing information (headers)




                                                                          Documenting
          Interpreting          Add corrected information (data)


         Normalising       Interpret and correct objective information


                             Link information across sheets
        Harmonising Link information to other datasets (e.g. locations)


           Visualising       Build (generic) visualisations of results



donderdag 26 januari 12
... a bit about Linked Data

              “Just another Data Model”
              RDF ≠ Ontology (OWL)
              RDF ≠ Taxonomy (RDFS/SKOS)


              Globally Unique Identifiers (URI) for all entities

              Dereferencable on the Web (URI = URL)

              HTTP-accessible databases (triple stores, SPARQL)

              Triples all the way             <subject,	
  predicate,	
  object>



donderdag 26 januari 12
Spreadsheet ≠ Database

                          Primary Keys are entities

                          Column names are attributes

                          Cell values are attribute values

                          Secondary keys are relations to
                          other entities




donderdag 26 januari 12
Spreadsheet ≠ Database

                          Primary Keys are entities

                          Column names are attributes

                          Cell values are attribute values

                          Secondary keys are relations to
                          other entities




donderdag 26 januari 12
Spreadsheet ≠ Database

                          Primary Keys are entities

                          Column names are attributes

                          Cell values are attribute values

                          Secondary keys are relations to
                          other entities




donderdag 26 januari 12
Spreadsheet ≠ Database
                          No Primary Keys!

                          Anything can be an entity

                          Column headers are “types”

                          Row headers are “types”

                          Hierarchies!

                          Cell values are entity “values”

                          No relations to other entities


donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook

                                       Cell   Cell   Cell


                               Sheet   Cell   Cell   Cell


                                       Cell   Cell   Cell



                                       Cell   Cell   Cell


                               Sheet   Cell   Cell   Cell


                                       Cell   Cell   Cell




donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook1.xls

                                          Sheet1:A1   Sheet1:B1   Sheet1:C1


                               Sheet1     Sheet1:A2   Sheet1:B2   Sheet1:C2


                                             ...         ...         ...



                                          Sheet2:A1   Sheet2:B1   Sheet2:C1


                               Sheet2     Sheet2:A2   Sheet2:B2   Sheet2:C2


                                             ...         ...         ...




donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook1.xls

                                          workers    agriculture   12


                               Sheet1                 industry     6


                                                         ...       ...


                                          diamond
                                                         A         34
                                           cutters


                               Sheet2                    B         67


                                             ...         ...       ...




donderdag 26 januari 12
Anatomy of a Spreadsheet

                          Workbook1.xls

                                          workers    agriculture   12


                               Sheet1                 industry     6


                                                         ...       ...


                                          diamond
                                                         A         34
                                           cutters


                               Sheet2                    B         67


                                             ...         ...       ...




                                                                     NB: all URIs scoped to sheet!



donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?

              SDMX (Eurostat, World Bank, CBS, etc.)



              Every data item is an observation

              Every observation has a value

              Every observation has one or more dimensions


donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?

              SDMX (Eurostat, World Bank, CBS, etc.)



              Every data item is an observation

              Every observation has a value

              Every observation has one or more dimensions


donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?                                                12



                                                                                                                  1878

              SDMX (Eurostat, World Bank, CBS, etc.)                                                          M


                                                                                                                    O
                                              I
                                                                                             leeftijd

                                                      nummer der beroepsklasse                                                   geboortejaar


              Every data item is an observation                                                          geslacht

                                                                                                              huwelijkse staat


                                                  E      pannenbakkers

              Every observation has a value                                                  beroep

                                                                                                    positie
                                                                                         D                          1




              Every observation has one or more dimensions    letter der beroepsklasse




donderdag 26 januari 12
Data Cube

              How to best represent numeric data, in a flexible way?                                                 12



                                                                                                                   1878

              SDMX (Eurostat, World Bank, CBS, etc.)                                                           M


                                                                                                                       O
                                              I
                                                                                             leeftijd   ?
                                                      nummer der beroepsklasse                                                               ?
                                                                                                                                  geboortejaar


              Every data item is an observation                                                                    ?
                                                                                                            geslacht
                                                                                                                             ?
                                                                                                               huwelijkse staat


                                                  E      pannenbakkers

              Every observation has a value                                                  beroep

                                                                                                    positie
                                                                                         D                             1




              Every observation has one or more dimensions    letter der beroepsklasse




donderdag 26 januari 12
Anatomy of a Spreadsheet

                           Properties   Headers




                          RowHeaders     Data




donderdag 26 januari 12
Anatomy of a Spreadsheet

                           Properties   Headers




                          RowHeaders     Data




donderdag 26 januari 12
Anatomy of a Spreadsheet

                                      Properties      Headers




                                     RowHeaders        Data




                          http://github.com/Data2Semantics/TabLinker
donderdag 26 januari 12
:I
                                                                                                                                "1"^^xsd:int


                               skos:broader                  :Nummer_der_beroepsklasse                            d2s:populationSize



                                   :I/E              :Letter__Onderdeel_beroepsklasse_                     _:x                    d2s:dimension            :14--15_1875--1874


                                                                                                                                          d2s:dimension
                               skos:broader
                                                                                                                                                                  :M

             :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen                                   d2s:dimension


                                                                                                                                                                  :O
                                                                           :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D
         Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers

                                                                                               :D




                                                                                                                                       Sheet1:D15




donderdag 26 januari 12
d2s:HierarchicalRowHeader                                                                                   d2s:DataCell                                                 d2s:Header



                                     rdf:type                                                                            rdf:type                                                     rdf:type
                          rdf:type                                                                                                                                                                  rdf:type
            rdf:type                                                                                                                                                                                             rdf:type


   Sheet1:E15            Sheet1:C14             Sheet1:B8                                                             Sheet1:L15                                                    Sheet1:L3             Sheet1:L4         Sheet1:L5


                                                d2s:isDimension


                                                        :I
                           d2s:isDimension                                                                                                     "1"^^xsd:int
                                                                                                                     d2s:isObservation                                            d2s:isDimension


                                                   skos:broader           :Nummer_der_beroepsklasse                                                                                                      d2s:isDimension
                                                                                                                                 d2s:populationSize



d2s:isDimension                                        :I/E       :Letter__Onderdeel_beroepsklasse_                        _:x                   d2s:dimension             :14--15_1875--1874                     d2s:isDimension



                                                                                                                                                         d2s:dimension
                                                   skos:broader
                                                                                                                                                                                        :M

                          :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen                                      d2s:dimension
                                                                                                                                         :Regelnummer
                                                                                                                                                                                        :O
                                                                                        :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D                        d2s:dimension
                       Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers

                                                                                                            :D                                            :5                            :10



                                                                                                      d2s:isDimension                             d2s:isDimension                 d2s:isDimension



                                                                                                       Sheet1:F15                                     Sheet1:D15                   Sheet1:L6


                                                                                                          rdf:type                                             rdf:type    rdf:type



                                                                                                      d2s:RowHeader                                              d2s:Metadata




donderdag 26 januari 12
What TabLinker can’t do
              Annotations
              “footnote”-style on separate sheet

              Interpret functions
              e.g. automatic sums

              Integrate/harmonise across sheets/files

              Additional useful functionality:

                    “checksum” functionality

                    Export to database tables

donderdag 26 januari 12
Normalising & Correcting

                             "1"^^xsd:int


                           d2s:populationSize



                                  _:x



                             d2s:dimension



                          :14--15_1875--1874




donderdag 26 januari 12
Normalising & Correcting

                             "1"^^xsd:int          "1"^^xsd:int                             "11"^^xsd:int


                           d2s:populationSize    d2s:populationSize d2s:populationSize
                                                                                                    "1889"^^xsd:int
                                                                         d2s:censusYear
                                  _:x                   _:x
                                                                         d2s:birthYears
                                                                                                            :1875--1874
                                                                             d2s:gemeente

                             d2s:dimension         d2s:dimension      d2s:ageGroup

                                                                                                                 :Assendelft

                          :14--15_1875--1874    :14--15_1875--1874                            :14-15




donderdag 26 januari 12
Documenting

        <http://example.com/workbook1/sheet1>      <http://example.com/workbook1/sheet1/corrected>                                                              provo:Activity
                                                                                                                                                  rdf:type
                                                                                                                         :curation20120126
                     "1"^^xsd:int                              "11"^^xsd:int
                                                                                                     provo:wasGeneratedBy                     provo:hadAgent

                                                                                                                                provo:startedAt
                   d2s:populationSize d2s:populationSize                                                            provo:endedAt
                                                                       "1889"^^xsd:int                                                                          :RinkeHoekstra
                                           d2s:censusYear
                          _:x
                                           d2s:birthYears
                                                                               :1875--1874                         _:b                      _:a
                                                d2s:gemeente
                     d2s:dimension      d2s:ageGroup
                                                                                                            time:inXSDDateTime           time:inXSDDateTime
                                                                                    :Assendelft

                 :14--15_1875--1874                              :14-15
                                                                                                          "20120126T09:00:00"                 "20120126T08:30:00"




                                                               http://www.w3.org/TR/prov-o/


donderdag 26 januari 12
Harmonising

                                                                                  I



                                                                                              skos:broader
                                                                            skos:broader
                                                    skos:broader


                                              D                                   E                                  A



                                                                   skos:broader       skos:broader                       skos:broader
                               skos:broader



                                                                                                                                     Fabricage van
                                                 Fabricage van steen                                                                aardewerk (incl.
                    Fabricage van                                                          Fabricage van dakpannen
                                              (molensteen, steenbakkers,                                                          porcelein, terracotta,
                         kalk                                                                  (pannenbakkers)
                                                    tegelbakkers)                                                                   kachelbakkers,
                                                                                                                                  pottenbakkers, enz.)




donderdag 26 januari 12
Harmonising
                                                                                        I



                                                                                                    skos:broader
                                                                                  skos:broader
                                                          skos:broader


                                                    D                                   E                                  A



                                                                         skos:broader       skos:broader                        skos:broader
                                skos:broader



                                                                                                                                              Fabricage van
                                                       Fabricage van steen                                                                   aardewerk (incl.
                  Fabricage van                                                                  Fabricage van dakpannen
                                                    (molensteen, steenbakkers,                                                             porcelein, terracotta,
                       kalk                                                                          (pannenbakkers)
                                                          tegelbakkers)                                                                      kachelbakkers,
                                                                                                                                           pottenbakkers, enz.)

                   skos:exactMatch                       skos:broadMatch                              skos:broadMatch                          skos:closeMatch
                                     skos:exactMatch                                                                     skos:exactMatch
                                                                                 skos:exactMatch


                    HISCO:23811                           HISCO:25281                                      HISCO:25281                          HISCO:26345



                                      HISCO:23810                                HISCO:25281                               HISCO:26340




donderdag 26 januari 12
Harmonising
                                                                                             I



                                                                                                          skos:broader
                                                                                     skos:broader
                                                             skos:broader


                                                    D                                        E                                    A



                                                                            skos:broader         skos:broader                         skos:broader
                                     skos:broader



                                                                                                                                                  Fabricage van
                                                          Fabricage van steen                                                                    aardewerk (incl.
                          Fabricage van                                                                Fabricage van dakpannen
                                                       (molensteen, steenbakkers,                                                              porcelein, terracotta,
                               kalk                                                                        (pannenbakkers)
                                                             tegelbakkers)                                                                       kachelbakkers,
                                                                                                                                               pottenbakkers, enz.)

                                                                                            Sheet1:I



                                                                                           skos:broader            skos:broader
                                                           skos:broader


                                            Sheet1:D                                       Sheet1:E                                    Sheet1:A



                                                                               skos:broader         skos:broader                              skos:broader
                                    skos:broader


                                                                                                                                                 Sheet1:Fabricage van
                                                         Sheet1:Fabricage van steen                        Sheet1:Fabricage van                     aardewerk (incl.
                           Sheet1:Fabricage
                                                         (molensteen, steenbakkers,                             dakpannen                         porcelein, terracotta,
                               van kalk
                                                               tegelbakkers)                                 (pannenbakkers)                        kachelbakkers,
                                                                                                                                                  pottenbakkers, enz.)

donderdag 26 januari 12
I



                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




donderdag 26 januari 12
I
                                                                                                                                                        Is SKOS sufficient?
                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




        NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin.

donderdag 26 januari 12
I
                                                                                                                                                        Is SKOS sufficient?
                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




        NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin.

donderdag 26 januari 12
I
                                                                                                                                                        Is SKOS sufficient?
                                                                                              skos:broader
                                                                           skos:broader
                                                   skos:broader


                                            D                                    E                                        A




            1889             skos:broader
                                                                  skos:broader       skos:broader                                skos:broader




                                                                                                                                             Fabricage van
                                               Fabricage van steen                                                                          aardewerk (incl.
                Fabricage van                                                              Fabricage van dakpannen
                                            (molensteen, steenbakkers,                                                                    porcelein, terracotta,
                     kalk                                                                      (pannenbakkers)
                                                  tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                          pottenbakkers, enz.)




                                                          skos:narrowMatch                                   I                skos:closeMatch


                   skos:exactMatch
                                                                                                                                                             skos:narrowMatch
                                                                                                                         skos:broader
                                                                                                     skos:broader
                                                                            skos:broader


                                                                    D                                        E                                     A



                                                                                            skos:broader         skos:broader                           skos:broader             1899
                                                  skos:broader



                                                                                                                                                                      Fabricage van
                                                                        Fabricage van steen                                                                          aardewerk (incl.
                                       Fabricage van                                                                 Fabricage van dakpannen
                                                                          (steenbakkers,                                                                                porcelein,
                                            kalk                                                                         (pannenbakkers)
                                                                           tegelbakkers)                                                                             kachelbakkers,
                                                                                                                                                                   pottenbakkers, enz.)




        NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin.

donderdag 26 januari 12
Vocabularies, Tools
              Vocabularies
              Data Cube, SKOS, W3C Time, PROV-O

              Excel + TabLinker
              Semi-automatic conversion of Excel sheets to RDF

              ProvTracer
              Create PROV-O provenance trail for shell/python scripts

              Visualization Prototype
              SGVizler (SPARQL + Google Graph API)


donderdag 26 januari 12
Discussion
              Advantages of Linked Data approach

                    Straightforward transformation from spreadsheets

                    Seamless integration of original, corrected and harmonised data

                    Ingestion of external (linked) data

                    Powerful documentation (provenance)

                    Everything is transparently query-able (SPARQL)

                    .... on the Web


donderdag 26 januari 12
Discussion


              Disadvantages of Linked Data approach (subject to research)

                    Size? (300k * 519 sheets = 156M triples)

                    Only rudimentary support for arithmetical operations in queries

                    No dynamic/conditional ‘view’-like graphs




donderdag 26 januari 12
SPARQL vs. SQL?


              Middle ground?

              Expose database through D2RQ




donderdag 26 januari 12
Fin



donderdag 26 januari 12

Contenu connexe

Plus de Rinke Hoekstra

Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataRinke Hoekstra
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerRinke Hoekstra
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Rinke Hoekstra
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataRinke Hoekstra
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for ResearchRinke Hoekstra
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of DataRinke Hoekstra
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckRinke Hoekstra
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie ExtractieRinke Hoekstra
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesRinke Hoekstra
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataRinke Hoekstra
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)Rinke Hoekstra
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design PatternsRinke Hoekstra
 
Publicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataPublicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataRinke Hoekstra
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsRinke Hoekstra
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitRinke Hoekstra
 
Semantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologySemantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologyRinke Hoekstra
 
BestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court ProceedingsBestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court ProceedingsRinke Hoekstra
 
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2Rinke Hoekstra
 

Plus de Rinke Hoekstra (20)

Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research Data
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
 
COMMIT/VIVO
COMMIT/VIVOCOMMIT/VIVO
COMMIT/VIVO
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering Bottleneck
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie Extractie
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design Patterns
 
Publicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataPublicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids Data
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the Netherlands
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site Visit
 
Semantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web TechnologySemantic Modelling using Semantic Web Technology
Semantic Modelling using Semantic Web Technology
 
BestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court ProceedingsBestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
BestPortal: Lessons Learned in Lightweight Semantic Access to Court Proceedings
 
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
BestMap: Context-Aware SKOS Vocabulary Mappings in OWL 2
 

Dernier

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Linked Census Data to RDF

  • 1. Linked Census Data Rinke Hoekstra CEDAR Kickoff, 26 January 2012 donderdag 26 januari 12
  • 2. Overview “Can Linked Data make a difference for historical analysis?” Problem Procedure (as I understand it) Step-by-step Vocabularies, tools Conclusion donderdag 26 januari 12
  • 3. Problem ~519 Excel spreadsheets (more?... I heard 1200) Want to do analysis over time and space, but... Structure Excel sheets cannot be readily imported in a database Contents Excel sheets are not normalised (age) nor harmonised (occupations/places) Excel sheets contain errors (both original and data-entry) Want to preserve all stages of data cleansing/harmonisation donderdag 26 januari 12
  • 4. Procedure Verbatim import of sheets to Archiving database/triple store Correcting/ Add missing information (headers) Documenting Interpreting Add corrected information (data) Normalising Interpret and correct objective information Link information across sheets Harmonising Link information to other datasets (e.g. locations) Visualising Build (generic) visualisations of results donderdag 26 januari 12
  • 5. ... a bit about Linked Data “Just another Data Model” RDF ≠ Ontology (OWL) RDF ≠ Taxonomy (RDFS/SKOS) Globally Unique Identifiers (URI) for all entities Dereferencable on the Web (URI = URL) HTTP-accessible databases (triple stores, SPARQL) Triples all the way <subject,  predicate,  object> donderdag 26 januari 12
  • 6. Spreadsheet ≠ Database Primary Keys are entities Column names are attributes Cell values are attribute values Secondary keys are relations to other entities donderdag 26 januari 12
  • 7. Spreadsheet ≠ Database Primary Keys are entities Column names are attributes Cell values are attribute values Secondary keys are relations to other entities donderdag 26 januari 12
  • 8. Spreadsheet ≠ Database Primary Keys are entities Column names are attributes Cell values are attribute values Secondary keys are relations to other entities donderdag 26 januari 12
  • 9. Spreadsheet ≠ Database No Primary Keys! Anything can be an entity Column headers are “types” Row headers are “types” Hierarchies! Cell values are entity “values” No relations to other entities donderdag 26 januari 12
  • 10. Anatomy of a Spreadsheet Workbook Cell Cell Cell Sheet Cell Cell Cell Cell Cell Cell Cell Cell Cell Sheet Cell Cell Cell Cell Cell Cell donderdag 26 januari 12
  • 11. Anatomy of a Spreadsheet Workbook1.xls Sheet1:A1 Sheet1:B1 Sheet1:C1 Sheet1 Sheet1:A2 Sheet1:B2 Sheet1:C2 ... ... ... Sheet2:A1 Sheet2:B1 Sheet2:C1 Sheet2 Sheet2:A2 Sheet2:B2 Sheet2:C2 ... ... ... donderdag 26 januari 12
  • 12. Anatomy of a Spreadsheet Workbook1.xls workers agriculture 12 Sheet1 industry 6 ... ... diamond A 34 cutters Sheet2 B 67 ... ... ... donderdag 26 januari 12
  • 13. Anatomy of a Spreadsheet Workbook1.xls workers agriculture 12 Sheet1 industry 6 ... ... diamond A 34 cutters Sheet2 B 67 ... ... ... NB: all URIs scoped to sheet! donderdag 26 januari 12
  • 14. Data Cube How to best represent numeric data, in a flexible way? SDMX (Eurostat, World Bank, CBS, etc.) Every data item is an observation Every observation has a value Every observation has one or more dimensions donderdag 26 januari 12
  • 15. Data Cube How to best represent numeric data, in a flexible way? SDMX (Eurostat, World Bank, CBS, etc.) Every data item is an observation Every observation has a value Every observation has one or more dimensions donderdag 26 januari 12
  • 16. Data Cube How to best represent numeric data, in a flexible way? 12 1878 SDMX (Eurostat, World Bank, CBS, etc.) M O I leeftijd nummer der beroepsklasse geboortejaar Every data item is an observation geslacht huwelijkse staat E pannenbakkers Every observation has a value beroep positie D 1 Every observation has one or more dimensions letter der beroepsklasse donderdag 26 januari 12
  • 17. Data Cube How to best represent numeric data, in a flexible way? 12 1878 SDMX (Eurostat, World Bank, CBS, etc.) M O I leeftijd ? nummer der beroepsklasse ? geboortejaar Every data item is an observation ? geslacht ? huwelijkse staat E pannenbakkers Every observation has a value beroep positie D 1 Every observation has one or more dimensions letter der beroepsklasse donderdag 26 januari 12
  • 18. Anatomy of a Spreadsheet Properties Headers RowHeaders Data donderdag 26 januari 12
  • 19. Anatomy of a Spreadsheet Properties Headers RowHeaders Data donderdag 26 januari 12
  • 20. Anatomy of a Spreadsheet Properties Headers RowHeaders Data http://github.com/Data2Semantics/TabLinker donderdag 26 januari 12
  • 21. :I "1"^^xsd:int skos:broader :Nummer_der_beroepsklasse d2s:populationSize :I/E :Letter__Onderdeel_beroepsklasse_ _:x d2s:dimension :14--15_1875--1874 d2s:dimension skos:broader :M :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen d2s:dimension :O :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers :D Sheet1:D15 donderdag 26 januari 12
  • 22. d2s:HierarchicalRowHeader d2s:DataCell d2s:Header rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type Sheet1:E15 Sheet1:C14 Sheet1:B8 Sheet1:L15 Sheet1:L3 Sheet1:L4 Sheet1:L5 d2s:isDimension :I d2s:isDimension "1"^^xsd:int d2s:isObservation d2s:isDimension skos:broader :Nummer_der_beroepsklasse d2s:isDimension d2s:populationSize d2s:isDimension :I/E :Letter__Onderdeel_beroepsklasse_ _:x d2s:dimension :14--15_1875--1874 d2s:isDimension d2s:dimension skos:broader :M :BENAMING_van_de_onderdeelen_der_onderscheidene_beroepsklassen__met_de_daartoe_behoorende_beroepen d2s:dimension :Regelnummer :O :Positie_in_het_beroep__aangeduid_met_A__B__C_of_D d2s:dimension Sheet1:I/E/Fabricage_van_dakpannen__pannenbakkers :D :5 :10 d2s:isDimension d2s:isDimension d2s:isDimension Sheet1:F15 Sheet1:D15 Sheet1:L6 rdf:type rdf:type rdf:type d2s:RowHeader d2s:Metadata donderdag 26 januari 12
  • 23. What TabLinker can’t do Annotations “footnote”-style on separate sheet Interpret functions e.g. automatic sums Integrate/harmonise across sheets/files Additional useful functionality: “checksum” functionality Export to database tables donderdag 26 januari 12
  • 24. Normalising & Correcting "1"^^xsd:int d2s:populationSize _:x d2s:dimension :14--15_1875--1874 donderdag 26 januari 12
  • 25. Normalising & Correcting "1"^^xsd:int "1"^^xsd:int "11"^^xsd:int d2s:populationSize d2s:populationSize d2s:populationSize "1889"^^xsd:int d2s:censusYear _:x _:x d2s:birthYears :1875--1874 d2s:gemeente d2s:dimension d2s:dimension d2s:ageGroup :Assendelft :14--15_1875--1874 :14--15_1875--1874 :14-15 donderdag 26 januari 12
  • 26. Documenting <http://example.com/workbook1/sheet1> <http://example.com/workbook1/sheet1/corrected> provo:Activity rdf:type :curation20120126 "1"^^xsd:int "11"^^xsd:int provo:wasGeneratedBy provo:hadAgent provo:startedAt d2s:populationSize d2s:populationSize provo:endedAt "1889"^^xsd:int :RinkeHoekstra d2s:censusYear _:x d2s:birthYears :1875--1874 _:b _:a d2s:gemeente d2s:dimension d2s:ageGroup time:inXSDDateTime time:inXSDDateTime :Assendelft :14--15_1875--1874 :14-15 "20120126T09:00:00" "20120126T08:30:00" http://www.w3.org/TR/prov-o/ donderdag 26 januari 12
  • 27. Harmonising I skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) donderdag 26 januari 12
  • 28. Harmonising I skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:exactMatch skos:broadMatch skos:broadMatch skos:closeMatch skos:exactMatch skos:exactMatch skos:exactMatch HISCO:23811 HISCO:25281 HISCO:25281 HISCO:26345 HISCO:23810 HISCO:25281 HISCO:26340 donderdag 26 januari 12
  • 29. Harmonising I skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) Sheet1:I skos:broader skos:broader skos:broader Sheet1:D Sheet1:E Sheet1:A skos:broader skos:broader skos:broader skos:broader Sheet1:Fabricage van Sheet1:Fabricage van steen Sheet1:Fabricage van aardewerk (incl. Sheet1:Fabricage (molensteen, steenbakkers, dakpannen porcelein, terracotta, van kalk tegelbakkers) (pannenbakkers) kachelbakkers, pottenbakkers, enz.) donderdag 26 januari 12
  • 30. I skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) donderdag 26 januari 12
  • 31. I Is SKOS sufficient? skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin. donderdag 26 januari 12
  • 32. I Is SKOS sufficient? skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin. donderdag 26 januari 12
  • 33. I Is SKOS sufficient? skos:broader skos:broader skos:broader D E A 1889 skos:broader skos:broader skos:broader skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (molensteen, steenbakkers, porcelein, terracotta, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) skos:narrowMatch I skos:closeMatch skos:exactMatch skos:narrowMatch skos:broader skos:broader skos:broader D E A skos:broader skos:broader skos:broader 1899 skos:broader Fabricage van Fabricage van steen aardewerk (incl. Fabricage van Fabricage van dakpannen (steenbakkers, porcelein, kalk (pannenbakkers) tegelbakkers) kachelbakkers, pottenbakkers, enz.) NB: These are not strings, but globally unique URIs, scoped within their spreadsheet (graph!) of origin. donderdag 26 januari 12
  • 34. Vocabularies, Tools Vocabularies Data Cube, SKOS, W3C Time, PROV-O Excel + TabLinker Semi-automatic conversion of Excel sheets to RDF ProvTracer Create PROV-O provenance trail for shell/python scripts Visualization Prototype SGVizler (SPARQL + Google Graph API) donderdag 26 januari 12
  • 35. Discussion Advantages of Linked Data approach Straightforward transformation from spreadsheets Seamless integration of original, corrected and harmonised data Ingestion of external (linked) data Powerful documentation (provenance) Everything is transparently query-able (SPARQL) .... on the Web donderdag 26 januari 12
  • 36. Discussion Disadvantages of Linked Data approach (subject to research) Size? (300k * 519 sheets = 156M triples) Only rudimentary support for arithmetical operations in queries No dynamic/conditional ‘view’-like graphs donderdag 26 januari 12
  • 37. SPARQL vs. SQL? Middle ground? Expose database through D2RQ donderdag 26 januari 12