SlideShare une entreprise Scribd logo
1  sur  70
World Sense-Making using
       Linked Data
           Tope Omitola
(joint work with Prof. Nigel Shadbolt)
    Faculty Research Seminar Talk,
   Birmingham City University, UK
          Thurs 8 Dec. 2011



                                         1
Thank You
   Thank you for inviting me.
World Sense-Making using
       Linked Data
      Tope Omitola



                           3
Talk Outline

   EnAKTing: Its story

   From the Web to Semantic Web to Linked Data

   Public Sector Datasets: Publication and Consumption

   Findability of Appropriate Data Sources – Service Descriptions

   Provenance and Trust in Linked Data
What is EnAKTing?

   EPSRC-funded project.

   Addressing 3 key research problems; (1) how to build
    ontologies quickly that are capable of exploiting the
    potential of large-scale user participation, (2) how we
    query an unbounded web of linked data, (3) how to
    visualise, explore, browse and navigate this mass of
    data.

   Project Leaders: Prof. Sir Tim Berners-Lee, Prof. Dame
    Wendy Hall, and Prof. Nigel Shadbolt.
From the Web to Semantic Web to Linked Data




 The Web of Data


 Problems with the Web of Document


 RDF


 Linked Data
The Web of Data
             (a.k.a Semantic Web/Linked Data)

   Traditional Web of Documents

   Internet, Documents, Links

   Documents in HTML

   Links using URLs

   HTTP for document access and transfer
Data Silos on the Current Web




                                        API

HTML
                  HTML
                                       XML
Some more problems with Web of Documents


   Difficult to Integrate Data
       Example Use Case: Making a Travel Plan


   Data Integration by looking and typing

   Slow Unproductive Workflow

   Difficult for apps to make “sense” of HTML text
Solutions


   Use RDF to give some structure to the data

   RDF <-> subject predicate object

   RDF links things, not just documents, and they
    are typed
RDF is a language (for data)
Words                         URIsand literal text
Nouns and Verbs               Classes andProperties
Sentence structure            RDF Statements (triples)
Paragraphs                    RDF Graphs
Footnotes                     URIs[Domain Name Service]
Dictionaries                  RDF Schemas

 • Generic grammar for languages of description
 • Functions as native language, second language, or pidgin.
RDF and Ontology

   The AAA Slogan: “Anyone can say Anything
    about Any topic.”
       s po . (subject predicate object .)
       <http://en.wikipedia.org/wiki/Tony_Benn><http:/
        /purl.org/dc/elements/1.1/title> "Tony Benn” .

   RDF is used to build ontologies; a formal
    representation of shared knowledge by a set
    of concepts within a domain and the
    relationships between them
   Examples: Finance ontology; MusicBrainz,
    music ontology; GO, gene ontology, etc
What is Linked Data?


   Data, data, everywhere: We are surrounded
    by data: School performance, car fuel
    efficiency, etc

   Data help us to make better decisions

   You can discern the shape and structure of an
    entity by looking at the data it generates

   Data shapes conversations and markets
What is Linked Data?


   Linked Data: Framework where data is a first class
    citizen on the Web

   Evolving the current Web into a Global Data
    Space

   TimBL: 4 principles of Linked Data

       Use URIs as names for things, Use HTTP URIs, When
        someone looks up a URI, provide useful information,
        using the standards (RDF, etc), Include links to other
        URIs, so that they can discover more things
The Web of Linked Data


   Link everything. No silos.


      Thing              Thing           Thing




      Thing                              Thing
                         Thing
The Web of Linked Data


   Linked Data (Semantic Web ) is a graph
    database:
Linked Data

   Advantage comes from linking the RDF(s)
    together.




                                              17
Some Linked Datastores


   BBC

   NY Times

   Guardian

   DBpedia

   Geonames

   …



                                        18
Linking (Linked) Open Data cloud
                      linkeddata.org


   Many of the datastores are being linked
    together to form a network/graph.




                                               19
Linked Data

   In summary:

   Linked Data provides: RDF

   A standardized data access mechanism, HTTP

   Hyperlink-based data discovery, using URIs

   Self-descriptive data, through using shared
    vocabularies
Government Linked Data




 Explosion of Government (Linked) Open
 Data efforts and projects.


data.gov, data.gov.uk, data.gov.au


   Examples:
Public Sector Datasets




 Inherent value in opening up public government
 data

 Systems and Services can be tailored to citizens’
 priorities.

 Likely questions citizens may need answers to
 are:
  – “Where can I find a good school, a good investment
    advisor, a good employer?”


                                                         23
Public Sector Datasets
                           (contd.)


   Integration of datasets enables more complex
    questions to be asked and answered

   Some examples:
    – http://www.planningalerts.com/
    – http://ishortman.com/projects/expendituremap/

   Governments freeing up their data.

   Holy grail is information integration: Meshing.
                                                      24
Issues we focus on



   Findability of appropriate data sources

   SEARCH: Look at the data sources

   EXTRACT: Slicing of data sources

   INTEGRATE: Unifying the views

   EXPLORE: Answering the questions.
Examples of Government Public Data (csv)
Examples of Government Linked Data (rdf)
Workflow
                   Identify Dataset


             Design/ Select Vocabularies


           Extract and convert data into RDF


                Publish as Linked Data


                Consume Linked Data
                   (Application)




                                               28
Publishing your data as Linked Data: Some Things to
                          Consider


   How do you choose a good URI to name things? There are
    guidelines for this. Examples:
   http://dbpedia.org/resource/Wildlife_photography
    Tope Omitola @ Univ of Southampton:http://id.ecs.soton.ac.uk/person/24123
    .
   Describing a Data Set using: voiD (the Vocabulary of Interlinked
    Datasets)
   Choosing and Using Vocabularies to Describe Data (SKOS, RDFS,
    OWL, scovo)
   Sourcing datasets: Where do you get the datasets from (e.g. Semantic
    Web search engines, manual search, etc)
   Choice of join points: When you have different datasets, where do you
    join them together
   Data normalization: using RDF make things easier.
   Alignment of datasets
Architecture




                                         Infer new
                         Data            concepts and
                         Integration     relationships


                                                    SPARQL

                       RDF


          Gatherers
Data      and                 RDF Triplestore
Sources   Extractors
                              (4store)                       Services



                                                                        30
Data Publication – Challenges and Solutions

 Research Questions:
  – In our case, dealing with data that are centred around
    the United Kingdom’s democratic system,
  – Using geography data from the UK’s Ordnance Survey
    as the “join-point” with data for criminal statistics,
    Members of Parliament, mortality rates, etc.
 Sourcing the datasets
  – Many government data sets are in pdf, html, or xls
    files, so automatic discovery methods are not possible
    (yet),
  – Went through manual discovery process, searching for
    them,
  – We found some in pdf, html, and in xls,
  – We decided against pdf and html

                                                         31
Data Publication – Challenges and Solutions (contd.)


– We went for data in xls format. Why?
    • Ability to source from a wider range of public sector
      domains.


Data Source              Format         Dataset
Publicwhip.org.uk        HTML           MP votes records, etc
Theyworkforyou.com       XML dump       Parliament, Parliament
                                        expenses
Homeoffice.gov.uk        Excel          Recorded crime
                                        (England, 2008/09)
Statistics.gov.uk        Excel          Hospital Waiting List
                                        (England 2008/09)
Performance.doh.gov.uk   Excel          Mortality rates
                                        (England 2008/09)
Ordnancesurvey.co.uk     Linked Data    UK’s mapping agency      32
Data Publication – Challenges and Solutions (contd.)


 Data normalisation.


 RDF as our standard model.


 Data conversion to RDF.     Python + Java.

 Modelling the datasets: Multi-dimensional,
  used SCOVO.

                                                         33
Data Publication – Challenges and Solutions (contd.)

 Crime dataset:
Table 7.03 Recorded crime by offence group by police force area, English region and
Wales, 2008/09
                                                                                                                                       Recorded
Numbers                                                                                                                                   crime
Police force area, English     Total   Violence     Sexual   Robbery   Burglary Offences        Other     Fraud    Criminal       Drug    Other
region           and Wales              against   offences                       against         theft      and    damage     offences offences
                                                                                         1
                                            the                                 vehicles     offences    forgery
                                        person
                                                                                 Numbers

Cleveland                     55,094    10,662        566        404     6,175      5,224     13,697        905     13,746      2,636     1,079
Durham                        45,074     7,435        476        170     6,226      4,940      9,674        835     13,027      1,327       964
Northumbria                  105,234    19,147        989        732    11,418     11,620     24,042      2,909     27,178      5,166     2,033
North East Region            205,402    37,244      2,031      1,306    23,819     21,784     47,413      4,649     53,951      9,129     4,076




 :TimePeriodrdf:typeowl:Class; rdfs:subClassOfscovo:Dimension.
 :TP2008_09 rdf:type :TimePeriod.
 :GeographicalRegionrdfs:subClassOfscovo:Dimension;
 dc:title "Police force area, English region and Wales".
 :CriminalOffenceTyperdf:typeowl:Class; rdfs:subClassOfscovo:Dimension.

                                                                                                                                        34
Some Issues in Linked Data

 Co-referencing, i.e. different sources referring to
  the same entities by different names.

 Cardiff in
  Dbpediahttp://dbpedia.org/resource/Cardiff or
  http://dbpedia.org/resource/Cardiff_City

 Cardiff in
  Geonameshttp://sws.geonames.org/2172349/

 Which Cardiff shall we use?


 Solution: sameas service from Southampton
                                                        35
36
Alignment of datasets




                        37
Alignment of Datasets (contd.)

   Asserted owl:sameAs relations between dataset geo
    and O.S. (using string matching)

   For example, the English county of Cumbria was
    aligned as the following:
    <http://enakting.ecs.soton.ac.uk/statistics/data/Cumbria>
    http://www.w3.org/2002/07/owl#sameAs
    <http://data.ordnancesurvey.co.uk/id/7000000000024876>.

   A few special cases. “Yorkshire and the Humber
    Region” vs “Yorkshire & the Humber”

   NHS Trust were labelled differently: e.g. South Tyneside NHS
    Trust had no equivalence in the OS. So used Google Maps.
                                                                   38
Examples of Government Public Data (csv)
Examples of Government Linked Data (rdf)
Recap: Data Publication

 Sourcing : Many not in RDF yet. Some in html,
  pdf, and xls. We chose xls.
 Selection of RDF as the normal form.
 Used scovo to model multidimensional data.
 We used owl:sameAs to assert equivalences
  between geo regions.
 We used string matching. Some did not work,
  e.g. Yorkshire and the Humber. Some have no
  equivalent OS entities, so we had to go via
  Google Maps API
                                                  41
Consuming Linked Data


 How do you visualize linked data sets.


 Linked Data browsers, e.g. Disco, Tabulator.


 Linked Data Search Engines, e.g. Sig.ma,
  Falcons, Sindice.

 Domain-specific Applications and Mashups,
  e.g. dayta.me(from Southampton), US Global
  Foreign Aid Mashup.
Data Consumption


 Application acts as an aggregator of
  information based on user’s postal (zip) code.

 Generates data views     based on geographical
  region of postal code.

 Shows political representatives (MPs) for
  constituencies, their voting records, and their
  expenses.

                                                    43
Data Consumption (contd.)




                            44
Data Consumption(contd.)


   Challenges:
    – The lack of UIs to quickly browse, search or visualise
      views on a widerange of differently modelled data,
    – Lack of suitable tools which allow efficient
      aggregation and presentation of datato the UI from
      multiple datasets,
    – Data consumers having partial knowledge of domain
      and finding it difficult to understand the domain and
      the data being modelled.Points out the need for a
      toolset that helps developers givebetter description of
      the domain being modelled.




                                                            45
Recap: Publish and Consume

 Information Integration; one of the holy grails
 Problems with data sources. Different formats, etc,
 RDF can act as a standard model.
 Publication to RDF. Challenges. Solutions.
    – scovo for multi-dimensional data
    – string matching and its complexities
   Consuming the data. Challenges. Solutions.
    – Aggregating data based on zip code
    – Complexities of geo boundaries
   We have re-published the data we generated into the
    linked data cloud: EnAKTing datasets
    www.enakting.org/enakting/datasets

                                                          46
Some of our Outputs


   http://geoservice.psi.enakting.org: service to discover
    geographical resources,
   http://map.psi.enakting.org/: integrate different PSI Linked
    Data sources by querying Backlinking service,
   http://backlinks.psi.enakting.org: service to discover back-
    links in PSI,
   http://void.rkbexplorer.com/: describes the contents of
    data sets, enabling discovery and reuse of resources,
   http://bagatelles.ecs.soton.ac.uk/psi/: platform for
    integrating several PSI catalogues from the Web
   http://4sreasoner.ecs.soton.ac.uk/ Scalable Reasoning in
    4store; 4sr is a branch of4store where backward chained
    reasoning is implemented
   http://apps.seme4.com/see-uk/ : Visualization tool for
    some UK data
                                                               47
Our solutions/apps
Our solutions/apps
Findability of Appropriate Data Sources – Service
                   Descriptions


 How do you tell the world about your new
  linked data sets?

 Provide good service descriptions of your data
  sets

 Use vocabulary of Interlinked Datasets
Vocabulary of Interlinked Datasets (VoID)

 allows description of datasets and their
  interlinking, e.g. "there are 200k links of type
  gr: predicates between dataset X and dataset Y;
  and dataset Y mainly offers data about homes
  and X about mortgages” .
 A dataset: a set of RDF triples published,
  maintained or aggregated by a single provider,
  and accessible on the Web, e.g.
:DBpedia a void:Dataset .
    allows the description of RDF links between
    datasets (using void:Linkset).
Three Areas of voiD




 General Metadata


 Access Metadata


 Structural Metadata
voiD (contd.)


    General metadata: the dataset's title,
    description, date of creation, the creator,
    publisher, licence, subject(s), etc;
:DBpedia a void:Dataset;
dcterms:title "DBPedia";
dcterms:description "RDF data extracted from Wikipedia";
  dcterms:contributor :FU_Berlin;
dcterms:modified "2008-11-17"^^xsd:datedcterms:contributor
  :OpenLink_Software.
Access metadata: describes how the RDF data(set) can be
                       accessed

   using sparql e.g.
:DBpedia a void:Dataset;
void:sparqlEndpoint<http://dbpedia.org/sparql>.
   using URI lookup,
Sindice a void:Dataset ;
void:uriLookupEndpoint<http://api.sindice.com/v2/
  search?qt=term&q=> .
 using rdf dumps,
:NYTimes a void:Dataset;
void:dataDump<http://data.nytimes.com/people.rdf>.
Structural metadata describes the structure and schema of
                        datasets

 naming some representative example entites for
 a dataset
 stating if datasets' entities share common URIs
:DBpedia a void:Dataset;
void:uriSpace "http://dbpedia.org/resource/” .
    Stating the vocabularies used in a dataset
:LiveJournal a void:Dataset;
void:vocabulary<http://xmlns.com/foaf/0.1/>.
    Providing statistics about datasets, e.g.
    expressing the number of RDF triples or the
    number of entities of a dataset.
:DBpedia a void:Dataset;
void:triples 1000000000 ; void:entities 3400000.
Publishing voiD files


 as void.ttl in the root directory of the site, with a
  local “hash URI” for the dataset, e.g.
  http://example.com/void.ttl#MyDataset.


 Using the root URI of the site, such as
  http://example.com/,
                    as the dataset URI, and serving
  both HTML and an RDF format via content
  negotiation from that URI.

 Embedding the VoID description as HTML+RDFa
  into homepage of dataset, with a local “hash URI”
  for the dataset, yielding URI such as
  http://example.com/#MyDataset.
Why is voiD useful -- voiD Discovery

 By enabling   the discovery and usage of linked
  datasets.
 A sitemap such as http://www.yoursite.com/sitemap.xml
  references void.ttl, and sitemap.xml added robots.txt
  . A search engine crawls the website indexing
  void.ttl plus a cache of the rdf triples referenced in
  this void file.
 through backlinks:
  <document.rdf>void:inDataset<void.ttl#MyDataset>.
 Through a well-known URI: void.ttl can be placed
  in /.well-known/void on any Web server , e.g.
  http://www.example.com/.well-known/void .
@prefix void: <http://rdfs.org/ns/void#> . @prefix scovo: <http://purl.org/NET/scovo#> .

<http://crime.psi.enakting.org/id/void>
     a void:Dataset;
foaf:homepage<http://crime.psi.enakting.org/>;
rdfs:label "crime.psi.enakting.org Linked Data Repository";
dcterms:date "2010-09-13T11:30:29"^^xsd:date;
dcterms:title "crime.psi.enakting.org Linked Data Repository";
foaf:nick "crime";
dcterms:description "United Kingdom's crime statistics per region for the year 2008/09, provided by the
      United Kingdom Home Office. Dataset provenance:
      http://www.homeoffice.gov.uk/rds/pdfs09/hosb1109chap7.xls";
dcterms:publisher<http://crime.psi.enakting.org>;
void:statItem [
scovo:dimensionvoid:numberOfTriples; rdf:value 4988; rdfs:label "4,988 triples”;
   ];
void:subset [
          a void:Linkset; rdfs:label "crime.psi.enakting.org CRS -> http://data.ordnancesurvey.co.uk/";
void:subjectsTarget<http://crime.psi.enakting.org/id/void>;
void:objectsTarget<http://void.rkbexplorer.com/id/dataset/d1d473f29a9091069644824242e9ae07>;
void:linkPredicatecoref:duplicate;
void:statItem [
rdfs:label "133 URI equivalences"; rdf:value 133; scovo:dimensionvoid:numberOfTriples;
     ] ].
Provenance and Trust in Linked Data



 Whom do you trust on the Web?
Provenance and Trust

 Mash-ups, aggregation, integration, data re-use.


 How do you elicit Reliability and Accuracy?


 Generate trust by revealing as much information of
  you as possible.

 Enables consumers to decide the quality and
  trustworthiness of your data.

 Useful for Data Discovery/Mining + Query
  Planning.
Different kinds of Provenance

 When was x derived (when-provenance).


 How was x derived (how-provenance).


 What data was used to derive x (what-
 provenance).

 Who carried out the transformation(s) from
 whence x came (who-provenance).
Provenance Models for Linked Datasets


   Provenance Vocabulary Ontology
Provenance Models for Linked Datasets (contd)

• Open Provenance Model
Provenance Models for Linked Datasets (contd)


   Provenance for Datasets (voidp)
   http://www.enakting.org/provenance/voidp/
voiD Provenance Extension voidp

 Designed to be simple and lightweight.
 Mainly for (RDF) data publishers.
 Includes necessary information of the
 process, its inputs, and outputs.
 Basis is simple: An agent runs a process on a
 data (or dataset) to get another data (or
 dataset).
 Agent → Process → Data → Data’ .
   @prefix voidp:
    <http://purl.org/void/provenance/ns> .
voidp Classes and Predicates

   voidp:ProvenanceEvent:items       under provenance
    control.
   voidp:actor: actor, person, group, software or physical
    artifact, involved in this provenance event.
   voidp:certification:used to contain dataset’ signature
    elements
   voidp:contact: contact details of whom to contact should
    people have queries about this dataset.
   voidp:item:the provenance characteristics of a data item
    under provenance control.
   voidp:processType: the type of transformation or conversion
  procedure carried out on the item’s source
 voidp:resultingDataset: dataset that is the result of this
  provenance event.
 voidp:sourceDataset: source dataset for the data item under
  provenance control.
voidp: A Concrete Example
@prefix voidp: <http://purl.org/void/provenance/ns/> .
<http://crime.psi.enakting.org/id/void>
     a void:Dataset
voidp:activity [
           a voidp:Provenance;
voidp:item [
foaf:name<http://crime.psi.enakting.org/stats/1898/2002/ds1>;
rdf:typescovo:Dataset; rdfs:label "RECORDED CRIME STATISTICS 1898 - 2001/02"@en ;
prv:createdBy [
rdf:typeprv:Actorprv:performedBy<http://tomitola> ;
    ];
voidp:originatingSource<http://rds.homeoffice.gov.uk/rds/pdfs07/recorded-crime-1898-
    2002.xls> ;
voidp:hashValue "12335353535"^^xsd:string ;
voidp:processType<http://void.rkbexplorer.com/id/dataset/123456789> ;
to:hasBeginning"2010-10-24T21:32:52"^^xsd:dateTime ;
to:hasEnd        "2010-10-25T09:32:00"^^xsd:dateTime ;
    ].
voidp in the Wild




    The Datalift project
    http://data.lirmm.fr/ontologies/vdpp

 data.southampton.ac.uk
    http://graphite.ecs.soton.ac.uk/browser/?uri=http
     %3A%2F%2Fid.southampton.ac.uk%2Fdataset%2F
     jargon%2Flatest.rdf
    http://graphite.ecs.soton.ac.uk/browser/?uri=http
     %3A%2F%2Fdata.southampton.ac.uk%2Fdumps%
     2Fjargon%2F2011-11-10%2Fjargon.rdf
Conclusion

 The Web of Data is real


 The Web of Data is here


 It’s time to get on board
http://www.enakting.org/enakting/

          t.omitola@ecs.soton.ac.uk

 Slides at:
  http://www.slideshare.net/TopeOmitola/
  omitola-birmingham-cityuniv
 Questions?



                                           70

Contenu connexe

Tendances

Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
EUCLID project
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für Unternehmen
Sören Auer
 

Tendances (20)

Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Recommender Systems based on Linked Open Data
Recommender Systems based on Linked Open DataRecommender Systems based on Linked Open Data
Recommender Systems based on Linked Open Data
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
Semantic Web / Linked Data Technologies
Semantic Web / Linked Data TechnologiesSemantic Web / Linked Data Technologies
Semantic Web / Linked Data Technologies
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
Connecting Museums with Linked Data
Connecting Museums with Linked DataConnecting Museums with Linked Data
Connecting Museums with Linked Data
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
 
Open data and reuse of public information
Open data and reuse of public informationOpen data and reuse of public information
Open data and reuse of public information
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für Unternehmen
 
RDF: what and why plus a SPARQL tutorial
RDF: what and why plus a SPARQL tutorialRDF: what and why plus a SPARQL tutorial
RDF: what and why plus a SPARQL tutorial
 

En vedette (6)

agINFRA CEFood Presentation
agINFRA CEFood PresentationagINFRA CEFood Presentation
agINFRA CEFood Presentation
 
Tutorial on grid-powered data aggregation and accessing datasets
Tutorial on grid-powered data aggregation and accessing datasetsTutorial on grid-powered data aggregation and accessing datasets
Tutorial on grid-powered data aggregation and accessing datasets
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of content
 
Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview Presentation
 
Semantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientistsSemantic Web, an introduction for bioscientists
Semantic Web, an introduction for bioscientists
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 

Similaire à Omitola birmingham cityuniv

121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
manujam
 
Linked data introduction w exempel
Linked data introduction w exempelLinked data introduction w exempel
Linked data introduction w exempel
Kerstin Forsberg
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
vafopoulos
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
emmanuel_jamin
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
vafopoulos
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
Mediabistro
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
Andre Freitas
 

Similaire à Omitola birmingham cityuniv (20)

Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
Linked data introduction w exempel
Linked data introduction w exempelLinked data introduction w exempel
Linked data introduction w exempel
 
Pragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebPragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic Web
 
Linked sensor data
Linked sensor dataLinked sensor data
Linked sensor data
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Charleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldCharleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data World
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
Future of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic WebFuture of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic Web
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 

Dernier

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Dernier (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Omitola birmingham cityuniv

  • 1. World Sense-Making using Linked Data Tope Omitola (joint work with Prof. Nigel Shadbolt) Faculty Research Seminar Talk, Birmingham City University, UK Thurs 8 Dec. 2011 1
  • 2. Thank You  Thank you for inviting me.
  • 3. World Sense-Making using Linked Data Tope Omitola 3
  • 4. Talk Outline  EnAKTing: Its story  From the Web to Semantic Web to Linked Data  Public Sector Datasets: Publication and Consumption  Findability of Appropriate Data Sources – Service Descriptions  Provenance and Trust in Linked Data
  • 5. What is EnAKTing?  EPSRC-funded project.  Addressing 3 key research problems; (1) how to build ontologies quickly that are capable of exploiting the potential of large-scale user participation, (2) how we query an unbounded web of linked data, (3) how to visualise, explore, browse and navigate this mass of data.  Project Leaders: Prof. Sir Tim Berners-Lee, Prof. Dame Wendy Hall, and Prof. Nigel Shadbolt.
  • 6. From the Web to Semantic Web to Linked Data  The Web of Data  Problems with the Web of Document  RDF  Linked Data
  • 7. The Web of Data (a.k.a Semantic Web/Linked Data)  Traditional Web of Documents  Internet, Documents, Links  Documents in HTML  Links using URLs  HTTP for document access and transfer
  • 8. Data Silos on the Current Web API HTML HTML XML
  • 9. Some more problems with Web of Documents  Difficult to Integrate Data  Example Use Case: Making a Travel Plan  Data Integration by looking and typing  Slow Unproductive Workflow  Difficult for apps to make “sense” of HTML text
  • 10. Solutions  Use RDF to give some structure to the data  RDF <-> subject predicate object  RDF links things, not just documents, and they are typed
  • 11. RDF is a language (for data) Words URIsand literal text Nouns and Verbs Classes andProperties Sentence structure RDF Statements (triples) Paragraphs RDF Graphs Footnotes URIs[Domain Name Service] Dictionaries RDF Schemas • Generic grammar for languages of description • Functions as native language, second language, or pidgin.
  • 12. RDF and Ontology  The AAA Slogan: “Anyone can say Anything about Any topic.”  s po . (subject predicate object .)  <http://en.wikipedia.org/wiki/Tony_Benn><http:/ /purl.org/dc/elements/1.1/title> "Tony Benn” .  RDF is used to build ontologies; a formal representation of shared knowledge by a set of concepts within a domain and the relationships between them  Examples: Finance ontology; MusicBrainz, music ontology; GO, gene ontology, etc
  • 13. What is Linked Data?  Data, data, everywhere: We are surrounded by data: School performance, car fuel efficiency, etc  Data help us to make better decisions  You can discern the shape and structure of an entity by looking at the data it generates  Data shapes conversations and markets
  • 14. What is Linked Data?  Linked Data: Framework where data is a first class citizen on the Web  Evolving the current Web into a Global Data Space  TimBL: 4 principles of Linked Data  Use URIs as names for things, Use HTTP URIs, When someone looks up a URI, provide useful information, using the standards (RDF, etc), Include links to other URIs, so that they can discover more things
  • 15. The Web of Linked Data  Link everything. No silos. Thing Thing Thing Thing Thing Thing
  • 16. The Web of Linked Data  Linked Data (Semantic Web ) is a graph database:
  • 17. Linked Data  Advantage comes from linking the RDF(s) together. 17
  • 18. Some Linked Datastores  BBC  NY Times  Guardian  DBpedia  Geonames  … 18
  • 19. Linking (Linked) Open Data cloud linkeddata.org  Many of the datastores are being linked together to form a network/graph. 19
  • 20. Linked Data  In summary:  Linked Data provides: RDF  A standardized data access mechanism, HTTP  Hyperlink-based data discovery, using URIs  Self-descriptive data, through using shared vocabularies
  • 21. Government Linked Data  Explosion of Government (Linked) Open Data efforts and projects. data.gov, data.gov.uk, data.gov.au  Examples:
  • 22.
  • 23. Public Sector Datasets  Inherent value in opening up public government data  Systems and Services can be tailored to citizens’ priorities.  Likely questions citizens may need answers to are: – “Where can I find a good school, a good investment advisor, a good employer?” 23
  • 24. Public Sector Datasets (contd.)  Integration of datasets enables more complex questions to be asked and answered  Some examples: – http://www.planningalerts.com/ – http://ishortman.com/projects/expendituremap/  Governments freeing up their data.  Holy grail is information integration: Meshing. 24
  • 25. Issues we focus on  Findability of appropriate data sources  SEARCH: Look at the data sources  EXTRACT: Slicing of data sources  INTEGRATE: Unifying the views  EXPLORE: Answering the questions.
  • 26. Examples of Government Public Data (csv)
  • 27. Examples of Government Linked Data (rdf)
  • 28. Workflow Identify Dataset Design/ Select Vocabularies Extract and convert data into RDF Publish as Linked Data Consume Linked Data (Application) 28
  • 29. Publishing your data as Linked Data: Some Things to Consider  How do you choose a good URI to name things? There are guidelines for this. Examples:  http://dbpedia.org/resource/Wildlife_photography Tope Omitola @ Univ of Southampton:http://id.ecs.soton.ac.uk/person/24123 .  Describing a Data Set using: voiD (the Vocabulary of Interlinked Datasets)  Choosing and Using Vocabularies to Describe Data (SKOS, RDFS, OWL, scovo)  Sourcing datasets: Where do you get the datasets from (e.g. Semantic Web search engines, manual search, etc)  Choice of join points: When you have different datasets, where do you join them together  Data normalization: using RDF make things easier.  Alignment of datasets
  • 30. Architecture Infer new Data concepts and Integration relationships SPARQL RDF Gatherers Data and RDF Triplestore Sources Extractors (4store) Services 30
  • 31. Data Publication – Challenges and Solutions  Research Questions: – In our case, dealing with data that are centred around the United Kingdom’s democratic system, – Using geography data from the UK’s Ordnance Survey as the “join-point” with data for criminal statistics, Members of Parliament, mortality rates, etc.  Sourcing the datasets – Many government data sets are in pdf, html, or xls files, so automatic discovery methods are not possible (yet), – Went through manual discovery process, searching for them, – We found some in pdf, html, and in xls, – We decided against pdf and html 31
  • 32. Data Publication – Challenges and Solutions (contd.) – We went for data in xls format. Why? • Ability to source from a wider range of public sector domains. Data Source Format Dataset Publicwhip.org.uk HTML MP votes records, etc Theyworkforyou.com XML dump Parliament, Parliament expenses Homeoffice.gov.uk Excel Recorded crime (England, 2008/09) Statistics.gov.uk Excel Hospital Waiting List (England 2008/09) Performance.doh.gov.uk Excel Mortality rates (England 2008/09) Ordnancesurvey.co.uk Linked Data UK’s mapping agency 32
  • 33. Data Publication – Challenges and Solutions (contd.)  Data normalisation.  RDF as our standard model.  Data conversion to RDF. Python + Java.  Modelling the datasets: Multi-dimensional, used SCOVO. 33
  • 34. Data Publication – Challenges and Solutions (contd.) Crime dataset: Table 7.03 Recorded crime by offence group by police force area, English region and Wales, 2008/09 Recorded Numbers crime Police force area, English Total Violence Sexual Robbery Burglary Offences Other Fraud Criminal Drug Other region and Wales against offences against theft and damage offences offences 1 the vehicles offences forgery person Numbers Cleveland 55,094 10,662 566 404 6,175 5,224 13,697 905 13,746 2,636 1,079 Durham 45,074 7,435 476 170 6,226 4,940 9,674 835 13,027 1,327 964 Northumbria 105,234 19,147 989 732 11,418 11,620 24,042 2,909 27,178 5,166 2,033 North East Region 205,402 37,244 2,031 1,306 23,819 21,784 47,413 4,649 53,951 9,129 4,076 :TimePeriodrdf:typeowl:Class; rdfs:subClassOfscovo:Dimension. :TP2008_09 rdf:type :TimePeriod. :GeographicalRegionrdfs:subClassOfscovo:Dimension; dc:title "Police force area, English region and Wales". :CriminalOffenceTyperdf:typeowl:Class; rdfs:subClassOfscovo:Dimension. 34
  • 35. Some Issues in Linked Data  Co-referencing, i.e. different sources referring to the same entities by different names.  Cardiff in Dbpediahttp://dbpedia.org/resource/Cardiff or http://dbpedia.org/resource/Cardiff_City  Cardiff in Geonameshttp://sws.geonames.org/2172349/  Which Cardiff shall we use?  Solution: sameas service from Southampton 35
  • 36. 36
  • 38. Alignment of Datasets (contd.)  Asserted owl:sameAs relations between dataset geo and O.S. (using string matching)  For example, the English county of Cumbria was aligned as the following: <http://enakting.ecs.soton.ac.uk/statistics/data/Cumbria> http://www.w3.org/2002/07/owl#sameAs <http://data.ordnancesurvey.co.uk/id/7000000000024876>.  A few special cases. “Yorkshire and the Humber Region” vs “Yorkshire & the Humber”  NHS Trust were labelled differently: e.g. South Tyneside NHS Trust had no equivalence in the OS. So used Google Maps. 38
  • 39. Examples of Government Public Data (csv)
  • 40. Examples of Government Linked Data (rdf)
  • 41. Recap: Data Publication  Sourcing : Many not in RDF yet. Some in html, pdf, and xls. We chose xls.  Selection of RDF as the normal form.  Used scovo to model multidimensional data.  We used owl:sameAs to assert equivalences between geo regions.  We used string matching. Some did not work, e.g. Yorkshire and the Humber. Some have no equivalent OS entities, so we had to go via Google Maps API 41
  • 42. Consuming Linked Data  How do you visualize linked data sets.  Linked Data browsers, e.g. Disco, Tabulator.  Linked Data Search Engines, e.g. Sig.ma, Falcons, Sindice.  Domain-specific Applications and Mashups, e.g. dayta.me(from Southampton), US Global Foreign Aid Mashup.
  • 43. Data Consumption  Application acts as an aggregator of information based on user’s postal (zip) code.  Generates data views based on geographical region of postal code.  Shows political representatives (MPs) for constituencies, their voting records, and their expenses. 43
  • 45. Data Consumption(contd.)  Challenges: – The lack of UIs to quickly browse, search or visualise views on a widerange of differently modelled data, – Lack of suitable tools which allow efficient aggregation and presentation of datato the UI from multiple datasets, – Data consumers having partial knowledge of domain and finding it difficult to understand the domain and the data being modelled.Points out the need for a toolset that helps developers givebetter description of the domain being modelled. 45
  • 46. Recap: Publish and Consume  Information Integration; one of the holy grails  Problems with data sources. Different formats, etc,  RDF can act as a standard model.  Publication to RDF. Challenges. Solutions. – scovo for multi-dimensional data – string matching and its complexities  Consuming the data. Challenges. Solutions. – Aggregating data based on zip code – Complexities of geo boundaries  We have re-published the data we generated into the linked data cloud: EnAKTing datasets www.enakting.org/enakting/datasets 46
  • 47. Some of our Outputs  http://geoservice.psi.enakting.org: service to discover geographical resources,  http://map.psi.enakting.org/: integrate different PSI Linked Data sources by querying Backlinking service,  http://backlinks.psi.enakting.org: service to discover back- links in PSI,  http://void.rkbexplorer.com/: describes the contents of data sets, enabling discovery and reuse of resources,  http://bagatelles.ecs.soton.ac.uk/psi/: platform for integrating several PSI catalogues from the Web  http://4sreasoner.ecs.soton.ac.uk/ Scalable Reasoning in 4store; 4sr is a branch of4store where backward chained reasoning is implemented  http://apps.seme4.com/see-uk/ : Visualization tool for some UK data 47
  • 50. Findability of Appropriate Data Sources – Service Descriptions  How do you tell the world about your new linked data sets?  Provide good service descriptions of your data sets  Use vocabulary of Interlinked Datasets
  • 51. Vocabulary of Interlinked Datasets (VoID)  allows description of datasets and their interlinking, e.g. "there are 200k links of type gr: predicates between dataset X and dataset Y; and dataset Y mainly offers data about homes and X about mortgages” .  A dataset: a set of RDF triples published, maintained or aggregated by a single provider, and accessible on the Web, e.g. :DBpedia a void:Dataset .  allows the description of RDF links between datasets (using void:Linkset).
  • 52. Three Areas of voiD  General Metadata  Access Metadata  Structural Metadata
  • 53. voiD (contd.)  General metadata: the dataset's title, description, date of creation, the creator, publisher, licence, subject(s), etc; :DBpedia a void:Dataset; dcterms:title "DBPedia"; dcterms:description "RDF data extracted from Wikipedia"; dcterms:contributor :FU_Berlin; dcterms:modified "2008-11-17"^^xsd:datedcterms:contributor :OpenLink_Software.
  • 54. Access metadata: describes how the RDF data(set) can be accessed  using sparql e.g. :DBpedia a void:Dataset; void:sparqlEndpoint<http://dbpedia.org/sparql>.  using URI lookup, Sindice a void:Dataset ; void:uriLookupEndpoint<http://api.sindice.com/v2/ search?qt=term&q=> .  using rdf dumps, :NYTimes a void:Dataset; void:dataDump<http://data.nytimes.com/people.rdf>.
  • 55. Structural metadata describes the structure and schema of datasets  naming some representative example entites for a dataset  stating if datasets' entities share common URIs :DBpedia a void:Dataset; void:uriSpace "http://dbpedia.org/resource/” .  Stating the vocabularies used in a dataset :LiveJournal a void:Dataset; void:vocabulary<http://xmlns.com/foaf/0.1/>.  Providing statistics about datasets, e.g. expressing the number of RDF triples or the number of entities of a dataset. :DBpedia a void:Dataset; void:triples 1000000000 ; void:entities 3400000.
  • 56. Publishing voiD files  as void.ttl in the root directory of the site, with a local “hash URI” for the dataset, e.g. http://example.com/void.ttl#MyDataset.  Using the root URI of the site, such as http://example.com/, as the dataset URI, and serving both HTML and an RDF format via content negotiation from that URI.  Embedding the VoID description as HTML+RDFa into homepage of dataset, with a local “hash URI” for the dataset, yielding URI such as http://example.com/#MyDataset.
  • 57. Why is voiD useful -- voiD Discovery  By enabling the discovery and usage of linked datasets.  A sitemap such as http://www.yoursite.com/sitemap.xml references void.ttl, and sitemap.xml added robots.txt . A search engine crawls the website indexing void.ttl plus a cache of the rdf triples referenced in this void file.  through backlinks: <document.rdf>void:inDataset<void.ttl#MyDataset>.  Through a well-known URI: void.ttl can be placed in /.well-known/void on any Web server , e.g. http://www.example.com/.well-known/void .
  • 58. @prefix void: <http://rdfs.org/ns/void#> . @prefix scovo: <http://purl.org/NET/scovo#> . <http://crime.psi.enakting.org/id/void> a void:Dataset; foaf:homepage<http://crime.psi.enakting.org/>; rdfs:label "crime.psi.enakting.org Linked Data Repository"; dcterms:date "2010-09-13T11:30:29"^^xsd:date; dcterms:title "crime.psi.enakting.org Linked Data Repository"; foaf:nick "crime"; dcterms:description "United Kingdom's crime statistics per region for the year 2008/09, provided by the United Kingdom Home Office. Dataset provenance: http://www.homeoffice.gov.uk/rds/pdfs09/hosb1109chap7.xls"; dcterms:publisher<http://crime.psi.enakting.org>; void:statItem [ scovo:dimensionvoid:numberOfTriples; rdf:value 4988; rdfs:label "4,988 triples”; ]; void:subset [ a void:Linkset; rdfs:label "crime.psi.enakting.org CRS -> http://data.ordnancesurvey.co.uk/"; void:subjectsTarget<http://crime.psi.enakting.org/id/void>; void:objectsTarget<http://void.rkbexplorer.com/id/dataset/d1d473f29a9091069644824242e9ae07>; void:linkPredicatecoref:duplicate; void:statItem [ rdfs:label "133 URI equivalences"; rdf:value 133; scovo:dimensionvoid:numberOfTriples; ] ].
  • 59. Provenance and Trust in Linked Data  Whom do you trust on the Web?
  • 60. Provenance and Trust  Mash-ups, aggregation, integration, data re-use.  How do you elicit Reliability and Accuracy?  Generate trust by revealing as much information of you as possible.  Enables consumers to decide the quality and trustworthiness of your data.  Useful for Data Discovery/Mining + Query Planning.
  • 61. Different kinds of Provenance  When was x derived (when-provenance).  How was x derived (how-provenance).  What data was used to derive x (what- provenance).  Who carried out the transformation(s) from whence x came (who-provenance).
  • 62. Provenance Models for Linked Datasets  Provenance Vocabulary Ontology
  • 63. Provenance Models for Linked Datasets (contd) • Open Provenance Model
  • 64. Provenance Models for Linked Datasets (contd)  Provenance for Datasets (voidp)  http://www.enakting.org/provenance/voidp/
  • 65. voiD Provenance Extension voidp  Designed to be simple and lightweight.  Mainly for (RDF) data publishers.  Includes necessary information of the process, its inputs, and outputs.  Basis is simple: An agent runs a process on a data (or dataset) to get another data (or dataset).  Agent → Process → Data → Data’ .  @prefix voidp: <http://purl.org/void/provenance/ns> .
  • 66. voidp Classes and Predicates  voidp:ProvenanceEvent:items under provenance control.  voidp:actor: actor, person, group, software or physical artifact, involved in this provenance event.  voidp:certification:used to contain dataset’ signature elements  voidp:contact: contact details of whom to contact should people have queries about this dataset.  voidp:item:the provenance characteristics of a data item under provenance control.  voidp:processType: the type of transformation or conversion procedure carried out on the item’s source  voidp:resultingDataset: dataset that is the result of this provenance event.  voidp:sourceDataset: source dataset for the data item under provenance control.
  • 67. voidp: A Concrete Example @prefix voidp: <http://purl.org/void/provenance/ns/> . <http://crime.psi.enakting.org/id/void> a void:Dataset voidp:activity [ a voidp:Provenance; voidp:item [ foaf:name<http://crime.psi.enakting.org/stats/1898/2002/ds1>; rdf:typescovo:Dataset; rdfs:label "RECORDED CRIME STATISTICS 1898 - 2001/02"@en ; prv:createdBy [ rdf:typeprv:Actorprv:performedBy<http://tomitola> ; ]; voidp:originatingSource<http://rds.homeoffice.gov.uk/rds/pdfs07/recorded-crime-1898- 2002.xls> ; voidp:hashValue "12335353535"^^xsd:string ; voidp:processType<http://void.rkbexplorer.com/id/dataset/123456789> ; to:hasBeginning"2010-10-24T21:32:52"^^xsd:dateTime ; to:hasEnd "2010-10-25T09:32:00"^^xsd:dateTime ; ].
  • 68. voidp in the Wild  The Datalift project http://data.lirmm.fr/ontologies/vdpp  data.southampton.ac.uk http://graphite.ecs.soton.ac.uk/browser/?uri=http %3A%2F%2Fid.southampton.ac.uk%2Fdataset%2F jargon%2Flatest.rdf http://graphite.ecs.soton.ac.uk/browser/?uri=http %3A%2F%2Fdata.southampton.ac.uk%2Fdumps% 2Fjargon%2F2011-11-10%2Fjargon.rdf
  • 69. Conclusion  The Web of Data is real  The Web of Data is here  It’s time to get on board
  • 70. http://www.enakting.org/enakting/ t.omitola@ecs.soton.ac.uk  Slides at: http://www.slideshare.net/TopeOmitola/ omitola-birmingham-cityuniv  Questions? 70

Notes de l'éditeur

  1. http://www.planningalerts.com/: Email alerts of planning applications near a location. Data from screen scraping some local UK councils’ websites,http://ishortman.com/projects/expendituremap/:map of public expenditure data by UK region. Services such as defence, public order, science and technology, agriculture, and transport. Data based on normalised spreadsheet data from the UK’s Office for National Statistics Annual Abstract of Statistics.
  2. Common tasks involved in the publish linked data, following presentation will give a brief overview of the each stage.
  3. Linked data is mainly composed of its Publication, i.e. making your linked data available to the public, and Consumption, for others to consume and use it.
  4. Uses standard SW technologies (RDF, OWL, SPARQL)Uses Garlik JXT triplestore
  5. Be clear of the questions you are asking:
  6. Data normalisation. Data sources in different formats.RDF/Turtle for its compactness and clarity.Data conversion to RDF. We used python scripts and Java (Jena) to convert the files to RDF.Modelling the datasets:Much of the data were multi-dimensional, so we used SCOVO to model these.
  7. Modelling the Home Office datasets:Each row consists of Police Force data. Columns of each row contains crime values for offences such as “Violence against the person”, “Robbery”, “Offence Against vehicles”. We modelled the time period (2008/09), the geo regions, and the different crime types as “scovo:Dimension”.
  8. Very difficult to integrate data from disparate sources
  9. Asserted owl:sameAs relations between the geographic concepts of the datasets and the corresponding relevant entities in the O.S. Admin Geography (using string matching).
  10. Do psiusecase demo here.
  11. Demo of http://map.psi.enakting.org/
  12. DBpedia is an example of a dataset,