The document discusses the use of linked open data for academia. It introduces linked data and its key principles of using URIs to identify objects and including links between data from different sources. This allows data to be interconnected in a web of data rather than separate silos. Examples are given of applying these principles through projects like Bio2RDF that link life sciences data and LODAC that link academic data about species, museums and locations. Benefits include decentralized data sharing and integration across domains. Requirements for research data are that it be accessible, reusable and sustainable.
Boost PC performance: How more available memory can improve productivity
LOD for Academia: Connecting Research Data
1. Linked Open Data for ACademia
Introduction of Linked Data
for Science
Hideaki Takeda
takeda@nii.ac.jp / ORCID:0000-0002-2909-7163
Professor, National Institute of Informatics
2013 International Conference on Open Data in Biodiversity and Ecological Research, 20 November, 2013
2. Linked Open Data for ACademia
Researchers in 1983
Survey, Research, and Writing
Printed Articles
Survey
Article Writing
Data
Data
Real World
Object
3. Linked Open Data for ACademia
Researchers in 2013distribution of articles
Digital
More articles ever!
Sharing and re-use of data
Digital Articles
Printed Articles
Real and Digital objects as target
Survey
Article Writing
Digital Information
Data
Acquiring Data
Publishing Data
Data
Real World
Object
4. Linked Open Data for ACademia
Trends of Research and Data
• Rapid Growth
– Increase of article publications
– Big data and many (small) databases
• Open and Share
– Open access
– Data sharing
• Integration
– Among different types of data
– Across domains
5. Linked Open Data for ACademia
Key Requirements
• Accessibility
– Research results must be shared
• Reusability
– Research results are expected to be re-used by
other research
• Sustainability
– Research results must be preserved
6. Linked Open Data for ACademia
Key Requirements
• Accessibility
– Research results must be shared
• Reusability
– Research results are expected to be re-used by
other research
• Sustainability
– Research results must be preserved
7. Linked Open Data for ACademia
Open Data
• Open Data is not just “data which is
open”, rather …
• “A piece of data or content is open if anyone is
free to use, reuse, and redistribute it —
subject only, at most, to the requirement to
attribute and/or share-alike.” http://opendefinition.org/
• Use, re-use, redistribute
• Open license
8. Linked Open Data for ACademia
5 ★ Open Data
- link your data to
other data to
provide context
- use URIs to denote things, so that
people can point at your stuff
- use non-proprietary formats (e.g., CSV
instead of Excel)
- make it available as structured data (e.g., Excel instead
of image scan of a table)
- make your stuff available on the Web (whatever format)
under an open license
http://5stardata.info/
9. Linked Open Data for ACademia
Linked Data/Linked Open Data (LOD)
- link your data to
other data to
provide context
- use URIs to denote things, so that
people can point at your stuff
11. Linked Open Data for ACademia
Web of Data
Another data to
the observation
Data identical
to this
What’s the
meaning of
the data?
Inter-connection between data in difference
data sources is enabled
12. Linked Open Data for ACademia
Linked Data Principles
• The four rules for Linked Data
– Use URIs as names for things
• Give a URI to every object in the world!
– Use HTTP URIs so that people can look up those names.
• Don’t use URN
– When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL)
• Provide machine-readable data for URI
– Include links to other URIs. so that they can discover more things.
• Make data linked together just like Web
Linked Data, TBL, http://www.w3.org/DesignIssues/LinkedData.html
13. Linked Open Data for ACademia
How to express data in Linked Data
• Use RDF(+RDFS, OWL)
– Very simple:<Subject> <predicate> <object> .
<http://www-kasm.nii.ac.jp/~takeda#me> rdfs:type foaf:Person .
<http://www-kasm.nii.ac.jp/~takeda#me> foaf:name “Hideaki Takeda” .
<http://www-kasm.nii.ac.jp/~takeda#me> foaf:gender “male” .
<http://www-kasm.nii.ac.jp/~takeda#me> foaf:knows
<http://southampton.rkbexplorer.com/id/person07113> .
foaf:Person
rdfs:type
http://www-kasm.nii.ac.jp/
~takeda#me
foaf:knows
foaf:name
“Hideaki Takeda”
foaf:gender
“male”
http://southampton.rkbexplorer.com
/id/person07113
14. Linked Open Data for ACademia
Linked Dataの記述
foaf:Person
rdfs:type
http://www-kasm.nii.ac.jp/
~takeda#me
foaf:knows
foaf:name
foaf:gender
“Hideaki Takeda”
“male”
http://southampton.rkbexplorer.com/
id/person-07113
owl:sameAs
dbpprop:occupation
dbpedia:Computer_scientist
<http://dbpedia.org/resource/Tim_Berners-Lee>
dbpprop:name
“Sir Tim Berners-Lee”
dbpprop:birthPlace
“London, England”
dbpprop:birthDate
“1955-06-08”
15. Linked Open Data for ACademia
Linking Open Data (LOD)
•
•
•
•
•
The project to collect published Linked Data
Major Linked Data
(Translated from the original resources)
– Dbpedia (Wikipedia) 270 Million Triples
– Geonames:Geo names and their latitudes and longitudes, 93 Million
Triples
– MusicBrainz:Music
– WordNet:Dictionary
– DBLP bibliography:Bibliography for technical papers. 28 Million Triples
– US Census Data: 1 Billion Triples
(Crawling)
– FOAF (Friend Of A Friend)
(Wrapper)
– Flickr Wrapper
19. Linked Open Data for ACademia
Benefits of LOD for Science
• Truly de-centralized database
– No need for central database
– Everyone can create one and join the cloud!
• Truly open and sharable data and schemata
– Easy for re-use and mash-up
– Easy for cross-domain/discipline use and connection
• A single format for all kind of data
– Easy for data processing
20. Linked Open Data for ACademia
Bio2RDF
At the heart of Linked Data for the Life Sciences
• Bio2RDF is an open source framework to produce
and provide biological linked data that uses
simple conventions on the emerging semantic
web
• Bio2RDF reduces the time and
effort involved in data
integration so that you can get
to doing science
• 19 datasets;
1,010,758,291 triples
http://bio2rdf.org/
21. Linked Open Data for ACademia
Alison Callahan, José Cruz-Toledo, Peter Ansell, Michel Dumontier: Bio2RDF Release 2: Improved Coverage, Interoperability
and Provenance of Life Science Linked Data, The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science
Volume 7882, 2013, pp 200-212
23. Linked Open Data for ACademia
LODAC Location:
Integration of location information
LODAC Project
- connecting academic data LODAC SPECIES: Connecting species data by name
Specimen
DB
Species
Info. DB
App. for query expansion
DBPedia Japanese
Research
GBIF
Taxon
Name DB
DB
BioSci.
No. of Names:
113118
No. of Triples:14,532,449 DB
LODAC Museum: LOD of data in museums
Raw Data for entities Minimum Data to identify entities Data for entities
Raw
Data from Source A
Integrated data
Data from Source B
Work
dc:references
dc:references
crm:P55_has_current_location
crm:P55_has_current_location dc:creator
dc:creator
dc:creator
Museum crm:P55_has_current_location
dc:references
dc:references
Creator
dc:references
dc:references
CKAN Japanese:
Catalog for Open Data
24. Linked Open Data for ACademia
LODAC SPECIES: Linking Species
Information with names
Museum
Specimen
DB
Species
Info. DB
Research
DB
GBIF
Taxon Name
LOD
BioSci.
DB
No. of Species Names:113118
No. of Triples:14,532,449
25. Linked Open Data for ACademia
Data model for intergration
TaxonName
rdfs:subClassOf
rdfs:subClassOf
CommonName
rdf:type
ScientificName
rdf:type
TaxonRank
rdf:type
rdf:type
rdf:type
hasTaxonRank
hasCommonName
hasScientificName
hasSuperTaxon
species
species
Butterfly
hasTaxonRank
BDLS
collectedDate
dcterms:source
crm:has_current_location
collectionLocality
institutionName
dcterms:publisher
rdf:type
Specimen
: owl:Class
: Named Graph
Bryophytes
26. Linked Open Data for ACademia
Search application
with LODAC SPECIES
http://lod.ac/apps/lsdcs
27. Linked Open Data for ACademia
LODAC Museum
• Integrated database for information on
museums in Japan
Type of Information
– Data
• No. of museums:114
• No. of triples:
40,059,131
RDF type
No. of items
Collections (total)
lodac:Specimen +
lodac:Work
ca. 1,770,000
Collections (specimen)
lodac:Specimen
ca. 1,690,000
Collections (creative and
historical work)
lodac:Work
ca. 130,000
Creators
foaf:Person
ca.
Institutes
Foaf:Organization
ca. 200,000
• Integration by creator, work and institute
• Data publication by RDF
• Some applications using the data
8,800
28. Linked Open Data for ACademia
Integrated data processing by RDF
Collect
Refine
Integrate
Publish
Use
Processed by RDF
•
•
•
•
•
Collect:RDF by converting RDB / by scraping Web
Refine: Define schema and covert data by schema
Integrate: Schema mapping, ID mapping
Publish: Dump data / SPARQL Endpoint
Use: Mash-up applications
29. Linked Open Data for ACademia
Collect
Extracting collection data from
museum websites
Extract
Property
Value
Property
Value
30. Linked Open Data for ACademia
Dataset
Collect
Type
Art work
(lodac:Work)
No.
Data source
ca.80,000 Catalog of the collections of 3 National Art
Museum (25,180), National Museum of
Western Art (4,373), Tokushima Pref. Art
Museum (18,482) … over 100 museums
Database for National Treasure & Important
Cultural Property of National Designated (915)
The Japanese Art Thesaurus (266)
Specimen
(lodac:Speciment)
Person (foaf:Person)
Facilities (icls.
Museum)
ca.1,690,000 (100+ Museum collections)
Science Net (National Science Museum)
ca. 8,800 The Japanese Art Thesaurus
ca. 200,000 The Japanese Art Thesaurus
Cultural Heritage Online
GIS data National and Regional Planning
Bureau
31. Linked Open Data for ACademia
Refine
Standardization of data
Re-organized common metadata.
dc:title
crm:P45_consistOf
skos:preflabel
Raw Data
....
lodac:era
Re-organized Metadata
Current organized policies
・Use existing metadata
・Define own metadata.
31
32. Linked Open Data for ACademia
Refine
Metadata schema for works
lodac:Work
Genre
Type of cultural assets
Creator
Nationality
Title
Title Pronunciation (yomi)
Title in English
Inscription
Seal
No. of parts
Collection
Created year
Estimated starting year
Material
Property
lodac:genre
lodac:culturalAssets
dc:creator / dc11:creator
crm:P7_took_place_at
dc:title / skos:prefLabel
dc:title @ja-hrkt / skos:altLabel
dc:title @en / skos:altLabel
crm:P62I_is_depicted_by
crm:P65_shows_visual_item
crm:P57_has_number_of_parts
dc:isPartOf
dc:created
lodac:estimatedStartYear
dc:medium / crm:P45_consists_of
33. Linked Open Data for ACademia
Integrating Data
Integrate
Raw Data for entities
Minimum Data to identify entities
Raw Data for entities
Integrated data
Data from Source B
Data from Source A
Work
dc:references
dc:references
crm:P55_has_current_location
crm:P55_has_current_location
dc:creator
dc:creator
dc:creator
crm:P55_has_current_location
Museum
dc:references
dc:references
Creator
dc:references
dc:references
34. Linked Open Data for ACademia
Integrate
Integrate Item
Integrating Data
Source
A.Japanese Art Thesaurus
Amount
of Data
648
Facilities
77
B.Cultural Heritage Online
Title of important
cultural properties
Creator information
and Work Title
Integration
Data
A.Japanese Art Thesaurus (Art work)
915
3,800
74
B.DB for National Treasure (Art work)
10,115
A.Japanese Art Thesaurus (Creator)
1,332
15,020
B.All of art work (Work title string)
61,861
A.Japanese Art Thesaurus (Creator)
1,332
Creator name
615
B.All of art work title(using creator name)
61,861
34
35. Linked Open Data for ACademia
Publish
Publishing data as RDF
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a
c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:skos="http://www.w3.org
/2004/02/skos/core#">
<foaf:Person rdf:about="http://lod.ac/id/359">
<lodac:creates rdf:resource="http://lod.ac/id/20029"/>
ID-resource URI
(Own address)
http://lod.ac/id/359
Links to her/his
work URI
<lodac:creates rdf:resource="http://lod.ac/id/20128"/>
<lodac:creates rdf:resource="http://lod.ac/id/20755"/>
External link
DBpedia Japanese
<lodac:creates rdf:resource="http://lod.ac/id/24768"/>
<lodac:creates rdf:resource="http://lod.ac/id/26732"/>
……
<dc:references rdf:resource="http://ja.dbpedia.org/resource/下村観山"/>
<dc:references rdf:resource="http://lod.ac/ref/359"/>
<rdfs:label xml:lang="ja">下村観山</rdfs:label>
<skos:prefLabel xml:lang="ja">下村観山</skos:prefLabel>
<foaf:name xml:lang="ja">下村観山</foaf:name>
</foaf:Person>
Ref-resource URI
http://lod.ac/ref/359
36. Linked Open Data for ACademia
Use
Yokohama Art Spot
LODAC Museum × Yokohama Art LOD
– Application using
museum and local data
– Data related to art in
Yokohama
• Collections
• Events
• Q&A
http://lod.ac/apps/yas/
× PinQA
37. Linked Open Data for ACademia
System Architecture
Use
‣ Python + SPARQLWrapper
‣ Geolocation
Yokohama
Art LOD
PinQA
Question
User
JSON
SPARQL
Yokohama Art Spot
LODAC
Museum
Work
Event
Answer
Artist
Institution
Artist
Institution
38. Linked Open Data for ACademia
Conclusion
• Data and Web
– Great Potential!
• Linked Data - Exploit the power of Web –
– Simple Structure: URI and RDF
– Truly distributed data management
– Easy to link to each other
– Suitable for inter-disciplinary areas
• Left Issues
– Scalability
– Sustainability
• DOI: DataCite
• ORCID