The document describes a case study of developing a Linked Open Data application called LODAC Museum. It discusses how museum data from over 100 museums in Japan was gathered, standardized, and integrated to create linked data. It describes challenges around data licensing and matching entities. The resulting LOD is published as RDF and consumed through applications like a photo browsing app and an art event finder for Yokohama.
1. 2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA
IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea
LOD Application Exemplar
- A case study: LODAC Museum
Hideaki Takeda
Fumi Kato
National Institute of Informatics
takeda@nii.ac.jp
Hideaki Takeda, Fumi Kato / National Institute of Informatics
2. Aim of this talk
• How to plan, design, and implement LOD?
• Learn from the case
Hideaki Takeda, Fumi Kato / National Institute of Informatics
3. LODAC Project http://lod.ac/
• Open Social Semantic Web Platform for Academic
Resources
– Providing platforms for Linked Open Data
– Practicing data accumulation and publishing
• Interested Areas
– Museum information
– Geographical information, especially geographical names
– Local information
– Taxonomic information on species
– …
Hideaki Takeda, Fumi Kato / National Institute of Informatics
4. Linked Open Data Initiative
• Non Profit Organization
– (Under application for approval)
• Academia + IT People + local people
• Aim: facilitate LOD activities among local
people
http://linkedopendata.jp/
Hideaki Takeda, Fumi Kato / National Institute of Informatics
5. Museum data as LOD
• The state-of-the-art of museum information in
Japan (nearly 6,000 museums in Japan)
– Distributed
• Self maintained
• Isolated
– Opaque
• Self designed
• Messy
• Aggregating and associating museum information
– LODAC-Museum
Hideaki Takeda, Fumi Kato / National Institute of Informatics
6. LODAC Museum – Main work
• Gathering of data
– Thesaurus, museum collections, etc
• Standardization of data
– Representing data from different sources in a
unique form
• Integration of data
– Identifying data
– Associating the same data
• Consuming of data
Hideaki Takeda, Fumi Kato / National Institute of Informatics
7. LODAC Museum Architecture
Consuming of data
Integration of data
SPARQL Import
unicorn(ruby) OWLIM
RDF
SPARQL Standardization of data
Map to RDF
nginx
Gathering of data
thttpd(python)
Crawl / Scrape
ID Management
(MySQL) Ex tracted data
(JSON)
Museum Websites
Semantic MediaWiki
Hideaki Takeda, Fumi Kato / National Institute of Informatics
8. Gathering data
• No museums publish data as LOD!
• We use data published as Web pages
– Scrape and translate data
– License is not clear
• It is a serous problem
• We need permission from every site in principle
• We got permission from some data publishers not all
Hideaki Takeda, Fumi Kato / National Institute of Informatics
9. Gathering data
• No museums publish data as LOD!
• We use data published as Web pages
– Scrape and translate data
– License is not clear
• It is a serous problem
• We need permission from every site in principle
• We got permission from some data publishers not all
Hideaki Takeda, Fumi Kato / National Institute of Informatics
10. Dataset
Type No. Data source
Art work ca.80,000 Catalog of the collections of 3 National Art
(lodac:Work) Museum (25,180), National Museum of
Western Art (4,373), Tokushima Pref. Art
Museum (18,482) … over 100 museums
Database for National Treasure & Important
Cultural Property of National Designated (915)
The Japanese Art Thesaurus (266)
Specimen ca.1,690,000 (100+ Museum collections)
(lodac:Speciment) Science Net (National Science Museum)
Person (foaf:Person) ca. 8,800 The Japanese Art Thesaurus
Facilities (icls. ca. 200,000 The Japanese Art Thesaurus
Museum) Cultural Heritage Online
GIS data National and Regional Planning
Bureau
Hideaki Takeda, Fumi Kato / National Institute of Informatics
11. Extracting collection data from
museum websites
Extract
Hideaki Takeda, Fumi Kato / National Institute of Informatics
12. Extracting collection data from
museum websites
Extract
Property Value
Property Value
Hideaki Takeda, Fumi Kato / National Institute of Informatics
13. Standardization of data
Re-organized common metadata.
dc:title
crm:P45_consistOf
skos:preflabel
Raw Data .... lodac:era
Re-organized Metadata
Current organized policies
・Use existing metadata
・Define own metadata.
13
Hideaki Takeda, Fumi Kato / National Institute of Informatics
14. Namespaces
Prefix Metadata Name
crm CIDOC-CRM
dc11 Dublin Core 1.1
dc DCMI Terms
skos Simple Knowledge Organization System
rdfs Resource Description Frame Work Schema
foaf Friend of a Friend
rda2 Resource Description and Access
lodac LODAC Project
14
Hideaki Takeda, Fumi Kato / National Institute of Informatics
15. Metadata schema for works
lodac:Work Property
Genre lodac:genre
Type of cultural assets lodac:culturalAssets
Creator dc:creator / dc11:creator
Nationality crm:P7_took_place_at
Title dc:title / skos:prefLabel
Title Pronunciation (yomi) dc:title @ja-hrkt / skos:altLabel
Title in English dc:title @en / skos:altLabel
Inscription crm:P62I_is_depicted_by
Seal crm:P65_shows_visual_item
No. of parts crm:P57_has_number_of_parts
Collection dc:isPartOf
Created year dc:created
Estimated starting year lodac:estimatedStartYear
Material dc:medium / crm:P45_consists_of
Hideaki Takeda, Fumi Kato / National Institute of Informatics
16. Integrating Data
• How to integrate data from different sources
– sharing of responsibility
• Each source is responsible for its data
– Identifying IDs for data and managing data with the IDs
• LODAC is only responsible for integration
– Assigning original IDs and associating other IDs to them
dc:references dc:references
(Ref-resource) (ID-resource) (Ref-resource)
Creator’s reference Creator’s information Creator’s reference
Hideaki Takeda, Fumi Kato / National Institute of Informatics
17. Integrating Data
Raw Data for entities Minimum Data to identify entities Raw Data for entities
Data from Source A Integrated data Data from Source B
Work
dc:references dc:references
crm:P55_has_current_location
crm:P55_has_current_location dc:creator
dc:creator
dc:creator Museum crm:P55_has_current_location
dc:references dc:references
Creator
dc:references dc:references
Hideaki Takeda, Fumi Kato / National Institute of Informatics
18. Integration of Person Data
• Matching of Creators
– Base: List of Artists from Thesaurus of Japanese Art
– Target: Creators of collection in museums + Dbpedia
– Method: String match of names
– Results: Links from artist nodes to work nodes are added
LODAC data Links
Link to Work
DBpedia
Basic Information
for Creators
Hideaki Takeda, Fumi Kato / National Institute of Informatics
19. Integrating Data
Amount Integration
Integrate Item Source of Data Data
A.Japanese Art Thesaurus 648
Facilities 77
B.Cultural Heritage Online 915
A.Japanese Art Thesaurus (Art work) 3,800
Title of important
74
cultural properties
B.DB for National Treasure (Art work) 10,115
A.Japanese Art Thesaurus (Creator) 1,332
Creator information
and Work Title
15,020
B.All of art work (Work title string) 61,861
A.Japanese Art Thesaurus (Creator) 1,332
Creator name 615
B.All of art work title(using creator name) 61,861
19
Hideaki Takeda, Fumi Kato / National Institute of Informatics
20. Publishing data as RDF
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a
c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:skos="http://www.w3.org
/2004/02/skos/core#"> ID-resource URI
<foaf:Person rdf:about="http://lod.ac/id/359"> (Own address)
<lodac:creates rdf:resource="http://lod.ac/id/20029"/> http://lod.ac/id/359
<lodac:creates rdf:resource="http://lod.ac/id/20128"/> Links to her/his work
<lodac:creates rdf:resource="http://lod.ac/id/20755"/> URI
External link <lodac:creates rdf:resource="http://lod.ac/id/24768"/>
DBpedia Japanese <lodac:creates rdf:resource="http://lod.ac/id/26732"/>
……
<dc:references rdf:resource="http://ja.dbpedia.org/resource/下村観山"/>
<dc:references rdf:resource="http://lod.ac/ref/359"/>
<rdfs:label xml:lang="ja">下村観山</rdfs:label>
<skos:prefLabel xml:lang="ja">下村観山</skos:prefLabel>
<foaf:name xml:lang="ja">下村観山</foaf:name>
Ref-resource URI
</foaf:Person>
http://lod.ac/ref/359
20
Hideaki Takeda, Fumi Kato / National Institute of Informatics
21. LODAC Museum Architecture
Consuming of data
Integration of data
SPARQL Import
unicorn(ruby) OWLIM
RDF
SPARQL Standardization of data
Map to RDF
nginx
Gathering of data
thttpd(python)
Crawl / Scrape
ID Management
(MySQL) Ex tracted data
(JSON)
Museum Websites
Semantic MediaWiki
Hideaki Takeda, Fumi Kato / National Institute of Informatics
22. LODAC Applications
• Photo BURARI Pro
• Yokohama Art Spot
• Go2Museum
• http://lod.ac/apps
Hideaki Takeda, Fumi Kato / National Institute of Informatics
23. Photo BURARI Pro
(C)ATR-Promotions,Inc
Photo App with SPARQL
23
Hideaki Takeda, Fumi Kato / National Institute of Informatics
24. Photo BURARI Pro
(C)ATR-Promotions,Inc
• SPARQL Endpoints
– DBpedia
– Linked Geo Data
– LODAC
• Other data source
– Sinsai.info
• Using JSON Result
– JSON Framework for
Objective C
Hideaki Takeda, Fumi Kato / National Institute of Informatics
26. Yokohama Art Spot
LODAC Museum × Yokohama Art LOD × PinQA
– Application using
museum and local data
– Data related to art in
Yokohama
• Collections
• Events
• Q&A
http://lod.ac/apps/yas/
Hideaki Takeda, Fumi Kato / National Institute of Informatics
27. System Architecture
‣ Python + SPARQLWrapper
‣ Geolocation
Yokohama Art Spot
SPARQL
Yokohama JSON
PinQA LODAC
Art LOD Museum
Question Event Work
User Answer Artist Institution Artist Institution
Hideaki Takeda, Fumi Kato / National Institute of Informatics
31. Go2Museum
http://160.193.95.58/~ueda/go2museum/
Hideaki Takeda, Fumi Kato / National Institute of Informatics
32. iPhone Android
Hideaki Takeda, Fumi Kato / National Institute of Informatics
33. Museum data from various web sites
NDL CiNii
Search
Search
Search
LODAC LODAC
Google Museum Location Yahoo!
Web/Map/Route Location
Link Link
Link
Hideaki Takeda, Fumi Kato / National Institute of Informatics
34. Twitter: @go2museum
• “Today’s museum”
• Recommendation based on lat&long of tweets
Hideaki Takeda, Fumi Kato / National Institute of Informatics
35. Summary
• A life cycle of data is described
– Scraping, standardizing, integrating, and
publishing
• Important issues
– Recognizing data
– Designing schema
• Good for data
• Good for RDF Store and SPARQL
– Developing applications
• More people can be involved
• Next cycle of data
Hideaki Takeda, Fumi Kato / National Institute of Informatics