Six Myths about Ontologies: The Basics of Formal Ontology
Building DBpedia Japanese and Linked Data Cloud in Japanese
1. 2013 Linked Data in Practice Workshop (LDPW2013) , 30 November, 2013
Building DBpedia Japanese and
Linked Data Cloud in Japanese
Fumihiro Kato, Hideaki Takeda, Seiji Koide, Ikki Ohmukai
{fumi, takeda, koide, i2k}@nii.ac.jp
National Institute of Informatics (NII)
Research Organization of Information and Systems (ROIS)
Graduate University for Advanced Studies (Sokendai)
2. Two Driving Forces to push LOD in Japan
• LOD for ACademia (LODAC) Project since 2010
– A research project in ROIS and NII
– Research on Linked Data for research
• Linked Open Data Initiative Inc., (LODI) since 2012
– Non Profit Organization
– Promotion of LOD in Japan
– Collaboration with various stakeholders
• Government, Public sectors, companies
• Members of two forces are mostly overlapped
3. LODAC Location:
Integration of location information
LODAC Project
- connecting academic data LODAC SPECIES: Connecting species data by name
Specimen
DB
Species
Info. DB
App. for query expansion
DBPedia Japanese
Research
GBIF
Taxon
Name DB
DB
BioSci.
No. of Names:
113118
No. of Triples:14,532,449 DB
LODAC Museum: LOD of data in museums
Raw Data for entities Minimum Data to identify entities Data for entities
Raw
Data from Source A
Integrated data
Data from Source B
Work
dc:references
dc:references
crm:P55_has_current_location
crm:P55_has_current_location dc:creator
dc:creator
dc:creator
Museum crm:P55_has_current_location
dc:references
dc:references
Creator
dc:references
dc:references
CKAN Japanese:
Catalog for Open Data
4. LODAC Museum
• Integrated database for information on
museums in Japan
Type of Information
– Data
• No. of museums:114
• No. of triples:
40,059,131
RDF type
No. of items
Collections (total)
lodac:Specimen +
lodac:Work
ca. 1,770,000
Collections (specimen)
lodac:Specimen
ca. 1,690,000
Collections (creative and
historical work)
lodac:Work
ca. 130,000
Creators
foaf:Person
ca.
Institutes
Foaf:Organization
ca. 200,000
• Integration by creator, work and institute
• Data publication by RDF
• Some applications using the data
8,800
5. Use
Yokohama Art Spot
LODAC Museum × Yokohama Art LOD
– Application using
museum and local data
– Data related to art in
Yokohama
• Collections
• Events
• Q&A
http://lod.ac/apps/yas/
× PinQA
6. LODAC SPECIES: Linking Species
Information with names
Museum
Specimen
DB
Species
Info. DB
Research
DB
GBIF
Taxon Name
LOD
BioSci.
DB
No. of Species Names:113118
No. of Triples:14,532,449
9. Prospectus
• LOD is becoming an infrastructure of our society
– Similar to the impact to our society by Web
– LOD help maturity and diversity of our society
• We wish to diffuse LOD more in Japan !
– For Governments (Central and Local)
– For Companies
– For Citizens
• How?
– By Researchers, Engineers, Citizens together
10. Projects
• Platforms
– CKAN Japanese
– DBpedia Japanese
• Collaborative Projects
– with Ministry of Industry, Trade, and Economics (METI)
• Open Data METI
– with National Statistics Center
• Scheme Design for Area Code
– Collaboration with Sabae City
• e.g., “Sabae Burari”
• Promotional Events
11.
12. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
14. Motivation
• Data hub for Japanese resources
– To promote LOD in Japan
– To connect datasets in Japanese
• Two linguistic datasets
– DBpedia Japanese
– RDFized Japanese WordNet
15. DBpedia Japanese
• DBpedia i18n project
– 14 chapters
• generated from Japanese
Wikipedia dump files
– DIEF (DBpedia Information
Extraction Framework)
– ~80m triples
• Linking to
– Japanese WordNet
– Japanese Wikipedia Ontology
– other DBpedia chapters
• http://ja.dbpedia.org
16. i18n/l10n efforts
• IRI, IRI, IRI, ...
• Configurations for Extractors and Parsers
• DBpedia Mappings for each chapter
17. Extraction process
ref: D. Kontokostas et al. "Internationalization of Linked Data. The case of the Greek DBpedia edition."
Journal of Web Semantics: Science, Services and Agents on the World Wide Web, vol. 15, No.3, Sep. 2012, pp.51-61
18. DBpedia Information Extraction Framework
• Software to extract data from Wikipedia dump
– including custom extractors/parsers to apply
language specific configurations
• Extractors / Parsers
– DisambiguationExtractor
– HomepageExtractor
– ImageExtractor
– PersondataExtractor
29. Statistics for DBpedia Mappings
DBpedia Japanese
DBpeida (English)
rate of all templates in
Wikipedia are mapped
4.67% (81 of 1733)
6.33% (369 of 5,826)
rate of all properties in
Wikipedia are mapped
2.47% (1,581 of 62,679)
3.47% (6,169 of 177,599)
rate of all template
occurrences Wikipedia are
mapped
47.99% (286,858 of
597,696)
82.24% (2,435,773 of
2,728,357)
rate of all property
occurrences Wikipedia are
mapped
38.75% (3,128,208 of
8,071,982)
54.95% (27,283,343 of
49,654,072)
30. "Mapping Party"
• The mapping task is not easy
– Wikipedia Template
– DBpedia Ontology
– Well known vocabularies
• We held hands-on sessions
– Aug. 2012: 10 people
– Mar. 2013: 25 people
34. IRI issues
IRI
2. Input URIs
must be
decoded to IRIs
IRI to URI
3. Some
serializations can
not use IRIs
4. don't decode IRI
IRI
1. IRIs have to
be used properly
in queries
IRI
5. use the latest version
35. Query: Notable comics written by comics creators who have
received the Tezuka Osamu Cultural Prize
PREFIX dbp: <http://ja.dbpedia.org/resource/>
PREFIX dbp-owl: <http://dbpedia.org/ontology/>
SELECT ?creatorName ?comicName
WHERE {
?creator a dbp-owl:ComicsCreator ; dbp-owl:award dbp:手塚治虫文化賞 ;
dbp-owl:notableWork ?comic ; rdfs:label ?creatorName .
?comic a dbp-owl:Comics ; rdfs:label ?comicName .
}
dbp-owl:Comics
サイボーグ009
rdfs:label
rdf:type
dbp-owl:AdministrativeRegion
dbp:サイボーグ009
rdf:type
dbp-owl:
ComicsCreator
dbp-owl:notableWork
rdfs:label
dbp:宮城県
rdf:type
dbp-owl:birthPlace
dbp:石ノ森章太
郎
宮城県
rdf:type
foaf:Person
dbp-owl:leaderName
dbp-prop:生年
rdfs:label
dbp-owl:award
dbp:村井嘉浩
1938
石ノ森章太郎
dbp:手塚治虫
文化賞
36. Japanese Linked Data Cloud
• 21 datasets
• Criteria
– providing more than 1000
triples
– providing either
dereference, data dump or
SPARQL Endpoint
– including Japanese labels
– linking to other datasets in
LOD cloud or JLDC
• Open license is not
mandatory
38. Links to/from Japanese WordNet
links
WN nouns
DBpedia
IRIs
WN to
DBpedia
DBpedia to
WN
resources
33,017
65,788
1,456,158
50.1%
2.3%
properties
1,245
65,788
16,020
1.9%
7.8%
39. Ongoing Work
• More Wikipedia entries and infoboxes
– Wikipedia Town
• More DBpedia mappings
– Mapping Party
• Parsers for Japanese
– Japanese Calendar: 慶応3年1月2日 =>
"1868-01-02"^^xsd:date
40. Summary
• Linked Data in Japan is steadily expanding
– Started by the research project
– Now extended to various areas
• Creating a local chapter of DBpedia is a key to
promote Linked Data in the local language
– A hub in the local language
– People in any areas can find connections in
DBpedia with their data
• Promotion of open license is still in progress