Linked data paradigm has provided the potential for any data to link or to be linked with structural information, internally and externally. To improve on current cultural
service of the Union Catalog of Digital Archives Taiwan (catalog.digitalarchives.tw), a linked data prototype is developed and benefited by extending the Art & Architecture Thesaurus (AAT) for a machine-understandable catalog service.
However, knowledge engineering is time and labor consuming, especially for an archive that is non-western based in culture and multidisciplinary in natural. This
makes data semantics of the UCdaT are extremely challenged for mapping to international standards and vocabularies.
At this stage, the triple store is an experimental addition to the existing Union Catalog of Digital Archives Taiwan architecture, and provides semantic links to target collections for relative suggestions. This will guide us in creating a future technical architecture that is scalable to the whole archive level, compliant with learning by doing
guidelines, and preserves the data even that is difficult to be understood fully at present, but at least to be linked by others that may provide third-party’s understandings for their own reuse.
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
1. A Linked Data Prototype for The Union Catalog of Digital Archives Taiwan
Museum Computing: An Approach to Bridging Cultures, Communities and Science
The 21th PNC Annual Conference and Joint Meetings, October 21-23, 2014
National Palace Museum, Taipei, Taiwan
Keh-JiannChen, Tyng-Ruey Chuang,
Andrea Wei-Ching Huang, Chung-HsiHung, and Wan-Jung Shu
Institute of Information Science, Academia Sinica, Taipei, Taiwan
The corresponding author is Andrea Wei‐Ching Huang at {andreahg}@iis.sinica.edu.tw
2. Outline
1.Introduction & Motivation
2.Digital Archives Thesaurus (dat)
3.A Chinese Bottle in the Prototype
4.The dat Ontology & Prototype System
5.Conclusion & Future Works
6.Reference
3. Introduction / MotivationUnion Catalog of Digital Archives TaiwanWhy linked dataDatasets we use for the experimentDigital Archives Thesaurus (dat) Overview AAT hierarchy adaption Disambiguation skillSpeaker (1): Wan-Jung Shu dat
4. 1. Union Catalog of Digital Archives Taiwan:
Collections from more than 12 InstitutionsIntroduction & Motivation
1. Anthropology
2. Archaeology
3. Archeology
4. Archives
5. Biology
6. Chinese Artifacts
7. Chinese Paintings & Calligraphy
8. Full Text of Rare Chinese Books
9. Geology
10. Language
11. Map & Remote Sensing
12. Multimedia
13. News
14. Rare & Manuscript Collections
15. Research Reusing
16. Resource Integration for Applications
17. Stone Rubbing
17 Topic Subjects Metadata (DC 15 elements): 5,214,602 Image: 4,032,112 Audio & Video Media: 48,591
oAcademia Sinica
oAcademia Historica
oNational Museum of Natural Science
oNational Central Library
oNational Taiwan University
oNational Palace Museum
oTaiwan Historica
oNational Museum of History
oChinese Taipei Film Archive
oHakka Affairs Council
oNational Archives Administration
oCouncil of Indigenous Peoples
oOpen Requests for Proposals Projects
o…
o…
5. Catalogsin Web Context
• Need to be open.
• Need to be linkable.
• Needs to provide links.
• Must be part of the network.
• Can not be an end in itself.
• Allow for hackability. CommonsenseCataloging
•2014 Survey indicates: Over 36.6% of keywords in Google search results include Schema snippets.
•Pages using schema.org markupshave higher Google rankings.
•Library users visit daily, such as Google, Wikipedia and social networks. Modernize Catalogs
•Improve Visibility, Discoverability and Findability.
•Linking Outside the Catalog.
•Sharing of metadata.
•Move from Document-basedModel to Data-CentricDescription Model(ex. Marc-based to BIBFRAME). MARC MARC 21-BIBFRAMEFor Linking For Sharing For Finding
Introduction & Motivation: Why Linked (Open) Data ?
Reason 1: International Trends
6. Introduction & Motivation: Why Linked (Open) Data ?
Data Semantics
Thesaurus
Vocabulary
Ontology
What to wear is depending on what applications need. Old Data New MeaningNew Value
Reason 2: Sematic value added for data
7. Introduction & Motivation
1) For Digital Archives Thesaurus:
Chinese Artifacts : 32,044
Concepts : 1,667
2) For Linked Data Prototype:
5 sub categories of the Chinese Artifacts / 25 examples
No. of Concepts : 167
No. of Triples : 225
…
bamboo/wood lacquerware
ceramic artifacts
enamelware and glass artifacts
jade/stone artifacts
metal artifacts
Chinese Artifacts
Datasets we use for this prototype experiment
8. Digital Archives Thesaurus: overview
Chinese Art and Artifact Subsets : [concepts and guide terms : 3,088 ] / [terms : 4,538]
Digital Archive ThesaurusConcept N
Term 1Union Catalog Keyword dictionaryAAT hierarchy adaptionRelated terms of Chinese Artifacts
Term 2
Term 3
Term 4
Term 5Concept 2Concept 1
Term n
Union Catalog Keyword Dictionary
Over 100,000 keywords
Sourceof related terms:
Art dictionaries
Textbooks
Journal papers
9. Digital Archives Thesaurus: AAT hierarchy adaption
Chinese Art and Artifact Subsets : [concepts and guide terms : 3,088 ] / [terms : 4,538]
Contribution to AAT
Equivalence relation
AAT
dat
10. tagged term
Digital Archives Thesaurus: knowledge extraction form Chinese text
Digital Archive Thesaurus
CKIP segmentation process
銀鍍 金
纍絲
點翠
珠寶
花蝶
簪
term extraction
11. Digital Archives Thesaurus: concept-terms-objectConcept N
洋彩 瓷胎洋彩
tag n
瓶 bottle紙槌瓶 蕉葉紋 番蓮紋 開光
內填琺瑯 champlevé 如意雲紋 磁胎銅胎 錦地
12. Digital Archives Thesaurus: disambiguation
Disambiguation skills
Homograph distinguished byprefix
Subjectrestriction
DC elements restriction
Example of ambiguation
金in Chinese may represent
Metal (material)
Gold (material)
Golden (color)
JinDynasty (styles and periods )
13. Homograph distinguished byprefix
青花(blue white porcelain) as a type of object
青花(ching-whaglaze)as glazing material
青花with prefix character 以、用、由(means ‘use’) → use ching-hwaglaze Digital Archives Thesaurus: Disambiguation-Part I
14. Subjectrestriction
琉璃as a kind of glazed pottery tags in ‘Pottery’ category
琉璃as glass material tags in ‘Enamel and Glassware’ categoryDigital Archives Thesaurus: Disambiguation-Part II
15. DC element restriction
1.DC elementsused: title, type, date, subject, description
2.Example in title
object type term (簪=hair pin) must be at the last word of titleDigital Archives Thesaurus: Disambiguation-Part III
16. A Prototype System Framework overview How a Chinese Bottle is semantically represented in RDF triples? A beta dat ontology domain knowledge representation of the Chinese Artifacts descriptions for curation and publication the Artifacts about data reusing and the use of the R4R OntologySpeaker (2): Andrea Wei-Ching Huang
20. How is this Chinese Bottle semantically represented in RDF triples through our prototype?
21. The Prototype – I
Describing and
representing for publishing
the concept relations
between the Chinese
Artifacts of the Digital
Archive Taiwan and the
Digital Archives Thesaurus.
Union
Catalog
Metadata
Digital
Archives
Thesaurus
The dat beta ontology
Chinese Artifacts
Relational Database
Semantic Browsing
The dat concept making process
Chinese Knowledge and Information
Processing (CKIP)
Chinese Word Segmentation System
Segmented Keyword List
Keyword Extraction
Tag
Extensions
Binary Relation
Overview of a Linked Data Prototype System using dat (Digital Archives Thesaurus) & dat Ontology
22. Union Catalog Metadata
Chinese Knowledge and Information Processing (CKIP)
Chinese Word Segmentation System
Segmented Keyword List
Keyword Extraction
landscape
landscapeDigital Archives Thesaurus
Tag Extensions
Binary RelationBefore The Prototype
23. Union
Catalog
Metadata
Chinese Knowledge and Information
Processing (CKIP)
Chinese Word Segmentation System
Segmented Keyword List
Keyword Extraction
landscape
landscape
Digital
Archives
Thesaurus
Tag
Extensions
Binary Relation
landscape shanshui
Shanshui
The Prototype – II
Overview of a Linked Data Prototype System using dat (Digital Archives Thesaurus)
24. Describing and
representing for publishing
the concept relations
between the Chinese
Artifacts of the Digital
Archive Taiwan and the
Digital Archives Thesaurus.
Union
Catalog
Metadata
Digital
Archives
Thesaurus
The dat beta ontology
Chinese Artifacts
Relational Database
Semantic Browsing
Binary Relation
Every artifact item has been assigned a
dat URI.
The Prototype – III
Overview of a Linked Data Prototype System using the dat Ontology
25. ArtifactConcept (dat) Concept(aat) Tag
dat:artifactType
dat:componentForm
dat:decorationSubject
dat:describedSubject
dat:designElement
dct:created
dct:instructionalMethod
dct:medium
schema:color
dat:hasTag
black ovals are the main modeling resources
white ovals are resources defined by local class definitions
grey ovals are resources defined by external class definitions
dash lines indicate mapping relation tasks not completed
skos:narrowerThe Core Ontology: intellectual semantics of the Chinese ArtifactsThe dat ontology –I
26. dcat:DatasetArtifactObjectNameSourceUnionCatalog
rdfs:subClassOf
dct:title
schema:urldat:ProvenanceCuration & Publication: descriptions of the modelling objects
We use popular vocabularies such as DC terms and schema.org to relate Artifact to its preservation and technical descriptions. The dat ontology –II
27. dcat:DatasetArtifact
r4r:isPartOf
r4r:hasProvenancer4r:RRObject
rdfs:subClassOfReusing: descriptions of the modelling object to associated publications and policy used
Do not use common vocabularies to describe Artifact and Dataset relations because we wish to publish the dataset and to be reused by others.
In particular, we wish to publish an URI for this resource that can support dynamic contexts:
(1)Metadata: ready or not ready?
(2)Publish only or can be reused?
(3)Joint publications such as article, data and code. dat:ProvenanceThe dat ontology –III
28. dcat:DatasetArtifactConcept (dat) ObjectNameSourceConcept(aat) Tag
dat:artifactType
dat:componentForm
dat:decorationSubject
dat:describedSubject
dat:designElement
dct:created
dct:instructionalMethod
dct:medium
schema:color
r4r:isPartOfUnionCatalog
rdfs:subClassOfskos:Concept
dat:hasTag
dct:title
rdfs:subClassOf
schema:url
r4r:hasProvenancer4r:RRObject
rdfs:subClassOf
black ovals are the main modeling resources
white ovals are resources defined by local class definitions
grey ovals are resources defined by external class definitions
dash lines indicate mapping relation tasks not completed
Beta: An Ontology for Publishing Chinese Artifacts as Linked Data Using the Digital Archives Thesaurus (dat)
skos:narrower
skos:broader
skos:relatedpreservation & technical descriptions of modelling objectsdat:ProvenanceThe dat ontology
35. dcat:Dataset
16 other
catalogs
Concept
(domain local
Source ObjectName
Concept
(domain external)
Tag
r4r:isPartOf
UnionCatalog
rdfs:subClassOf
skos:Concept
dat:hasTag
dct:title
rdfs:subClassOf
schema:url
r4r:hasProvenance
r4r:RRObject
rdfs:subClassOf
skos:narrower skos:broader
skos:related
Future Works - III
Wikipedia
cross-domain
dat
thesaurus
dat:Provenance
36. Reference
Article:
Bizer, Christian, and Richard Cyganiak. "D2r server-publishing relational databases on the semantic web."Poster at the 5th International Semantic Web Conference. 2006.
Bizer, Chris, Richard Cyganiak, and Tom Heath. "How to publish linked data on the web." (2007).
Huang, Andrea Wei-Ching and Tyng-Ruey Chuang, “Relations for Reusing (R4R) in a Shared Context: An Exploration on Research Publications and Cultural Objects”, Proc. of the 4th International Workshop on Semantic Digital Archives (SDA), in conjunction with International Digital Libraries Conference (DL2014), London, 8th-12th September 2014.
Malmsten, Martin. "Making a library catalogue part of the semantic web."UniversitätsverlagGöttingen(2008): 146.
OCLC Linked Data, http://oclc.org/developer/develop/linked-data.en.html
LC Linked Data Service: Authorities and Vocabularies, http://id.loc.gov/
Code:
d2R, Database to RDF mapping engine and SPARQL server http://d2rq.org/, https://github.com/d2rq/d2rq
Huang, Andrea Wei-Ching and Tyng-Ruey Chuang, Relations for Reusing (R4R) Ontology, http://guava.iis.sinica.edu.tw/r4r
Huang, Andrea Wei-Ching, Chung-HsiHung, and Wan-Jung Shu, Keh-JiannChen and Tyng-Ruey Chuang, Beta: An Ontology for Publishing Chinese Artifacts as Linked Data Using the Digital Archives Thesaurus (dat), http://dat.digitalarchives.tw/ontology/