SlideShare a Scribd company logo
1 of 29
Union Catalog and Knowledge
Engineering for TELDAP
Keh-Jiann Chen
Principal Investigator
Core Platforms for Digital Contents Project, TELDAP
Research Fellow
Research Center for Information Technology Innovation &
Institute of Information Science, Academia Sinica
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
 Introduction
The integration and management of digital
contents has become an important issue as
the amount of digital contents produced from
different projects and institutions increases
rapidly.
The goal of our project is to achieve
optimized preservation, retrieval, and
presentation of digital collections.
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
What is the union catalog ?
• It is a catalog and portal for all digital collections of
TELDAP.
• It is an integrated platform for browsing and searching
entire digital contents of TELDAP.
• Metadata provides core descriptions and licensing
information of each digital collection.
Browsing by topics
Search by keywords
Home Page of Union Catalog
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
 Metadata models for different
types of objects
Archived digital items
• Union catalog metadata model- Dublin core+
Web sites
• DCCAP (Dublin Core Collections Application Profile)
• Fields for internal used only
― Unique Identifier, Format, Evaluation, Cataloging History
Documents
• Document metadata-Dublin core
9
Metadata for
digital items :
Over 3 million
digital items and
still increasing
Element Definition
Title A name given to the resource
Creator An entity primarily responsible for making the
content of the resource
Subject and Keywords The topic of the content of the resource
Description An account of the content of the resource
Publisher An entity responsible for making the resource
available
Contributor An entity responsible for making contributions to the
content of the resource
Date A date associated with an event in the life cycle of
the resource
Resource Type The nature or genre of the content of the resource
Format The physical or digital manifestation of the resource
Resource Identifier An unambiguous reference to the resource within a
given context
Source A Reference to a resource from which the present
resource is derived
Language A language of the intellectual content of the
resource
Relation A reference to a related resource
Coverage The extent or scope of the content of the resource
Rights Management Information about rights held in and over the
resource
10
Metadata for websites
Over 500 websites and still increasing
Metadata
• DCCAP (Dublin Core Collections Application
Profile)
• Total of 19 data fields
The Website Homepage Picture
URL, Project Information
Type, Name, Author, Subject,
Description, Language,
Item Type, Target
Archived Information:
URL, time, authorization
Copyright, Purpose, Other Information
Figure: http://digitalarchives.tw
Metadata for
websites
Dynamic categorization
• User-oriented categorization
– General, elementary school students, high school
students, researchers, …etc.
• Topical-based categorization
– Archaeology, painting, animal, plant, document, …
etc.
• Functional-based categorization
– Research, education, business, technology,…
• Categorization based on institutions
– Academia Sinica, Taiwan U., Palace museum,…
Purpose: Education
Target: Elementary school student,
Junior high school student,
Teacher…
Purpose: Creative applications
Purpose: Academic research
Subject: Animal, Archaeology,
Anthropology…
Figure: http://digitalarchives.tw
Digitalarchives.tw
Metadata for project documents
Over 14,000 documents and still increasing
Metadata- Dublin core
Construct Teldapwiki- A Wikipedia for
TELDAP http://wiki.teldap.tw/
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
Plans of making knowledge
structures for TELDAP
• Construct metadata models for different objects.
• Establish hyperlinks between contexts and
objects.
– Develop keyword extraction tools.
– Design automatic hyperlink tagging tools.
• Construct TELDAP ontology and thesaurus.
– Art & Architecture Thesaurus by Getty
– Chinese WordNet
(1) Metadata models for different objects
• Digital collections
– Union catalog metadata model- Dublin core+
• Web sites
– DCCAP (Dublin Core Collections Application Profile)
– Public fields
– Private fields
 Unique Identifier, Format, Evaluation, Cataloging History
• Documents
– Document metadata-Dublin core
(2) Establish hyperlinks between contents
and objects
• Identify keywords in contents.
• Tag keywords with related object hyperlinks.
Develop hyperlink tagging tools
• Word segmentation tools
– Resolve word segmentation ambiguities and identify
keywords.
– CKIP word segmentation system:
http://ckipsvr.iis.sinica.edu.tw/
Develop hyperlink tagging tools
• TELDAP keyword dictionary
– Extract keywords from metadata and establish
object-keyword relations.
 Extract text from XML data for each object.
 The text are classified by topics, titles,
descriptions, authors, locations, eras etc.
 From each class of text file extract keywords by
automatic word segmentation, keyword
extraction, and manual post editing.
– Current dictionary contains more than 50,000
Keywords.
Prototype system for hyperlink tagger
• Identify and select keywords from the input text
Prototype system for hyperlink tagger
• Produce text with keywords and hyperlinks
Prototype system for hyperlink tagger
• Hyperlinks point to the related digital collections
(3) Construct TELDAP ontology and
thesaurus
Establish association links between
Chinese keywords and Getty AAT.
Merge TELDAP keywords with Chinese
AAT.
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
 Future Perspective
• Technology development
– Construct multi-lingua thesauri – extend Getty AAT.
– Maintain the TELDAP keyword-and-object relation
database.
– Construct name authority files, gazetteers, and
universal calendars.
– Design hyperlink taggers and keyword extension tools.
– Design an authoring tool which provides hyperlinks of
keyword related digital contents automatically.
– Design knowledge-based content retrieval system.
 Future Perspectives
• Content enrichment
– Within TELDAP :
 Standardize object metadata model and data format.
 Provide object metadata in controlled vocabulary.
 Write scripts and stories for different topics with Wiki-like
knowledge structure.
 Enrich the digital collections.
 Establish hyperlinks between text books and TELDAP
collections.
– Extend the knowledge sources : e.g. Wikipedia
Union catalogandknowledge engineering for teldap

More Related Content

What's hot

Desinging a library portal madhu
Desinging a library portal  madhuDesinging a library portal  madhu
Desinging a library portal madhu
kmusthu
 
Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003
Rose Holley
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 

What's hot (18)

Library orientation: Resources and Finding overview
Library orientation: Resources  and  Finding overviewLibrary orientation: Resources  and  Finding overview
Library orientation: Resources and Finding overview
 
Digital libraries & repositories
Digital libraries & repositoriesDigital libraries & repositories
Digital libraries & repositories
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
Design and development of subject gateways with special reference to lisgateway
Design and development of subject  gateways with special reference to lisgatewayDesign and development of subject  gateways with special reference to lisgateway
Design and development of subject gateways with special reference to lisgateway
 
Role of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic LibraryRole of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic Library
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 
Desinging a library portal madhu
Desinging a library portal  madhuDesinging a library portal  madhu
Desinging a library portal madhu
 
MetadataTheory: Introduction to Repositories (8th of 10)
MetadataTheory: Introduction to Repositories (8th of 10)MetadataTheory: Introduction to Repositories (8th of 10)
MetadataTheory: Introduction to Repositories (8th of 10)
 
Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003Developments in Access to Art Information: EnCompass Digital Portal. 2003
Developments in Access to Art Information: EnCompass Digital Portal. 2003
 
DRI Introductory Training: Introduction to Metadata
DRI Introductory Training: Introduction to MetadataDRI Introductory Training: Introduction to Metadata
DRI Introductory Training: Introduction to Metadata
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singh
 
Open Science and Identifiers
Open Science and IdentifiersOpen Science and Identifiers
Open Science and Identifiers
 
Daffodil International University Permanent Campus Library Orientation
Daffodil International University Permanent Campus Library OrientationDaffodil International University Permanent Campus Library Orientation
Daffodil International University Permanent Campus Library Orientation
 
Text Indexing and Retrieval
Text Indexing and RetrievalText Indexing and Retrieval
Text Indexing and Retrieval
 
Web mining
Web miningWeb mining
Web mining
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Information Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information CentersInformation Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information Centers
 

Viewers also liked

2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
AAT Taiwan
 
Beauty Code Vol 17
Beauty Code Vol 17Beauty Code Vol 17
Beauty Code Vol 17
sasahk
 
Beauty Code June
Beauty Code JuneBeauty Code June
Beauty Code June
sasahk
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
AAT Taiwan
 
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval SystemIntroduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
AAT Taiwan
 

Viewers also liked (13)

156
156156
156
 
Broadband 2010
Broadband 2010Broadband 2010
Broadband 2010
 
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
 
Mecralar
MecralarMecralar
Mecralar
 
Dearborn National Dental Wellness Approach
Dearborn National Dental Wellness ApproachDearborn National Dental Wellness Approach
Dearborn National Dental Wellness Approach
 
Beauty Code Vol 17
Beauty Code Vol 17Beauty Code Vol 17
Beauty Code Vol 17
 
Beauty Code June
Beauty Code JuneBeauty Code June
Beauty Code June
 
多語言藝術與建築索引典
多語言藝術與建築索引典多語言藝術與建築索引典
多語言藝術與建築索引典
 
AAT Translation Assessment Process
AAT Translation Assessment ProcessAAT Translation Assessment Process
AAT Translation Assessment Process
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
 
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval SystemIntroduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
 
Reader
ReaderReader
Reader
 
Aat in german
Aat in germanAat in german
Aat in german
 

Similar to Union catalogandknowledge engineering for teldap

Aksum University digital libraries
Aksum University digital librariesAksum University digital libraries
Aksum University digital libraries
Eskinder Asmelash
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for Libraries
Thomas King
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
Vikas Bhushan
 
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
NASIG
 

Similar to Union catalogandknowledge engineering for teldap (20)

Knowledge Engineering for TELDAP
Knowledge Engineering for TELDAPKnowledge Engineering for TELDAP
Knowledge Engineering for TELDAP
 
Aksum University digital libraries
Aksum University digital librariesAksum University digital libraries
Aksum University digital libraries
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices
 
Snac webinar v3
Snac webinar v3Snac webinar v3
Snac webinar v3
 
Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social Semantics
 
DL-architecture.ppt
DL-architecture.pptDL-architecture.ppt
DL-architecture.ppt
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for Libraries
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
 
Introducing the Open Discovery Initiative
Introducing the Open Discovery InitiativeIntroducing the Open Discovery Initiative
Introducing the Open Discovery Initiative
 
C N I20080404
C N I20080404C N I20080404
C N I20080404
 
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible LibraryBeyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
 
4 things about discovery
4 things about discovery4 things about discovery
4 things about discovery
 
Deluca "Building Momentum and Support for Institutional Repository Deposits"
Deluca "Building Momentum and Support for Institutional Repository Deposits"Deluca "Building Momentum and Support for Institutional Repository Deposits"
Deluca "Building Momentum and Support for Institutional Repository Deposits"
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic Librarians
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singh
 
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
Come Together: Interdepartmental Collaboration to Connect the IR and Library ...
 

More from AAT Taiwan

German AAT 2013
German AAT 2013German AAT 2013
German AAT 2013
AAT Taiwan
 
Chile AAT 2013
Chile AAT 2013Chile AAT 2013
Chile AAT 2013
AAT Taiwan
 
The Dutch AAT 2013
The Dutch AAT 2013The Dutch AAT 2013
The Dutch AAT 2013
AAT Taiwan
 
Challenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different CulturesChallenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different Cultures
AAT Taiwan
 
2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導
AAT Taiwan
 
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
AAT Taiwan
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
AAT Taiwan
 
2011 chinese aat update
2011 chinese aat update2011 chinese aat update
2011 chinese aat update
AAT Taiwan
 
Metadata for architectural contents in europe
Metadata for architectural contents in europeMetadata for architectural contents in europe
Metadata for architectural contents in europe
AAT Taiwan
 
Te papa, collections online & thesauri
Te papa, collections online & thesauriTe papa, collections online & thesauri
Te papa, collections online & thesauri
AAT Taiwan
 
An introduction to the name authority files in iran
An introduction to the name authority files in iranAn introduction to the name authority files in iran
An introduction to the name authority files in iran
AAT Taiwan
 
The spanish language version of the aat
The spanish language version of the  aatThe spanish language version of the  aat
The spanish language version of the aat
AAT Taiwan
 
Illuminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the WebIlluminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the Web
AAT Taiwan
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
AAT Taiwan
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
AAT Taiwan
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
AAT Taiwan
 
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
AAT Taiwan
 
(Final) aat taiwan system
(Final) aat taiwan system(Final) aat taiwan system
(Final) aat taiwan system
AAT Taiwan
 
(Final) bilingual equivalence mapping methods and issues
(Final) bilingual equivalence mapping methods and issues(Final) bilingual equivalence mapping methods and issues
(Final) bilingual equivalence mapping methods and issues
AAT Taiwan
 

More from AAT Taiwan (20)

German AAT 2013
German AAT 2013German AAT 2013
German AAT 2013
 
Chile AAT 2013
Chile AAT 2013Chile AAT 2013
Chile AAT 2013
 
The Dutch AAT 2013
The Dutch AAT 2013The Dutch AAT 2013
The Dutch AAT 2013
 
Challenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different CulturesChallenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different Cultures
 
2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導
 
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
 
2011 chinese aat update
2011 chinese aat update2011 chinese aat update
2011 chinese aat update
 
Metadata for architectural contents in europe
Metadata for architectural contents in europeMetadata for architectural contents in europe
Metadata for architectural contents in europe
 
Te papa, collections online & thesauri
Te papa, collections online & thesauriTe papa, collections online & thesauri
Te papa, collections online & thesauri
 
An introduction to the name authority files in iran
An introduction to the name authority files in iranAn introduction to the name authority files in iran
An introduction to the name authority files in iran
 
The spanish language version of the aat
The spanish language version of the  aatThe spanish language version of the  aat
The spanish language version of the aat
 
The dutch aat
The dutch aatThe dutch aat
The dutch aat
 
Illuminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the WebIlluminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the Web
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
 
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
 
(Final) aat taiwan system
(Final) aat taiwan system(Final) aat taiwan system
(Final) aat taiwan system
 
(Final) bilingual equivalence mapping methods and issues
(Final) bilingual equivalence mapping methods and issues(Final) bilingual equivalence mapping methods and issues
(Final) bilingual equivalence mapping methods and issues
 

Union catalogandknowledge engineering for teldap

  • 1. Union Catalog and Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica
  • 2.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 3.  Introduction The integration and management of digital contents has become an important issue as the amount of digital contents produced from different projects and institutions increases rapidly. The goal of our project is to achieve optimized preservation, retrieval, and presentation of digital collections.
  • 4.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 5. What is the union catalog ? • It is a catalog and portal for all digital collections of TELDAP. • It is an integrated platform for browsing and searching entire digital contents of TELDAP. • Metadata provides core descriptions and licensing information of each digital collection.
  • 6. Browsing by topics Search by keywords Home Page of Union Catalog
  • 7.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 8.  Metadata models for different types of objects Archived digital items • Union catalog metadata model- Dublin core+ Web sites • DCCAP (Dublin Core Collections Application Profile) • Fields for internal used only ― Unique Identifier, Format, Evaluation, Cataloging History Documents • Document metadata-Dublin core
  • 9. 9 Metadata for digital items : Over 3 million digital items and still increasing Element Definition Title A name given to the resource Creator An entity primarily responsible for making the content of the resource Subject and Keywords The topic of the content of the resource Description An account of the content of the resource Publisher An entity responsible for making the resource available Contributor An entity responsible for making contributions to the content of the resource Date A date associated with an event in the life cycle of the resource Resource Type The nature or genre of the content of the resource Format The physical or digital manifestation of the resource Resource Identifier An unambiguous reference to the resource within a given context Source A Reference to a resource from which the present resource is derived Language A language of the intellectual content of the resource Relation A reference to a related resource Coverage The extent or scope of the content of the resource Rights Management Information about rights held in and over the resource
  • 10. 10
  • 11. Metadata for websites Over 500 websites and still increasing Metadata • DCCAP (Dublin Core Collections Application Profile) • Total of 19 data fields
  • 12. The Website Homepage Picture URL, Project Information Type, Name, Author, Subject, Description, Language, Item Type, Target Archived Information: URL, time, authorization Copyright, Purpose, Other Information Figure: http://digitalarchives.tw Metadata for websites
  • 13. Dynamic categorization • User-oriented categorization – General, elementary school students, high school students, researchers, …etc. • Topical-based categorization – Archaeology, painting, animal, plant, document, … etc. • Functional-based categorization – Research, education, business, technology,… • Categorization based on institutions – Academia Sinica, Taiwan U., Palace museum,…
  • 14. Purpose: Education Target: Elementary school student, Junior high school student, Teacher… Purpose: Creative applications Purpose: Academic research Subject: Animal, Archaeology, Anthropology… Figure: http://digitalarchives.tw Digitalarchives.tw
  • 15. Metadata for project documents Over 14,000 documents and still increasing Metadata- Dublin core Construct Teldapwiki- A Wikipedia for TELDAP http://wiki.teldap.tw/
  • 16.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 17. Plans of making knowledge structures for TELDAP • Construct metadata models for different objects. • Establish hyperlinks between contexts and objects. – Develop keyword extraction tools. – Design automatic hyperlink tagging tools. • Construct TELDAP ontology and thesaurus. – Art & Architecture Thesaurus by Getty – Chinese WordNet
  • 18. (1) Metadata models for different objects • Digital collections – Union catalog metadata model- Dublin core+ • Web sites – DCCAP (Dublin Core Collections Application Profile) – Public fields – Private fields  Unique Identifier, Format, Evaluation, Cataloging History • Documents – Document metadata-Dublin core
  • 19. (2) Establish hyperlinks between contents and objects • Identify keywords in contents. • Tag keywords with related object hyperlinks.
  • 20. Develop hyperlink tagging tools • Word segmentation tools – Resolve word segmentation ambiguities and identify keywords. – CKIP word segmentation system: http://ckipsvr.iis.sinica.edu.tw/
  • 21. Develop hyperlink tagging tools • TELDAP keyword dictionary – Extract keywords from metadata and establish object-keyword relations.  Extract text from XML data for each object.  The text are classified by topics, titles, descriptions, authors, locations, eras etc.  From each class of text file extract keywords by automatic word segmentation, keyword extraction, and manual post editing. – Current dictionary contains more than 50,000 Keywords.
  • 22. Prototype system for hyperlink tagger • Identify and select keywords from the input text
  • 23. Prototype system for hyperlink tagger • Produce text with keywords and hyperlinks
  • 24. Prototype system for hyperlink tagger • Hyperlinks point to the related digital collections
  • 25. (3) Construct TELDAP ontology and thesaurus Establish association links between Chinese keywords and Getty AAT. Merge TELDAP keywords with Chinese AAT.
  • 26.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 27.  Future Perspective • Technology development – Construct multi-lingua thesauri – extend Getty AAT. – Maintain the TELDAP keyword-and-object relation database. – Construct name authority files, gazetteers, and universal calendars. – Design hyperlink taggers and keyword extension tools. – Design an authoring tool which provides hyperlinks of keyword related digital contents automatically. – Design knowledge-based content retrieval system.
  • 28.  Future Perspectives • Content enrichment – Within TELDAP :  Standardize object metadata model and data format.  Provide object metadata in controlled vocabulary.  Write scripts and stories for different topics with Wiki-like knowledge structure.  Enrich the digital collections.  Establish hyperlinks between text books and TELDAP collections. – Extend the knowledge sources : e.g. Wikipedia