SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Sinica Bow 與中文詞網的方法與作法

                         謝舒凱
Lab of Ontologies, Language Processing and e-Humanities,
                         NTNU
  CWN group, Institute of Linguistics, Academia Sinica
                    shukai@gmail.com



                     April 24, 2010
Background



Sinica BOW and Chinese Wordnet (CWN)



On-going Efforts and Future Perspectives
Background



Sinica BOW and Chinese Wordnet (CWN)



On-going Efforts and Future Perspectives
Who are We?




       我們是一群很有「sense」又懂得搞「關係」的人
What We have been Working on?


  Language Resources Construction, Evaluation and Knowledge
  Modelling:
      Corpus 語料庫 (ASBC, LDC-Gigaword, twWaC(balanced,
      domain and social media))
      Lexicon 詞彙知識庫 (Core Vocabulary, Domain lexicon
      knowledge base)
      Ontology 知識本體 (Sinica BOW (SUMO),
      KYOTO-DOLCHE, Hanzi/radical Ontology, Domain
      ontologies)
Corpus and Query Tools
Ontology and Cross-languages Validation
   SUMO Chinese example


                                     ®É¶¡


                                            ®ÉÂI




                                            ²y-±«×




                          °ê»Ú³æ¦ì
Lexicon




      Corpus distribution-based approach
      Simulation-based computational approach
      (Psycho-) linguistic approach
Latent Semantics in the Mental Lexicon
Random Walk in the Mental Lexicon
WordNet
WordNet Browser (e.g., Dubey)
Background



Sinica BOW and Chinese Wordnet (CWN)



On-going Efforts and Future Perspectives
Bootstrapping Bilingual Wordnet (I): Sinica BOW
Bootstrapping Bilingual Wordnet (II): GoogleCWN
Chinese-anchored Bilingual Wordnet from Scratch
Methodologies, Issues and Solutions




    1. Word segmentatin and selection (frequency and lexical
       semantic theory-based)
    2. Word sense distinction: 同義詞集 (synset), 詞義 (sense)、義
       面 (meaning facet)、異體詞
    3. Word sense relations: LSR algegra (transitivity in the
       network), paronymy, troponymy, morpho-semantic relations,
       etc.
Implementation




    1. From MS Access to MySQL database.
    2. Python-NLTK modules for CWN (and other resources)
    3. Convert to LMF-compatible markup
Lexicon Standard and Markup Languages




      LMF (Lexical Markup Framework)
      GLML(Generative Lexicon Markup Language)
      KAF (KYOTO-Annotation Format)
KAF Example
Current status
Toward a Global Wordnet Grids




      HanziGrid among CJKV (partly done with Chinese Hanzi and
      Japanese Kanji mapping)
      Chinese-Italian WordNet Web Service (RDF/OWL
      representation as a data model for Semantic Web)
      Global Wordnets Sense Tagging (Environmental domain for
      SemEval 2010)
Toward Mashup approach to dynamic LKB: Wordnik




  Test online
Toward a better understanding of Lexical and Social
Network
KYOTO-CWN WORKSHOP




    Around mid September
    Release of tools, resources, technical reports, browsing system
      竭誠歡迎您的參加、批評、指教、與合作,謝謝!

Contenu connexe

Similaire à Cwn aat talk

Development of a Controlled Natural Language Interface for Semantic MediaWiki
Development of a Controlled Natural Language Interface for Semantic MediaWikiDevelopment of a Controlled Natural Language Interface for Semantic MediaWiki
Development of a Controlled Natural Language Interface for Semantic MediaWiki
Jie Bao
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
devangmittal4
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
butest
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
Jie Bao
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Andrea Nuzzolese
 

Similaire à Cwn aat talk (20)

Roeder rocky 2011_46
Roeder rocky 2011_46Roeder rocky 2011_46
Roeder rocky 2011_46
 
SKOS, Past, Present and Future
SKOS, Past, Present and FutureSKOS, Past, Present and Future
SKOS, Past, Present and Future
 
Development of a Controlled Natural Language Interface for Semantic MediaWiki
Development of a Controlled Natural Language Interface for Semantic MediaWikiDevelopment of a Controlled Natural Language Interface for Semantic MediaWiki
Development of a Controlled Natural Language Interface for Semantic MediaWiki
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
Porting terminologies to the Semantic Web
Porting terminologies to the Semantic WebPorting terminologies to the Semantic Web
Porting terminologies to the Semantic Web
 
WP3 Further specification of Functionality and Interoperability - Gradmann
WP3 Further specification of Functionality and Interoperability - GradmannWP3 Further specification of Functionality and Interoperability - Gradmann
WP3 Further specification of Functionality and Interoperability - Gradmann
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
Reasoning on the Semantic Web
Reasoning on the Semantic WebReasoning on the Semantic Web
Reasoning on the Semantic Web
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
 
2010-04-29-swnj-pcls-presentation
2010-04-29-swnj-pcls-presentation2010-04-29-swnj-pcls-presentation
2010-04-29-swnj-pcls-presentation
 
Skos Presention 5 16 2008 Leitte
Skos Presention 5 16 2008 LeitteSkos Presention 5 16 2008 Leitte
Skos Presention 5 16 2008 Leitte
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Arabic SentiWordNet in Relation to SentiWordNet 3.0
Arabic SentiWordNet in Relation to SentiWordNet 3.0Arabic SentiWordNet in Relation to SentiWordNet 3.0
Arabic SentiWordNet in Relation to SentiWordNet 3.0
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
 
Assessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph RestrictionsAssessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph Restrictions
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
 

Plus de AAT Taiwan

German AAT 2013
German AAT 2013German AAT 2013
German AAT 2013
AAT Taiwan
 
Chile AAT 2013
Chile AAT 2013Chile AAT 2013
Chile AAT 2013
AAT Taiwan
 
The Dutch AAT 2013
The Dutch AAT 2013The Dutch AAT 2013
The Dutch AAT 2013
AAT Taiwan
 
Challenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different CulturesChallenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different Cultures
AAT Taiwan
 
2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導
AAT Taiwan
 
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
AAT Taiwan
 
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
AAT Taiwan
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
AAT Taiwan
 
2011 chinese aat update
2011 chinese aat update2011 chinese aat update
2011 chinese aat update
AAT Taiwan
 
Metadata for architectural contents in europe
Metadata for architectural contents in europeMetadata for architectural contents in europe
Metadata for architectural contents in europe
AAT Taiwan
 
Te papa, collections online & thesauri
Te papa, collections online & thesauriTe papa, collections online & thesauri
Te papa, collections online & thesauri
AAT Taiwan
 
An introduction to the name authority files in iran
An introduction to the name authority files in iranAn introduction to the name authority files in iran
An introduction to the name authority files in iran
AAT Taiwan
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
AAT Taiwan
 
The spanish language version of the aat
The spanish language version of the  aatThe spanish language version of the  aat
The spanish language version of the aat
AAT Taiwan
 
Union catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldapUnion catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldap
AAT Taiwan
 
Illuminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the WebIlluminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the Web
AAT Taiwan
 
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval SystemIntroduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
AAT Taiwan
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
AAT Taiwan
 

Plus de AAT Taiwan (20)

German AAT 2013
German AAT 2013German AAT 2013
German AAT 2013
 
Chile AAT 2013
Chile AAT 2013Chile AAT 2013
Chile AAT 2013
 
The Dutch AAT 2013
The Dutch AAT 2013The Dutch AAT 2013
The Dutch AAT 2013
 
Challenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different CulturesChallenges of Developing Terminology in Two Different Cultures
Challenges of Developing Terminology in Two Different Cultures
 
2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導2013 Sep Getty 刊物報導
2013 Sep Getty 刊物報導
 
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
 
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
 
2011 chinese aat update
2011 chinese aat update2011 chinese aat update
2011 chinese aat update
 
Metadata for architectural contents in europe
Metadata for architectural contents in europeMetadata for architectural contents in europe
Metadata for architectural contents in europe
 
Te papa, collections online & thesauri
Te papa, collections online & thesauriTe papa, collections online & thesauri
Te papa, collections online & thesauri
 
An introduction to the name authority files in iran
An introduction to the name authority files in iranAn introduction to the name authority files in iran
An introduction to the name authority files in iran
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
 
The spanish language version of the aat
The spanish language version of the  aatThe spanish language version of the  aat
The spanish language version of the aat
 
The dutch aat
The dutch aatThe dutch aat
The dutch aat
 
Aat in german
Aat in germanAat in german
Aat in german
 
Union catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldapUnion catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldap
 
Illuminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the WebIlluminating Chaos Using Semantics to Harness the Web
Illuminating Chaos Using Semantics to Harness the Web
 
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval SystemIntroduction and discussion about the AAT-Taiwan Management & Retrieval System
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
 

Cwn aat talk

  • 1. Sinica Bow 與中文詞網的方法與作法 謝舒凱 Lab of Ontologies, Language Processing and e-Humanities, NTNU CWN group, Institute of Linguistics, Academia Sinica shukai@gmail.com April 24, 2010
  • 2. Background Sinica BOW and Chinese Wordnet (CWN) On-going Efforts and Future Perspectives
  • 3. Background Sinica BOW and Chinese Wordnet (CWN) On-going Efforts and Future Perspectives
  • 4. Who are We? 我們是一群很有「sense」又懂得搞「關係」的人
  • 5. What We have been Working on? Language Resources Construction, Evaluation and Knowledge Modelling: Corpus 語料庫 (ASBC, LDC-Gigaword, twWaC(balanced, domain and social media)) Lexicon 詞彙知識庫 (Core Vocabulary, Domain lexicon knowledge base) Ontology 知識本體 (Sinica BOW (SUMO), KYOTO-DOLCHE, Hanzi/radical Ontology, Domain ontologies)
  • 7. Ontology and Cross-languages Validation SUMO Chinese example ®É¶¡ ®ÉÂI ²y-±«× °ê»Ú³æ¦ì
  • 8. Lexicon Corpus distribution-based approach Simulation-based computational approach (Psycho-) linguistic approach
  • 9. Latent Semantics in the Mental Lexicon
  • 10. Random Walk in the Mental Lexicon
  • 13. Background Sinica BOW and Chinese Wordnet (CWN) On-going Efforts and Future Perspectives
  • 17. Methodologies, Issues and Solutions 1. Word segmentatin and selection (frequency and lexical semantic theory-based) 2. Word sense distinction: 同義詞集 (synset), 詞義 (sense)、義 面 (meaning facet)、異體詞 3. Word sense relations: LSR algegra (transitivity in the network), paronymy, troponymy, morpho-semantic relations, etc.
  • 18. Implementation 1. From MS Access to MySQL database. 2. Python-NLTK modules for CWN (and other resources) 3. Convert to LMF-compatible markup
  • 19. Lexicon Standard and Markup Languages LMF (Lexical Markup Framework) GLML(Generative Lexicon Markup Language) KAF (KYOTO-Annotation Format)
  • 22. Toward a Global Wordnet Grids HanziGrid among CJKV (partly done with Chinese Hanzi and Japanese Kanji mapping) Chinese-Italian WordNet Web Service (RDF/OWL representation as a data model for Semantic Web) Global Wordnets Sense Tagging (Environmental domain for SemEval 2010)
  • 23. Toward Mashup approach to dynamic LKB: Wordnik Test online
  • 24. Toward a better understanding of Lexical and Social Network
  • 25. KYOTO-CWN WORKSHOP Around mid September Release of tools, resources, technical reports, browsing system 竭誠歡迎您的參加、批評、指教、與合作,謝謝!