1. Sinica Bow 與中文詞網的方法與作法
謝舒凱
Lab of Ontologies, Language Processing and e-Humanities,
NTNU
CWN group, Institute of Linguistics, Academia Sinica
shukai@gmail.com
April 24, 2010
5. What We have been Working on?
Language Resources Construction, Evaluation and Knowledge
Modelling:
Corpus 語料庫 (ASBC, LDC-Gigaword, twWaC(balanced,
domain and social media))
Lexicon 詞彙知識庫 (Core Vocabulary, Domain lexicon
knowledge base)
Ontology 知識本體 (Sinica BOW (SUMO),
KYOTO-DOLCHE, Hanzi/radical Ontology, Domain
ontologies)
17. Methodologies, Issues and Solutions
1. Word segmentatin and selection (frequency and lexical
semantic theory-based)
2. Word sense distinction: 同義詞集 (synset), 詞義 (sense)、義
面 (meaning facet)、異體詞
3. Word sense relations: LSR algegra (transitivity in the
network), paronymy, troponymy, morpho-semantic relations,
etc.
18. Implementation
1. From MS Access to MySQL database.
2. Python-NLTK modules for CWN (and other resources)
3. Convert to LMF-compatible markup
19. Lexicon Standard and Markup Languages
LMF (Lexical Markup Framework)
GLML(Generative Lexicon Markup Language)
KAF (KYOTO-Annotation Format)
22. Toward a Global Wordnet Grids
HanziGrid among CJKV (partly done with Chinese Hanzi and
Japanese Kanji mapping)
Chinese-Italian WordNet Web Service (RDF/OWL
representation as a data model for Semantic Web)
Global Wordnets Sense Tagging (Environmental domain for
SemEval 2010)