SlideShare une entreprise Scribd logo
1  sur  53
ifrs:Revenue us-gaap: GainLossOnSaleOfOilAndGasProperty de-gaap:BilanzsummeSummeAktiva Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop, University of South Africa Tobias Wunner DERI, National University of Ireland, Galway be-gaaap:MinderwaardenBijDeRealisatieVanVasteActiva ,[object Object],[object Object]
Context and Motivation Monnet use case in financial domain query financial information Cross-vocabulary Cross-lingual Get result in your own language Research challenges localization & translation of vocabularies cross-lingual ontology-based information extraction
Finance Terminology is complex! “minimum finance lease payments receivable,    at present value,    end of period not later than one year” representative term of financial domain ,[object Object]
complex structure (conceptually & linguistically),[object Object]
Break down complexity 3-faceted lexical enrichment Semantic Linguistic Terminological asset [asset] [available-for-sale] [financial] [financial asset] [non-financial asset] [available-for-sale financial asset] Noun_Sing: asset Noun_Plural: assets ?P: available-for-sale Adjective: financial NP: available-for-sale fin. asset VP: to sell financial assets is-a is-a financial asset non-financial asset is-a Term decomposition available-for-sale financial asset
XBRL – Semantic Analysis
XBRL – Semantic Analysis
XBRL – Semantic Analysis “Enhance semantics to facilitate translation and information extraction.”
XBRL – Terminological Analysis ifrs:MinimumFinanceLeasePaymentsReceivableAtPresentValue ifrs:MinimumFinanceLeasePaymentsReceivable Minimum  finance  lease  payments  receivable, at present value sapTerm:payments googleDefine:leasePayments sapTerm:financeLease googleDefine:Finance_lease Domain Independent Domain Related Domain Specific Domain Related Domain Independent Domain Independent Domain Specific
XBRL – Linguistic Analysis Financial text “…  received   minimum finance lease payments  …”  verb “…  lease payment  …”  complex singular simple minimum finance lease payments   receivable XBRL term adverb …  lease payments  … plural
Outline 1. Research challenge and motivation 2. Ontology Translation 3. Lexicalization (lemon) 4. CLOBIE (CL Ontology-based Inf. Extraction)
Translation using STL Models developed in Monnet English / German / Spanish / Dutch …Net2 Afrikaans?  Zulu? Xhosa? … ifrs:MinimumFinanceLeasePaymentsPayable ifrs:ProfitLossBeforeTax ifrs:Revenue
Application in Machine Translation in Dutch available-for-sale financial assets IFRS,  SAPTerm,  GoogleDefine 1. term analysis using: domain TM (IFRS),  Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale]    [financial]   [assets] 2. translate subterms using: [voorverkoopbeschikbare]    [financiële]   [activa] 3. term synthesis using: grammars  (rules, statistical models) voor verkoop beschikbare financiële activa
Application in Machine Translation in Afrikaans available-for-sale financial assets IFRS,  SAPTerm,  GoogleDefine 1. term analysis using: domain TM (IFRS),  Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale]    [financial]   [assets] 2. translate subterms using: [beskikbaarvirverkoop]    [finansiële]   [bates] 3. term synthesis using: grammars (rules, statistical models) finansiële bates beskikbaar vir verkoop
Application in Machine Translation in Spanish available-for-sale financial assets IFRS,  SAPTerm,  GoogleDefine 1. term analysis using: domain TM (IFRS),  Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale]    [financial]   [assets] 2. translate subterms using: [disponiblespara la venta]    [financia]   [activos] 3. term synthesis using: grammars (rules, statistical models) activos financieros disponibles para la venta
Outline 1. Research challenge and motivation 2. Ontology Translation 3. Lexicalization (lemon) 4. CLOBIE (CL Ontology-based Inf. Extraction)
Why do we need a lexicon? http://en.wikipedia.org/wiki/Finance_lease “loads of unlinked domain-specific terminology on the web !” An interoperable web for … ? re-use enable multilinguality cross-lingual search cross-lingual fact extraction http://www.investopedia.com/terms/l/lease-payments.asp
Lexicon standards overview ISO (XML) TEI (Text Encoding Initiative) LMF (Lexical Markup Framework) W3C & Semantic Web (RDF / OWL) build-in rdfs:label lightweight linguistic representations (SKOS, SKOS-XL) rich linguistic representations (GOLD, LexInfo)
SKOS – Multilingual Information SKOS concepts with… germ relations multilingual labels resource references skos:related ifrs:Minimum FinanceLease Payments dbpedia: Finance_lease dbpedia:Lease _payments skos:narrower skos:broader skos:related
SKOS – Multilingual Information Not much uptake yet?              from http://data.nytimes.com/
Ontology-Text Mismatch ‘Edificio-historico’ vs. ‘…edificio, declarado Monumento Histórico…’ >> goes beyond SKOS (monolingual & multilingual term variants) >> requires representation of lexical information to compute linguistic variants, e.g.   ‘edificio historico[apposVP[NP[Adj]]]’
A Lexicon Model for Ontologies Requirements for ‘ontology-lexicon’ model Represent linguistic information relative to ontology Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of underlying application Keep semantics separate from linguistic info Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties of words) knowledge Modular, minimal design Provide simple core model that can be easily extended upon need
Was there a solution already? - SKOS Simple Knowledge Organization System – SKOS General model for formalizing thesauri, terminologies and related semantic and knowledge resources Formalization of terminology in focus - terminology, classification, Semantic Web communities Does not address linguistic aspects of terminology, or therefore, the lexicon-ontology interface http://www.w3.org/2004/02/skos/
Was there a solution already? - GOLD General Ontology for Linguistic Description – GOLD Community-based ontology of linguistics Linguistic study in focus - linguistics community Formal model of linguistics as an ontology, but not about connecting lexical features to ontological semantics Other issues: very big, modularity? http://linguistics-ontology.org/gold/2010
Was there a solution already? - OWN OntoWordNet – OWN Formal specification of WordNet through extension and axiomatization of its conceptual relations Formal knowledge representation in focus - logic, knowledge representation, Semantic Web communities Turns WordNet into an ontology but not about connecting lexical features to ontological semantics http://wiki.loa-cnr.it/index.php/LoaWiki:OWN
Was there a solution already? - LMF Lexical Markup Framework – LMF General model for formalizing and sharing of machine-readable dictionaries Lexical knowledge representation in focus - lexicography, NLP communities Very close to ontology-lexicon requirements, but no view on how lexical features link to ontological semantics – semantics is limited to a notion of sense based on synsets Other issues: incomplete formal model, focus on classes, less on properties/relations  http://www.lexicalmarkupframework.org/
lemon lexicon model for ontologies: ‘lemon’ General model for formalizing lexical features relative to independently defined ontological semantics http://www.monnet-project.eu/lemon Two-level modelling Abstract level (meta-model): lemon Instantiation level (lexicon model): e.g. ‘LexInfo2’ http://lexinfo.net/
Many solutions… …with an a priori amount of linguistics or semantics!
lemon: Overview
lemon: Lexicon Lexicon: wild animals entry entry entry LE: Kudu LE: shaped like a Kudu LexicalEntry can be a Word, Phrase, or Part - such as an Affix
lemon: Form wild animals otherForm abstractForm canonicalForm LE F LE F LE F “kudu” “greater” “great”
lemon: Structure ? LE: shaped like a Kudu LE: shaped LE: like LE: a LE: Kudu LexicalEntry can be decomposed into one or more Components and compositional structure can be represented
lemon: Structure - Example :Component :Component :Component :Component lexeme edge edge decomposition :LexicalEntry :node :LexicalEntry :node :node :LexicalEntry :node :node :LexicalEntry :node :LexicalEntry :node shaped like a kudu constituent:PP shaped, lemma=“shape” constituent:VP constituent:VBN like, lemma=“like” constituent:NP constituent:IN a constituent:DT Kudu constituent:NNP element leaf edge edge element leaf edge element leaf edge element leaf
lemon: Meaning & Reference LE: kudu lexeme sense LS sememe reference
lemon: Meaning & Reference LE: kudu sense sense LE: greater            kudu narrower LS LS reference reference preSem
lemon: Meaning & Reference LE:greater kudu LE:lesser     kudu sense sense lexical incompatibility LS LS incompatible reference reference dbpedia:Kudu
lemon: Meaning & Reference LE: kudu LE: goat sense sense ontological incompatibility LS LS reference reference owl:disjointWith
lemon: Lexical Projection LexicalEntry can introduce a syntactic frame with arguments that are mapped to LexicalSense and indirectly to ontological semantic objects/properties
Lexical projection (Verb Frame) syntactic frame S (       NP VP(         VB NP       )     ) …with semantic sugar! SAP AGsold long-term fixed rate conventional mortgage loans
…more frames with LexInfo2 http://lexinfo.net/ontology/2.0/lexinfo#DitransitiveFrame ditransitive : Frame subject : Argument direct object : Argument indirect object : Argument verb: Frame extends synarg synarg synarg SAP AGsoldCompany Xmortgage loans
…more frames with LexInfo2 http://lexinfo.net/ontology/2.0/lexinfo#DitransitiveFrame_To ditransitive_to : Frame subject : Argument direct object : Argument indirect object : Argument ditransitive: Frame extends synarg synarg synarg SAP AGsold mortgage loansto Company X
Or Zulu morphology… LE:angoma sense sense class = lemon:MorphologicalPattern LE:tolo :zuluNC7_8 a lemon:MorphPattern ;   lemon:transform [       lemon:rule "isi(?=[^aeiou])~" ;      lemon:rule "is(?=[aeiou])~" ;      lemon:generates [ lexinfo:numberlexinfo:singular ]   ] , [      lemon:rule "izi(?=[^aeiou])~" ;      lemon:rule "iz(?=[aeiou])~" ;      lemon:generates [ lexinfo:numberlexinfo:plural ]  ] . pattern pattern isitolo (shop) izangoma (witch doctors)
Lemon Editor and Generator http://monnetproject.deri.ie/Lemon-Editor “asset-backed-debts” Finance Ontology lemon lexicon @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix lemon: <http://www.monnet-project.eu/lemon#> . @prefix financeV4: <http://fadyart.com/financeV4#> . @prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . @prefix pennbank: <http://www.monnet-project.eu/pennbank#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . … <file:test#assetbackeddebt> lemon:phraseRoot [ lemon:edge [ lemon:edge [ lemon:edge [ lemon:leaf _:n6 ] ; lemon:constituentpennbank:NNP ] ; lemon:constituentpennbank:NP ] ,       [ lemon:edge [ lemon:edge [ lemon:leaf _:n88 ] ; lemon:constituentpennbank:VBD ] ,         [ lemon:edge [ lemon:edge [ lemon:leaf _:n69 ] ; lemon:constituentpennbank:NN ] ; lemon:constituentpennbank:NP ] ; lemon:constituentpennbank:VP ] ; lemon:constituentpennbank:S ] ; lemon:decomposition ( _:n6     _:n88     _:n69     ) ; lemon:sense [ lemon:reference financeV4:AssetBackedDebt ] ; lemon:canonicalForm [ lemon:writtenRep "Asset backed debt"@en ] . … lemon Lexical Entries <file:test#back> lexinfo:partOfSpeechlexinfo:verb ; lemon:canonicalForm [ lexinfo:tenselexinfo:past ; lexinfo:verbFormMoodlexinfo:indicative ; lemon:writtenRep "backed"@en ; lexinfo:aspectlexinfo:perfective ] . _:n88 rdf:typelemon:Component ; lexinfo:tenselexinfo:past ; lemon:element <file:test#back> ; lexinfo:verbFormMoodlexinfo:indicative ; lexinfo:aspectlexinfo:perfective .
Outline 1. Research challenge and motivation 2. Ontology Translation & Inform. Extraction 3. Lexicalization (lemon) 4. CLOBIE (Cross-lingual Ontology-based Information Extraction)
What is CLOBIE Information Extraction Monolingual No semantics Cross-lingual Information Extraction Multilingual Ontology-based Information Extraction Semantics in the background
What is CLOBIE Information extraction(monolingual) Information extraction (multilingual) Information extraction with semantics “SAP sold risk securities at a value of 12b EUR.”  PATTERN: .*SAP.*[sells|sold|issues].*[risk securities].*[0-9]+b [EUR|USD].* PATTERN_DE: .*SAP.*verkaufte*.*[RisikoWertpapiere].*[0-9]+b [EUR|USD].* .*[COMPANY] sell [ASSETS] .* PATTERN: .*$COMPANY .*[sells|sold|issues].*$ASSETS.*$MONETARY_VALUE.} financial assets non-financial assets risk securities Property, Plant & Equipment
Application in Information Extraction (IE) :MinimumFinanceLeasePaymentsReceivable    rdfs:subClassOf xbrli:monetaryItemType ;    rdfs:label “Minimum finance lease payments receivable”@en . semantically lifted Minimum   finance lease   payments   receivable term analysis receivables payments received linguistic analysis Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 SAP Annual Report 2008 SAP Annual Report 2008 SAP Annual Report 2008 SAP Annual Report 2008 …The fair value of the Group’s finance leasereceivablesat 23 February 2008 was £5m… ..As at December 31, 2008, the future minimumlease payments expected to be received was €16million… …The fair value of the Group’s finance leasereceivablesat 23 February 2008 was £5m… ..As at December 31, 2008, the future minimumlease paymentsexpected to be received was €16million… …The fair value of the Group’s finance lease receivables at 23 February 2008 was £5m… ..As at December 31, 2008, the future minimum lease payments expected to be received was €16million… …The fair value of the Group’s finance lease receivables at 23 February 2008 was £5m… ..As at December 31, 2008, the future minimum lease payments expected to be received was €16million…
CLOBIE Interdisciplinary Statistical MT Rule-based MT Localization Term extraction Relation extraction Extract. grammars Machine Translation Information Extraction NLP Corpus query Term analysis POS tagging Morph analysis Information Retrieval CLOBIE Semantic Web TF-IDF Web query ranking algorithms CLIR (ESA, MT-based) Ontologies SKOS, lemon SPARQL queries
Why CLOBIE? Many unstructured resources (News, FinReps) Knowledge in SW is often: Not dynamic (no regular, only manual updates) Knowledge across languages/countries not integrated
CLOBIE blackboard architecture CLOBIE Search read token_id / POS token_id / token_id sent_id/ term sent_id/ concept Blackboard … read /  contribute read /  contribute read /  contribute read /  contribute Annotators Basic NLP ,[object Object]
Tok. / POSLinguistic Analyzer ,[object Object]
 Dependency ParserTerm Analyzer Semantic Analyzer ,[object Object],Semantic / Terminological / Linguistic  Enrichment Process
CLOBIE Data set (Wind Energy) 10 companies in Wind Energy domain Financial reports in German / Spanish / English / Dutch IFRS / DE-GAAP Semantics defined by IFRS vocabulary xEBR vocabulary

Contenu connexe

En vedette

Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeEstelle Delpech
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Association for Computational Linguistics
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Association for Computational Linguistics
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesEstelle Delpech
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Estelle Delpech
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsAlberto Simões
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationwebLyzard technology
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaHaithem Afli
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Adrien Barbaresi
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van braziliëJan-Willem Lammens
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconİrem Tümer
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in Englishteflang
 

En vedette (15)

Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-Sierra
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation Patterns
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corpora
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van brazilië
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexicon
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in English
 

Similaire à Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop, University of South Africa (UNISA)

Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLLawrie Hunter
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic WebOscar Corcho
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep WebSamiul Hoque
 
Porting terminologies to the Semantic Web
Porting terminologies to the Semantic WebPorting terminologies to the Semantic Web
Porting terminologies to the Semantic WebBernard Vatant
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 
Directions This assignment is for a Reading Course. The cross-dis
Directions This assignment is for a Reading Course. The cross-disDirections This assignment is for a Reading Course. The cross-dis
Directions This assignment is for a Reading Course. The cross-disAlyciaGold776
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonlexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonPretaLLOD
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologieseswcsummerschool
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEduserv Foundation
 

Similaire à Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop, University of South Africa (UNISA) (20)

Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic Web
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep Web
 
Porting terminologies to the Semantic Web
Porting terminologies to the Semantic WebPorting terminologies to the Semantic Web
Porting terminologies to the Semantic Web
 
Icwl2015 wahl
Icwl2015 wahlIcwl2015 wahl
Icwl2015 wahl
 
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual DictionariesOpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 
Directions This assignment is for a Reading Course. The cross-dis
Directions This assignment is for a Reading Course. The cross-disDirections This assignment is for a Reading Course. The cross-dis
Directions This assignment is for a Reading Course. The cross-dis
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonlexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
Keynote at AgroLT 2008
Keynote at AgroLT 2008Keynote at AgroLT 2008
Keynote at AgroLT 2008
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
 
xAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
xAPI Vocabulary - Improving Semantic Interoperability of Controlled VocabulariesxAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
xAPI Vocabulary - Improving Semantic Interoperability of Controlled Vocabularies
 
Eprints Application Profile
Eprints Application ProfileEprints Application Profile
Eprints Application Profile
 
Eprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, MexicoEprints Special Session - DC-2006, Mexico
Eprints Special Session - DC-2006, Mexico
 

Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop, University of South Africa (UNISA)

  • 1.
  • 2. Context and Motivation Monnet use case in financial domain query financial information Cross-vocabulary Cross-lingual Get result in your own language Research challenges localization & translation of vocabularies cross-lingual ontology-based information extraction
  • 3.
  • 4.
  • 5. Break down complexity 3-faceted lexical enrichment Semantic Linguistic Terminological asset [asset] [available-for-sale] [financial] [financial asset] [non-financial asset] [available-for-sale financial asset] Noun_Sing: asset Noun_Plural: assets ?P: available-for-sale Adjective: financial NP: available-for-sale fin. asset VP: to sell financial assets is-a is-a financial asset non-financial asset is-a Term decomposition available-for-sale financial asset
  • 6. XBRL – Semantic Analysis
  • 7. XBRL – Semantic Analysis
  • 8. XBRL – Semantic Analysis “Enhance semantics to facilitate translation and information extraction.”
  • 9. XBRL – Terminological Analysis ifrs:MinimumFinanceLeasePaymentsReceivableAtPresentValue ifrs:MinimumFinanceLeasePaymentsReceivable Minimum finance lease payments receivable, at present value sapTerm:payments googleDefine:leasePayments sapTerm:financeLease googleDefine:Finance_lease Domain Independent Domain Related Domain Specific Domain Related Domain Independent Domain Independent Domain Specific
  • 10. XBRL – Linguistic Analysis Financial text “… received minimum finance lease payments …” verb “… lease payment …” complex singular simple minimum finance lease payments receivable XBRL term adverb … lease payments … plural
  • 11. Outline 1. Research challenge and motivation 2. Ontology Translation 3. Lexicalization (lemon) 4. CLOBIE (CL Ontology-based Inf. Extraction)
  • 12. Translation using STL Models developed in Monnet English / German / Spanish / Dutch …Net2 Afrikaans? Zulu? Xhosa? … ifrs:MinimumFinanceLeasePaymentsPayable ifrs:ProfitLossBeforeTax ifrs:Revenue
  • 13. Application in Machine Translation in Dutch available-for-sale financial assets IFRS, SAPTerm, GoogleDefine 1. term analysis using: domain TM (IFRS), Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale] [financial] [assets] 2. translate subterms using: [voorverkoopbeschikbare] [financiële] [activa] 3. term synthesis using: grammars (rules, statistical models) voor verkoop beschikbare financiële activa
  • 14. Application in Machine Translation in Afrikaans available-for-sale financial assets IFRS, SAPTerm, GoogleDefine 1. term analysis using: domain TM (IFRS), Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale] [financial] [assets] 2. translate subterms using: [beskikbaarvirverkoop] [finansiële] [bates] 3. term synthesis using: grammars (rules, statistical models) finansiële bates beskikbaar vir verkoop
  • 15. Application in Machine Translation in Spanish available-for-sale financial assets IFRS, SAPTerm, GoogleDefine 1. term analysis using: domain TM (IFRS), Linked Open Data (DBPedia), Translation services (GoogleTranslate) [available-for-sale] [financial] [assets] 2. translate subterms using: [disponiblespara la venta] [financia] [activos] 3. term synthesis using: grammars (rules, statistical models) activos financieros disponibles para la venta
  • 16. Outline 1. Research challenge and motivation 2. Ontology Translation 3. Lexicalization (lemon) 4. CLOBIE (CL Ontology-based Inf. Extraction)
  • 17. Why do we need a lexicon? http://en.wikipedia.org/wiki/Finance_lease “loads of unlinked domain-specific terminology on the web !” An interoperable web for … ? re-use enable multilinguality cross-lingual search cross-lingual fact extraction http://www.investopedia.com/terms/l/lease-payments.asp
  • 18. Lexicon standards overview ISO (XML) TEI (Text Encoding Initiative) LMF (Lexical Markup Framework) W3C & Semantic Web (RDF / OWL) build-in rdfs:label lightweight linguistic representations (SKOS, SKOS-XL) rich linguistic representations (GOLD, LexInfo)
  • 19. SKOS – Multilingual Information SKOS concepts with… germ relations multilingual labels resource references skos:related ifrs:Minimum FinanceLease Payments dbpedia: Finance_lease dbpedia:Lease _payments skos:narrower skos:broader skos:related
  • 20. SKOS – Multilingual Information Not much uptake yet? from http://data.nytimes.com/
  • 21. Ontology-Text Mismatch ‘Edificio-historico’ vs. ‘…edificio, declarado Monumento Histórico…’ >> goes beyond SKOS (monolingual & multilingual term variants) >> requires representation of lexical information to compute linguistic variants, e.g. ‘edificio historico[apposVP[NP[Adj]]]’
  • 22. A Lexicon Model for Ontologies Requirements for ‘ontology-lexicon’ model Represent linguistic information relative to ontology Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of underlying application Keep semantics separate from linguistic info Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties of words) knowledge Modular, minimal design Provide simple core model that can be easily extended upon need
  • 23. Was there a solution already? - SKOS Simple Knowledge Organization System – SKOS General model for formalizing thesauri, terminologies and related semantic and knowledge resources Formalization of terminology in focus - terminology, classification, Semantic Web communities Does not address linguistic aspects of terminology, or therefore, the lexicon-ontology interface http://www.w3.org/2004/02/skos/
  • 24. Was there a solution already? - GOLD General Ontology for Linguistic Description – GOLD Community-based ontology of linguistics Linguistic study in focus - linguistics community Formal model of linguistics as an ontology, but not about connecting lexical features to ontological semantics Other issues: very big, modularity? http://linguistics-ontology.org/gold/2010
  • 25. Was there a solution already? - OWN OntoWordNet – OWN Formal specification of WordNet through extension and axiomatization of its conceptual relations Formal knowledge representation in focus - logic, knowledge representation, Semantic Web communities Turns WordNet into an ontology but not about connecting lexical features to ontological semantics http://wiki.loa-cnr.it/index.php/LoaWiki:OWN
  • 26. Was there a solution already? - LMF Lexical Markup Framework – LMF General model for formalizing and sharing of machine-readable dictionaries Lexical knowledge representation in focus - lexicography, NLP communities Very close to ontology-lexicon requirements, but no view on how lexical features link to ontological semantics – semantics is limited to a notion of sense based on synsets Other issues: incomplete formal model, focus on classes, less on properties/relations http://www.lexicalmarkupframework.org/
  • 27. lemon lexicon model for ontologies: ‘lemon’ General model for formalizing lexical features relative to independently defined ontological semantics http://www.monnet-project.eu/lemon Two-level modelling Abstract level (meta-model): lemon Instantiation level (lexicon model): e.g. ‘LexInfo2’ http://lexinfo.net/
  • 28. Many solutions… …with an a priori amount of linguistics or semantics!
  • 30. lemon: Lexicon Lexicon: wild animals entry entry entry LE: Kudu LE: shaped like a Kudu LexicalEntry can be a Word, Phrase, or Part - such as an Affix
  • 31. lemon: Form wild animals otherForm abstractForm canonicalForm LE F LE F LE F “kudu” “greater” “great”
  • 32. lemon: Structure ? LE: shaped like a Kudu LE: shaped LE: like LE: a LE: Kudu LexicalEntry can be decomposed into one or more Components and compositional structure can be represented
  • 33. lemon: Structure - Example :Component :Component :Component :Component lexeme edge edge decomposition :LexicalEntry :node :LexicalEntry :node :node :LexicalEntry :node :node :LexicalEntry :node :LexicalEntry :node shaped like a kudu constituent:PP shaped, lemma=“shape” constituent:VP constituent:VBN like, lemma=“like” constituent:NP constituent:IN a constituent:DT Kudu constituent:NNP element leaf edge edge element leaf edge element leaf edge element leaf
  • 34. lemon: Meaning & Reference LE: kudu lexeme sense LS sememe reference
  • 35. lemon: Meaning & Reference LE: kudu sense sense LE: greater kudu narrower LS LS reference reference preSem
  • 36. lemon: Meaning & Reference LE:greater kudu LE:lesser kudu sense sense lexical incompatibility LS LS incompatible reference reference dbpedia:Kudu
  • 37. lemon: Meaning & Reference LE: kudu LE: goat sense sense ontological incompatibility LS LS reference reference owl:disjointWith
  • 38. lemon: Lexical Projection LexicalEntry can introduce a syntactic frame with arguments that are mapped to LexicalSense and indirectly to ontological semantic objects/properties
  • 39. Lexical projection (Verb Frame) syntactic frame S ( NP VP( VB NP ) ) …with semantic sugar! SAP AGsold long-term fixed rate conventional mortgage loans
  • 40. …more frames with LexInfo2 http://lexinfo.net/ontology/2.0/lexinfo#DitransitiveFrame ditransitive : Frame subject : Argument direct object : Argument indirect object : Argument verb: Frame extends synarg synarg synarg SAP AGsoldCompany Xmortgage loans
  • 41. …more frames with LexInfo2 http://lexinfo.net/ontology/2.0/lexinfo#DitransitiveFrame_To ditransitive_to : Frame subject : Argument direct object : Argument indirect object : Argument ditransitive: Frame extends synarg synarg synarg SAP AGsold mortgage loansto Company X
  • 42. Or Zulu morphology… LE:angoma sense sense class = lemon:MorphologicalPattern LE:tolo :zuluNC7_8 a lemon:MorphPattern ;   lemon:transform [       lemon:rule "isi(?=[^aeiou])~" ;      lemon:rule "is(?=[aeiou])~" ;      lemon:generates [ lexinfo:numberlexinfo:singular ]   ] , [      lemon:rule "izi(?=[^aeiou])~" ;      lemon:rule "iz(?=[aeiou])~" ;      lemon:generates [ lexinfo:numberlexinfo:plural ]  ] . pattern pattern isitolo (shop) izangoma (witch doctors)
  • 43. Lemon Editor and Generator http://monnetproject.deri.ie/Lemon-Editor “asset-backed-debts” Finance Ontology lemon lexicon @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix lemon: <http://www.monnet-project.eu/lemon#> . @prefix financeV4: <http://fadyart.com/financeV4#> . @prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . @prefix pennbank: <http://www.monnet-project.eu/pennbank#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . … <file:test#assetbackeddebt> lemon:phraseRoot [ lemon:edge [ lemon:edge [ lemon:edge [ lemon:leaf _:n6 ] ; lemon:constituentpennbank:NNP ] ; lemon:constituentpennbank:NP ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n88 ] ; lemon:constituentpennbank:VBD ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n69 ] ; lemon:constituentpennbank:NN ] ; lemon:constituentpennbank:NP ] ; lemon:constituentpennbank:VP ] ; lemon:constituentpennbank:S ] ; lemon:decomposition ( _:n6 _:n88 _:n69 ) ; lemon:sense [ lemon:reference financeV4:AssetBackedDebt ] ; lemon:canonicalForm [ lemon:writtenRep "Asset backed debt"@en ] . … lemon Lexical Entries <file:test#back> lexinfo:partOfSpeechlexinfo:verb ; lemon:canonicalForm [ lexinfo:tenselexinfo:past ; lexinfo:verbFormMoodlexinfo:indicative ; lemon:writtenRep "backed"@en ; lexinfo:aspectlexinfo:perfective ] . _:n88 rdf:typelemon:Component ; lexinfo:tenselexinfo:past ; lemon:element <file:test#back> ; lexinfo:verbFormMoodlexinfo:indicative ; lexinfo:aspectlexinfo:perfective .
  • 44. Outline 1. Research challenge and motivation 2. Ontology Translation & Inform. Extraction 3. Lexicalization (lemon) 4. CLOBIE (Cross-lingual Ontology-based Information Extraction)
  • 45. What is CLOBIE Information Extraction Monolingual No semantics Cross-lingual Information Extraction Multilingual Ontology-based Information Extraction Semantics in the background
  • 46. What is CLOBIE Information extraction(monolingual) Information extraction (multilingual) Information extraction with semantics “SAP sold risk securities at a value of 12b EUR.” PATTERN: .*SAP.*[sells|sold|issues].*[risk securities].*[0-9]+b [EUR|USD].* PATTERN_DE: .*SAP.*verkaufte*.*[RisikoWertpapiere].*[0-9]+b [EUR|USD].* .*[COMPANY] sell [ASSETS] .* PATTERN: .*$COMPANY .*[sells|sold|issues].*$ASSETS.*$MONETARY_VALUE.} financial assets non-financial assets risk securities Property, Plant & Equipment
  • 47. Application in Information Extraction (IE) :MinimumFinanceLeasePaymentsReceivable rdfs:subClassOf xbrli:monetaryItemType ; rdfs:label “Minimum finance lease payments receivable”@en . semantically lifted Minimum finance lease payments receivable term analysis receivables payments received linguistic analysis Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 Tesco’s Annual Report 2009 SAP Annual Report 2008 SAP Annual Report 2008 SAP Annual Report 2008 SAP Annual Report 2008 …The fair value of the Group’s finance leasereceivablesat 23 February 2008 was £5m… ..As at December 31, 2008, the future minimumlease payments expected to be received was €16million… …The fair value of the Group’s finance leasereceivablesat 23 February 2008 was £5m… ..As at December 31, 2008, the future minimumlease paymentsexpected to be received was €16million… …The fair value of the Group’s finance lease receivables at 23 February 2008 was £5m… ..As at December 31, 2008, the future minimum lease payments expected to be received was €16million… …The fair value of the Group’s finance lease receivables at 23 February 2008 was £5m… ..As at December 31, 2008, the future minimum lease payments expected to be received was €16million…
  • 48. CLOBIE Interdisciplinary Statistical MT Rule-based MT Localization Term extraction Relation extraction Extract. grammars Machine Translation Information Extraction NLP Corpus query Term analysis POS tagging Morph analysis Information Retrieval CLOBIE Semantic Web TF-IDF Web query ranking algorithms CLIR (ESA, MT-based) Ontologies SKOS, lemon SPARQL queries
  • 49. Why CLOBIE? Many unstructured resources (News, FinReps) Knowledge in SW is often: Not dynamic (no regular, only manual updates) Knowledge across languages/countries not integrated
  • 50.
  • 51.
  • 52.
  • 53. CLOBIE Data set (Wind Energy) 10 companies in Wind Energy domain Financial reports in German / Spanish / English / Dutch IFRS / DE-GAAP Semantics defined by IFRS vocabulary xEBR vocabulary
  • 54. Next steps… Benchmark development and evaluation on the basis of a data set in finance domain financial reports and news from different companies in wind energy domain multilingual (German, Dutch, Spanish, English) multi-vocabulary (IFRS, European local GAAPs, DBPedia) Cross-lingual ontology-based information retrieval system Generate ontology-based information extraction grammars from lemon ontology-lexicons

Notes de l'éditeur

  1. Frame: VerbNet, …LinguisticOntology: GOLD, LexInfo2Form: SKOSLexicalSense-Ontology: SKOS-XLNode/Edge: ParseStructures rare formats such as NEGRA Corpus / TIGER TAG SET by IMS Stuttgart or StanfordParser proprietary
  2. Also phrasal lexicon
  3. Lemon distinguishes among different types of lexical forms
  4. LexicalSenseunderspecified sense THAT points to a language-external referenceunique ontological semantic object (depending on conditions and context) can have subsense andsenseRelation with other lexicalSensesemene relation between lexicalSense and ontologicalSemantic Object can be either: pref / alt / hiddenSem
  5. Syntactic agreement: NP( NP_COMPANY VP( VB_sell NP_ASSETS ) )Semantic agreement: COMPANY, ASSETS
  6. Syntactic agreement: NP( NP_COMPANY VP( VB_sell NP_ASSETS ) )Semantic agreement: COMPANY, ASSETS
  7. Syntactic agreement: NP( NP_COMPANY VP( VB_sell NP_ASSETS ) )Semantic agreement: COMPANY, ASSETS
  8. asset-backed-debt“debts are backed by assets”Corresponds to a noun phrase BUT is analyzed by the lemon generator as a sentence: ‘asset backed debt’