SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Introduction
Techniques
Overview and Summary
Multilingual Text Classification
Gerard de Melo, Stefan Siersdorfer
Max Planck Institute for Computer Science
Saarbr¨ucken, Germany
2007-04-04
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documents to classes (e.g.
thematically, geographically)
machine learning algorithms, e.g.
SVM, can learn from pre-classified
training documents
multilingual case: documents in
multiple languages
applications: news wire filtering,
library management, e-mail, etc.
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documents to classes (e.g.
thematically, geographically)
machine learning algorithms, e.g.
SVM, can learn from pre-classified
training documents
multilingual case: documents in
multiple languages
applications: news wire filtering,
library management, e-mail, etc.
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documents to classes (e.g.
thematically, geographically)
machine learning algorithms, e.g.
SVM, can learn from pre-classified
training documents
multilingual case: documents in
multiple languages
applications: news wire filtering,
library management, e-mail, etc.
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Text Classification
Text Classification
task: automatically assign text
documents to classes (e.g.
thematically, geographically)
machine learning algorithms, e.g.
SVM, can learn from pre-classified
training documents
multilingual case: documents in
multiple languages
applications: news wire filtering,
library management, e-mail, etc.
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Machine Translation for Multilingual TC
idea: simply translate all documents into a single language LI
(prior work by Jalam 2002, Rigutini et al. 2005)
shortcomings of this approach
lexical variety in LI (English: huge vocabulary, many synonyms)
variety of expression in source languages
lexical ambiguity in LI (unnecessary introduction of additional
ambiguity)
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Machine Translation for Multilingual TC
idea: simply translate all documents into a single language LI
(prior work by Jalam 2002, Rigutini et al. 2005)
shortcomings of this approach
lexical variety in LI (English: huge vocabulary, many synonyms)
variety of expression in source languages
lexical ambiguity in LI (unnecessary introduction of additional
ambiguity)
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Machine Translation for Multilingual TC
idea: simply translate all documents into a single language LI
(prior work by Jalam 2002, Rigutini et al. 2005)
shortcomings of this approach
lexical variety in LI (English: huge vocabulary, many synonyms)
variety of expression in source languages
lexical ambiguity in LI (unnecessary introduction of additional
ambiguity)
Spanish coche −→ car
French voiture −→ automobile
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic Concepts
Idea
map all words to semantic concepts (e.g. WordNet synsets),
thus distinguishing different senses of a word while identifying
synonyms
disambiguate using context information
construct feature vectors by counting occurrences of concepts
rather than terms
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic Concepts
Idea
map all words to semantic concepts (e.g. WordNet synsets),
thus distinguishing different senses of a word while identifying
synonyms
disambiguate using context information
construct feature vectors by counting occurrences of concepts
rather than terms
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic Concepts
Idea
map all words to semantic concepts (e.g. WordNet synsets),
thus distinguishing different senses of a word while identifying
synonyms
disambiguate using context information
construct feature vectors by counting occurrences of concepts
rather than terms
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Semantic Concepts
Problems
understemming
polysemy: highly related senses are treated as distinct
incongruent concepts between languages
variety of expression
lexical lacunae
English I have a headache I have a headache
Spanish Me duele la cabeza *It hurts the head to me
French J’ai mal `a la t^ete *I have pain at the head
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Weight Propagation
propagate weight from original
concepts to related concepts
choose path to c maximizing
its weight
Dijkstra-like algorithm in order
to assign maximal possible
weight to a concept
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Weight Propagation
propagate weight from original
concepts to related concepts
choose path to c maximizing
its weight
Dijkstra-like algorithm in order
to assign maximal possible
weight to a concept
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Machine Translation
Mapping to Semantic Concept
Weight Propagation
Weight Propagation
propagate weight from original
concepts to related concepts
choose path to c maximizing
its weight
Dijkstra-like algorithm in order
to assign maximal possible
weight to a concept
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Overview and Summary
Overview and Summary
Ontology Region Mapping
1 optionally translate the documents – or use a multilingual
lexical resource (aligned wordnets)
2 map terms to concepts
3 search for highly related concepts
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Overview and Summary
Overview and Summary
Ontology Region Mapping
1 optionally translate the documents – or use a multilingual
lexical resource (aligned wordnets)
2 map terms to concepts
3 search for highly related concepts
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
Introduction
Techniques
Overview and Summary
Overview and Summary
Overview and Summary
Ontology Region Mapping
1 optionally translate the documents – or use a multilingual
lexical resource (aligned wordnets)
2 map terms to concepts
3 search for highly related concepts
entire regions of concepts are
relevant, so propagate a part
of the concept’s weight to
related concepts
G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification

Contenu connexe

Tendances

The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
fridolin.wild
 
slides
slidesslides
slides
butest
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
butest
 

Tendances (20)

Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
Methods for Amharic Part-of-Speech Tagging
Methods for Amharic Part-of-Speech TaggingMethods for Amharic Part-of-Speech Tagging
Methods for Amharic Part-of-Speech Tagging
 
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical DomainDDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
slides
slidesslides
slides
 
Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
 
Skip gram and cbow
Skip gram and cbowSkip gram and cbow
Skip gram and cbow
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Bt0077 multimedia systems
Bt0077   multimedia systemsBt0077   multimedia systems
Bt0077 multimedia systems
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 

Similaire à Multilingual Text Classification using Ontologies

Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
csandit
 
Doc format.
Doc format.Doc format.
Doc format.
butest
 
Scales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduates
Scales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduatesScales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduates
Scales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduates
Hans Ecke
 
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Christophe Tricot
 
The Valladolid Presentation - Nov, 16, 2011
The Valladolid Presentation - Nov, 16, 2011The Valladolid Presentation - Nov, 16, 2011
The Valladolid Presentation - Nov, 16, 2011
sdemetri
 

Similaire à Multilingual Text Classification using Ontologies (20)

Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Doc format.
Doc format.Doc format.
Doc format.
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
Scales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduates
Scales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduatesScales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduates
Scales02WhatProgrammingLanguagesShouldWeTeachOurUndergraduates
 
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...
 
The Valladolid Presentation - Nov, 16, 2011
The Valladolid Presentation - Nov, 16, 2011The Valladolid Presentation - Nov, 16, 2011
The Valladolid Presentation - Nov, 16, 2011
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
IJNLC 2013 - Ambiguity-Aware Document Similarity
IJNLC  2013 - Ambiguity-Aware Document SimilarityIJNLC  2013 - Ambiguity-Aware Document Similarity
IJNLC 2013 - Ambiguity-Aware Document Similarity
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
Vsm lsi
Vsm lsiVsm lsi
Vsm lsi
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 

Plus de Gerard de Melo

From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated Data
Gerard de Melo
 

Plus de Gerard de Melo (15)

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link Prediction
 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your Research
 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
From Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable Knowledge
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated Data
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge Base
 
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaExtracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked Data
 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic Intensities
 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
 

Dernier

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Dernier (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 

Multilingual Text Classification using Ontologies

  • 1. Introduction Techniques Overview and Summary Multilingual Text Classification Gerard de Melo, Stefan Siersdorfer Max Planck Institute for Computer Science Saarbr¨ucken, Germany 2007-04-04 G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 2. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 3. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 4. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 5. Introduction Techniques Overview and Summary Text Classification Text Classification task: automatically assign text documents to classes (e.g. thematically, geographically) machine learning algorithms, e.g. SVM, can learn from pre-classified training documents multilingual case: documents in multiple languages applications: news wire filtering, library management, e-mail, etc. G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 6. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Machine Translation for Multilingual TC idea: simply translate all documents into a single language LI (prior work by Jalam 2002, Rigutini et al. 2005) shortcomings of this approach lexical variety in LI (English: huge vocabulary, many synonyms) variety of expression in source languages lexical ambiguity in LI (unnecessary introduction of additional ambiguity) G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 7. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Machine Translation for Multilingual TC idea: simply translate all documents into a single language LI (prior work by Jalam 2002, Rigutini et al. 2005) shortcomings of this approach lexical variety in LI (English: huge vocabulary, many synonyms) variety of expression in source languages lexical ambiguity in LI (unnecessary introduction of additional ambiguity) G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 8. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Machine Translation for Multilingual TC idea: simply translate all documents into a single language LI (prior work by Jalam 2002, Rigutini et al. 2005) shortcomings of this approach lexical variety in LI (English: huge vocabulary, many synonyms) variety of expression in source languages lexical ambiguity in LI (unnecessary introduction of additional ambiguity) Spanish coche −→ car French voiture −→ automobile G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 9. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Idea map all words to semantic concepts (e.g. WordNet synsets), thus distinguishing different senses of a word while identifying synonyms disambiguate using context information construct feature vectors by counting occurrences of concepts rather than terms G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 10. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Idea map all words to semantic concepts (e.g. WordNet synsets), thus distinguishing different senses of a word while identifying synonyms disambiguate using context information construct feature vectors by counting occurrences of concepts rather than terms G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 11. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Idea map all words to semantic concepts (e.g. WordNet synsets), thus distinguishing different senses of a word while identifying synonyms disambiguate using context information construct feature vectors by counting occurrences of concepts rather than terms G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 12. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Semantic Concepts Problems understemming polysemy: highly related senses are treated as distinct incongruent concepts between languages variety of expression lexical lacunae English I have a headache I have a headache Spanish Me duele la cabeza *It hurts the head to me French J’ai mal `a la t^ete *I have pain at the head G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 13. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Weight Propagation propagate weight from original concepts to related concepts choose path to c maximizing its weight Dijkstra-like algorithm in order to assign maximal possible weight to a concept G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 14. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Weight Propagation propagate weight from original concepts to related concepts choose path to c maximizing its weight Dijkstra-like algorithm in order to assign maximal possible weight to a concept G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 15. Introduction Techniques Overview and Summary Machine Translation Mapping to Semantic Concept Weight Propagation Weight Propagation propagate weight from original concepts to related concepts choose path to c maximizing its weight Dijkstra-like algorithm in order to assign maximal possible weight to a concept G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 16. Introduction Techniques Overview and Summary Overview and Summary Overview and Summary Ontology Region Mapping 1 optionally translate the documents – or use a multilingual lexical resource (aligned wordnets) 2 map terms to concepts 3 search for highly related concepts G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 17. Introduction Techniques Overview and Summary Overview and Summary Overview and Summary Ontology Region Mapping 1 optionally translate the documents – or use a multilingual lexical resource (aligned wordnets) 2 map terms to concepts 3 search for highly related concepts G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification
  • 18. Introduction Techniques Overview and Summary Overview and Summary Overview and Summary Ontology Region Mapping 1 optionally translate the documents – or use a multilingual lexical resource (aligned wordnets) 2 map terms to concepts 3 search for highly related concepts entire regions of concepts are relevant, so propagate a part of the concept’s weight to related concepts G. de Melo, S. Siersdorfer, Max-Planck-Institut Informatik Multilingual Text Classification