1. Building a Spanish MMTx by
using Automatic Translation and
Biomedical Ontologies
Francisco Carrero 1,2 ; José Carlos Cortizo 1,2 ; José Mª Gómez 3
1 Wipley, Social Gaming Platform
http://www.wipley.com
2 Universidad Europea de Madrid
http://www.esp.uem.es/gsi
3 Optenet
http://www.esp.uem.es/gsi
2. Outline
The MIRCAT project
The challenge
English MetaMap, a big effort
Approaching a Spanish MetaMap
Experiments
Discussion of the Results and Future Work
Francisco Carrero Garcia
6. The Challenge
The problem
We can extract UMLS concepts from English texts using
MetaMap...
...but there is no Spanish version of MetaMap
Is it difficult to construct a tool like MetaMap?
Francisco Carrero Garcia
10. Experimental Design
Text Collections
MedLine Plus medical News
http://www.nlm.nih.gov/medlineplus/newsbydate.html
Excellent online resource
2000 news, some in English, some in Spanish
600 available in both languages
Francisco Carrero Garcia
11. Experiments
Experimental Design
MetaMap extracts concepts, allowing multiple representations
A => Using compound concepts
B => simple concepts
1 => resolves ambiguity by adding all the concepts
2 => ignores ambiguities by choosing the first possibility
4 representations: A1, A2, B1, B2
Francisco Carrero Garcia
12. Experiments
Filtering
Data representations containing a lot of features do not usually
perform very well in text tasks
Many classifiers degrade in prediction accuracy when faced with
many irrelevant features or redundant/correlated ones (“curse
of dimensionality”)
We apply Zipf’s Law to filter the attributes
Francisco Carrero Garcia
16. Discussion of the Results
Translation
The worst results (similarity) are achieved with the most
complex (near to humans) representation: A1
B1 is less complex and produces the best results
=> Our model seems to be more suitable as a plain bag-of-
concepts representation
Similar to bag-of-words representation, widely used in text
processing tasks
Francisco Carrero Garcia
17. Discussion of the Results
Classification
All results are comparable to classification on original English
texts
In some cases, are even better
Best results using A2+Zipf, +7.8% in AUC
UNMKD representations never achieves worse classifications than
English
Francisco Carrero Garcia
18. Conclussions and Future Work
The “easy way” to construct a Spanish MetaMap is promising
Google Translation seems a good tool to adapt English resources
to any other languages (like Spanish)
We should try other translation tools
We are working on applying this approach to other text tasks
(like Information Retrieval and Filtering)
Francisco Carrero Garcia
19. Ending...
Thank you very much for your attention
Francisco Carrero Garcia