TechMiner is a new approach that combines natural language processing, machine learning, and semantic technologies to extract information about technologies (such as applications, systems, languages, and formats) from research publications. It generates an ontology describing technologies and their relationships to other research entities. The approach was evaluated on a gold standard of manually annotated publications and found to improve precision and recall over alternative natural language processing approaches. Future work includes enriching the approach to identify additional scientific objects and applying it to other research fields.
Forensic Biology & Its biological significance.pdf
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
1. Francesco Osborne, Helene de Ribaupierre, Enrico
Motta
KMi, The Open University, United Kingdom
EKAW2016
TechMiner: Extracting Technologies from
Academic Publications
2. 22
Osborne, F., Motta, E. and Mulholland, P.
Exploring scholarly data with Rexplore.
International Semantic Web Conference 2013
technologies.kmi.open.ac.uk/rexplore/
3. Semantic Enhanced Scholarly Data
Most scholarly datasets capture ‘standard’ scholarly entities and
their connections, such as authors, affiliations, venues,
publications, citations, and others.
We still lack comprehensive information about the content of
research papers, often simply represented as a collection of
keywords or categories from a taxonomy.
Hence, researchers are working for extracting other kinds of
entities, including:
– Genes
– Chemical components
– Epistemological concepts (e.g., hypothesis, motivation, experiments)
3
4. What about technologies?
• Technologies such as applications, systems, languages and
formats are an essential part of the Computer Science
ecosystem.
• Current knowledge bases cover just a little part of the set of
technologies presented in the literature.
• Identifying semantic relationships between technologies and
other research entities allows:
– Richer semantic search;
– Monitoring the emergence and impact of new technologies, both within
and across scientific fields;
– Studying the scholarly dynamics associated with the emergence of new
technologies;
– Supporting companies in the field of innovation brokering and
initiatives for encouraging software citations across disciplines, e.g.
FORCE11 Software Citation Working Group.
4
5. TechMiner
TechMiner (TM) is a new approach, which combines NLP,
machine learning and semantic technologies, for mining
technologies applications, systems, languages and formats from
research publications.
It generates an OWL ontology describing technologies and their
relationships with other research entities.
We evaluated TM on a manually annotated gold standard and
found that it improves significantly both precision and recall over
alternative NLP approaches.
– The proposed semantic features significantly improve both recall and
precision.
5
9. Evaluation – Gold Standard
We tested our approach on a gold standard (GS) of manually
annotated publications in the field of the Semantic Web
We selected a number of publications tagged with keywords
related to this field (e.g., ‘semantic web’, ‘linked data’, ‘RDF’) and
asked a group of 8 Semantic Web experts to annotate these
papers with their technologies.
The resulting GS includes 548 publications, each of them
annotated by at least two experts, and 539 technologies.
9
12. Future works
• Enriching the approach for identifying other
categories of scientific objects, such as datasets,
algorithms and so on.
• Trying the approach on other research fields.
• Building a pipeline for allowing human experts to
correct and manage the information extracted by
TechMiner.
12
13. Helene de Ribaupierre Enrico MottaFrancesco Osborne
Osborne, F., Ribaupierre, H., and Motta, E. (2016) TechMiner:
Extracting Technologies from Academic Publications.
EKAW 2016, Bologna, Italy
Email: francesco.osborne@open.ac.uk
Twitter: FraOsborne
Site: people.kmi.open.ac.uk/francesco
http://oro.open.ac.uk/47332/1/EKAW2016_TM.pdf
Notes de l'éditeur
That is where we enter in the picture
Why other solution are black box SW tech can give explanation