Towards the automatic identification of the nature of citations
1. Towards the automatic identification
of the nature of citations
(1) Department of Computer Science and Engineering, University of Bologna, Italy
(2) STLab-ISTC, National Research Council, Italy
30 May 2013
Montpellier, France
ESWC 2013
2. Motivation
• Bibliographic citations can be seen as tools for:
– linking research: making pointers to related works, to source of
experimental data, to methods used, etc.
– disseminating research: conference proceedings, journals, Web
platforms (e.g. blogs, wikis), Semantic Publishing platforms and
projects (e.g. OpenCitation, OpenBibliography, Lucero)
– exploring research: new ways of browsing article through networks
of citations (e.g. CiteWiz, Citation Sensitive In-browser Summariser)
– evaluating research: measuring the importance of journals (e.g.
impact factor) or the scientific productivity of authors (e.g. h-index)
• Assumption: all these activities can be radically improved by
exploiting the actual function of citations, i.e. author’s
reason for citing a given paper
3. Goal
• To design a method able to automatically infer the
author’s reason for citing a scientific article
• To implement a tool that is comparable to humans in the
task of identifying the nature of citations
4. Available online at http://wit.istc.cnr.it:8080/tools/citalo
It extends the research
outlined in earlier work X.
Ontology
learning
Citation type
extraction
Word-sense
disambiguation
Alignment to
CiTO
Sentiment
analysis
Output:
cito:extends
Input: a sentence
containing a reference to
a bibliographic entity
indicated by an “X”
Derive a logical (i.e. an
OWL ontology)
representation of the
sentence through
FRED
Extract candidate types
for the citation by looking
for patterns in FRED
output via SPARQL
Gather the sense of the
candidate types through
IMS with respect to
OntoWordNet
Capture the sentiment
polarity emerging from th
text through AlchemyAPI
Assign CiTO types to the
citation through SPARQL
CONSTRUCT
5. Result
Similarly to Teufel et al. [19] the most
neutral CiTO property,
citesForInformation, was the most
prevalent function in our dataset too,
as the second most used property
was usedMethodIn
We run CiTalO on the same sample according to 8 different configurations and we
compared the results with humans annotations
No configuration
that emerges as
the absolutely best
one from these
data
Worst
configurations were
those that took into
account all the
proximal synsets
We asked humans to manually annotate 106 citation sentences, contained in scientific ar
according to CiTO properties