Driving Behavioral Change for Information Management through Data-Driven Gree...
Citation studies in the humanities
1. Citation studies
in the humanities
Chris Alen Sula
School of Information & Library Science
Pratt Institute
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Matt Miller
NYPL Labs
New York Public Library
2. Background
‣ scholarly communication — the processes by which scholars
share their findings, both formally (e.g., articles) and informally
(e.g., tweets, letters, blogs)
‣ bibliometrics — methods for analyzing citation behaviors
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
‣ Bibliometrics is largely based on studies of scientific and
technical corpora (Hérubel and Buchanan, 1994; Lamont,
2000), with relatively few studies in the humanities (cf.
Ardanuy, 2013).
(Yan & Ding, 2012)
3. Citation networks
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Rosvall & Bergstrom (2007)
4. Bibliometrics & humanities: Why so
little?
‣ lack of data (Linmans, 2010), especially for
‣ monographs (Hammarfelt, 2011), which still form the backbone of
humanities work (Larivière, et. al., 2006)
‣ older sources, which humanists cite with greater frequency than scientists
(Heinzkill, 1980)
‣ lack of citations, comparatively speaking
‣ Humanists cite each other less frequently than scientists (Heinzkill, 1980;
Swales, 1990; Hellqvist, 2010).
‣ Multi-authored articles are rare (Price, 1966; Pao, 1981, 1982; Sievert and
Sievert, 1989; Wiberly, 1989), around 1.06 authors per article from 1980–
2007 (Linmans, 2010).
‣ Humanists do cite and co-author (Leydesdorff, Hammarfelt & Salah,
2011) and Dhers have done citation studies (Smith, 2009).
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
5. ‣ Humanities discourse differs from scientific discourse.
‣ more integral references, in which authors associate their own views with those
they references (Swales, 1990; Hyland, 1999; Harwood, 2008)
‣ more negative references, which object to other authors’ claims (Meadows,
1974; Brooks, 1985; Cano, 1989).
‣ The mere fact that one humanist cites another says nothing
about type or significance of their relationship.
‣ Understanding and tracking these these relationships would
give us a richer, more nuanced view of the humanities. Part of
that data can come from reference contexts, part from extra-
citational information (mentions, likes, real-world
relationships, etc.).
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Bibliometrics & humanities: Why so
little?
6. Reference context
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
(Chubin & Moitra, 1975)
(Frost, 1979)
‣ two example schema
7. #DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Code at http://github.com/thisismattmiller/dh2013-humanities-citation
8. Our tool: extraction
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
‣ Layout recognition used to extract citations and
surrounding context (usually 1–2 sentences)
9. Our tool: classification
‣ Naïve Bayes classifier using NLT
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
sample from positive training set
‣extensively discussed by
‣useful discussion
‣indebted to
‣groundbreaking work
‣result confirms the hypothesis
sample from negative training set
‣contra
‣appears to overlook
‣fail to account for
‣problematic
‣is unable to
10. Data & results
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
‣ articles sampled for this study
‣ results of citation tool applied to sample set
12. Broader patterns?
‣ citation frequency x polarity
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
13. Future directions
‣ further manual inspection of articles to determine the
reliability of extraction and classification
‣ further training of the sentiment classifier on larger
corpora
‣ measures of inter-rater reliability for classification
‣ support for more document layouts
‣ crowdsourced PDF analysis & classifier training
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
14. References
‣ All references are available in the conference proceedings at
http://dh2013.unl.edu/abstracts/ab-353.html
‣ Additional references:
‣ Jordi Ardanuy (2013). "Sixty Years of Citation Analysis Studies in the Humanities
(1951–2010)" Journal of the American Society for Information Science and Technology
64(8): 1751–1755.
‣ Erjia Yan and Ying Ding (2012). “Scholarly Network Similarities: How Bibliographic
Coupling Networks, Citation Networks, Cocitation Networks, Topical Networks,
Coauthorship Networks, and Coword Networks Relate to Each Other” Journal of the
American Society for Information Science and Technology 63(7): 1313–1326.
‣ Code at http://github.com/thisismattmiller/dh2013-
humanities-citation
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller