SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Giovanni Colavizza
Matteo Romanello (@mr56k)
Frédéric Kaplan (@frederickaplan)
The References of References:
Enriching Library Catalogs via Domain-Specific
Reference Mining
1
Goal
2
Empowering scholars in
the Humanities with better
IR systems
Motivation - the Scholar
Issues: lack of data [Sula and Miller, 2014] leads to absence of
services: estimated coverage of Web of Science for Humanities
circa 13% [Mingers and Leydesdorff, 2015].
3
Sciences:
Google Scholar
English
mainly papers
Lower-cost information
gathering
Humanities:
no Google Scholar-like system
multiple languages
mainly monographs
Higher-cost information
gathering
Motivation - the Footnote
How humanists cite? Footnotes [see e.g. Hellqvist, 2009]
4
Motivation - the Archive
Approximately half citations to primary sources [Wiberley Jr., 2009]
5
Motivation - the Scholar reloaded
6
Proposal: Enriching library catalogs
7
Use reference monographs, the “canon” of the
domain, to extract references to the rest of the
literature and enrich library catalogs.
Project: Linked Books
Focused on a case study/domain:
the history of Venice.
Partners so far:
• Ca’ Foscari University Library System
• Biblioteca Marciana
• Istituto Veneto di Scienze, Lettere ed Arti
• Archivio di Stato di Venezia
• EPFL
8
The Pipeline
9
Corpus selection
10
Result: 1904 monographs, 701 with
a structured list of references.
Use the means of the library:
1- Consultation shelves
2- Dewey and subject classification
3- Scholarly bibliographies
4- Keyword search
The Pipeline - Digitization
11
Digitization
12
1,904 monographs + ~1,000 journal issues
The Pipeline - Annotation/Extraction/Parsing
13
Annotation
14
• annotated 27% of 701 monographs (with reference list)
• 3.8% of all digitized pages (with references)
• annotators identified 33 citation styles, divided into 6 families
• Yes, humanities scholars love customized reference styles!
Reference Extraction/Parsing
15
[Klinkhammer b-i-secondary-full] [Lutz, i-secondary-full] [L’occupazione i-
secondary-full] [tedesca i-secondary-full] [in i-secondary-full] [Italia i-secondary-full]
[1943-1945, i-secondary-full] [Torino, i-secondary-full] [Bollati i-secondary-full]
[Boringhieri i-secondary-full] [1993 i-secondary-full].
Klinkhammer Lutz, L’occupazione tedesca in Italia 1943-1945,
Torino, Bollati Boringhieri 1993.
[Klinkhammer author] [Lutz, author] [L’occupazione title] [tedesca title]
[in title] [Italia title] [1943-1945, title] [Torino, publicationplace] [Bollati
publisher] [Boringhieri publisher] [1993 publicationyear].
Extraction/Parsing - Evaluation
16
Extraction/Parsing - Confusion Matrix
17
null
author
title
abbrev. (E)
monograph (E)
Task 1
F1 score
(avg) 0.806
class=“null” 0.609
Task 2
F1 score
(avg) 0.842
class=“end abbreviated” 0.242
The Pipeline - Lookup
18
Lookup
19
1. Against OPAC SBN (via API)
Steps:
1. search candidates by title
2. match reference metadata
3. assign each candidate a
confidence score
4. return set of candidates
Evaluation:
• 2k references (out of 181k)
• 41.7% no candidates
• 58.3% with candidates:
• 72.3% -> first candidate
correct
Goal: disambiguation of references
Issues:
• OCR errors -> impact on search by title (low recall)
• API as a “black box” + bottleneck of search by title
Lookup
20
2. Against metadata of digitized books
Lookup
Goal: verify cohesiveness of digitized corpus
Method:
• based on SBN lookup
• but lookup against digitization
metadata
• tuned to maximize precision
• returns 1 or no matches
Evaluation*:
• 500 references (out of 181k)
• precision ~ 1.00
• recall > 0.95
Result:
• only 7% of references extracted from 701 monographs point
inwards (i.e. towards the 1904 monographs)
21
Core of the discipline
co-citation network from
extracted references*
giant component = 59%
of selected corpus
books in the giant
component -> core of
reference works on
history of Venice
giant component ->
32.5% with only works in
consultation
Conclusions and Outlook
22
data- and citation-driven approach to assess and
exploit, from an IR point of view, domain-specific
library holdings on the history of Venice
next big challenge: extraction, consolidation and
disambiguation of references contained within
footnotes (journals)
Giovanni Colavizza
Matteo Romanello (@mr56k)
Frédéric Kaplan (@frederickaplan)
Thank you!
go.epfl.ch/linkedbooks
23

Contenu connexe

Similaire à The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining

Exploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiencesExploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiencesLuigi Spagnolo
 
INF 100 Tutorial
INF 100 TutorialINF 100 Tutorial
INF 100 TutorialChris Chan
 
Dissertations 5 ref, plagiarism, own crit-analysis
Dissertations 5   ref, plagiarism, own crit-analysisDissertations 5   ref, plagiarism, own crit-analysis
Dissertations 5 ref, plagiarism, own crit-analysisStudy Hub
 
Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014Giovanni Colavizza
 
Bibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptxBibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptxJamshi8
 
Dissertations 5 ref, plagiarism, own crit-analysis [handout]
Dissertations 5   ref, plagiarism, own crit-analysis [handout]Dissertations 5   ref, plagiarism, own crit-analysis [handout]
Dissertations 5 ref, plagiarism, own crit-analysis [handout]Study Hub
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...jodischneider
 
02 Literature search and reviewing_1.pptx
02  Literature search and reviewing_1.pptx02  Literature search and reviewing_1.pptx
02 Literature search and reviewing_1.pptxssusere05ec21
 
Portland Place School June 2018
Portland Place School June 2018 Portland Place School June 2018
Portland Place School June 2018 EISLibrarian
 
Order #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of servOrder #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of servJUST36
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Print source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxPrint source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxsanjaychavan62
 
Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries ALISS
 
Writing Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineWriting Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineRobert Domanski
 
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewArc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewGalala University
 
McGill Library and your thesis
McGill Library and your thesisMcGill Library and your thesis
McGill Library and your thesisPamela Harrison
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...EDINA, University of Edinburgh
 

Similaire à The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining (20)

Exploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiencesExploratory computing: designing discovery-driven user experiences
Exploratory computing: designing discovery-driven user experiences
 
INF 100 Tutorial
INF 100 TutorialINF 100 Tutorial
INF 100 Tutorial
 
Dissertations 5 ref, plagiarism, own crit-analysis
Dissertations 5   ref, plagiarism, own crit-analysisDissertations 5   ref, plagiarism, own crit-analysis
Dissertations 5 ref, plagiarism, own crit-analysis
 
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
 
Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014
 
Romanello tokyo
Romanello tokyoRomanello tokyo
Romanello tokyo
 
Bibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptxBibiliography Footnotes Oral Presentation PPT 17.pptx
Bibiliography Footnotes Oral Presentation PPT 17.pptx
 
Dissertations 5 ref, plagiarism, own crit-analysis [handout]
Dissertations 5   ref, plagiarism, own crit-analysis [handout]Dissertations 5   ref, plagiarism, own crit-analysis [handout]
Dissertations 5 ref, plagiarism, own crit-analysis [handout]
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...
 
02 Literature search and reviewing_1.pptx
02  Literature search and reviewing_1.pptx02  Literature search and reviewing_1.pptx
02 Literature search and reviewing_1.pptx
 
Portland Place School June 2018
Portland Place School June 2018 Portland Place School June 2018
Portland Place School June 2018
 
Order #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of servOrder #185993101 writers choice (5 pages, 4 slides)type of serv
Order #185993101 writers choice (5 pages, 4 slides)type of serv
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Print source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptxPrint source literature 24 March 2023.pptx
Print source literature 24 March 2023.pptx
 
Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries Collection evaluation techniques for academic libraries
Collection evaluation techniques for academic libraries
 
Writing Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a DisciplineWriting Right: Teaching Writing Conventions Specific to a Discipline
Writing Right: Teaching Writing Conventions Specific to a Discipline
 
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature reviewArc 323 human studies in architecture fall 2018 lecture 3-literature review
Arc 323 human studies in architecture fall 2018 lecture 3-literature review
 
McGill Library and your thesis
McGill Library and your thesisMcGill Library and your thesis
McGill Library and your thesis
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...
 

Plus de Giovanni Colavizza

Sul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital HumanitiesSul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital HumanitiesGiovanni Colavizza
 
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...Giovanni Colavizza
 
A Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni databaseA Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni databaseGiovanni Colavizza
 
Notes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliensNotes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliensGiovanni Colavizza
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time MachineGiovanni Colavizza
 
Mapping Early Modern News Networks
Mapping Early Modern News NetworksMapping Early Modern News Networks
Mapping Early Modern News NetworksGiovanni Colavizza
 
Report on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTMReport on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTMGiovanni Colavizza
 
Mapping the News Networks in XVII Italy
Mapping the News Networks in XVII ItalyMapping the News Networks in XVII Italy
Mapping the News Networks in XVII ItalyGiovanni Colavizza
 
Garzoni conference 11 October 2014
Garzoni conference 11 October 2014Garzoni conference 11 October 2014
Garzoni conference 11 October 2014Giovanni Colavizza
 
Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Giovanni Colavizza
 
Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Giovanni Colavizza
 
Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Giovanni Colavizza
 
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Giovanni Colavizza
 

Plus de Giovanni Colavizza (14)

Sul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital HumanitiesSul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital Humanities
 
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
 
A Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni databaseA Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni database
 
Venice 1740 Reconstruction
Venice 1740 ReconstructionVenice 1740 Reconstruction
Venice 1740 Reconstruction
 
Notes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliensNotes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliens
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Mapping Early Modern News Networks
Mapping Early Modern News NetworksMapping Early Modern News Networks
Mapping Early Modern News Networks
 
Report on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTMReport on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTM
 
Mapping the News Networks in XVII Italy
Mapping the News Networks in XVII ItalyMapping the News Networks in XVII Italy
Mapping the News Networks in XVII Italy
 
Garzoni conference 11 October 2014
Garzoni conference 11 October 2014Garzoni conference 11 October 2014
Garzoni conference 11 October 2014
 
Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013
 
Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013
 
Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013
 
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
 

Dernier

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermicultureTakeleZike1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 

Dernier (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermiculture
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 

The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining

  • 1. Giovanni Colavizza Matteo Romanello (@mr56k) Frédéric Kaplan (@frederickaplan) The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining 1
  • 2. Goal 2 Empowering scholars in the Humanities with better IR systems
  • 3. Motivation - the Scholar Issues: lack of data [Sula and Miller, 2014] leads to absence of services: estimated coverage of Web of Science for Humanities circa 13% [Mingers and Leydesdorff, 2015]. 3 Sciences: Google Scholar English mainly papers Lower-cost information gathering Humanities: no Google Scholar-like system multiple languages mainly monographs Higher-cost information gathering
  • 4. Motivation - the Footnote How humanists cite? Footnotes [see e.g. Hellqvist, 2009] 4
  • 5. Motivation - the Archive Approximately half citations to primary sources [Wiberley Jr., 2009] 5
  • 6. Motivation - the Scholar reloaded 6
  • 7. Proposal: Enriching library catalogs 7 Use reference monographs, the “canon” of the domain, to extract references to the rest of the literature and enrich library catalogs.
  • 8. Project: Linked Books Focused on a case study/domain: the history of Venice. Partners so far: • Ca’ Foscari University Library System • Biblioteca Marciana • Istituto Veneto di Scienze, Lettere ed Arti • Archivio di Stato di Venezia • EPFL 8
  • 10. Corpus selection 10 Result: 1904 monographs, 701 with a structured list of references. Use the means of the library: 1- Consultation shelves 2- Dewey and subject classification 3- Scholarly bibliographies 4- Keyword search
  • 11. The Pipeline - Digitization 11
  • 12. Digitization 12 1,904 monographs + ~1,000 journal issues
  • 13. The Pipeline - Annotation/Extraction/Parsing 13
  • 14. Annotation 14 • annotated 27% of 701 monographs (with reference list) • 3.8% of all digitized pages (with references) • annotators identified 33 citation styles, divided into 6 families • Yes, humanities scholars love customized reference styles!
  • 15. Reference Extraction/Parsing 15 [Klinkhammer b-i-secondary-full] [Lutz, i-secondary-full] [L’occupazione i- secondary-full] [tedesca i-secondary-full] [in i-secondary-full] [Italia i-secondary-full] [1943-1945, i-secondary-full] [Torino, i-secondary-full] [Bollati i-secondary-full] [Boringhieri i-secondary-full] [1993 i-secondary-full]. Klinkhammer Lutz, L’occupazione tedesca in Italia 1943-1945, Torino, Bollati Boringhieri 1993. [Klinkhammer author] [Lutz, author] [L’occupazione title] [tedesca title] [in title] [Italia title] [1943-1945, title] [Torino, publicationplace] [Bollati publisher] [Boringhieri publisher] [1993 publicationyear].
  • 17. Extraction/Parsing - Confusion Matrix 17 null author title abbrev. (E) monograph (E) Task 1 F1 score (avg) 0.806 class=“null” 0.609 Task 2 F1 score (avg) 0.842 class=“end abbreviated” 0.242
  • 18. The Pipeline - Lookup 18
  • 19. Lookup 19 1. Against OPAC SBN (via API) Steps: 1. search candidates by title 2. match reference metadata 3. assign each candidate a confidence score 4. return set of candidates Evaluation: • 2k references (out of 181k) • 41.7% no candidates • 58.3% with candidates: • 72.3% -> first candidate correct Goal: disambiguation of references Issues: • OCR errors -> impact on search by title (low recall) • API as a “black box” + bottleneck of search by title
  • 20. Lookup 20 2. Against metadata of digitized books Lookup Goal: verify cohesiveness of digitized corpus Method: • based on SBN lookup • but lookup against digitization metadata • tuned to maximize precision • returns 1 or no matches Evaluation*: • 500 references (out of 181k) • precision ~ 1.00 • recall > 0.95 Result: • only 7% of references extracted from 701 monographs point inwards (i.e. towards the 1904 monographs)
  • 21. 21 Core of the discipline co-citation network from extracted references* giant component = 59% of selected corpus books in the giant component -> core of reference works on history of Venice giant component -> 32.5% with only works in consultation
  • 22. Conclusions and Outlook 22 data- and citation-driven approach to assess and exploit, from an IR point of view, domain-specific library holdings on the history of Venice next big challenge: extraction, consolidation and disambiguation of references contained within footnotes (journals)
  • 23. Giovanni Colavizza Matteo Romanello (@mr56k) Frédéric Kaplan (@frederickaplan) Thank you! go.epfl.ch/linkedbooks 23