SlideShare une entreprise Scribd logo
1  sur  15
The Past, Present and Future of
Digital Scholarship with
Newspaper Collections
DH2019, Utrecht, July 2019
The Past, Present and Future of Digital
Scholarship with Newspaper Collections
• Short Project Presentations:
• Living with Machines
• impresso - Media Monitoring of the Past
• Construire avec les usagers la numérisation des collections de périodiques
(NewsEye)
• Overview Papers
• Digital Editions of Serials and media historians: an overview
• Towards a Critical Framework for Digital Newspaper Scholarship
• Q&A
Our Partners Our Funders
Living with Machines
Dr Mia Ridge, British Library, Co-Investigator
Paper authors/project team: Mia Ridge, Giovanni Colavizza, with Ruth Ahnert, Claire
Austin, David Beavan, Kaspar Beelens, Mariona Coll Ardanuy, Adam Farquhar, Emma
Griffin, James Hetherington, Jon Lawrence, Katie McDonough, Barbara McGillivray,
André Piza, Daniel van Strien, Giorgia Tolfo, Alan Wilson, Daniel Wilson.
Project vision
• We aim to facilitate new historical findings about the impact of
technology on the lives of ordinary people during the Industrial
Revolution / long nineteenth century (c. 1780 – 1918)
Or
• Applying new methods to questions about the past to explore the
future of collaboration between data science, history and digital
humanities
Or
• Challenging library professionals, data scientists and historians to
‘radically collaborate’ and learn from and with each other
Why newspapers?
• Large digitised corpus available if requested
• Opportunity to tackle the challenges of working at scale:
operational, methodological, organisational
• Suitable for developing innovative computational models, tools,
code, data and infrastructure reusable by other scholars and
research projects
The British Newspaper Archive
• Nearly 33 million newspaper pages
• Site by Findmypast Limited in commercial partnership with the
British Library
• BL Labs previously facilitated access for researchers to JISC-
funded digitised newspapers
British Library newspapers and periodicals
• British Library has 60m issues (450 million pages, 34,000 titles)
from 17thC to today
• Majority UK/Irish (Legal Deposit from 1869), but also overseas
esp. USA, India, Africa
• New digitisation through ‘Heritage Made Digital’ and Living with
Machines projects
• 6.8% digitised (July 2019)
But what’s actually
available digitally?
Courtesy Yann Ryan @lievesofgrass and @BL_MadeDigital
Copyright ‘safe date’
discussions are on-going
and... complicated
Our early work with newspapers
Research questions tackled across various Labs include:
• How bad is the OCR, really? And what effect does that have on
computational linguistic and nominal linkage methods?
• Can digitising newspaper directories help us understand the
difference in political and religious affiliations (etc.) between the
overall potential corpus and what’s currently been digitised?
• Can we use crowdsourcing tasks to reliably gather information
about industrial accidents? Can we then use the results to train
machine learning tools to find accidents at scale?
Ongoing questions
• To what extent does ‘convenience’ in digitisation and the quest for
geographical coverage affect scholarship?
• Copyright dates, short vs long runs, microfilm vs hard copy
• How do we show the impact of OCR quality on both keyword
searches and data processing at scale?
• What kinds of derived datasets would be useful to researchers?
• Planning for legacy: how do we integrate entity recognition etc.
results into discovery systems? How do we ensure interoperability?
• We can share public domain but not potentially copyrighted pages
– what effect does that have on user experience?
• How do we reconcile different ideas about ‘outputs’?
Thank you!
Living with Machines @LivingWMachines
Sneak preview and newsletter signup:
http://livingwithmachines.ac.uk/
The Past, Present and Future of Digital
Scholarship with Newspaper Collections
• Short Project Presentations:
• Living with Machines
• impresso - Media Monitoring of the Past
• Construire avec les usagers la numérisation des collections de périodiques
(NewsEye)
• Overview Papers
• Digital Editions of Serials and media historians: an overview
• Towards a Critical Framework for Digital Newspaper Scholarship
• Q&A
Dividing the work into ‘Labs’
• Sources - showing the biases in the collection and processing of sources
• Language - combining approaches from computational linguistics to corpora
including newspapers and novels
• Space and time - combining census data and event-based records to
understand urban change with spatial and temporal analyses
• Communities - a meta lab, amplifying results and engaging the public in
meaningful crowdsourcing that contributes to the project's research
• 3I (Integration, infrastructure and interfaces) - connects the IT infrastructure
with work done in the other labs and vice-versa, thinking about computational
processes and integration of data science.
• Data acquisition and wrangling – managing practical aspects of data ingest
including rights and data management

Contenu connexe

Tendances

Tendances (20)

Working with other sectors
Working with other sectorsWorking with other sectors
Working with other sectors
 
Keynote: Stefano Bertolo
Keynote: Stefano BertoloKeynote: Stefano Bertolo
Keynote: Stefano Bertolo
 
Open Data & Local Authorities, Paul Maltby, Nov 2014
Open Data & Local Authorities, Paul Maltby, Nov 2014Open Data & Local Authorities, Paul Maltby, Nov 2014
Open Data & Local Authorities, Paul Maltby, Nov 2014
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 
BDE Webinar: How does the research community benefit from the new EU General ...
BDE Webinar: How does the research community benefit from the new EU General ...BDE Webinar: How does the research community benefit from the new EU General ...
BDE Webinar: How does the research community benefit from the new EU General ...
 
The current status of TDM in Europe
The current status of TDM in EuropeThe current status of TDM in Europe
The current status of TDM in Europe
 
Archiving News on the Web
Archiving News on the WebArchiving News on the Web
Archiving News on the Web
 
Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...
Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...
Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...
 
Nesta destination local cc 070715
Nesta destination local cc 070715Nesta destination local cc 070715
Nesta destination local cc 070715
 
02 apps4 energy erik mannens what if we need open data, linked and big data t...
02 apps4 energy erik mannens what if we need open data, linked and big data t...02 apps4 energy erik mannens what if we need open data, linked and big data t...
02 apps4 energy erik mannens what if we need open data, linked and big data t...
 
From Digital Enterprise to Insight(s) - Stefan Decker
From Digital Enterprise to Insight(s) - Stefan DeckerFrom Digital Enterprise to Insight(s) - Stefan Decker
From Digital Enterprise to Insight(s) - Stefan Decker
 
OPERAS: open access in the european research area through scholarly communica...
OPERAS: open access in the european research area through scholarly communica...OPERAS: open access in the european research area through scholarly communica...
OPERAS: open access in the european research area through scholarly communica...
 
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
 
The British Library Digital Research Centre
The British Library Digital Research CentreThe British Library Digital Research Centre
The British Library Digital Research Centre
 
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
 
Open Public Procurement: Research meets Research meets Policy
Open Public  Procurement:  Research meets  Research meets  PolicyOpen Public  Procurement:  Research meets  Research meets  Policy
Open Public Procurement: Research meets Research meets Policy
 
Presentación de Okfn-Spain
Presentación de Okfn-SpainPresentación de Okfn-Spain
Presentación de Okfn-Spain
 
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
 
Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015
 
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
 

Similaire à Living with Machines at The Past, Present and Future of Digital Scholarship with Newspaper Collections

TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16
meghaninmotion
 
James baker bronte 11.10pptx
James baker bronte 11.10pptxJames baker bronte 11.10pptx
James baker bronte 11.10pptx
SoniaJones
 

Similaire à Living with Machines at The Past, Present and Future of Digital Scholarship with Newspaper Collections (20)

Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
Operationalising AI at a national library
Operationalising AI at a national libraryOperationalising AI at a national library
Operationalising AI at a national library
 
Living with Machines year two update
Living with Machines year two updateLiving with Machines year two update
Living with Machines year two update
 
Living with Machines: one year in
Living with Machines: one year inLiving with Machines: one year in
Living with Machines: one year in
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
 
AHRC Digital Transformations theme: the Story So Far
AHRC Digital Transformations theme: the Story So FarAHRC Digital Transformations theme: the Story So Far
AHRC Digital Transformations theme: the Story So Far
 
Cross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projectsCross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projects
 
MA in Digital Humanities
MA in Digital Humanities MA in Digital Humanities
MA in Digital Humanities
 
Leaders and partners: strategic positioning for transformative services - Wen...
Leaders and partners: strategic positioning for transformative services - Wen...Leaders and partners: strategic positioning for transformative services - Wen...
Leaders and partners: strategic positioning for transformative services - Wen...
 
20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]
 
Cs global 280114
Cs global 280114Cs global 280114
Cs global 280114
 
TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16
 
James baker bronte 11.10pptx
James baker bronte 11.10pptxJames baker bronte 11.10pptx
James baker bronte 11.10pptx
 
Presentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of SciencesPresentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of Sciences
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
 
101 This is Digital Scholarship Staff Training
101 This is Digital Scholarship Staff Training101 This is Digital Scholarship Staff Training
101 This is Digital Scholarship Staff Training
 
Dh2016 dstp
Dh2016 dstpDh2016 dstp
Dh2016 dstp
 
VTDNP at the Massachusetts Library Association Conference
VTDNP at the Massachusetts Library Association ConferenceVTDNP at the Massachusetts Library Association Conference
VTDNP at the Massachusetts Library Association Conference
 
The art of work in the age of ??? reproduction
The art of work in the age of ??? reproductionThe art of work in the age of ??? reproduction
The art of work in the age of ??? reproduction
 

Plus de Mia

Crowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directionsCrowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directions
Mia
 
Historical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projectsHistorical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projects
Mia
 

Plus de Mia (20)

Festival of Maintenance talk: Apps, microsites and collections online: innova...
Festival of Maintenance talk: Apps, microsites and collections online: innova...Festival of Maintenance talk: Apps, microsites and collections online: innova...
Festival of Maintenance talk: Apps, microsites and collections online: innova...
 
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
 
Enabling digital scholarship through staff training: the British Library's ex...
Enabling digital scholarship through staff training: the British Library's ex...Enabling digital scholarship through staff training: the British Library's ex...
Enabling digital scholarship through staff training: the British Library's ex...
 
A modest proposal: crowdsourcing in cultural heritage benefits us all.
A modest proposal: crowdsourcing in cultural heritage benefits us all.A modest proposal: crowdsourcing in cultural heritage benefits us all.
A modest proposal: crowdsourcing in cultural heritage benefits us all.
 
Crowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directionsCrowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directions
 
Crowdsourcing 'In the Spotlight' at the British Library
Crowdsourcing 'In the Spotlight' at the British LibraryCrowdsourcing 'In the Spotlight' at the British Library
Crowdsourcing 'In the Spotlight' at the British Library
 
Crowdsourcing: the British Library experience
Crowdsourcing: the British Library experienceCrowdsourcing: the British Library experience
Crowdsourcing: the British Library experience
 
Chair's welcome, MCG's Museums+Tech 2017
Chair's welcome, MCG's Museums+Tech 2017Chair's welcome, MCG's Museums+Tech 2017
Chair's welcome, MCG's Museums+Tech 2017
 
Historical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projectsHistorical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projects
 
Connected heritage: How should Cultural Institutions Open and Connect Data?
Connected heritage: How should Cultural Institutions Open and Connect Data?Connected heritage: How should Cultural Institutions Open and Connect Data?
Connected heritage: How should Cultural Institutions Open and Connect Data?
 
Wish upon a star: making crowdsourcing in cultural heritage a reality
Wish upon a star: making crowdsourcing in cultural heritage a realityWish upon a star: making crowdsourcing in cultural heritage a reality
Wish upon a star: making crowdsourcing in cultural heritage a reality
 
Doing Digital Research @ British Library
Doing Digital Research @ British LibraryDoing Digital Research @ British Library
Doing Digital Research @ British Library
 
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data VisualisationBeyond the Black Box: Data Visualisation
Beyond the Black Box: Data Visualisation
 
Introduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsIntroduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDs
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)
 
Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer
 
Why do we digitise? 20 reasons in 20 pictures
Why do we digitise? 20 reasons in 20 picturesWhy do we digitise? 20 reasons in 20 pictures
Why do we digitise? 20 reasons in 20 pictures
 
Reaching out: museums, crowdsourcing and participatory heritage
Reaching out: museums, crowdsourcing and participatory heritageReaching out: museums, crowdsourcing and participatory heritage
Reaching out: museums, crowdsourcing and participatory heritage
 
Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...
 
Network visualisations and the ‘so what?’ problem
Network visualisations and the ‘so what?’ problemNetwork visualisations and the ‘so what?’ problem
Network visualisations and the ‘so what?’ problem
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Living with Machines at The Past, Present and Future of Digital Scholarship with Newspaper Collections

  • 1. The Past, Present and Future of Digital Scholarship with Newspaper Collections DH2019, Utrecht, July 2019
  • 2. The Past, Present and Future of Digital Scholarship with Newspaper Collections • Short Project Presentations: • Living with Machines • impresso - Media Monitoring of the Past • Construire avec les usagers la numérisation des collections de périodiques (NewsEye) • Overview Papers • Digital Editions of Serials and media historians: an overview • Towards a Critical Framework for Digital Newspaper Scholarship • Q&A
  • 3. Our Partners Our Funders Living with Machines Dr Mia Ridge, British Library, Co-Investigator Paper authors/project team: Mia Ridge, Giovanni Colavizza, with Ruth Ahnert, Claire Austin, David Beavan, Kaspar Beelens, Mariona Coll Ardanuy, Adam Farquhar, Emma Griffin, James Hetherington, Jon Lawrence, Katie McDonough, Barbara McGillivray, André Piza, Daniel van Strien, Giorgia Tolfo, Alan Wilson, Daniel Wilson.
  • 4. Project vision • We aim to facilitate new historical findings about the impact of technology on the lives of ordinary people during the Industrial Revolution / long nineteenth century (c. 1780 – 1918) Or • Applying new methods to questions about the past to explore the future of collaboration between data science, history and digital humanities Or • Challenging library professionals, data scientists and historians to ‘radically collaborate’ and learn from and with each other
  • 5. Why newspapers? • Large digitised corpus available if requested • Opportunity to tackle the challenges of working at scale: operational, methodological, organisational • Suitable for developing innovative computational models, tools, code, data and infrastructure reusable by other scholars and research projects
  • 6. The British Newspaper Archive • Nearly 33 million newspaper pages • Site by Findmypast Limited in commercial partnership with the British Library • BL Labs previously facilitated access for researchers to JISC- funded digitised newspapers
  • 7. British Library newspapers and periodicals • British Library has 60m issues (450 million pages, 34,000 titles) from 17thC to today • Majority UK/Irish (Legal Deposit from 1869), but also overseas esp. USA, India, Africa • New digitisation through ‘Heritage Made Digital’ and Living with Machines projects • 6.8% digitised (July 2019)
  • 9. Courtesy Yann Ryan @lievesofgrass and @BL_MadeDigital
  • 10. Copyright ‘safe date’ discussions are on-going and... complicated
  • 11. Our early work with newspapers Research questions tackled across various Labs include: • How bad is the OCR, really? And what effect does that have on computational linguistic and nominal linkage methods? • Can digitising newspaper directories help us understand the difference in political and religious affiliations (etc.) between the overall potential corpus and what’s currently been digitised? • Can we use crowdsourcing tasks to reliably gather information about industrial accidents? Can we then use the results to train machine learning tools to find accidents at scale?
  • 12. Ongoing questions • To what extent does ‘convenience’ in digitisation and the quest for geographical coverage affect scholarship? • Copyright dates, short vs long runs, microfilm vs hard copy • How do we show the impact of OCR quality on both keyword searches and data processing at scale? • What kinds of derived datasets would be useful to researchers? • Planning for legacy: how do we integrate entity recognition etc. results into discovery systems? How do we ensure interoperability? • We can share public domain but not potentially copyrighted pages – what effect does that have on user experience? • How do we reconcile different ideas about ‘outputs’?
  • 13. Thank you! Living with Machines @LivingWMachines Sneak preview and newsletter signup: http://livingwithmachines.ac.uk/
  • 14. The Past, Present and Future of Digital Scholarship with Newspaper Collections • Short Project Presentations: • Living with Machines • impresso - Media Monitoring of the Past • Construire avec les usagers la numérisation des collections de périodiques (NewsEye) • Overview Papers • Digital Editions of Serials and media historians: an overview • Towards a Critical Framework for Digital Newspaper Scholarship • Q&A
  • 15. Dividing the work into ‘Labs’ • Sources - showing the biases in the collection and processing of sources • Language - combining approaches from computational linguistics to corpora including newspapers and novels • Space and time - combining census data and event-based records to understand urban change with spatial and temporal analyses • Communities - a meta lab, amplifying results and engaging the public in meaningful crowdsourcing that contributes to the project's research • 3I (Integration, infrastructure and interfaces) - connects the IT infrastructure with work done in the other labs and vice-versa, thinking about computational processes and integration of data science. • Data acquisition and wrangling – managing practical aspects of data ingest including rights and data management

Notes de l'éditeur

  1. 3 half hour sections
  2. There are a few different ways to think about the goals of the project.
  3. Conveniently already had lots digitised; allowed us to tackle questions of scale and truly break new ground (‘new’ allowing for all the other pojrects!)
  4. Many names of researchers will be familiar to DH audiences
  5. Our dates are different than FMP, which have different relationships with newspaper publishers and can work to a later date
  6. Will we be able to link people, places etc. to identifiers at scale?
  7. 3 half hour sections