A lecture given at the Moore Institute at the National University of Ireland Galway. It lays out the case for archiving the web as a source for future scholarly enquiry; examines the state of play of web archiving in Ireland; outlines the broad use cases for the archived web; and presents results from research into creationism on the web in the UK and in Ireland.
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Prospects and pitfalls in using web archives for research
1. A new class of primary source?
Prospects and pitfalls in using
web archives for research
Dr Peter Webster
Webster Research and Consulting
@pj_webster
6. The web its own archive?
Open UK Web Archive 2004-13 comparison.
@anjacks0n http://britishlibrary.typepad.co.uk/webarchive/2014/10/what-is-still-on-
the-web-after-10-years-of-archiving-.html
10. Reasons to care about web
archiving
• education and research
• enforcement of the law
• public accountability
11. Three archives for the UK
Temporal scope Content scope Access
Open UKWA 2004-present Selective
(14.7k)
Online
Legal Deposit
UKWA
2013-present Comprehensive
(for UK)
Onsite
JISC UK
Domain Dataset
1996-2013 Comprehensive
(for .uk)
Index only
12. JISC UK Web Domain Dataset
(1996-2013)
• copy of Internet Archive holdings for .uk
• bought by JISC, held by British Library
• 60TB of data
• no direct access to content
• prototype search at webarchive.org.uk/shine
• derived datasets in public domain
13. Web archives for NI and RoI
Temporal scope Content scope Access
NLI Web
Archive
2011-present Selective (542) Online
PRONI Web
Archive
2010-present Selective (115) Online
Legal Deposit
UKWA
2013-present Comprehensive
(for UK!)
Onsite (TCD)
14. Ways to use the archived web
• URL search -> single page
• Full-text search -> single page
• Visualisation -> trend -> page
18. Ways to use the archived web
• URL search -> single page
• Full-text search -> single page
• Visualisation -> trend -> page
• Direct access to WARC
• Derived datasets
• API access
19. Derived datasets from the BL
From JISC UK Web Domain Dataset (1996-
2010)
• File format profile
• Geo-index
• Crawled URL Index (CDX)
• Host Link Graph
Public domain at data.webarchive.org.uk
20. Creationism ?
• non-evolutionary account of human
origins
• modern
• a long history
• a feature of some parts of evangelicalism
• (anti-evolutionism, Intelligent Design)
21. The creationist web :
three questions
A justified conspiracy theory about
marginalisation of creationist voices?
A real danger or a moral panic (Truth in
Science) ?
The web as friend of the marginalised
opinion?
http://peterwebster.me/2014/11/18/reading-creationism-in-the-web-archive/
23. Approach
• selection of key UK creationist sites
• extraction of all unique inbound referring
hosts for 1996-2010
• inspection and classification
24. Caveats on method
• partial nature of the dataset
• benchmarking of absolute numbers
• selective sample
• what does a link mean, anyway ?
• not looking at number of linking resources
per host
25. Truth in Science: how
significant?
• only 46 unique inbound hosts
• … of which many were other creationists
or secularist sites
• two churches, one school
• fewer in 2010 than 2007
26.
27. Conclusions
• a utopian dream unfulfilled
• a genuine moral panic
• a justified conspiracy theory
28. Next steps (1)
1. NI the 'creationism capital of Europe'?
(Analysis of:
• links from GB organisations to NI
creationists
• links from NI to RoW)
2. What about creationism in .ie ?
29. Next steps (2)
Project: EU National Web Spheres
• part of resaw.eu
• investigating the nature of a national web
domain
• .. including the interlinking between them
• case study I: Anglican & Presbyterian
churches in Ireland, north and south
30. Web Archives for Historians
@HistWebArchives , http://webarchivehistorians.org/