SlideShare a Scribd company logo
1 of 24
Download to read offline
Ancient History of the UK Web
With support by and thanks to Ning Wang and Adham Tamer
Josh Cowls, Scott A. Hale, Helen Margetts,
Eric T. Meyer, Ralph Schroeder, Taha Yasseri
Past Web Archive Activities at OII
• 2008-2009. JISC/NEH Transatlantic Digitisation Collaboration: World Wide Web of
Humanities (Jisc & NEH funded)
– OII, Internet Archive, Hanzo Archives
– Meyer, E.T., Carpenter, K., Middleton, M. (2009). World Wide Web of Humanities: Final
Report to JISC. Online:
http://www.jisc.ac.uk/media/documents/programmes/digitisation/humanitiesfinalrepor
t.pdf
• 2010. Researcher Engagement with Web Archives (Jisc funded)
– OII, VKS
– Dougherty, M., Meyer, E.T., Madsen, C., van den Heuvel, C., Thomas, A., Wyatt, S. (2010).
Researcher Engagement with Web Archives: State of the Art. London: JISC. Online:
http://ssrn.com/abstract=1714997 and http://ie-repository.jisc.ac.uk/544/
– Thomas, A., Meyer, E.T., Dougherty, M., van den Heuvel, C., Madsen, C., Wyatt, S. (2010).
Researcher Engagement with Web Archives: Challenges and Opportunities for
Investment. London: JISC. Online: http://ssrn.com/abstract=1715000 and http://ie-
repository.jisc.ac.uk/543/
– Dougherty, M., Meyer, E.T. (2014). Community, Tools, and Practices in Web Archiving:
The state of the art in relation to social science and humanities research needs. Journal
of the American Society of Information Science & Technology.
http://onlinelibrary.wiley.com/doi/10.1002/asi.23099/abstract
• 2011. Using Web Archives: A Futures Perspective (IIPC funded)
– OII
– Meyer, E.T., Thomas, A.J., Schroeder, R. (2011). Web Archives: The Future(s). London:
IIPC. Online: http://ssrn.com/abstract=1830025
Recent Web Archive Activities at OII
• 2013-2015: Jisc Big Data project (Jisc funded)
– OII, British Library
– Prepare and release hyperlink corpus
• 2014-2015: Big UK Domain Data for the Arts and Humanities (AHRC
funded)
– IHR, OII, British Library
– Supporting researchers in Arts & Humanities to use web archive data
– Producing edited book of empirical studies concerning the history of
the UK web
• First paper from these combined projects
– Hale, S.A., Yasseri, T., Cowls, J., Meyer, E.T., Schroeder, R., Margetts, H.
(2014, July). Mapping the UK webspace: Fifteen years of British
universities on the web. ACM WebSci’14, Bloomington, Indiana.
http://papers.ssrn.com/abstract=2435481 or
http://arxiv.org/abs/1405.2856
Big Data:
Demonstrating the Value of the UK Web Domain Dataset
for Social Science Research
This project aims to enhance JISC's UK Web
Domain archive, a 30 TB archive of the .uk
country-code top level domain collected from
1996 to 2010. It will extract link graphs from the
data and disseminate social science research
using the collection.
February 2012 - February 2014
Taming a mammoth:
Web Archive Dataset Preparation
30 TB compressed data
6.2TB metadata and links
2.5 TB temporal links
30 TB compressed data in (w)arc format
– Approx. 4.5 million files
– Mix of binary and plain text payloads along
with header data
– Two formats: old arc and newer warc
Housed at the BL, access restrictions
WARC/1.0
WARC-Type: response
WARC-Target-URI: http://hits.guardian.co.uk/b/ss/guardiangu-blogs,guardiangu-news,guardiangu-
network/1/H.22.2/56938?ns=guardian&pageName=Prisoner+of+war+camps+in+the+UK+mapped+and+listed.+Download+the+d
ata%3AGraphic%3A1476560&ch=News&c3=GU.co.uk&c4=History+%28Books+genre%29%2CBooks%2CSecond+world+war+
%28News%29%2CGermany%2CUK+news%2CTechnology&c5=Not+commercially+useful%2CCorporate+IT&c6=Simon+Roger
s&c7=10-Nov-
08&c8=1476560&c9=Graphic&c10=Blogpost&c11=News&c13=&c25=Datablog&c30=content&h2=GU%2FNews%2Fblog%2FDa
tablog&c2=GUID:(none)
WARC-Date: 2010-12-05T02:58:00Z
WARC-Payload-Digest: sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
WARC-IP-Address: 66.235.138.18
WARC-Record-ID: <urn:uuid:7d5ce147-9b4b-46cb-8975-ee93b4d0dda8>
Content-Type: application/http; msgtype=response
Content-Length: 740
HTTP/1.1 302 Found
Date: Sun, 05 Dec 2010 02:58:00 GMT
Server: Omniture DC/2.0.0
X-C: ms-4.3.1
Expires: Sat, 04 Dec 2010 02:58:00 GMT
Last-Modified: Mon, 06 Dec 2010 02:58:00 GMT
Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform, private
Pragma: no-cache
ETag: "4CFAFFB8-0E4C-7443902F"
Vary: *
P3P: policyref="/w3c/p3p.xml", CP="NOI DSP COR NID PSA OUR IND COM NAV STA"
Location: http://b.scorecardresearch.com/r?c2=6035250&d.c=gif&d.o=guardiangu-
network&d.x=243551159&d.t=page&d.u=http%3A%2F%2Fwww.guardian.co.uk%2Fnews%2Fdatablog%2F2010%2Fnov%2F08
%2Fprisoner-of-war-camps-uk
xserver: www422
Content-Length: 0
Keep-Alive: timeout=15
Connection: close
Content-Type: text/plain
Extract meta-data and links (wat format)
– Approx. 4.5 million files
– 6.2TB on disk compressed
– Housed at OII
– Structured JSON
– Different formats for arc/warcs
{
"Container": {
"Filename": "DOTUK-HISTORICAL-1996-2010-GROUP-AA-XAAAAA-20110428000000-
00000.arc.gz",
"Offset": "88937",
"Compressed": true,
"Gzip-Metadata": {
"Header-Length": "10",
"Inflated-CRC": "-1223265901",
"Inflated-Length": "26073",
"Deflate-Length": "4463",
"Footer-Length": "8"
}
},
"Envelope": {
"ARC-Header-Length": "102",
"ARC-Header-Metadata": {
"Date": "20080509081524",
"Target-URI": "http://www.ukhomeinteriors.co.uk/content/ext_corbels.php",
"Content-Length": "25970",
"Content-Type": "text/html",
"IP-Address": "83.223.106.10"
},
"Payload-Metadata": {
"Actual-Content-Type": "application/http; msgtype=response",
"Block-Digest": "sha1:MCCZNOKBJHTZ5MMMCUJGBPE25C2TVUWF",
"HTTP-Response-Metadata": {
"Headers-Length": "591",
"HTML-Metadata": {
"Head": {
"Title": "Exterior Corbels",
Plain text lists
Build own ad-hawk Hadoop cluster, fix
incompatibilities, divide into smaller batches
– Build plain text lists of pages and hyperlinks
– Remove error page (e.g., 404 Not Found)
– Remove pages not in .uk
– Standardize dates (many formats)
– Standardize hyperlinks (trailing /, etc.)
– Fix/remove tons of invalid hyperlinks (whitespace,
invalid characters, etc.)
Load results into Apache Hive (2.5 TB)
Source Destination Time
LinkText
http://octopus.well.ox.ac.uk:80/
http://octopus.well.ox.ac.uk:80/links.html
1032758438
Links
http://octopus.well.ox.ac.uk:80/
http://octopus.well.ox.ac.uk:80/projects.html
1001793436
Projects
http://octopus.well.ox.ac.uk:80/computing.sht
ml
http://debian.org/
1075794060
Debian/GNU
Overall Statistics
Third-level-
domains:
e.g.
ox.ac.uk
Relative size of second-level-domains
Number of links within SLD per node
Cross-domain links (2010)
Absolute Normalized to target size
Case of ac.uk
Mapping the UK Webspace:
Fifteen Years of British Universities on the Web
Hale et al., WebSci'14, available: http://arxiv.org/abs/1405.2856
121 UK universities
websites and links
1) League table ranking
2) Group affiliation
3) Geographical location
Group Affiliations
League table ranking
Geography
Colour ~ intensity
Gravity Law σ𝑖𝑗 =
𝑠𝑖𝑗
𝑠𝑖
𝑜𝑢𝑡
𝑠𝑗
𝑖𝑛
𝑠𝑖𝑗 =
𝑠𝑖
𝑜𝑢𝑡
𝑠𝑗
𝑖𝑛
𝑟0.28
Big UK Domain Data for the Arts and
Humanities
Primary aim: developing a methodological and
theoretical framework within which to study over 15
years of UK domain data – with lessons for the
future study of web archives more generally
Big UK Domain Data for the Arts and
Humanities
The dataset:
– Crawled from 1996 – 2013
– Approximately 65 TB, billions of words
– Building interface to allow search by retrieval
date, target domain of links, sentiment
– Allow qualitative and quantitative analysis – and
iteration between multiple research techniques
Big UK Domain Data for the Arts and
Humanities
Key outputs:
– Ten bursary projects using web archive data to
investigate a broad range of topics, for example…
• Armed services recruitment online
• The accessibility of the web for disabled users
• Online discussions of ‘Beat’ poetry
– An edited book of empirical studies concerning the
history of the UK web, featuring chapters on, for
example…
• Constitutional and institutional change in UK government
• The BBC’s online presence
• The ‘web of faith’ online
Next
● Studies underway at OII, BL, IHR
● Book and articles
– Study overall growth of .uk
– Case study of .gov.uk
– Study of media and select committee
visibility
● Releasing data open source

More Related Content

What's hot

Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
Open Access in Architectural Research
Open Access in Architectural ResearchOpen Access in Architectural Research
Open Access in Architectural ResearchAlastair Dunning
 
Digital Cultural Heritage and Open Education
Digital Cultural Heritage and Open EducationDigital Cultural Heritage and Open Education
Digital Cultural Heritage and Open EducationLorna Campbell
 
Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab Jisc
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsJon Voss
 
Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldaelang
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Bernhard Haslhofer
 
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomCorpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomStella Wisdom
 
Developing Open Access Content into Academic English Resources for Data-Drive...
Developing Open Access Content into Academic English Resources for Data-Drive...Developing Open Access Content into Academic English Resources for Data-Drive...
Developing Open Access Content into Academic English Resources for Data-Drive...Alannah Fitzgerald
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Disrupting the transactional library model: the challenges and opportunities ...
Disrupting the transactional library model: the challenges and opportunities ...Disrupting the transactional library model: the challenges and opportunities ...
Disrupting the transactional library model: the challenges and opportunities ...Jisc
 
Digital Libraries: Local and Global
Digital Libraries: Local and GlobalDigital Libraries: Local and Global
Digital Libraries: Local and GlobalAlastair Dunning
 
Launch of Welsh Newspapers Online
Launch of Welsh Newspapers OnlineLaunch of Welsh Newspapers Online
Launch of Welsh Newspapers OnlineAlastair Dunning
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
資訊素養工作坊PowerPoint
資訊素養工作坊PowerPoint資訊素養工作坊PowerPoint
資訊素養工作坊PowerPointkaikwong
 
Future Directions of the European Library
Future Directions of the European LibraryFuture Directions of the European Library
Future Directions of the European LibraryAlastair Dunning
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryNora McGregor
 

What's hot (20)

Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
Open Access in Architectural Research
Open Access in Architectural ResearchOpen Access in Architectural Research
Open Access in Architectural Research
 
Digital Cultural Heritage and Open Education
Digital Cultural Heritage and Open EducationDigital Cultural Heritage and Open Education
Digital Cultural Heritage and Open Education
 
Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab Reports from the UKMHL and Historical Texts live lab
Reports from the UKMHL and Historical Texts live lab
 
3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise3e Studiedag Webarchivering - Promise
3e Studiedag Webarchivering - Promise
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Open.Ed
Open.EdOpen.Ed
Open.Ed
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & Museums
 
Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the field
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
 
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella WisdomCorpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
Corpus Protocols IFLA Geneva August 2014 by Neil Smyth and Stella Wisdom
 
Developing Open Access Content into Academic English Resources for Data-Drive...
Developing Open Access Content into Academic English Resources for Data-Drive...Developing Open Access Content into Academic English Resources for Data-Drive...
Developing Open Access Content into Academic English Resources for Data-Drive...
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Disrupting the transactional library model: the challenges and opportunities ...
Disrupting the transactional library model: the challenges and opportunities ...Disrupting the transactional library model: the challenges and opportunities ...
Disrupting the transactional library model: the challenges and opportunities ...
 
Digital Libraries: Local and Global
Digital Libraries: Local and GlobalDigital Libraries: Local and Global
Digital Libraries: Local and Global
 
Launch of Welsh Newspapers Online
Launch of Welsh Newspapers OnlineLaunch of Welsh Newspapers Online
Launch of Welsh Newspapers Online
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
資訊素養工作坊PowerPoint
資訊素養工作坊PowerPoint資訊素養工作坊PowerPoint
資訊素養工作坊PowerPoint
 
Future Directions of the European Library
Future Directions of the European LibraryFuture Directions of the European Library
Future Directions of the European Library
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 

Viewers also liked

The uk today
The uk todayThe uk today
The uk todaynehria01
 
History of uk music press
History of uk music pressHistory of uk music press
History of uk music pressbethandilley
 
Culture in United Kingdom & Ireland
Culture in United Kingdom & IrelandCulture in United Kingdom & Ireland
Culture in United Kingdom & IrelandMuhammad Shahzaib
 
The royal family of great britain
The royal family of great britainThe royal family of great britain
The royal family of great britainsimbirinka
 
Education In The Uk
Education In The UkEducation In The Uk
Education In The Ukenglishbites
 
The uk education system
The uk education systemThe uk education system
The uk education systemsigugi
 
Educational System in UK
Educational System in UKEducational System in UK
Educational System in UKKadelle Pidor
 

Viewers also liked (8)

The uk today
The uk todayThe uk today
The uk today
 
History of uk music press
History of uk music pressHistory of uk music press
History of uk music press
 
Culture in United Kingdom & Ireland
Culture in United Kingdom & IrelandCulture in United Kingdom & Ireland
Culture in United Kingdom & Ireland
 
The royal family of great britain
The royal family of great britainThe royal family of great britain
The royal family of great britain
 
Education in the uk
Education in the ukEducation in the uk
Education in the uk
 
Education In The Uk
Education In The UkEducation In The Uk
Education In The Uk
 
The uk education system
The uk education systemThe uk education system
The uk education system
 
Educational System in UK
Educational System in UKEducational System in UK
Educational System in UK
 

Similar to Ancient History of the UK Web

A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...EDINA, University of Edinburgh
 
World Archaeology Congress paper
World Archaeology Congress paperWorld Archaeology Congress paper
World Archaeology Congress paperdejp3
 
Data Science at the ATI and BL Web Archiving
Data Science at the ATI and BL Web ArchivingData Science at the ATI and BL Web Archiving
Data Science at the ATI and BL Web Archivinglabsbl
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three dri_ireland
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
eResearch-Oz-Bellamy
eResearch-Oz-BellamyeResearch-Oz-Bellamy
eResearch-Oz-BellamyCraig Bellamy
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Javier Pereda
 
EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...
EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...
EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...Keith.May
 
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...TimelessFuture
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project
 
Introduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesIntroduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesEDINA, University of Edinburgh
 
Ensuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published HeritageEnsuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published HeritageEDINA, University of Edinburgh
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryMia
 
Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesXiaogang (Marshall) Ma
 

Similar to Ancient History of the UK Web (20)

A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
 
World Archaeology Congress paper
World Archaeology Congress paperWorld Archaeology Congress paper
World Archaeology Congress paper
 
Data Science at the ATI and BL Web Archiving
Data Science at the ATI and BL Web ArchivingData Science at the ATI and BL Web Archiving
Data Science at the ATI and BL Web Archiving
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
eResearch-Oz-Bellamy
eResearch-Oz-BellamyeResearch-Oz-Bellamy
eResearch-Oz-Bellamy
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
 
EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...
EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...
EAA 2017 Re-engineering the process: How best to share, connect, re-use & pro...
 
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...Chaos&Order: Using visualization as a means to
 explore large heritage collec...
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
Introduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesIntroduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data services
 
NECTAR_VRE1
NECTAR_VRE1NECTAR_VRE1
NECTAR_VRE1
 
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
 
Ensuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published HeritageEnsuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published Heritage
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
Exploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental SciencesExploring the Web of Data for Earth and Environmental Sciences
Exploring the Web of Data for Earth and Environmental Sciences
 
bellamy_budapest
bellamy_budapestbellamy_budapest
bellamy_budapest
 
British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011
 

More from Scott A. Hale

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching MisinformationScott A. Hale
 
Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Scott A. Hale
 
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...Scott A. Hale
 
Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Scott A. Hale
 
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...Scott A. Hale
 
Interactive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationInteractive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationScott A. Hale
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolScott A. Hale
 
Multilinguals and Wikipedia Editing
Multilinguals and Wikipedia EditingMultilinguals and Wikipedia Editing
Multilinguals and Wikipedia EditingScott A. Hale
 
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebMapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebScott A. Hale
 
Design and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and WikipediaDesign and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and WikipediaScott A. Hale
 
Global connectivity and multilinguals in the Twitter network (slides)
Global connectivity and multilinguals in the Twitter network (slides)Global connectivity and multilinguals in the Twitter network (slides)
Global connectivity and multilinguals in the Twitter network (slides)Scott A. Hale
 
ECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentScott A. Hale
 

More from Scott A. Hale (12)

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching Misinformation
 
Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...
 
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...No Master Algorithm: Human-machine intelligence and the real-world needs of f...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
 
Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)Foreign-language Reviews: Help or Hindrance? (Slides)
Foreign-language Reviews: Help or Hindrance? (Slides)
 
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
 
Interactive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and disseminationInteractive Visualizations for teaching, research, and dissemination
Interactive Visualizations for teaching, research, and dissemination
 
Oxford Digital Humanities Summer School
Oxford Digital Humanities Summer SchoolOxford Digital Humanities Summer School
Oxford Digital Humanities Summer School
 
Multilinguals and Wikipedia Editing
Multilinguals and Wikipedia EditingMultilinguals and Wikipedia Editing
Multilinguals and Wikipedia Editing
 
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the WebMapping the UK Webspace: Fifteen Years of British Universities on the Web
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
 
Design and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and WikipediaDesign and Multilingual Users on Twitter and Wikipedia
Design and Multilingual Users on Twitter and Wikipedia
 
Global connectivity and multilinguals in the Twitter network (slides)
Global connectivity and multilinguals in the Twitter network (slides)Global connectivity and multilinguals in the Twitter network (slides)
Global connectivity and multilinguals in the Twitter network (slides)
 
ECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers ExperimentECPR 2011 Leaders and Followers Experiment
ECPR 2011 Leaders and Followers Experiment
 

Recently uploaded

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Ancient History of the UK Web

  • 1. Ancient History of the UK Web With support by and thanks to Ning Wang and Adham Tamer Josh Cowls, Scott A. Hale, Helen Margetts, Eric T. Meyer, Ralph Schroeder, Taha Yasseri
  • 2. Past Web Archive Activities at OII • 2008-2009. JISC/NEH Transatlantic Digitisation Collaboration: World Wide Web of Humanities (Jisc & NEH funded) – OII, Internet Archive, Hanzo Archives – Meyer, E.T., Carpenter, K., Middleton, M. (2009). World Wide Web of Humanities: Final Report to JISC. Online: http://www.jisc.ac.uk/media/documents/programmes/digitisation/humanitiesfinalrepor t.pdf • 2010. Researcher Engagement with Web Archives (Jisc funded) – OII, VKS – Dougherty, M., Meyer, E.T., Madsen, C., van den Heuvel, C., Thomas, A., Wyatt, S. (2010). Researcher Engagement with Web Archives: State of the Art. London: JISC. Online: http://ssrn.com/abstract=1714997 and http://ie-repository.jisc.ac.uk/544/ – Thomas, A., Meyer, E.T., Dougherty, M., van den Heuvel, C., Madsen, C., Wyatt, S. (2010). Researcher Engagement with Web Archives: Challenges and Opportunities for Investment. London: JISC. Online: http://ssrn.com/abstract=1715000 and http://ie- repository.jisc.ac.uk/543/ – Dougherty, M., Meyer, E.T. (2014). Community, Tools, and Practices in Web Archiving: The state of the art in relation to social science and humanities research needs. Journal of the American Society of Information Science & Technology. http://onlinelibrary.wiley.com/doi/10.1002/asi.23099/abstract • 2011. Using Web Archives: A Futures Perspective (IIPC funded) – OII – Meyer, E.T., Thomas, A.J., Schroeder, R. (2011). Web Archives: The Future(s). London: IIPC. Online: http://ssrn.com/abstract=1830025
  • 3. Recent Web Archive Activities at OII • 2013-2015: Jisc Big Data project (Jisc funded) – OII, British Library – Prepare and release hyperlink corpus • 2014-2015: Big UK Domain Data for the Arts and Humanities (AHRC funded) – IHR, OII, British Library – Supporting researchers in Arts & Humanities to use web archive data – Producing edited book of empirical studies concerning the history of the UK web • First paper from these combined projects – Hale, S.A., Yasseri, T., Cowls, J., Meyer, E.T., Schroeder, R., Margetts, H. (2014, July). Mapping the UK webspace: Fifteen years of British universities on the web. ACM WebSci’14, Bloomington, Indiana. http://papers.ssrn.com/abstract=2435481 or http://arxiv.org/abs/1405.2856
  • 4. Big Data: Demonstrating the Value of the UK Web Domain Dataset for Social Science Research This project aims to enhance JISC's UK Web Domain archive, a 30 TB archive of the .uk country-code top level domain collected from 1996 to 2010. It will extract link graphs from the data and disseminate social science research using the collection. February 2012 - February 2014
  • 5. Taming a mammoth: Web Archive Dataset Preparation 30 TB compressed data 6.2TB metadata and links 2.5 TB temporal links
  • 6. 30 TB compressed data in (w)arc format – Approx. 4.5 million files – Mix of binary and plain text payloads along with header data – Two formats: old arc and newer warc Housed at the BL, access restrictions
  • 7. WARC/1.0 WARC-Type: response WARC-Target-URI: http://hits.guardian.co.uk/b/ss/guardiangu-blogs,guardiangu-news,guardiangu- network/1/H.22.2/56938?ns=guardian&pageName=Prisoner+of+war+camps+in+the+UK+mapped+and+listed.+Download+the+d ata%3AGraphic%3A1476560&ch=News&c3=GU.co.uk&c4=History+%28Books+genre%29%2CBooks%2CSecond+world+war+ %28News%29%2CGermany%2CUK+news%2CTechnology&c5=Not+commercially+useful%2CCorporate+IT&c6=Simon+Roger s&c7=10-Nov- 08&c8=1476560&c9=Graphic&c10=Blogpost&c11=News&c13=&c25=Datablog&c30=content&h2=GU%2FNews%2Fblog%2FDa tablog&c2=GUID:(none) WARC-Date: 2010-12-05T02:58:00Z WARC-Payload-Digest: sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ WARC-IP-Address: 66.235.138.18 WARC-Record-ID: <urn:uuid:7d5ce147-9b4b-46cb-8975-ee93b4d0dda8> Content-Type: application/http; msgtype=response Content-Length: 740 HTTP/1.1 302 Found Date: Sun, 05 Dec 2010 02:58:00 GMT Server: Omniture DC/2.0.0 X-C: ms-4.3.1 Expires: Sat, 04 Dec 2010 02:58:00 GMT Last-Modified: Mon, 06 Dec 2010 02:58:00 GMT Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform, private Pragma: no-cache ETag: "4CFAFFB8-0E4C-7443902F" Vary: * P3P: policyref="/w3c/p3p.xml", CP="NOI DSP COR NID PSA OUR IND COM NAV STA" Location: http://b.scorecardresearch.com/r?c2=6035250&d.c=gif&d.o=guardiangu- network&d.x=243551159&d.t=page&d.u=http%3A%2F%2Fwww.guardian.co.uk%2Fnews%2Fdatablog%2F2010%2Fnov%2F08 %2Fprisoner-of-war-camps-uk xserver: www422 Content-Length: 0 Keep-Alive: timeout=15 Connection: close Content-Type: text/plain
  • 8. Extract meta-data and links (wat format) – Approx. 4.5 million files – 6.2TB on disk compressed – Housed at OII – Structured JSON – Different formats for arc/warcs
  • 9. { "Container": { "Filename": "DOTUK-HISTORICAL-1996-2010-GROUP-AA-XAAAAA-20110428000000- 00000.arc.gz", "Offset": "88937", "Compressed": true, "Gzip-Metadata": { "Header-Length": "10", "Inflated-CRC": "-1223265901", "Inflated-Length": "26073", "Deflate-Length": "4463", "Footer-Length": "8" } }, "Envelope": { "ARC-Header-Length": "102", "ARC-Header-Metadata": { "Date": "20080509081524", "Target-URI": "http://www.ukhomeinteriors.co.uk/content/ext_corbels.php", "Content-Length": "25970", "Content-Type": "text/html", "IP-Address": "83.223.106.10" }, "Payload-Metadata": { "Actual-Content-Type": "application/http; msgtype=response", "Block-Digest": "sha1:MCCZNOKBJHTZ5MMMCUJGBPE25C2TVUWF", "HTTP-Response-Metadata": { "Headers-Length": "591", "HTML-Metadata": { "Head": { "Title": "Exterior Corbels",
  • 10. Plain text lists Build own ad-hawk Hadoop cluster, fix incompatibilities, divide into smaller batches – Build plain text lists of pages and hyperlinks – Remove error page (e.g., 404 Not Found) – Remove pages not in .uk – Standardize dates (many formats) – Standardize hyperlinks (trailing /, etc.) – Fix/remove tons of invalid hyperlinks (whitespace, invalid characters, etc.) Load results into Apache Hive (2.5 TB)
  • 13. Relative size of second-level-domains
  • 14. Number of links within SLD per node
  • 15. Cross-domain links (2010) Absolute Normalized to target size
  • 16. Case of ac.uk Mapping the UK Webspace: Fifteen Years of British Universities on the Web Hale et al., WebSci'14, available: http://arxiv.org/abs/1405.2856 121 UK universities websites and links 1) League table ranking 2) Group affiliation 3) Geographical location
  • 20. Gravity Law σ𝑖𝑗 = 𝑠𝑖𝑗 𝑠𝑖 𝑜𝑢𝑡 𝑠𝑗 𝑖𝑛 𝑠𝑖𝑗 = 𝑠𝑖 𝑜𝑢𝑡 𝑠𝑗 𝑖𝑛 𝑟0.28
  • 21. Big UK Domain Data for the Arts and Humanities Primary aim: developing a methodological and theoretical framework within which to study over 15 years of UK domain data – with lessons for the future study of web archives more generally
  • 22. Big UK Domain Data for the Arts and Humanities The dataset: – Crawled from 1996 – 2013 – Approximately 65 TB, billions of words – Building interface to allow search by retrieval date, target domain of links, sentiment – Allow qualitative and quantitative analysis – and iteration between multiple research techniques
  • 23. Big UK Domain Data for the Arts and Humanities Key outputs: – Ten bursary projects using web archive data to investigate a broad range of topics, for example… • Armed services recruitment online • The accessibility of the web for disabled users • Online discussions of ‘Beat’ poetry – An edited book of empirical studies concerning the history of the UK web, featuring chapters on, for example… • Constitutional and institutional change in UK government • The BBC’s online presence • The ‘web of faith’ online
  • 24. Next ● Studies underway at OII, BL, IHR ● Book and articles – Study overall growth of .uk – Case study of .gov.uk – Study of media and select committee visibility ● Releasing data open source