SlideShare a Scribd company logo
1 of 35
Welcome
• Opening Session
• Internet Archives & Research Potential
• Building Community: Research Highlights
– Oxford Internet Institute
– Centre for Internet Studies & NetLab
– LS3 & the ALEXANDRIA Project
– WebScience @ University of Southampton
• Discussion and Challenges
ArchiveHub and Internet Archive Research
1. Large Scale Data
2. Developing New Tools
3. Testing and Building Theory
{AGENDA}
Large Scale Data | Developing New Tools | Testing and Building Theory
5
Opportunity: The Internet Archive contains the largest
single record of the history of the World Wide Web from
1995 to the present—a wealth of untapped research data.
Challenge: There is a significant lack of research-ready
databases and tools available to the scholarly community
Large Scale Data | Developing New Tools | Testing and Building Theory
A sense of scale
The Library of
Congress contains
approximately 3 PB
of dataa
6
ahttp://blogs.loc.gov/digitalpreservation/2012/03/how-many-libraries-of-congress-does-it-take/
The Wayback
Machine contains
more than 410
Billion available web
pages (as of 2014).
The Internet
Archive contains
in excess of 10
PB of archived
cultural material
Library of Congress
Internet Archive
Large Scale Data | Developing New Tools | Testing and Building Theory
7Large Scale Data | Developing New Tools | Testing and Building Theory
8
Opportunity: The ArchiveHub project aims to support the
creation and dissemination of general guidelines & tools for
conducting theoretically and methodologically rigorous
longitudinal research using archival Web data
Large Scale Data | Developing New Tools | Testing and Building Theory
HistoryTracker Tool
9
Version 2.0
20th Century Collection @ RU
PIG Scripts in
Hadoop Environment
RU High-Speed
Computing Cluster
Link Lists & Text Data
Curated Data Sets
Large Scale Data | Developing New Tools | Testing and Building Theory
10
Dataset Research Potential Dates Captures Unique URLs
Hurricane Katrina Online networks and organizational
resilience (Chewning, Lai and Doerfel,
2012; Perry, Taylor and Doerfel, 2003) in
the wake of disasters; information
dissemination
2003 – 2012 1,694,236 663,740
Superstorm
Sandy
2003 – 2012 41,703,112 20,013,455
US Senate Study the growth of political activity in
online environments (Adamic & Glance,
2005; Bruns, 2007; Chang & Park, 2012);
polarization & media discourse
109th – 112th
Congresses
26,965,770 8,674,397
US House 51,840,777 12,410,014
Occupy Wall
Street
Previous research on NGOs in the online
environment (Bach & Stark, 2004;
Shumate, 2003, 2012; Shumate, Fulk, &
Monge, 2005); use of hyperlink data to
study the formation and role of alliances
between SMOs
2010 – 2012 247,928,272 11,3259,655
US Media
Previous studies of news media
organizations (Greer & Mensing, 2006;
Weber, 2012; Weber & Monge, In
Press); focus on evolutionary patterns
2008 – 2012 1,315,132,555 539,184,823
Large Scale Data | Developing New Tools | Testing and Building Theory
What’s in the data?
11
Source | Destination | Date | Frequency | Content Type | Bytes | Descriptive Text
Link Data:
http://gawker.com/5953665/mitt-romneys-
staff-played-the-media-covering-them-in-a-
friendly-game-of-flag-football
Mitt Romney's Staff Played the Media Covering
Them in a Friendly Game of Flag
http://gawker.com
2012-10-22
Large Scale Data | Developing New Tools | Testing and Building Theory
12
http://archivehub.rutgers.ed
u
13Large Scale Data | Developing New Tools | Testing and Building Theory
14
Large Scale Data | Developing New Tools | Testing and Building Theory
PUTTING BIG THEORY INTO BIG DATA
[or]
moving from observing the Web to observing
new phenomenon on the Web
15Large Scale Data | Developing New Tools | Testing and Building Theory
Tracing the Emergence of Organizational Forms
16
Environment:
Organizations compete for scare resources; during rapid periods of
disruption, new entrants seek “protected” niches (Weber & Monge 2014)
Population:
In digital spaces, online connections provide communicative representations of
information flows (Weber & Monge, 2012)
Formation of ties (e.g. hyperlinks) can positively impact long-term likelihood of
organization survival (Weber, 2012)
Organization:
Organizations adapt internally, reconfiguring team structures and
developing new routines for knowledge sharing
(Ellison, Gibbs & Weber, In Press; Weber & Kim, Under Review)
Large Scale Data | Developing New Tools | Testing and Building Theory
17
18
19
20
21
22
Big Data… Big Theory?
• Networks are central to social movements in that links between
nodes can be influential in collective action
• Examples of nodes includes participants, organizations, media and
communications technologies
• Social networks and social movements (Diani, 2003)
• The interaction between actors, and between actors and hashtags,
collectively represent a networked form of organization
• Network form of organization (Powell, 1990)
Large Scale Data | Developing New Tools | Testing and Building Theory
Data
• Triangulation of data insulates against false readings from large-scale data
(see Lazer, Kennedy, King and Vespignani, 2014)
• Internet Archive:
– 335 OWS related websites; ~330 million edges over a 2-year period
• Lexis Nexis:
– Search conducted to assess U.S. newspaper coverage of OWS from the early stages of the
movement in September 2011 through Sept. 2012
– Search OWS keywords, e.g. “Occupy Wall Street,” “Occupy Oakland”
• Twitter
– Gnip PowerTrack
• Search by keywords; captures a larger volume of Twitter data than other options
– Sample includes October 17, 2011, through January 5, 2012. Initial study focused on the
critical two-month period from November 1 through December 31, 2011,
– 750,816 tweets across the two-month period.
25Large Scale Data | Developing New Tools | Testing and Building Theory
Large Scale Data | Developing New Tools | Testing and Building Theory
OWS News Coverage
Large Scale Data | Developing New Tools | Testing and Building Theory
OWS on the Web
• 335 seed organizations based on records from #OccupyResearch
• Data extracted for 2011 & 2012, based on “both matching”
28
0
2
4
6
8
10
12
14
16
18
Millions
Captures per Month
Large Scale Data | Developing New Tools | Testing and Building Theory
Maximal Cores (k Coreness)
29
Aug. 2011
Jan. 2012
Large Scale Data | Developing New Tools | Testing and Building Theory
30
-
10,000.00
20,000.00
30,000.00
40,000.00
50,000.00
60,000.00
70,000.00
80,000.00
Edges
60
80
100
120
140
160
180
Vertices
Large Scale Data | Developing New Tools | Testing and Building Theory
31
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Density
Large Scale Data | Developing New Tools | Testing and Building Theory
32
0
10
20
30
40
50
60
70
80
90
100
Clusters
Large Scale Data | Developing New Tools | Testing and Building Theory
33Large Scale Data | Developing New Tools | Testing and Building Theory
Challenges:
• Access Challenges:
– Scaling access to the data
• Data Challenges:
– Moving from access to researchable data
• Research Challenges:
– Bridging “big data” to “big theory”
– Potential for use as a historical research tool
34Large Scale Data | Developing New Tools | Testing and Building Theory
• Want data?
– Email me! matthew.weber@rutgers.edu
– ArchiveHub: http://archivehub.rutgers.edu
• The Team
– Kris Carpenter, Vinay Goel, Internet Archive
– David Lazer, Katherine Ognyanova, Northeastern University
– Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University
– Peter Monge, Ayushman Datta, Kristen Guth, USC
35Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers

More Related Content

What's hot

Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
An open data story
An open data storyAn open data story
An open data storyProgCity
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Experiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataExperiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataProgCity
 
Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010Juan Sequeda
 
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival RecordsInternet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival Recordsmwe400
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterElena Simperl
 
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...Anna De Liddo
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challengesMichael Hausenblas
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impactElena Simperl
 
The web bang project michele zadra
The web bang project michele zadraThe web bang project michele zadra
The web bang project michele zadraMichele Zadra
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 

What's hot (20)

Data Power
Data PowerData Power
Data Power
 
Political Transformations in Network Societies - the fifth estate
Political Transformations in Network Societies - the fifth estatePolitical Transformations in Network Societies - the fifth estate
Political Transformations in Network Societies - the fifth estate
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
An open data story
An open data storyAn open data story
An open data story
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Homelessness Data Discussion
Homelessness Data DiscussionHomelessness Data Discussion
Homelessness Data Discussion
 
Experiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataExperiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open data
 
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010Open Research Problems in Linked Data - WWW2010
Open Research Problems in Linked Data - WWW2010
 
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival RecordsInternet Archives as a Tool for Research: Decay in Large Scale Archival Records
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The web bang project michele zadra
The web bang project michele zadraThe web bang project michele zadra
The web bang project michele zadra
 
Data and Technological Citizenship: Principled Public Interest Governing
Data and Technological Citizenship: Principled Public Interest GoverningData and Technological Citizenship: Principled Public Interest Governing
Data and Technological Citizenship: Principled Public Interest Governing
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Critical Data Studies in the Academy
Critical Data Studies in the AcademyCritical Data Studies in the Academy
Critical Data Studies in the Academy
 

Viewers also liked

032415 marketing 101 watershed upload
032415 marketing 101   watershed upload032415 marketing 101   watershed upload
032415 marketing 101 watershed uploadmwe400
 
AEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and LinkingAEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and Linkingmwe400
 
AEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and EducationAEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and Educationmwe400
 
Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.mwe400
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashedmwe400
 
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.mwe400
 

Viewers also liked (10)

032415 marketing 101 watershed upload
032415 marketing 101   watershed upload032415 marketing 101   watershed upload
032415 marketing 101 watershed upload
 
AEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and LinkingAEJMC 2014 - Online News and Linking
AEJMC 2014 - Online News and Linking
 
What you always wanted to know about polarity
What you always wanted to know about polarityWhat you always wanted to know about polarity
What you always wanted to know about polarity
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
 
AEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and EducationAEJMC 2014 - Big Data and Education
AEJMC 2014 - Big Data and Education
 
Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.Immutable Technology and the Breakdown of Organizational Change.
Immutable Technology and the Breakdown of Organizational Change.
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
 
Web Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives UnleashedWeb Archives and Data Challenges - Archives Unleashed
Web Archives and Data Challenges - Archives Unleashed
 
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
From Big Data to Big Theory: Lessons Learned from Archival Internet Research.
 
Inspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketersInspire u~massage marketing like the big box marketers
Inspire u~massage marketing like the big box marketers
 

Similar to Wire Workshop: Overview slides for ArchiveHub Project

Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data scienceHan Woo PARK
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital AgeEric Meyer
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
A framework of Web Science
A framework of Web Science A framework of Web Science
A framework of Web Science vafopoulos
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
Big Data and Social Machines
Big Data and Social MachinesBig Data and Social Machines
Big Data and Social MachinesDavid De Roure
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Jisc
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-ResearchDavid De Roure
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...FIA2010
 
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)Lora Aroyo
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultCASRAI
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkResearch Data Alliance
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-ResearchEric Meyer
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...PrattSILS
 
e-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspectivee-Research: A Social Informatics Perspective
e-Research: A Social Informatics PerspectiveEric Meyer
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6Davide Ceolin
 
Data are the new black : Susan Robbins
Data are the new black : Susan RobbinsData are the new black : Susan Robbins
Data are the new black : Susan Robbinstherese nolan-brown
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?Li Ding
 

Similar to Wire Workshop: Overview slides for ArchiveHub Project (20)

Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data science
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital Age
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
A framework of Web Science
A framework of Web Science A framework of Web Science
A framework of Web Science
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
Big Data and Social Machines
Big Data and Social MachinesBig Data and Social Machines
Big Data and Social Machines
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
 
Web Observatories and e-Research
Web Observatories and e-ResearchWeb Observatories and e-Research
Web Observatories and e-Research
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
 
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
Lecture 6: How can we STUDY the (Social) Web? (VU Amsterdam Social Web Course)
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
DREaM Event 2: Louise Cooke
DREaM Event 2: Louise CookeDREaM Event 2: Louise Cooke
DREaM Event 2: Louise Cooke
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
e-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspectivee-Research: A Social Informatics Perspective
e-Research: A Social Informatics Perspective
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6
 
Data are the new black : Susan Robbins
Data are the new black : Susan RobbinsData are the new black : Susan Robbins
Data are the new black : Susan Robbins
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Wire Workshop: Overview slides for ArchiveHub Project

  • 1.
  • 2. Welcome • Opening Session • Internet Archives & Research Potential • Building Community: Research Highlights – Oxford Internet Institute – Centre for Internet Studies & NetLab – LS3 & the ALEXANDRIA Project – WebScience @ University of Southampton • Discussion and Challenges
  • 3. ArchiveHub and Internet Archive Research
  • 4. 1. Large Scale Data 2. Developing New Tools 3. Testing and Building Theory {AGENDA} Large Scale Data | Developing New Tools | Testing and Building Theory
  • 5. 5 Opportunity: The Internet Archive contains the largest single record of the history of the World Wide Web from 1995 to the present—a wealth of untapped research data. Challenge: There is a significant lack of research-ready databases and tools available to the scholarly community Large Scale Data | Developing New Tools | Testing and Building Theory
  • 6. A sense of scale The Library of Congress contains approximately 3 PB of dataa 6 ahttp://blogs.loc.gov/digitalpreservation/2012/03/how-many-libraries-of-congress-does-it-take/ The Wayback Machine contains more than 410 Billion available web pages (as of 2014). The Internet Archive contains in excess of 10 PB of archived cultural material Library of Congress Internet Archive Large Scale Data | Developing New Tools | Testing and Building Theory
  • 7. 7Large Scale Data | Developing New Tools | Testing and Building Theory
  • 8. 8 Opportunity: The ArchiveHub project aims to support the creation and dissemination of general guidelines & tools for conducting theoretically and methodologically rigorous longitudinal research using archival Web data Large Scale Data | Developing New Tools | Testing and Building Theory
  • 9. HistoryTracker Tool 9 Version 2.0 20th Century Collection @ RU PIG Scripts in Hadoop Environment RU High-Speed Computing Cluster Link Lists & Text Data Curated Data Sets Large Scale Data | Developing New Tools | Testing and Building Theory
  • 10. 10 Dataset Research Potential Dates Captures Unique URLs Hurricane Katrina Online networks and organizational resilience (Chewning, Lai and Doerfel, 2012; Perry, Taylor and Doerfel, 2003) in the wake of disasters; information dissemination 2003 – 2012 1,694,236 663,740 Superstorm Sandy 2003 – 2012 41,703,112 20,013,455 US Senate Study the growth of political activity in online environments (Adamic & Glance, 2005; Bruns, 2007; Chang & Park, 2012); polarization & media discourse 109th – 112th Congresses 26,965,770 8,674,397 US House 51,840,777 12,410,014 Occupy Wall Street Previous research on NGOs in the online environment (Bach & Stark, 2004; Shumate, 2003, 2012; Shumate, Fulk, & Monge, 2005); use of hyperlink data to study the formation and role of alliances between SMOs 2010 – 2012 247,928,272 11,3259,655 US Media Previous studies of news media organizations (Greer & Mensing, 2006; Weber, 2012; Weber & Monge, In Press); focus on evolutionary patterns 2008 – 2012 1,315,132,555 539,184,823 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 11. What’s in the data? 11 Source | Destination | Date | Frequency | Content Type | Bytes | Descriptive Text Link Data: http://gawker.com/5953665/mitt-romneys- staff-played-the-media-covering-them-in-a- friendly-game-of-flag-football Mitt Romney's Staff Played the Media Covering Them in a Friendly Game of Flag http://gawker.com 2012-10-22 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 13. 13Large Scale Data | Developing New Tools | Testing and Building Theory
  • 14. 14 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 15. PUTTING BIG THEORY INTO BIG DATA [or] moving from observing the Web to observing new phenomenon on the Web 15Large Scale Data | Developing New Tools | Testing and Building Theory
  • 16. Tracing the Emergence of Organizational Forms 16 Environment: Organizations compete for scare resources; during rapid periods of disruption, new entrants seek “protected” niches (Weber & Monge 2014) Population: In digital spaces, online connections provide communicative representations of information flows (Weber & Monge, 2012) Formation of ties (e.g. hyperlinks) can positively impact long-term likelihood of organization survival (Weber, 2012) Organization: Organizations adapt internally, reconfiguring team structures and developing new routines for knowledge sharing (Ellison, Gibbs & Weber, In Press; Weber & Kim, Under Review) Large Scale Data | Developing New Tools | Testing and Building Theory
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. Big Data… Big Theory? • Networks are central to social movements in that links between nodes can be influential in collective action • Examples of nodes includes participants, organizations, media and communications technologies • Social networks and social movements (Diani, 2003) • The interaction between actors, and between actors and hashtags, collectively represent a networked form of organization • Network form of organization (Powell, 1990) Large Scale Data | Developing New Tools | Testing and Building Theory
  • 24.
  • 25. Data • Triangulation of data insulates against false readings from large-scale data (see Lazer, Kennedy, King and Vespignani, 2014) • Internet Archive: – 335 OWS related websites; ~330 million edges over a 2-year period • Lexis Nexis: – Search conducted to assess U.S. newspaper coverage of OWS from the early stages of the movement in September 2011 through Sept. 2012 – Search OWS keywords, e.g. “Occupy Wall Street,” “Occupy Oakland” • Twitter – Gnip PowerTrack • Search by keywords; captures a larger volume of Twitter data than other options – Sample includes October 17, 2011, through January 5, 2012. Initial study focused on the critical two-month period from November 1 through December 31, 2011, – 750,816 tweets across the two-month period. 25Large Scale Data | Developing New Tools | Testing and Building Theory
  • 26. Large Scale Data | Developing New Tools | Testing and Building Theory
  • 27. OWS News Coverage Large Scale Data | Developing New Tools | Testing and Building Theory
  • 28. OWS on the Web • 335 seed organizations based on records from #OccupyResearch • Data extracted for 2011 & 2012, based on “both matching” 28 0 2 4 6 8 10 12 14 16 18 Millions Captures per Month Large Scale Data | Developing New Tools | Testing and Building Theory
  • 29. Maximal Cores (k Coreness) 29 Aug. 2011 Jan. 2012 Large Scale Data | Developing New Tools | Testing and Building Theory
  • 31. 31 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Density Large Scale Data | Developing New Tools | Testing and Building Theory
  • 32. 32 0 10 20 30 40 50 60 70 80 90 100 Clusters Large Scale Data | Developing New Tools | Testing and Building Theory
  • 33. 33Large Scale Data | Developing New Tools | Testing and Building Theory
  • 34. Challenges: • Access Challenges: – Scaling access to the data • Data Challenges: – Moving from access to researchable data • Research Challenges: – Bridging “big data” to “big theory” – Potential for use as a historical research tool 34Large Scale Data | Developing New Tools | Testing and Building Theory
  • 35. • Want data? – Email me! matthew.weber@rutgers.edu – ArchiveHub: http://archivehub.rutgers.edu • The Team – Kris Carpenter, Vinay Goel, Internet Archive – David Lazer, Katherine Ognyanova, Northeastern University – Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University – Peter Monge, Ayushman Datta, Kristen Guth, USC 35Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers

Editor's Notes

  1. 8.5PB of data.
  2. 20th Century Collection = 9TB of metadata Media Seed List = 4,891
  3. 20th Century Collection = 9TB of metadata Media Seed List = 4,891
  4. 9/25/11
  5. Diani – ANT – actants exist thru relationships w/ other nodes; technology nodes as actants; hastags Network form – repeated, enduring exchange…that lack a legitimant organziational authority to arbitrae
  6. Over time, dyadic communication will become prevalent in an emerging networked organization. As a social movement develops as an emerging network form of organization, the organizational structure will be increasingly clustered.
  7. Trend chart illustrating the relationship between OWS and the media
  8. News sources 105 major U.S. newspapers via Lexus Nexus Search terms: Occupy Wall Street, Occupy Los Angeles, Occupy Wall Street, Occupy Chicago, Zuccotti Park Initial Analysis: Sample set drawn from Oct. 17, 2011 – Jan. 1, 2012 Nov. 10 & Nov. 17 Occupy Los Angeles
  9. Aug 2011 -> 20,000 ties Jan. 2012 -> 65,000 ties – denser core