Dr. Michele Weigle gave a presentation on telling stories using web archives. She discussed defining a story's timeline and key events, identifying relevant archived web pages, and visualizing the assembled story. Her research group is exploring how to help others reconstruct personal and historical narratives from archived web content, as pages on the live web often disappear over time.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Telling Stories with Web Archives
1. Telling Stories with Web Archives
Dr. Michele C. Weigle
Web Sciences and Digital Libraries (WS-DL) Lab
Department of Computer Science
Old Dominion University
Norfolk, VA
Includes joint work with Dr. Michael L. Nelson and our PhD students, Scott Ainsworth, Yasmin
AlNoamany, Ahmed AlSum, Justin Brunelle, Mat Kelly, Hany SalahEldeen
Southeast Women in Computing Conference
November 16, 2013
2. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
#SEWIC2013
3. What is a web archive?
Southeast Women in Computing Conference - Nov 16, 2013
4. What are some web archives?
Southeast Women in Computing Conference - Nov 16, 2013
5. How can I access the archives?
MementoFox
Memento for Chrome
http://www.mementoweb.org/
http://ws-dl.blogspot.com/2010/03/2010-03-19-mementofox-add-on-released.html
http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html
Southeast Women in Computing Conference - Nov 16, 2013
6. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
7. The Web holds our stories
Southeast Women in Computing Conference - Nov 16, 2013
8. But webpages can disappear
• Average lifespan of a webpage - 50-100 days
• A year after publication, about 11% of content
shared on social media will be gone.
SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
Southeast Women in Computing Conference - Nov 16, 2013
9. But maybe it's archived
Ainsworth, AlSum, SalahEldeen, Weigle, and Nelson, "How Much of the Web is Archived?", JCDL 2011
http://ws-dl.blogspot.com/2011/06/2011-06-23-how-much-of-web-is-archived.html
Southeast Women in Computing Conference - Nov 16, 2013
10. But social media is hard to archive
Southeast Women in Computing Conference - Nov 16, 2013
11. Our Research Group Goals
• We believe that web archives are valuable
cultural resources, and we want everyone to
know about them.
• We want to make it easy for people to bridge
the gap between the live web and the archives.
• We believe that replaying the past is more
compelling than reading a summary.
Southeast Women in Computing Conference - Nov 16, 2013
26. Replaying the past can be
more compelling than just a
summary
Southeast Women in Computing Conference - Nov 16, 2013
27. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
28. What's My Story?
• As another illustration, I'll tell you a little bit
more about myself ...
• ... using the Internet Archive
Southeast Women in Computing Conference - Nov 16, 2013
39. Proof I was there - 2006
Southeast Women in Computing Conference - Nov 16, 2013
40. Faculty Position at ODU - 2006
Southeast Women in Computing Conference - Nov 16, 2013
41. Vehicular Networks - 2006
Southeast Women in Computing Conference - Nov 16, 2013
42. 1st PhD Student Graduated - 2010
Southeast Women in Computing Conference - Nov 16, 2013
43. InfoVis, Work with WS-DL - 2011
Southeast Women in Computing Conference - Nov 16, 2013
44. Telling My Story
• Going through the archive was a lot of fun.
• But, it wasn't always easy.
• Today, I might want to incorporate Facebook
and Twitter posts in my story. Not saved at
Internet Archive. =(
• Let's make this easy to do for everyone.
Southeast Women in Computing Conference - Nov 16, 2013
45. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
46. Project Overview
• Project forms the PhD work of Yasmin
AlNoamany, ideas in early stages
• Joins my interests in measurement, web
science, information visualization.
– measurement - how do people use web archives?
– web science - how can we analyze web archives to
find pages related to live web pages?
– info vis - how can we present the stories that we
have harvested from the archive?
Southeast Women in Computing Conference - Nov 16, 2013
47. How do people use web archives?
• We obtained a year's worth (2012) of requests
to the Internet Archive's Wayback Machine
– client IPs anonymized
Southeast Women in Computing Conference - Nov 16, 2013
48. How do people use web archives?
• First, there are a lot of robots (aka bots) who
access the archive
– 10 bot sessions for every 1 human session
– maybe people don't know about the archive?
• Typical human sessions are pretty short
– people aren't spending lots of time in the archive
– it took me over an hour of walking through the archive
to build my story
– maybe people who do know about the archive aren't
using it to build stories?
AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013
Southeast Women in Computing Conference - Nov 16, 2013
49. How do people use web archives?
• 65% of the requested archived pages no longer
exist on the live web
• People use the archive because the pages they
are interested in no longer exist
– like most of my examples from my story
AlNoamany, AlSum, Weigle, and Nelson, "Who and What Links to the Internet Archive", IJDL, to appear, 2013
Southeast Women in Computing Conference - Nov 16, 2013
50. Helping Others Tell Stories
• How can we use this information to help
people tell stories?
• How do people tell stories?
• What tools do they use today?
Southeast Women in Computing Conference - Nov 16, 2013
52. Bookmarking is not preserving
Southeast Women in Computing Conference - Nov 16, 2013
53. How do people tell stories?
• There are three levels of information:
– overview
– recent events
– story definition and replay
Southeast Women in Computing Conference - Nov 16, 2013
60. Research Questions
How do we
• define the time frame of a story?
• define the individual events that make up
a story?
• identify, evaluate, and select candidate
archived web pages to support the events
of the story?
• visualize the resulting story?
Southeast Women in Computing Conference - Nov 16, 2013
61. Define the Time Frame of a Story
• People remember the name of the story, but not
the date
– Hurricane Katrina - Aug 29, 2005
– 2011 Egyptian Revolution - Jan 25, 2011
– Boston Marathon Bombing - April 15, 2013
• Some stories have no definitive beginning/ending
– BP Gulf Oil Spill - April 20 - September? 2010 effects, court cases still ongoing
– Egyptian Revolution - which one? (1952, 2011, 2013)
Southeast Women in Computing Conference - Nov 16, 2013
62. Define the Time Frame of a Story
• Propose candidate times based on user query
Southeast Women in Computing Conference - Nov 16, 2013
63. Define a Story's Events
• Consult hand-crafted
timelines
• User-provided timelines
• Detect themes in relevant
archived web pages
Southeast Women in Computing Conference - Nov 16, 2013
64. Identify Relevant Archived Web Pages
• Identify "seed URIs" and query the archive for
their existence during the appropriate time
– also query for URIs linked from the seed URIs
• How to identify seed URIs?
– wikipedia
– news sites
– social media (tweets, Facebook shares)
– Storify
Southeast Women in Computing Conference - Nov 16, 2013
65. Different sources will provide
different seed URIs
Southeast Women in Computing Conference - Nov 16, 2013
66. What about social media pages?
Southeast Women in Computing Conference - Nov 16, 2013
67. Create your own Facebook archive
• May need to
allow for usercontributed
content
Kelly, Nelson, and Weigle, "WARCreate and WAIL: WARC, Wayback, and Heritrix Made Easy," Demo at Digital Preservation 2013.
http://ws-dl.blogspot.com/2013/07/2013-07-10-warcreate-and-wail-warc.html
Southeast Women in Computing Conference - Nov 16, 2013
68. Suppose we found 100 relevant pages
for each event in the story
I’ll add here many copies from bbc, nytimes,
foxnews
Southeast Women in Computing Conference - Nov 16, 2013
69. Evaluate Relevant Archived Web Pages
• Are there duplicate accounts?
• What is the reputation, bias, or point of view
of the source?
• How well was the page archived?
Southeast Women in Computing Conference - Nov 16, 2013
72. Quality of Archived Page
Southeast Women in Computing Conference - Nov 16, 2013
73. Select Relevant Archived Web Pages
• User will select pages to use in the final story
• But user needs to be presented with some
choices
Southeast Women in Computing Conference - Nov 16, 2013
75. Visualize the Story
• Provide different interactive visualizations that
enable exploring the story easily
• Provide the user with the ability to modify the
story and specify the start and end dates
Southeast Women in Computing Conference - Nov 16, 2013
79. Research Questions
How do we
• define the time frame of a story?
• define the individual events that make up
a story?
• identify, evaluate, and select candidate
archived web pages to support the events
of the story?
• visualize the resulting story?
Southeast Women in Computing Conference - Nov 16, 2013
80. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
81. User Access Patterns
AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013
Southeast Women in Computing Conference - Nov 16, 2013
82. Everybody Dips, Humans Dive, Robots Skim
Robots (34,203 sessions)
Humans (3,431 sessions)
AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013
Southeast Women in Computing Conference - Nov 16, 2013
83. What domains does each archive hold?
AlSum, Weigle, Nelson and Van de Sompel, "Profiling Web Archive Coverage for Top-Level Domain and Content Language," TPDL 2013.
Southeast Women in Computing Conference - Nov 16, 2013
84. What domains does each archive hold?
AlSum, Weigle, Nelson and Van de Sompel, "Profiling Web Archive Coverage for Top-Level Domain and Content Language," TPDL 2013.
Southeast Women in Computing Conference - Nov 16, 2013
85. Sometimes the live web "leaks" into
the archive
Sept 3, 2008
2012
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Southeast Women in Computing Conference - Nov 16, 2013
87. ODU's WS-DL Group
• Our recent work has been featured in the popular press
• We're always looking for more great students!
Dr. Michele C. Weigle
Old Dominion University
Norfolk, VA
mweigle@cs.odu.edu
@weiglemc
http://www.cs.odu.edu/~mweigle/
http://ws-dl.blogspot.com/
Southeast Women in Computing Conference - Nov 16, 2013
Notes de l'éditeur
We have seen machine readable lost of URIs, can we automatically create this list?
Storify is a social network service that lets the user create stories or timelines using social media such as Twitter, Facebook andInstagram. Storify was launched in September 2010, and has been open to the public since April 2011.http://storify.com/nzherald/muhttp://storify.com/nzherald/mu
The problem is that storify operate as bookmarking, it doesn’t preserve the links You have no clue of what the person is saying about the link
Which brings overview from wikipedia as a first result
Which brings overview from wikipedia as a first result
Which brings overview from wikipedia as a first result
Which brings overview from wikipedia as a first result
But replaying the story as it captured in the news web sites???Three information needsThis one is unserved
Three information needsThis one is unservedNow let me tell you a story of egyptian revolution, using a couple of screen shots which appeared in the time of revolution
Can we satisfy the information need of rewinding/replaying the events as they appeared in the past?How do we integrate web archives into live web to support storytelling?How do we integrate web archives into live web for repalying news stories as they captured?The research aims to integrate the past with the presentby automatically creating, identifying, and linking storiesculled from the past web that are related to the contentof a live web page or a specic event. This raises some ofthe questions: Can we leverage the content of social mediaservices to discover stories? Can we extract stories basedon user access patterns of the Wayback Machine? Can weassociate the names that people give particular events withtheir datetimes in order to find them in web archives?
If we look at different places we will get different URIs that express different prospective of the story Searching two places give us two results
I bet that anyone here know the importance of this page. Trust me. This page is very important I know that Egyptian revolution started on this group, I was one of the first who joined this page which had been created in June 10, 2010This is one of the most important pages, I know it because I have background , trust me! This is an important page for the story even if we know that, the current status is not representing the story
If we have time frame specified for the event/story, we will use deduping news collections
http://www.bartamaha.com/egypts-mubarak-resigns-after-30-year-rule-42593/Handle the duplicates of the news
Can we satisfy the information need of rewinding/replaying the events as they appeared in the past?How do we integrate web archives into live web to support storytelling?How do we integrate web archives into live web for repalying news stories as they captured?The research aims to integrate the past with the presentby automatically creating, identifying, and linking storiesculled from the past web that are related to the contentof a live web page or a specic event. This raises some ofthe questions: Can we leverage the content of social mediaservices to discover stories? Can we extract stories basedon user access patterns of the Wayback Machine? Can weassociate the names that people give particular events withtheir datetimes in order to find them in web archives?
Archive-It create slides for the seed URIs which is not normally happened by Web archive users as we discovered from the data.Humans exhibit Dip and Dive, while robots exhibit Dip and Skim Combination that humans exhibit (slides and dives)