SlideShare une entreprise Scribd logo
1  sur  30
Storytelling for Summarizing
Collections in Web Archives
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
CNI Spring 2016
2016-04-05
1
IMLS-Funded Research
1. Use small “stories” to summarize much larger
collections of archived web pages
– big  small
2. Generate web archive collections by mining
user-generated stories for seed URIs
– small  big
http://ws-dl.blogspot.com/2015/10/2015-10-07-imls-and-nsf-fund-web.html
2
Archive-It, a subscription-based
service, hosts curated web collections
3
> 3,000
collections
> 400
partners
> 10B
archived
pages
4
Collection
title
Collection
categorization
according to
the curator
Seed
URI
Metadata
about the
collection
Text
search
box
The group
that the
resource
belongs to
List of
the
seed
URIs
Timespan of
the resource
and the
number of
times it has
been captured
Problem:
Collection understanding and
collection summarization are
not currently supported
Not easy to answer “what’s in that collection?”
5
There is more than one collection
about the Egyptian Revolution
6
• “2010-2011 Arab Spring” https://archive-it.org/collections/3101
• “North Africa & the Middle East 2011-2013” https://archive-it.org/collections/2349
• “Egypt Revolution and Politics” https://archive-it.org/collections/2358
(1000s of Seeds X 1000s of Mementos)
+ Dimension of Time ==
Conventional Vis Methods
Not Applicable
7
Using Timelines, Treemaps, etc.:
http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
Idea:
Storytelling
8
Stories in Literature
Story elements: setting, characters, sequence, exposition,
conflict, climax, resolution
9
Once upon a time…
http://www.learner.org/interactives/story/
Stories in social media
10
“It's hard to define a story, but I know it when I see it” (Alexander, 2008)
A sampling and arrangement of web resources for summarization.
Collection == thematic sample from the Web
Story == arranged sample from the collection
S
1
S
2
S
3
S
4
S
2
S
1
S
3
Collection Y
S
3
S
2
S
1
Collection Z
Collection X
11
We sample k mementos from N pages of the collection to create a summary story
Collections have two dimensions
12
Time
URI
Fixed Pages, Fixed Time
R1
R1
R1
R1
t1 t3t2 t5t4 t6
13
Fixed Page, Fixed Time
14
A desktop Chrome user-agent
http://www.cnn.com/2014/02/24/world/africa/egypt-
politics/index.html?hpt=wo_c2
Andriod Chrome user-agent
http://www.cnn.com/2014/02/24/world/africa/egypt-
politics/index.html?hpt=wo_c2
First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites, JCDL 2013: https://www.harding.edu/fmccown/pubs/jcdlsp182-schneider.pdf
A Method for Identifying Personalized Representations in Web Archives, D-Lib Magazine, 2013: http://www.dlib.org/dlib/november13/kelly/11kelly.html
Fixed Page, Sliding Time
R R R R R R
t1 t3t2 t5t4 t6
15
Feb 1 Feb 1 Feb 2
Feb 4 Feb 5 Feb 7
Feb 9 Feb 11
Feb 11
16
Sliding Page, Fixed Time
R1
R2
R3
R4
t1 t3t2 t5t4 t6
17
Feb. 11, 2011
Mubarak resigns
18
Sliding Page, Sliding Time
R1
R2
R1
R3
R4
R2
t1 t3t2 t5t4 t6
19
Jan 27 Jan 31
Feb 7Feb 4
Feb 11 Feb 11
Feb 2
Jan 25
Feb 10
20
21
What do stories in Storify look like?
“Characteristics of Social Media Stories”, TPDL 2015
http://www.cs.odu.edu/~mln/pubs/tpdl-2015/tpdl-2015-stories.pdf
What is the length of a story
(the number of resources per story)?
• This story
has 31
resources
22
1
3
2
What are the types of resources
that compose a story?
• This story has
– 19 quotes
– 8 images
– 4 videos
23
Quotes
Video
What are the most frequently
used domains?
• This story uses:
– 90% twitter.com
– 7% instagram.com
– 3% facebook.com
24
Twitter.com
Twitter.com
Twitter.com
What differentiates a popular story?
25
19,795 views 64 views
(skipping many details,
see TPDL 2015 paper)
26
We should create stories with:
• ~28 pages
• moar images!
• where possible, select pages from social
media, news, blogs
• additional dimensions of quality:
– are well archived (e.g., not missing images,
stylesheets)
– generate nice summaries in the Storify
interface
27
Stories from collections about the Egyptian Revolution
28
https://storify.com/yasmina85/auto-stories-from-archived-collections-56fbc3d1b8d27c6f6571c647
https://storify.com/yasmina85/auto-stories-from-archived-collections-5702ff8f228eede273d49c21
https://storify.com/yasmina85/auto-stories-from-archived-collections-5702c7f1228eede273d48ddf
Evaluation: can humans tell human
generated stories from machine generated?
29
https://storify.com/yasmina85/this-is-manually-generated-story-from-archive-it-c-56b25ae72c0664474ee34f13
https://storify.com/yasmina85/auto-stories-from-archived-collections-56f1cfd36bc660f47f1b9f5e
Use an interface people already know how
to use to summarize collections
30
Archived collectionsStorytelling services
Archived enriched
stories
more info:
https://github.com/yasmina85/OffTopic-Detection
http://ws-dl.blogspot.com/2015/09/2015-09-28-tpdl-2015-in-poznan-poland.html
http://ws-dl.blogspot.com/2015/08/2015-08-20-odu-l3s-stanford-and.html

Contenu connexe

Tendances

Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 

Tendances (20)

Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03csvconfyasmin2017_05_03
csvconfyasmin2017_05_03
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Impact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesImpact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web Archives
 
Supporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSupporting Web Archiving via Web Packaging
Supporting Web Archiving via Web Packaging
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 

En vedette

En vedette (16)

Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 

Similaire à Storytelling for Summarizing Collections in Web Archives

Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
12th International Conference on Digital Preservation (iPRES 2015)
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
Samantha Norling
 
Check out the slides
Check out the slidesCheck out the slides
Check out the slides
webhostingguy
 
Farl web archiving
Farl web archivingFarl web archiving
Farl web archiving
aerho
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
judell
 
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
PhiloWeb
 
Prepare for the Mobilacalypse
Prepare for the MobilacalypsePrepare for the Mobilacalypse
Prepare for the Mobilacalypse
Jeff Eaton
 

Similaire à Storytelling for Summarizing Collections in Web Archives (20)

Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
Check out the slides
Check out the slidesCheck out the slides
Check out the slides
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
Farl web archiving
Farl web archivingFarl web archiving
Farl web archiving
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification Framework
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
 
The Squishy Future of Content - HEEMAC Edition
The Squishy Future of Content - HEEMAC EditionThe Squishy Future of Content - HEEMAC Edition
The Squishy Future of Content - HEEMAC Edition
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
E-Learning Workshop 2.0
E-Learning Workshop 2.0E-Learning Workshop 2.0
E-Learning Workshop 2.0
 
Social web Ontologies
Social web OntologiesSocial web Ontologies
Social web Ontologies
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
 
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
Nicolas Delaforge: Modeling the Web resource, extracting the context: stakes ...
 
Web mining
Web miningWeb mining
Web mining
 
Open Government & Fingal Open Data
Open Government & Fingal Open DataOpen Government & Fingal Open Data
Open Government & Fingal Open Data
 
Prepare for the Mobilacalypse
Prepare for the MobilacalypsePrepare for the Mobilacalypse
Prepare for the Mobilacalypse
 

Plus de Michael Nelson

Plus de Michael Nelson (6)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Storytelling for Summarizing Collections in Web Archives

  • 1. Storytelling for Summarizing Collections in Web Archives Yasmin AlNoamany Michele C. Weigle Michael L. Nelson Old Dominion University Web Science and Digital Libraries Group @WebSciDL This work is supported in part by IMLS LG-71-15-0077 CNI Spring 2016 2016-04-05 1
  • 2. IMLS-Funded Research 1. Use small “stories” to summarize much larger collections of archived web pages – big  small 2. Generate web archive collections by mining user-generated stories for seed URIs – small  big http://ws-dl.blogspot.com/2015/10/2015-10-07-imls-and-nsf-fund-web.html 2
  • 3. Archive-It, a subscription-based service, hosts curated web collections 3 > 3,000 collections > 400 partners > 10B archived pages
  • 4. 4 Collection title Collection categorization according to the curator Seed URI Metadata about the collection Text search box The group that the resource belongs to List of the seed URIs Timespan of the resource and the number of times it has been captured
  • 5. Problem: Collection understanding and collection summarization are not currently supported Not easy to answer “what’s in that collection?” 5
  • 6. There is more than one collection about the Egyptian Revolution 6 • “2010-2011 Arab Spring” https://archive-it.org/collections/3101 • “North Africa & the Middle East 2011-2013” https://archive-it.org/collections/2349 • “Egypt Revolution and Politics” https://archive-it.org/collections/2358
  • 7. (1000s of Seeds X 1000s of Mementos) + Dimension of Time == Conventional Vis Methods Not Applicable 7 Using Timelines, Treemaps, etc.: http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
  • 9. Stories in Literature Story elements: setting, characters, sequence, exposition, conflict, climax, resolution 9 Once upon a time… http://www.learner.org/interactives/story/
  • 10. Stories in social media 10 “It's hard to define a story, but I know it when I see it” (Alexander, 2008) A sampling and arrangement of web resources for summarization.
  • 11. Collection == thematic sample from the Web Story == arranged sample from the collection S 1 S 2 S 3 S 4 S 2 S 1 S 3 Collection Y S 3 S 2 S 1 Collection Z Collection X 11 We sample k mementos from N pages of the collection to create a summary story
  • 12. Collections have two dimensions 12 Time URI
  • 13. Fixed Pages, Fixed Time R1 R1 R1 R1 t1 t3t2 t5t4 t6 13
  • 14. Fixed Page, Fixed Time 14 A desktop Chrome user-agent http://www.cnn.com/2014/02/24/world/africa/egypt- politics/index.html?hpt=wo_c2 Andriod Chrome user-agent http://www.cnn.com/2014/02/24/world/africa/egypt- politics/index.html?hpt=wo_c2 First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites, JCDL 2013: https://www.harding.edu/fmccown/pubs/jcdlsp182-schneider.pdf A Method for Identifying Personalized Representations in Web Archives, D-Lib Magazine, 2013: http://www.dlib.org/dlib/november13/kelly/11kelly.html
  • 15. Fixed Page, Sliding Time R R R R R R t1 t3t2 t5t4 t6 15
  • 16. Feb 1 Feb 1 Feb 2 Feb 4 Feb 5 Feb 7 Feb 9 Feb 11 Feb 11 16
  • 17. Sliding Page, Fixed Time R1 R2 R3 R4 t1 t3t2 t5t4 t6 17
  • 18. Feb. 11, 2011 Mubarak resigns 18
  • 19. Sliding Page, Sliding Time R1 R2 R1 R3 R4 R2 t1 t3t2 t5t4 t6 19
  • 20. Jan 27 Jan 31 Feb 7Feb 4 Feb 11 Feb 11 Feb 2 Jan 25 Feb 10 20
  • 21. 21 What do stories in Storify look like? “Characteristics of Social Media Stories”, TPDL 2015 http://www.cs.odu.edu/~mln/pubs/tpdl-2015/tpdl-2015-stories.pdf
  • 22. What is the length of a story (the number of resources per story)? • This story has 31 resources 22 1 3 2
  • 23. What are the types of resources that compose a story? • This story has – 19 quotes – 8 images – 4 videos 23 Quotes Video
  • 24. What are the most frequently used domains? • This story uses: – 90% twitter.com – 7% instagram.com – 3% facebook.com 24 Twitter.com Twitter.com Twitter.com
  • 25. What differentiates a popular story? 25 19,795 views 64 views
  • 26. (skipping many details, see TPDL 2015 paper) 26
  • 27. We should create stories with: • ~28 pages • moar images! • where possible, select pages from social media, news, blogs • additional dimensions of quality: – are well archived (e.g., not missing images, stylesheets) – generate nice summaries in the Storify interface 27
  • 28. Stories from collections about the Egyptian Revolution 28 https://storify.com/yasmina85/auto-stories-from-archived-collections-56fbc3d1b8d27c6f6571c647 https://storify.com/yasmina85/auto-stories-from-archived-collections-5702ff8f228eede273d49c21 https://storify.com/yasmina85/auto-stories-from-archived-collections-5702c7f1228eede273d48ddf
  • 29. Evaluation: can humans tell human generated stories from machine generated? 29 https://storify.com/yasmina85/this-is-manually-generated-story-from-archive-it-c-56b25ae72c0664474ee34f13 https://storify.com/yasmina85/auto-stories-from-archived-collections-56f1cfd36bc660f47f1b9f5e
  • 30. Use an interface people already know how to use to summarize collections 30 Archived collectionsStorytelling services Archived enriched stories more info: https://github.com/yasmina85/OffTopic-Detection http://ws-dl.blogspot.com/2015/09/2015-09-28-tpdl-2015-in-poznan-poland.html http://ws-dl.blogspot.com/2015/08/2015-08-20-odu-l3s-stanford-and.html

Notes de l'éditeur

  1. What we mean here by Storytelling here is using visualizations to put a set of web pages from web archives in a narrative structure, ordered by time
  2. First deployed in 2006, Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. 
  3. Lori created the collections and entered metadata about them,description, title, etc Collection level metadata but it doesn’t help a lot Archive-It provides faceted browsing and search services on the resulting collection
  4. , there are about 3 or 4 collections about egyptian revolution in Archive-it, If I want to know about the egy rev, which collection should I browse?? Collection is two dimensions <<URIs, and copies of these URIs>> Historian with more than one collection will not know where to start
  5. Every story is made up of a set of events. We use ``story'' in its current, loose context of social media, which is sometimes missing elements from the more formal literary tradition of dramatic structure, morality, humor, improvisation, etc What we mean here by Storytelling here is using visualizations to put a set of web pages from web archives in a narrative structure, ordered by time
  6. Story def. in social media much looser and more relaxed. Storytelling may be seen as the set of cultural practices for representing events chronologically.
  7. So if this is the web, the archived collections are subsets from the web, we will sample from these collections to create a story…..
  8. http://www.cnn.com/2014/02/24/world/africa/egypt-politics/index.html?hpt=wo_c2 http://america.aljazeera.com/ Personalized Web resources offer different representations based on the user-agent string and other values in the HTTP request headers, GeoIP, and other environmental factors. Currently web archives don’t support browsing different representation. This means Web crawlers capturing content for archives may receive representations based on the crawl environment which will differ from the representations returned to the interactive users.
  9. http://wayback.archive-it.org/2358/20110211191423/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/ http://wayback.archive-it.org/2358/*/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/
  10. Here
  11. Here is feb 11 from different news sites https://wayback.archive-it.org/2358/20110211074248/http://www.globalpost.com/dispatch/egypt/110210/mubarak-resign-obama-egypt https://wayback.archive-it.org/2358/20110211191445/http://www.cnn.com/ https://wayback.archive-it.org/2358/20110211192204/http://www.bbc.co.uk/news/world-middle-east-12433045 https://wayback.archive-it.org/2358/20110211192142/http://www.modernegypt.info/ https://wayback.archive-it.org/2358/20110211191423/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/ https://wayback.archive-it.org/2358/20110211191423/http://www.arabist.net/ https://wayback.archive-it.org/2358/20110211194239/http://www.globalpost.com/dispatch/egypt/110211/mubarak-quits-resigns-egypt-cairo
  12. And here I want to get the broadest coverage possible for the egyptian revolution
  13. Our research question is What are the structural characteristics of popular (i.e., receiving the most views) human-generated stories? We answer the following questions:
  14. the top 25% of views, normalized by time available on the web
  15. So what we want to do is to create persistent stories then visualize them using storytelling tool that users already know about, such as storify. So we will integrate the story telling servises and the archived collections to generate archived enriched stories.