SlideShare une entreprise Scribd logo
1  sur  41
Collecting in the MomentCollecting in the Moment
Gretchen Gueguen
University of Virginia
RBMS Pre-conference
June 24, 2013
June 10, 2012
Teresa A. Sullivan, President of the University of
Virginia announces resignation…
…Gretchen M. Gueguen, Digital Archivist at Uva, prepares to attend
Rare Book School the next day
June 11-16, 2012
June 18, 2012
June 18, 2012
• Decision is made to form a cross-
departmental group within the library to
discuss saving the historic record related
to these events
At 9:00 a.m. on July 19th
…
me
What’s the Big Deal?
• Digital is THE publishing platform
• Event was important for both the historic
nature of the events (message) but also
HOW it was communicated (medium)
Springing into action
• Twitter
• Blogs and Web
• Facebook
• News
• Video
Twitter API
• Allows you to download tweets as data
for a given hashtag, user, or keyword
search (#woo-hoo!)
• Has many tools available for doing all
kinds of neat stuff (#woo-hoo!)
• Limits you to just the last 1500 tweets for
any given search (#d’oh!)
Info at:
http://mashe.hawksey.info/2011/11/twitter-how-to-archive-event-hashtags-and-visualize-conversation/
Final Collection
• 47 XML files
– #BOV, #UVA, #rally4honor, #dragasmustgo
– @cavalierdaily, @LarrySabato, @Rector Drago,
@strategydynamo
• 47 spreadsheets
– Hashtags only (#UVA, #sullivan, #BOV,
#fillthelawn, #strine, #united4honor,)
Twitter API update
Re-harvest has returned ~53,000 tweets
– Data issues
– Deleted accounts
Posted content
• Links, pictures, video related to the story
• Could not find a tool to just extract these
to look through later
• Many shortened links that had to be
clicked on to find out what they held
• Many links were retweeted
Blogs and other web content
• How to capture everything else
• Tools for web capture
– Difficult to implement
– Don’t do exactly what is needed
– I’m running out of time!
• Solution:
– I have to look at it anyway to select, so
• Firefox “Save As”
• Screengrab plugin for screenshot
Web sites
• No way to create web-archive standard (WARC)
files at the time
– ~1,000 HTML +archive
– Screenshot
Investigation of WAIL (Web Archive Integrated
Layer) to create WARC files
– Will require a re-harvest of URLs to ensure proper
header metadata
– But has automated way of doing this
Facebook & “Privacy”
• Rallies on grounds were organized
through Facebook “groups.”
• Some posts are visible
only to members of the group.
All others are only visible to
those with a Facebook account.
Facebook & Privacy
• Facebook accounts are free
• But this still means the content wasn’t
“public” as per the TOS
News
• Relatively easy to capture
• Overwhelming in volume
• Why capture the online version?
– Some things only appear online, some only in
print
– Online version, for many sources, allows
commenting
• Why capture this when it will be saved
elsewhere?
– Reference collection
– Databases may capture content but not
commentary
Subscriptions
Audio/Video
• YouTube
• News
• WINA podcast
• WUVA streaming
• Streaming Board
Meetings
• Public Affairs
User Contributions
• Capture what the
public thought was
important
• Possible violations
of privacy or
intellectual property
Final Tally
• Tweets: 80,000 ?
• News articles: 572
• Blog posts: 147
• Other web content: 196
• Twitter pictures: 243
• Video: 69
• Documents: 21
• User-Contributed Items: 118
What’s Been Done
• Preliminary collection finding aid
• Working with small group
on twitter and web data
issues
• Twitter and web re-
harvest
• Access provided in
a few cases
What Needs to Be Done?
• Access
– Searching
– Use
• Metadata
• Further appraisal
decisions/de-
accessioning
What About Next Time?
• Need to establish a web/social media
collection plan
– If we are routinely
capturing certain things
we won’t have to worry
about them during a
crisis
– Tools change rapidly,
working on collecting
routinely will better
position ourselves to adapt
Collecting in the Moment

Contenu connexe

Tendances

From Frenemies to Friends: Embracing Wikipedia
From Frenemies to Friends: Embracing WikipediaFrom Frenemies to Friends: Embracing Wikipedia
From Frenemies to Friends: Embracing WikipediaRebekah Cummings
 
Wikis: Enabling Collaboration in Libraries
Wikis: Enabling Collaboration in LibrariesWikis: Enabling Collaboration in Libraries
Wikis: Enabling Collaboration in LibrariesMeredith Farkas
 
Ltr 2 Handout
Ltr 2   HandoutLtr 2   Handout
Ltr 2 Handoutjenpost87
 
Library 2.0 Handout
Library 2.0 HandoutLibrary 2.0 Handout
Library 2.0 Handoutbethbouwman
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
Web2 inschools November 2010
Web2 inschools November 2010Web2 inschools November 2010
Web2 inschools November 2010Phil Bradley
 
Always Be Learning: Tools and Tips for Creating a Personal Learning Environment
Always Be Learning: Tools and Tips for Creating a Personal Learning EnvironmentAlways Be Learning: Tools and Tips for Creating a Personal Learning Environment
Always Be Learning: Tools and Tips for Creating a Personal Learning EnvironmentHeidi Steiner Burkhardt
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerWiLS
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
Promoting Digital Cultural Heritage Collections: Challenges and Opportunities
Promoting Digital Cultural Heritage Collections: Challenges and OpportunitiesPromoting Digital Cultural Heritage Collections: Challenges and Opportunities
Promoting Digital Cultural Heritage Collections: Challenges and OpportunitiesUCD Library
 
Finding Research After You Graduate
Finding Research After You GraduateFinding Research After You Graduate
Finding Research After You GraduateElaine Lasda
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
Ethics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenaryEthics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenaryAnna Perricci
 
Let's connect online; Using Social Media in Genealogy
Let's connect online; Using Social Media in GenealogyLet's connect online; Using Social Media in Genealogy
Let's connect online; Using Social Media in GenealogyKathy Petlewski
 
The public library and wikipedia
The public library and wikipediaThe public library and wikipedia
The public library and wikipediadorohoward
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising Anna Perricci
 

Tendances (20)

From Frenemies to Friends: Embracing Wikipedia
From Frenemies to Friends: Embracing WikipediaFrom Frenemies to Friends: Embracing Wikipedia
From Frenemies to Friends: Embracing Wikipedia
 
Wikis: Enabling Collaboration in Libraries
Wikis: Enabling Collaboration in LibrariesWikis: Enabling Collaboration in Libraries
Wikis: Enabling Collaboration in Libraries
 
Ltr 2 Handout
Ltr 2   HandoutLtr 2   Handout
Ltr 2 Handout
 
Library 2.0 Handout
Library 2.0 HandoutLibrary 2.0 Handout
Library 2.0 Handout
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
Bits of Research
Bits of ResearchBits of Research
Bits of Research
 
Web2 inschools November 2010
Web2 inschools November 2010Web2 inschools November 2010
Web2 inschools November 2010
 
Always Be Learning: Tools and Tips for Creating a Personal Learning Environment
Always Be Learning: Tools and Tips for Creating a Personal Learning EnvironmentAlways Be Learning: Tools and Tips for Creating a Personal Learning Environment
Always Be Learning: Tools and Tips for Creating a Personal Learning Environment
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve Meyer
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Promoting Digital Cultural Heritage Collections: Challenges and Opportunities
Promoting Digital Cultural Heritage Collections: Challenges and OpportunitiesPromoting Digital Cultural Heritage Collections: Challenges and Opportunities
Promoting Digital Cultural Heritage Collections: Challenges and Opportunities
 
Finding Research After You Graduate
Finding Research After You GraduateFinding Research After You Graduate
Finding Research After You Graduate
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Ethics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenaryEthics & Archiving the Web - presentation at ACH 2019 closing plenary
Ethics & Archiving the Web - presentation at ACH 2019 closing plenary
 
Blogs
BlogsBlogs
Blogs
 
Let's connect online; Using Social Media in Genealogy
Let's connect online; Using Social Media in GenealogyLet's connect online; Using Social Media in Genealogy
Let's connect online; Using Social Media in Genealogy
 
Conventions wiki
Conventions wikiConventions wiki
Conventions wiki
 
Conventions wiki
Conventions wikiConventions wiki
Conventions wiki
 
The public library and wikipedia
The public library and wikipediaThe public library and wikipedia
The public library and wikipedia
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 

Similaire à Collecting in the Moment

Introduction Slides - Social Media Residency
Introduction Slides - Social Media ResidencyIntroduction Slides - Social Media Residency
Introduction Slides - Social Media ResidencyMayo Clinic
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Dr. Starr Hoffman
 
Deep Web and Digital Investigations
Deep Web and Digital Investigations Deep Web and Digital Investigations
Deep Web and Digital Investigations Damir Delija
 
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can Use
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can UseBlogs, Wikis and Podcasts: Web 2.0 Tools You Can Use
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can Usekepitcher
 
Library Staff Day: Social Media, Public Libraries, and Media Streaming Services
Library Staff Day:  Social Media, Public Libraries, and Media Streaming ServicesLibrary Staff Day:  Social Media, Public Libraries, and Media Streaming Services
Library Staff Day: Social Media, Public Libraries, and Media Streaming ServicesMelissa Brisbin
 
Web 2.0 Bootcamp
Web 2.0 BootcampWeb 2.0 Bootcamp
Web 2.0 Bootcampleoklein
 
Web 2.0: It's All about Social Networking
Web 2.0: It's All about Social NetworkingWeb 2.0: It's All about Social Networking
Web 2.0: It's All about Social Networkingeoshea
 
ENGL 1221 McManus
ENGL 1221 McManusENGL 1221 McManus
ENGL 1221 McManusTraciwm
 
Learning via the Social Web
Learning via the Social WebLearning via the Social Web
Learning via the Social WebJohn Breslin
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
NISO REST Training IIIF
NISO REST Training IIIF NISO REST Training IIIF
NISO REST Training IIIF Glen Robson
 
Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2Marie Vans
 
Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2Marie Vans
 
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...WGBH Media Library and Archives
 
Creating a social media presence
Creating a social media presenceCreating a social media presence
Creating a social media presencePhil Bradley
 

Similaire à Collecting in the Moment (20)

Introduction Slides - Social Media Residency
Introduction Slides - Social Media ResidencyIntroduction Slides - Social Media Residency
Introduction Slides - Social Media Residency
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)
 
Social media
Social mediaSocial media
Social media
 
Social media
Social mediaSocial media
Social media
 
Deep Web and Digital Investigations
Deep Web and Digital Investigations Deep Web and Digital Investigations
Deep Web and Digital Investigations
 
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can Use
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can UseBlogs, Wikis and Podcasts: Web 2.0 Tools You Can Use
Blogs, Wikis and Podcasts: Web 2.0 Tools You Can Use
 
Library Staff Day: Social Media, Public Libraries, and Media Streaming Services
Library Staff Day:  Social Media, Public Libraries, and Media Streaming ServicesLibrary Staff Day:  Social Media, Public Libraries, and Media Streaming Services
Library Staff Day: Social Media, Public Libraries, and Media Streaming Services
 
Web 2.0 Bootcamp
Web 2.0 BootcampWeb 2.0 Bootcamp
Web 2.0 Bootcamp
 
Web 2.0: It's All about Social Networking
Web 2.0: It's All about Social NetworkingWeb 2.0: It's All about Social Networking
Web 2.0: It's All about Social Networking
 
ENGL 1221 McManus
ENGL 1221 McManusENGL 1221 McManus
ENGL 1221 McManus
 
Web 2.0 By Naveen
Web 2.0 By NaveenWeb 2.0 By Naveen
Web 2.0 By Naveen
 
Learning via the Social Web
Learning via the Social WebLearning via the Social Web
Learning via the Social Web
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Web 2.0 Kid Style
Web 2.0 Kid StyleWeb 2.0 Kid Style
Web 2.0 Kid Style
 
NISO REST Training IIIF
NISO REST Training IIIF NISO REST Training IIIF
NISO REST Training IIIF
 
Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2
 
Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2Preserving virtual worlds educational events using social media v2
Preserving virtual worlds educational events using social media v2
 
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
 
Creating a social media presence
Creating a social media presenceCreating a social media presence
Creating a social media presence
 

Plus de Gretchen Gueguen

Linked Data: Uses and Users
Linked Data: Uses and UsersLinked Data: Uses and Users
Linked Data: Uses and UsersGretchen Gueguen
 
DPLA Archival Description Working Group Update
DPLA Archival Description Working Group UpdateDPLA Archival Description Working Group Update
DPLA Archival Description Working Group UpdateGretchen Gueguen
 
Data Quality at the Scale of Aggregation
Data Quality at the Scale of AggregationData Quality at the Scale of Aggregation
Data Quality at the Scale of AggregationGretchen Gueguen
 
DPLA's Archival Description Working Group Update
DPLA's Archival Description Working Group UpdateDPLA's Archival Description Working Group Update
DPLA's Archival Description Working Group UpdateGretchen Gueguen
 
Do Digital Archivists Dream of Electronic Records
Do Digital Archivists Dream of Electronic RecordsDo Digital Archivists Dream of Electronic Records
Do Digital Archivists Dream of Electronic RecordsGretchen Gueguen
 
Just keep clicking Till You Find It: Building a Library Digital Collection In...
Just keep clicking Till You Find It: Building a Library Digital Collection In...Just keep clicking Till You Find It: Building a Library Digital Collection In...
Just keep clicking Till You Find It: Building a Library Digital Collection In...Gretchen Gueguen
 
National History Day Projects
National History Day ProjectsNational History Day Projects
National History Day ProjectsGretchen Gueguen
 
The Daily Reflector Image Collection: Best Practices in the Classroom
The Daily Reflector Image Collection: Best Practices in the ClassroomThe Daily Reflector Image Collection: Best Practices in the Classroom
The Daily Reflector Image Collection: Best Practices in the ClassroomGretchen Gueguen
 
Seeds Of Change Technical Implementation
Seeds Of Change Technical ImplementationSeeds Of Change Technical Implementation
Seeds Of Change Technical ImplementationGretchen Gueguen
 
Crowdsourcing Digitization: Harnessing Workflows to Increase Output
Crowdsourcing Digitization: Harnessing Workflows to Increase OutputCrowdsourcing Digitization: Harnessing Workflows to Increase Output
Crowdsourcing Digitization: Harnessing Workflows to Increase OutputGretchen Gueguen
 

Plus de Gretchen Gueguen (10)

Linked Data: Uses and Users
Linked Data: Uses and UsersLinked Data: Uses and Users
Linked Data: Uses and Users
 
DPLA Archival Description Working Group Update
DPLA Archival Description Working Group UpdateDPLA Archival Description Working Group Update
DPLA Archival Description Working Group Update
 
Data Quality at the Scale of Aggregation
Data Quality at the Scale of AggregationData Quality at the Scale of Aggregation
Data Quality at the Scale of Aggregation
 
DPLA's Archival Description Working Group Update
DPLA's Archival Description Working Group UpdateDPLA's Archival Description Working Group Update
DPLA's Archival Description Working Group Update
 
Do Digital Archivists Dream of Electronic Records
Do Digital Archivists Dream of Electronic RecordsDo Digital Archivists Dream of Electronic Records
Do Digital Archivists Dream of Electronic Records
 
Just keep clicking Till You Find It: Building a Library Digital Collection In...
Just keep clicking Till You Find It: Building a Library Digital Collection In...Just keep clicking Till You Find It: Building a Library Digital Collection In...
Just keep clicking Till You Find It: Building a Library Digital Collection In...
 
National History Day Projects
National History Day ProjectsNational History Day Projects
National History Day Projects
 
The Daily Reflector Image Collection: Best Practices in the Classroom
The Daily Reflector Image Collection: Best Practices in the ClassroomThe Daily Reflector Image Collection: Best Practices in the Classroom
The Daily Reflector Image Collection: Best Practices in the Classroom
 
Seeds Of Change Technical Implementation
Seeds Of Change Technical ImplementationSeeds Of Change Technical Implementation
Seeds Of Change Technical Implementation
 
Crowdsourcing Digitization: Harnessing Workflows to Increase Output
Crowdsourcing Digitization: Harnessing Workflows to Increase OutputCrowdsourcing Digitization: Harnessing Workflows to Increase Output
Crowdsourcing Digitization: Harnessing Workflows to Increase Output
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Collecting in the Moment

  • 1. Collecting in the MomentCollecting in the Moment Gretchen Gueguen University of Virginia RBMS Pre-conference June 24, 2013
  • 2. June 10, 2012 Teresa A. Sullivan, President of the University of Virginia announces resignation… …Gretchen M. Gueguen, Digital Archivist at Uva, prepares to attend Rare Book School the next day
  • 5. June 18, 2012 • Decision is made to form a cross- departmental group within the library to discuss saving the historic record related to these events
  • 6. At 9:00 a.m. on July 19th … me
  • 7.
  • 8. What’s the Big Deal? • Digital is THE publishing platform • Event was important for both the historic nature of the events (message) but also HOW it was communicated (medium)
  • 9. Springing into action • Twitter • Blogs and Web • Facebook • News • Video
  • 10. Twitter API • Allows you to download tweets as data for a given hashtag, user, or keyword search (#woo-hoo!) • Has many tools available for doing all kinds of neat stuff (#woo-hoo!) • Limits you to just the last 1500 tweets for any given search (#d’oh!)
  • 11.
  • 13. Final Collection • 47 XML files – #BOV, #UVA, #rally4honor, #dragasmustgo – @cavalierdaily, @LarrySabato, @Rector Drago, @strategydynamo • 47 spreadsheets – Hashtags only (#UVA, #sullivan, #BOV, #fillthelawn, #strine, #united4honor,)
  • 14. Twitter API update Re-harvest has returned ~53,000 tweets – Data issues – Deleted accounts
  • 15. Posted content • Links, pictures, video related to the story • Could not find a tool to just extract these to look through later • Many shortened links that had to be clicked on to find out what they held • Many links were retweeted
  • 16.
  • 17.
  • 18.
  • 19. Blogs and other web content • How to capture everything else • Tools for web capture – Difficult to implement – Don’t do exactly what is needed – I’m running out of time! • Solution: – I have to look at it anyway to select, so • Firefox “Save As” • Screengrab plugin for screenshot
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Web sites • No way to create web-archive standard (WARC) files at the time – ~1,000 HTML +archive – Screenshot Investigation of WAIL (Web Archive Integrated Layer) to create WARC files – Will require a re-harvest of URLs to ensure proper header metadata – But has automated way of doing this
  • 26. Facebook & “Privacy” • Rallies on grounds were organized through Facebook “groups.” • Some posts are visible only to members of the group. All others are only visible to those with a Facebook account.
  • 27.
  • 28.
  • 29. Facebook & Privacy • Facebook accounts are free • But this still means the content wasn’t “public” as per the TOS
  • 30.
  • 31. News • Relatively easy to capture • Overwhelming in volume • Why capture the online version? – Some things only appear online, some only in print – Online version, for many sources, allows commenting • Why capture this when it will be saved elsewhere? – Reference collection – Databases may capture content but not commentary
  • 33. Audio/Video • YouTube • News • WINA podcast • WUVA streaming • Streaming Board Meetings • Public Affairs
  • 34. User Contributions • Capture what the public thought was important • Possible violations of privacy or intellectual property
  • 35.
  • 36. Final Tally • Tweets: 80,000 ? • News articles: 572 • Blog posts: 147 • Other web content: 196 • Twitter pictures: 243 • Video: 69 • Documents: 21 • User-Contributed Items: 118
  • 37.
  • 38. What’s Been Done • Preliminary collection finding aid • Working with small group on twitter and web data issues • Twitter and web re- harvest • Access provided in a few cases
  • 39. What Needs to Be Done? • Access – Searching – Use • Metadata • Further appraisal decisions/de- accessioning
  • 40. What About Next Time? • Need to establish a web/social media collection plan – If we are routinely capturing certain things we won’t have to worry about them during a crisis – Tools change rapidly, working on collecting routinely will better position ourselves to adapt

Notes de l'éditeur

  1. June 10, 2012 was a Sunday, the sun was shining, Uva was just getting into the swing of summer courses, when on June 10, Teresa A. Sullivan, the President of the University, suddently and unexpectedly announced her resignation… … meanwhile, I, the Digital Archivist at Uva, prepared to attend a Rare Book School class on the NINETEENTH CENTURY BOOK TRADE for the coming week
  2. Reactions grow increasingly vocal around Grounds as both town & gown become suspicious of the motives and actions of the Board of Visitors, and especially its Rector, Helen Dragas… … meanwhile, I had turned OFF my computer for the week in order to fully pay attention to my class…needless to say, I was not the most informed person on campus about what was going on
  3. On June 18 th , the first public demonstration in support of Sullivan occurs on Uva’s historic lawn during a specially convened Board of Visitor’s meeting to discuss the resignation.
  4. At this time activities around the library began to coalesce. Several groups across the libraries began to discuss how to work together to save the historic record related to events. University Archives: already decided to attend the first rally in order to collect signs, have been in touch with faculty senate A cross-departmental group in the library including Archives, our Digital Humanities unit, and a part of our IT department formed to discuss coordinating activities and outreach. In addition, University Records Manager: consults with group about her activities/buried in FOIA requests
  5. So it was that around 9 a.m. on July 19 th , that I finally thought… Wait a minute, did the president of the university get fired or something? I’m over-emphasizing here to be funny, but the events really did catch me a bit by surprise. In addition, we hadn’t really gotten to the point of discussing things like capture of current social media or web-based information in relation to my newly-formed position at UVa. It was something that I knew would have to be addressed, but it hadn’t been yet.
  6. So I’m going to spend the rest of this morning talking about the work that I did over the weeks of the Uva leadership crisis last summer to try and capture some of the online content created by the university community in response. I think that the issues I had to figure out are really emblematic of the kinds of work that libraries, and particularly those that gather archives of unique materials, are going to have to face. Hopefully they won’t face them in the midst of a crisis, but they will have to face them eventually.
  7. And they will have to face them because the internet is no longer “ephemera.” It is a publishing platform, it is a space for interaction and creation, it is a storage medium, it is all these things. The public events here at Uva, the emergency faculty senate meeting, the protests, even the content gathered for articles in the newspapers were all based in one way or another on web-based mediums. Because of the use of these emerging technologies, I felt that capturing material related to campus events would be important not just because they documented undoubtedly historic events, but also because of the novel use of these technologies to communicate. In some sense, the medium was at least partly the message here.
  8. So, on the 19 th I set to work trying to document these various online sources. I’m going to spend the rest of my time this morning talking with you about attempts to harvest content from these sources: twitter, blogs and other web objects, facebook, news sources, and video.
  9. Twitter was the first and most important source to figure out. It was also really difficult. Twitter has an application programming interface (API), which is basically a set of open protocols that allow people to build tools that work with twitter’s data. This means that third-parties can build tools that allow you to download tweets into different data forms like xml or spreadsheets. The api limits you to no more than1500 tweets at a time. 1500 tweets is a lot, but when a topic is really popular, 1500 tweets can go by in no time at all. So time was of the essence… My goals then for creating a twitter archive were first to find a good tool for finding and saving tweets. Secondly, I needed to figure out how to get the oldest tweets related to the subject that I could. Third, I wanted to use the tweets to figure out what people were talking about, posting links to, and organizing.
  10. I ended up using two different tools. The first, shown here, is called The Archivist. It does a search, just as you would on twitter. This can be for hashtag, keyword, user profile (will get both that user’s tweets as well as those that reference them). Can save the output as XML or a tab-delimited text that can be imported into excel The biggest drawback was that the Archivist wasn’t set up to run simultaneous searches and had to be actually opened and running to capture them. This meant that sometimes once an hour if it was a busy day, I had to open the tool and do a dozen or so searches to get what had been posted in the last hour. This was obviously time-consuming and I did want to occasionally eat, sleep, or leave my desk. If it was really busy though and I waited too long between captures, I would exceed the api’s limit of 1500 tweets and be unable to capture some of them. I continued to use the Archivist, but also continued searching for another option
  11. The other tool I used was a script created for Google Spreadsheet. You just open this customized google spreadsheet, tweak a couple of lines in the associated script and let it do it’s thing. This ended up being the best option. It would capture tweets even when I didn’t have the spreadsheet open. It was saved in a spreadsheet, so it was exportable data, and it captured the most complete data of all the tools. It would crash if it became too full, which it did about once a day early on. But I quickly figured out that I could export out the data that was actually saved, then delete it from the live version and start again. So now instead of having to go through a series of procedures every hour, I only had to do it once a day, if that. I did still continue to use the Archivist as a back up. The Archivist was actually also better at searching individual accounts. But once the initial uproar died down, I didn’t need to back up using the Archivist as frequently. And I obviously felt much better having a back-up of the data. Overall, I estimate that we probably collected around 80,000 unique tweets. I have no analysis right now though of how much of that was retweeted content or irrelevant.
  12. All files contained Retweets Overlapping capture times at beginning and end Tweets from accounts that have since been deleted
  13. Twitter has since updated their API to provide access to older tweets. This means that we could re-harvest these tweets by using their twitter id number and then have a set with a u niform data format. So far we’ve re-harvested about 53,000 of them based on the twitter ids that were taken from the XML files. Another attempt to retrieve them from the spreadsheets hit a snag as the spreadsheets were formatted differently, but that should be easy to solve. Twitter has also released a basic tool for exploring an archive that could be used, called Grailbird. One problem is that if an account has been deleted since it was originally gathered, those tweets are no longer available. Once an account is deleted all of its content is removed from the stream. So parody accounts (of which there were several) that were taken down would be effectively lost. In our initial re-harvest this number was under 5%. We are considering how to map the initial capture of those tweets to match the data format of the re-harvest
  14. I realized that the twitter content could be a great lead into other web-based content related to the events. These were the websites, videos, pictures and articles that people were really talking about and which formed their conception of events. The issues were that I didn’t readily have on hand a tool that would extract these links for me for referral. The api would make it possible to have a tool that would do that and I have used a tool like that in the past, but it was open source and is no longer available because it was bought by a third party who discontinued that service. I didn’t have a lot of time to exhaustively look for tools that performed this activity, so I spent a lot of time clicking on links. The two main issues with this was that many people shorten their URLs so that there was no way to tell what they led to without clicking on them. In addition, many people retweeted links. So, I saw the same sources again and again when I clicked on them
  15. This is an example (this is just from my current twitter feed, by the way) that shows what it looks like when someone posts a link. The one item in the square there is a link to a picture. Twitter allows you to post pictures natively, so instead of posting them somewhere like instagram or facebook and linking to them, they really only exist within your twitter account. The only way to ensure these weren’t lost was to try and grab them when I saw them.
  16. This is an example of a twitter picture. As you can see the picture just shows up within your twitter wallpaper, so that adds somewhat to the context of how these were presented.
  17. Some of these pictures were really great and showed a view of the events that didn’t really make it into official sources and captured some things, like the slogans on the beta bridge, that didn’t really last very long
  18. So, I began collecting links to sources, but they needed to be captured very soon in order to ensure that they were grabbed before they possibly disappeared. I knew that tools existed to set up web crawls, and that these tools were the basis of efforts at other institutions to capture their online presence. However, these tools are somewhat difficult to implement, requiring some sophisticated configuration. The output they produce is somewhat limited as well. The type of capture that I wanted to do (one post on a blog, for example, not the entire thing) was also not exactly the same as web-crawling. In the end, I realized that I was looking at each of these sources to decide if they should be included, I could just use the “Save-as” command in firefox to save them as a HTML document with a folder of associated content. In addtion, I used a firefox plugin called Screengrab to make a screenshot of the entire page. After twitter and facebook (which I’ll discuss in a minute), this content is some of the best that was captured. It was completely unmediated and therefore was sometimes interesting, sometimes completely uninformed and biased, occasionally hilarious, and really captures the essence of reactions to the situation.
  19. Overall, they add a really human element and they give the face of the everyday public embroiled in the event. This kind of narrative of the average person is something that archivists and historians really prize. The public statement of great figures tends to be valued and kept, while that of the common person can slip through the cracks. It also highlights how much of the message of these events was about personalizing the event: “I AM UVA” and how much people self-identify with the university
  20. On the other hand, there were some really serious and well-read analyses of events on blogs and other unvetted sources
  21. Some, like this particular blog post by a UVA alum really galvanized people to discuss what was going on (and share conspiracy theories) in the comments
  22. Other things were interesting in how they took advantage of the media
  23. And how they tried to use social media as a tool to effect change in a grass roots way (this is a petition on Change.org)
  24. At the time I had no easy way to create web-archive files (WARC) for individual pages. Tools exist to create these fiels, most notably Heritrix which is the tool used for the internet archive. But these do web crawls…you supply a seed page and then it follows links to a depth that you determine and it continues to gather content as it goes along. I didn’t want to crawl, but in some cases I did want to capture page two of an article or something like that. I ended up using the “Save As” command in firefox which saves an HTML file and a folder of associated files like stylesheets, scripts, and images. I also used a firefox plugin called “Screengrab” to create a screenshot. Since then I’ve learned of a tool called WAIL, which stands for Web Archive Integrated Layer, as well as another called WARCreate which is a chrome extension. Both will do exactly what I needed, which is create a WARC file of a single page. However, as both are side projects by a computer science grad student with a lot of competing demands, as yet they have not proven viable (i.e. I’ve downloaded and had mixed results). The plan however, is to try and re-harvest all of this content (I saved a spreadsheet with the URL, source, and date of every web page captured). An interesting question will be how much has been removed since then…
  25. A subset of this kind of content with some particularly unique characteristics is facebook. The main rallies that took place on grounds were largely organized through facebook “Groups”…anyone on facebook can start a group and it’s just another wall where the members can post content as a way of discussing it with each other. Group administration allows the administrators and members to make some of their content visible to their members only. Collecting this would seem to be a violation of privacy since membership had to be requested and granted by an administrator. However, none of the content at all was visible to someone without a facebook account.
  26. So this is what the group for Students, Families, & Friends United to Reinstate Teresa Sullivan looks like if you aren’t logged in to facebook.
  27. But if I just sign in as a facebook-user, but not as a member of this group, I can see all of these posts as well as who is a member
  28. So there are a lot of questions here, facebook accounts are free and anyone can get one, so the default of having groups invisible to the public but visible to facebook users seems somewhat contradictory (and a way for facebook to get more people to join). We were advised by our University Legal Counsel that we should go ahead and capture the content and keep a preservation copy, but not to make it accessible without first discussing the issue with facebook.
  29. This group evolved over time, and I tried to capture it as it changed its look and message. By this point things were far more organized and focused on planning events rather than just joining together to share outrage. I also changed my facebook profile picture. An interesting issue I didn’t think about until was too late was that I needed to log in as myself in order to see these at all since we didn’t have a departmental account, and even if we did I doubt we would have set it up for this reason (although we may do so in the future). This is kind of embarrassing for me as, at the time my facebook profile picture was me and my dad in 1979 after I got a bath… (removing this from these pages is one of the next steps I want to work on…) In retrospect, setting up a facebook account for the department in order to capture pages could be a better solution. The profile could be public or private depending on what else we wanted to do with it.
  30. The other big source of online content was from news sources: papers, radio, and tv. In general this content isn’t much different from the other web content, but it did get to be really overwhelming in volume. The nice thing about it though, was that since these were more established sources with significant resources, I was less worried about the content disappearing. So much of the material I’ve collected from these sources has been gathered after the fact. The question of why to capture this content is an intriguing one. We have also collected the paper versions of the local paper and some of the local weeklies and a lot of the content is redundant between those two sources. In addtion, the content of many of the papers is aggregated into databases like LexisNexis. Some things do appear only in one source or the other, so gathering both web and paper for things that are not preserved elsewhere makes sense. The paper also captured a lot of intangible factors. For example, seeing the huge bold REINSTATED headline on the top of the Washington post there (this is a scan of the front page grabbed from the Newseum) carried a different message than the online version. The other question of why save it locally when it is saved elsewhere, has two answers. First we are creating an easier access point for researchers. So in this case the capture really has to do more with access than preservation. But, the databases only grab content, not comments. The commentary is really interesting and this is one of those places where the medium is really shaping the message. That sort of content would not have typically been captured in the past unless someone wrote a letter to the editor and it was published. Even then it would only be one side of the story, not the ongoing dialogue that is found in some article (and, to be honest, a lot of trolling, spam and other nonsense as well). We decided that capturing this was most important for the local papers, and so have tried to be exhaustive with those. We also are trying to capture them at least twice in case details are updated over time..
  31. The only issue we’ve found is with newspapers that require a subscription to view it’s “premium” content. Even when the article is downloaded during a free trial, the script which triggers this authentication is still saved and so it pops up and requires authentication before you can read the content in the HTML view. The HTML content is still saved however, and could be read in the code directly. In the end I decided that capturing this content wasn’t worth this particular barrier to access and we collected only a limited number of articles from this source (the local newspaper)
  32. A number of online sources involved some type of audio and video and these presented some particular difficulties for capture The most prevalent one was YouTube, which seems pretty obvious. While these are public posted and there isn’t any privacy violation to capture them, there is not an easy way to download them. YouTube’s basic license state that the owner is placing them on YouTube for access and basically says that they are not there to be downloaded. Users can opt to use a Creative Commons license instead which doesn’t have this restriction, but then the issue how to download is still present. I found another Firefox plugin called “download helper” that enabled me to download the ones to which we felt we could do so. We again consulted our legal counsel who advised us to do the same as for facebook: download for preservation, talk with YouTube before doing anything else. News sources also do not encourage downloading, so we kept a list of these videos and could therefore ask for them at a later time. Several events were actually streamed online, which by default, means there is no download. There are tools that will allow you to hack a stream and download a copy in real time, but I did not feel that was worth it. We’ve established a relationship with the Public Affairs office on campus who want to deposit their original video with us as well.
  33. Finally, although I didn’t mention this at the top, I want to also note the creation of the online contribution site created by Scholar’s Lab. Everyone involved agreed that this would be a great way to capture what the public thought was important, especially since we could have eyes to see everything. We realized that there would be issues though if we didn’t protect ourselves from the possibility of people posting content that infringed on someone else’s copyright. To protect against this we actually had a member of the University Legal Counsel approve a disclaimer regarding rights and created an option in the contribution form to allow people to indicate whether or not they wanted an item to be public. We also required that one of us on the staff approve the item before it was public. So far, we’ve had over 100 contributions. There have been pictures, video, copies of emails, and links to online sources.
  34. A lot of this content are not images that I’ve collected from other sources and in some cases it provides a better source than anything else I’ve collected. This picture, for example, documents one of the signs that we really liked, but haven’t yet gotten as a donation.
  35. So the final tally of what we have in a digital format so far: Tweets: 80,000 News articles: 571 Blog posts: 147 Other web content: 196 Twitter pictures: 243 Video: 69 Documents: 21 User-Contributed Items: 118 These numbers will continue to grow, I’m sure. For example, this does not take into account pictures and video from public affairs. And this is in addition to the 100 or so rally signs and a couple dozen newspapers. Some of the rally signs are simply too large or fragile for us to properly store, so we will probably scan them for access and dispose of them, thereby growing the collection more.
  36. As for what happened with the governance crisis, the President was reinstated by the Board of Visitors 18 days after resigning, however the Rector of the University was also re-appointed by the Governor and still holds her office. Relations between the Board and particularly the faculty continue to remain very strained and information is still coming out about the events of last summer. We have decided that our archive will only pertain to the events of those two weeks and we are no longer actively collecting social media.
  37. As far the work that has been done so far, a preliminary finding aid has been created. A small group of technical folks have helped to think through the data issues and we’ve begun to plan for the twitter and web re-harvest. I’ve provided access to the twitter data set in a couple of cases for students and faculty.
  38. But that still leaves a lot to actually figure out. When we are ready to provide more routine use in the reading room questions of how users will navigate and search the collection will need to be figured out. The objects also need to be prepared for ingest into our repository, which primarily means figuring out their metadata needs: technical, descriptive and preservation. And finally, we may decide to go through a do further appraisal and de-accessioning before doing that ingest.
  39. So, with that, I would be happy to take any questions and I thank you for being here today.