SlideShare une entreprise Scribd logo
1  sur  11
The Need for Long Term Preservation of Weblogs: the
                               BlogForever Project


                                          Ilias Trochidis
                               Aristotle University of Thessaloniki
                                              Greece




Workshop 7c, 18 October 2012               eChallenges e-2012   Copyright 2012 BlogForever
State of the Blogosphere

    • Blogs have become fairly established as an online
      communication and web publishing tool.
    • Hundreds of millions of blogs are published about every
      conceivable subject




                                                              ress.com
                                       ly ba   sis in WordP
                            d on a week
            new blogs create
  Number of




Workshop 7c, 18 October 2012                                             eChallenges e-2012   Copyright 2012 BlogForever
The problem of Blog Preservation

     • Despite the fast growth of blogosphere, there is still no
       effective solution for ubiquitous semantic weblog
       archiving, digital preservation, management and
       dissemination:
             – Current web preservation initiatives are geared towards aggregating
               and preserving html pages and not information entities (posts,
               comments, authors, metadata, dates, pingbacks, etc)
             – Current web archiving efforts disregard the preservation of Social
               Networks and interrelations between the archived content (meme-
               effect)
             – Current web archives cannot identify topics, subjects or events
               (monolithic). There is no generic web archiving solution capable to
               implement arbitrary subjects and topic hierarchies.




Workshop 7c, 18 October 2012           eChallenges e-2012   Copyright 2012 BlogForever
The disappearing web




http://gigaom.com/2012/09/19/the-disappearing-web-information-decay-is-eating-away-our-history/


Workshop 7c, 18 October 2012                              eChallenges e-2012                  Copyright 2012 BlogForever
Blog archiving evaluation

     • Example: In the “Blogs of War: Weblogs as News” paper
       there were documented 29 blogs on the Iraq war:
     • of those 29 blogs,
        – 13 (45%) on June 2012 no longer exist on the Internet,
        – Only 9 blogs (31%) still contained information on the Iraq
           war
        – 12 out of the 20 (60%) blogs that don’t exist were
           preserved by the Internet Archive (problems with missing
           photos, comments not archived etc.)

     • blogs on major events have already been lost



Workshop 7c, 18 October 2012   eChallenges e-2012   Copyright 2012 BlogForever
BlogForever objectives




Workshop 7c, 18 October 2012   eChallenges e-2012   Copyright 2012 BlogForever
The BlogForever architecture




Workshop 7c, 18 October 2012    eChallenges e-2012   Copyright 2012 BlogForever
Impact

      • Output: a simple weblog archiving solution that any user,
        user group or institution could use to preserve their
        collections of weblogs ensuring:
             – authenticity, integrity, completeness, usability, long term
               accessibility
      • Parties that will benefit: Bloggers, Universities, Libraries &
        Information Centres, Museums, Education, Research,
        Business
      • Examples:
             – CERN will create a repository with all physics blogs
             – National Documentation Centre of Greece will create a repository
               with academic blogs
             – a National Library of Medicine would like to preserve a collection of
               health and medicine blogs

Workshop 7c, 18 October 2012            eChallenges e-2012            Copyright 2012 BlogForever
Business Model

     • BlogForever as a service (single installation that can be
       used as a service by users and institutions)
     • BlogForever as a software (open source distribution)

     • Universities, Research Institutes, Archives, Governments,
       Blog Communities will be able to easily preserve their
       collections of weblogs
     • BlogForever will assure the preservation, the aggregation,
       the management and the dissemination of these collections

       • Do you need to preserve some blogs? We can setup a
                    BlogForever archive for you.

Workshop 7c, 18 October 2012   eChallenges e-2012   Copyright 2012 BlogForever
Future Work

     • Analyse blog archives in order to gain a better
       understanding of the content and provide new services:
             – Use Linked Open Data to link archived blog content with other web
               content
             – Apply Semantic Extension of Tags to understand them better and
               reuse them for multiple purposes.
     • In any case, use Ontologies to interpret and reason with
       information.
     • Data mining in order to extract information from the
       archives and transform it into an understandable structure
       for further use.
     • Brand reputation management and market sector repute
       analysis

Workshop 7c, 18 October 2012          eChallenges e-2012     Copyright 2012 BlogForever
Thank you!




                                      Any Questions?

         Visit: http://blogforever.eu to learn more.
               http://twitter.com/blogforever
             http://facebook.com/BlogForever
     The research leading to these results has received funding from the European Commission Framework Programme 7
                                     (FP7), BlogForever project, grant agreement No.269963.



Workshop 7c, 18 October 2012                     eChallenges e-2012            Copyright 2012 BlogForever

Contenu connexe

Similaire à BlogForever eChallenges 2012

Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data EducationAlexander Mikroyannidis
 
Words to the wise
Words to the wiseWords to the wise
Words to the wiseJohn Mason
 
Cit Discovery Learning Stillwell - PDF
Cit Discovery Learning Stillwell - PDFCit Discovery Learning Stillwell - PDF
Cit Discovery Learning Stillwell - PDFNanette Stillwell
 
SlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using PresentationsSlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using PresentationsAli Khalili
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data EducationEUCLID project
 
EPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for LearningEPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for LearningLey Leal
 
Escholars Session 2
Escholars Session 2Escholars Session 2
Escholars Session 2Kim Flintoff
 
Practical Blog Preservation (Workshop)
Practical Blog Preservation (Workshop)Practical Blog Preservation (Workshop)
Practical Blog Preservation (Workshop)Richard Davis
 
Learning Design in the Open: rethinking our courses for tomorrow's learners
Learning Design in the Open: rethinking our courses for tomorrow's learnersLearning Design in the Open: rethinking our courses for tomorrow's learners
Learning Design in the Open: rethinking our courses for tomorrow's learnerswitthaus
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculumYezenia C
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculumYezenia C
 
Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01Neo Ntlhokoa
 
Who needs a repository when you’ve got Google? Information and Digital Litera...
Who needs a repository when you’ve got Google? Information and Digital Litera...Who needs a repository when you’ve got Google? Information and Digital Litera...
Who needs a repository when you’ve got Google? Information and Digital Litera...Nick Sheppard
 
Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)Matthew Mobbs
 
Blogging Workshop
Blogging WorkshopBlogging Workshop
Blogging WorkshopLisa Harris
 

Similaire à BlogForever eChallenges 2012 (20)

Read my blog
Read my blog Read my blog
Read my blog
 
Session3
Session3Session3
Session3
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
 
Words to the wise
Words to the wiseWords to the wise
Words to the wise
 
Cit Discovery Learning Stillwell - PDF
Cit Discovery Learning Stillwell - PDFCit Discovery Learning Stillwell - PDF
Cit Discovery Learning Stillwell - PDF
 
SlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using PresentationsSlideWiki: Elicitation and Sharing of Knowledge using Presentations
SlideWiki: Elicitation and Sharing of Knowledge using Presentations
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
 
EPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for LearningEPE 312 Web 2.0 Tools for Learning
EPE 312 Web 2.0 Tools for Learning
 
Escholars Session 2
Escholars Session 2Escholars Session 2
Escholars Session 2
 
Practical Blog Preservation (Workshop)
Practical Blog Preservation (Workshop)Practical Blog Preservation (Workshop)
Practical Blog Preservation (Workshop)
 
Wolce 2012 role
Wolce 2012 role Wolce 2012 role
Wolce 2012 role
 
Learning Design in the Open: rethinking our courses for tomorrow's learners
Learning Design in the Open: rethinking our courses for tomorrow's learnersLearning Design in the Open: rethinking our courses for tomorrow's learners
Learning Design in the Open: rethinking our courses for tomorrow's learners
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculum
 
Integration of technology_into_the_curriculum
Integration of technology_into_the_curriculumIntegration of technology_into_the_curriculum
Integration of technology_into_the_curriculum
 
Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01Wollongong 090408232854-phpapp01
Wollongong 090408232854-phpapp01
 
Who needs a repository when you’ve got Google? Information and Digital Litera...
Who needs a repository when you’ve got Google? Information and Digital Litera...Who needs a repository when you’ve got Google? Information and Digital Litera...
Who needs a repository when you’ve got Google? Information and Digital Litera...
 
Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)Building an ePortfolio using Web 2.0 Technologies (2009)
Building an ePortfolio using Web 2.0 Technologies (2009)
 
OEP PPT 1
OEP PPT 1OEP PPT 1
OEP PPT 1
 
Classroom2.0
Classroom2.0Classroom2.0
Classroom2.0
 
Blogging Workshop
Blogging WorkshopBlogging Workshop
Blogging Workshop
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

BlogForever eChallenges 2012

  • 1. The Need for Long Term Preservation of Weblogs: the BlogForever Project Ilias Trochidis Aristotle University of Thessaloniki Greece Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 2. State of the Blogosphere • Blogs have become fairly established as an online communication and web publishing tool. • Hundreds of millions of blogs are published about every conceivable subject ress.com ly ba sis in WordP d on a week new blogs create Number of Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 3. The problem of Blog Preservation • Despite the fast growth of blogosphere, there is still no effective solution for ubiquitous semantic weblog archiving, digital preservation, management and dissemination: – Current web preservation initiatives are geared towards aggregating and preserving html pages and not information entities (posts, comments, authors, metadata, dates, pingbacks, etc) – Current web archiving efforts disregard the preservation of Social Networks and interrelations between the archived content (meme- effect) – Current web archives cannot identify topics, subjects or events (monolithic). There is no generic web archiving solution capable to implement arbitrary subjects and topic hierarchies. Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 5. Blog archiving evaluation • Example: In the “Blogs of War: Weblogs as News” paper there were documented 29 blogs on the Iraq war: • of those 29 blogs, – 13 (45%) on June 2012 no longer exist on the Internet, – Only 9 blogs (31%) still contained information on the Iraq war – 12 out of the 20 (60%) blogs that don’t exist were preserved by the Internet Archive (problems with missing photos, comments not archived etc.) • blogs on major events have already been lost Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 6. BlogForever objectives Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 7. The BlogForever architecture Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 8. Impact • Output: a simple weblog archiving solution that any user, user group or institution could use to preserve their collections of weblogs ensuring: – authenticity, integrity, completeness, usability, long term accessibility • Parties that will benefit: Bloggers, Universities, Libraries & Information Centres, Museums, Education, Research, Business • Examples: – CERN will create a repository with all physics blogs – National Documentation Centre of Greece will create a repository with academic blogs – a National Library of Medicine would like to preserve a collection of health and medicine blogs Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 9. Business Model • BlogForever as a service (single installation that can be used as a service by users and institutions) • BlogForever as a software (open source distribution) • Universities, Research Institutes, Archives, Governments, Blog Communities will be able to easily preserve their collections of weblogs • BlogForever will assure the preservation, the aggregation, the management and the dissemination of these collections • Do you need to preserve some blogs? We can setup a BlogForever archive for you. Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 10. Future Work • Analyse blog archives in order to gain a better understanding of the content and provide new services: – Use Linked Open Data to link archived blog content with other web content – Apply Semantic Extension of Tags to understand them better and reuse them for multiple purposes. • In any case, use Ontologies to interpret and reason with information. • Data mining in order to extract information from the archives and transform it into an understandable structure for further use. • Brand reputation management and market sector repute analysis Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever
  • 11. Thank you! Any Questions? Visit: http://blogforever.eu to learn more. http://twitter.com/blogforever http://facebook.com/BlogForever The research leading to these results has received funding from the European Commission Framework Programme 7 (FP7), BlogForever project, grant agreement No.269963. Workshop 7c, 18 October 2012 eChallenges e-2012 Copyright 2012 BlogForever

Notes de l'éditeur

  1. Blogger is the largest of these sites with more than 46 million unique U.S. visitors during October 2011, making it second only to Facebook in the social networking category tumblr.com counts 38,884,272 total blogs with 53,399,798 posts on the 29 th of December while in July 2009 the number of posts per day was 650,000 facebook or microblogging sites such as Twitter have supported the growth of blogs by delivering traffic to content which originated in blogs
  2. 1. Current web preservation initiatives are geared towards aggregating and preserving files and not information entities. For instance, the Internet Archive aggregates web pages and stores them into WARC files (ISO 28500:2009), compressed files similar to zip which are assigned a unique identification number and stored in a distributed file system. Additionally, WARC supports some metadata such as provenance and HTTP protocol metadata. Implicit page elements, such as: · Page title, headers, content, author information, · Metadata such as Dublin Core elements, · RSS feeds and other Semantic Web technologies such as Microformats (Khare R.) and Microdata (Ronallo J.) are completely ignored. This impacts greatly the way stored information is managed, reducing the utility of the archive and also hindering the creation of added-value services.   2. Current web archiving efforts disregard the preservation of Social Networks and of interrelations between the archived content. However, weblog interdependencies demonstrated by the identification of central actors and peripheral weblogs, as well as by the meme-effect that applies to them, need to be preserved, to provide meaningful features to the weblog repository.   3. Current web archive scope is limited to monolithic regions, subjects or events. There is no generic web archiving solution capable to implement arbitrary subjects and topic hierarchies. For instance, the National Library of Catalonia has initiated a web crawling and access project aiming to collect, process and provide permanent access to the entire cultural, scientific and general output of Catalonia in digital format (PADICAT). Alternatively, the Library of Congress has developed online collections for isolated historical events such as September 11, 2001 (Library of Congress). There is an ongoing debate, about benefits or disadvantages of one or another long-term preservation methodology. Many papers have been written and many conferences dedicated to this issue have appeared. It is surprising however, how little has been done at practical level.
  3. Mention the advantages of the archives