This document discusses reconstructing past versions of pages on MediaWiki by programmatically accessing historical data stored in the system. It describes how the Memento MediaWiki extension currently handles retrieving old revisions of articles and templates. Prototypes have been created for embedding past images as well. While the data exists, there is currently no way for extensions to access or render prior embedded JavaScript and CSS. The goal is to provide a uniform solution for accessing historical versions of all page resources according to the Memento protocol standard.
Effectively Troubleshoot 9 Types of OutOfMemoryError
Reconstructing Past Wiki Pages Programmatically
1. Reconstructing the past with
MediaWiki:
Programmatic Issues and Solutions
Shawn M. Jones
sjone@cs.odu.edu
Old Dominion University
2. Reconstructing the Past with the
Internet Archive
HTML
Images
JavaScript
CSS
Our goal: Temporal Coherence
Make the page look as it looked at the time it was archived.
3. Some Results from the Internet
Archive Are Lacking
Images change between the time
the Archive crawls the main page
and the time it gets to the images
Sometimes embedded images
are missing when the Archive
gets to them
Sometimes the page is designed
for a specific browser in mind
Image from “A Framework for Evaluation of Composite Memento Temporal Coherence”
by S. Ainsworth, M. L. Nelson, H. Van de Sompel. http://arxiv.org/abs/1402.0928
10. Accessing Old Article Text
The oldid argument references a revision of a page
within MediaWiki's database
Merely visiting the URI with the oldid will give you the
text content of the page as it existed at that revision
12. Including the Right Template
This gives us:
$title - the Title object for the given page
$parser - the Parser object for the given page
$id - the revision ID (oldid) for the Template page
Using $parser, and $title, we can change the $id and
fetch an old revision of the Template
13. Reconstructing the Past
Articles
Handled by
Memento MediaWiki Extension
Templates
Handled by
Memento MediaWiki Extension
Embedded Images
Embedded JavaScript
Embedded CSS
14. But What About Images?
This Map is important to
understanding the
content of this article
This image is changed
as the article is
changed, to reflect its
content
15. It’s the same map if we look at the
June 6, 2013 revision now
Users can't view this
embedded resource as
it looked on June 2013
while reading the article
from that time period
16. What should have happened
This is the the map from
June, 2013 that should
have been displayed
This is the current map
The content of the article won't match the data in this visual aide, possibly
confusing a user who wanted historical information on this topic
17. We Tried To Solve This
Upon further inspection of the code in MediaWiki, the $time argument
from this function is never used as detailed here
18. We Just Solved This
Upon further inspection of the code in MediaWiki, the $file argument’s
getHistory() function can be used to acquire previous revisions of images
19. Reconstructing the Past
Articles
Handled by
Memento MediaWiki Extension
Templates
Handled by
Memento MediaWiki Extension
Embedded Images
Prototyped for future version of
Memento MediaWiki Extension
Embedded JavaScript
Embedded CSS
21. We Couldn’t Solve This
The data is present, but we could not find any way for an
extension to access or render it.
22. Recap on Reconstructing the Past
Articles
Handled by
Memento MediaWiki Extension
Templates
Handled by
Memento MediaWiki Extension
Embedded Images
Prototyped for future version of
Memento MediaWiki Extension
Embedded JavaScript
Requires changes to MediaWiki
Embedded CSS
Requires changes to MediaWiki
23. Uniform solution
• RFC 7089, Memento, was designed to provide
uniform access to past versions of all resources
on the Web
• Memento provides a web standard to access
these resources