Unleash Your Potential - Namagunga Girls Coding Club
SWORD2 and Bittorrent
1. SWORD2 &
BITTORRENT
A Network Admin’s Worst Nightmare
Tim Brody, Damian Steer,
Sander van der Waal, Steve Welburn
2. WHAT IS SWORD2?
SWORD2 is a protocol for depositing stuff and its metadata
with a repository. It's implemented as a profile of the Atom
Publishing Protocol, which is roughly:
Client GETs service document from server
Client POSTs stuff for deposit and metadata to url
mentioned in service document
Server responds with 'created this at url'
3. Client can GET url
Client edit url content with PUT
Client can DELETE url
Atom originated in blogs, and SWORD2 essentially just
expands the metadata used.
4. THE PROBLEM WITH
BIG DEPOSITS
Big deposits take ages to transfer, which makes them
suseptible to interruptions due to error, or simply boredom
('Oops, I closed my laptop...'). In itself that ought to be
recoverable since HTTP supports partial uploads using the
range header. However if you look at steps 2 and 3 above
you may see a problem:
Client POSTs stuff for deposit and metadata
Server responds with 'created this at url'
5. THE IDEA
Send a reference to content via SWORD, rather than the
content itself.
We could use any number of schemes then, such as ftp,
rsync or http. (HTTP will work fine this way around because
the content has an identity and could be resumed)
(Aside: it's also interesting that a repository could chose not
to download, such as situations where the data is stored in a
national subject repository)
6. OR BITTORRENT
Unlike rsync, ftp, or http, there are many 'server'
implementations, with nice GUIs, for a variety of platforms
in a number of languages. ('server' and 'client' labels aren't
especially helpful with bittorrent)
Handles partial downloading with ease.
No packaging required: moving directories is as easy as
individual files.
7. WHAT DO YOU NEED?
A bittorrent client at the depositor's end. This is where the
files start.
A bittorrent client at the repository end. This is where the
files will appear.
A bittorrent tracker.
8. WHAT WE NOW KNOW
Bittorrent is a peer-to-peer network. The clients are peers, it
just happens that some have all the data ('seeders'), and
some are seeking data ('leechers'). Data is identified (very
roughly) using a hash of the content.
Clients need to find each other, and to do this they use
servers called 'trackers', the URLs of which are included in
torrent files. Trackers are pretty simple: you can contact
them to say 'I am interested in X', and find other clients
interested in X.
9. USING SWORD2 AND
BITTORRENT
Uploader opens bittorrent client, and creates a torrent file
for a file or directory.
The tracker used may be the repository itself.
SWORD deposit is made as usual, but the content is a
torrent file.
Content will be deposited.
10. IMPLEMENTATION
Tim has / is making EPrints a bittorrent tracker.
It will spot torrents uploaded via SWORD.
Uses transmission-cli to download.
Steve is making a deposit client.
Makes a torrent file, opens in torrent client, and uploads via
SWORD.
11. INTERESTING STUFF
This really helps with the other issue of large datasets:
downloading. I hope people will typically want individual
files, but this would allow full downloads without killing the
server.
12. MORE INTERESTING
STUFF
It's robust, and actually quite secure. You can't download
without the torrent file.
Can limit torrents to a particular tracker.
The tracker also provides basic usage information.