A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems

A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 1

A Survey on Large-Scale Decentralized Storage
Systems to be used by Volunteer Computing
Systems
Umit Cavus Buyuksahin, Maria Stylianou, Nicos Demetriou, Muhammad Adnan Khan

Abstract—Over the last decades, distributed systems are pro- their capacity. Due to this demand, researchers turn to unused
moted for extended computations and are presented as the ideal storage resources. Globally, there are many personal computers
storage space for large amounts of data. Distributed Storage whose resources are not fully used by their owners. Volunteer
Systems have been moved from the centralized architecture to a
more decentralized approach. This change allows such systems to computing systems aim to use these storage for enormous-
be used by volunteer computing systems, where the exploitation sized computations by considering them as if they were parts
of any available storage and resources is essential and greatly of a huge supercomputer. This is a powerful way to utilize
needed. This survey explores the characteristics of scalable distributed resources, in order to complete large-scale tasks.
decentralized storage systems that can be used by volunteer Volunteer computing systems have two main bases [7]. The
computing systems and discusses the various existing systems in
terms of the specified characteristics. For each surveyed system first one is the computational base, in which large computa-
we give a brief description and whether the required properties tion tasks are split into smaller tasks which are assigned to
are ensured. volunteer participants’ computers. The second base is called
Index Terms—decentralized storage systems, volunteer com- participative base and it deploys large number of volunteer
puting systems participants who offer their resources.
One of the well known volunteer computing systems is
SETI@home launched by BOINC projects [8]. Nowadays,
I. INTRODUCTION SETI@home works with about one million computers which
Storage is one of the fundamental parts of the computing provide approximately 70 TeraFLOPs processing rate [8].
[1]. Although it has lower speed than RAM, it has great Of course this resource usage can be increased when we
persistence and low cost. Thus, central storage systems were look at the potential resource in the world. However this is
constructed and focused on reliability, stability, and efficiency. unnecessary since the network is growing rapidly.
However, nowadays computation is not limited on a central These volunteer computing systems produce huge amounts
storage space, but it is executed in a global environment, of computational data that should be stored. This data may
like Internet. As Internet becomes part of this computation, it be used for later processing or sharing with other scientific
produces huge amounts of information that need to be gathered organizations that may contribute to science area. However,
and stored. For addressing this challenge, distributed storages today’s volunteer computing systems use centralized stor-
systems are introduced. In this design, data stored by hosts age systems [9] to distribute data to participants. It suffers
become geographically distributed. Because of this distribu- from limitations of centralized storage systems such as fault-
tion and the appearance of huge demands, new challenges tolerance, availability and scalability.
arise, such as fault-tolerance, availability, security, robustness, In order to pass over these limitations, new storage systems
survivability, scalability, anonymity. are developed which are decentralized and can be used by
With the grow of Internet, distributed storage systems are volunteer computing systems efficiently. As previously men-
able to scale using larger amounts of users. This growth tioned, there are many kind of decentralized storage systems.
has emerge the difficulty of having one central point for However, not all of them are suitable to be used in volunteer
administrating the system. Therefore, it is observed in other computing systems. In this survey we study several storage
surveys that these systems are moving from the centralized systems, we discuss their characteristics and challenges and we
architecture to a more decentralized approach [1]. propose the most proper one to be used in volunteer computing
Meanwhile, supercomputers are situated among us exe- systems.
cuting big computations which require huge storage, power The rest of the paper is organized as follows: In section 3,
and computational resources, and lead to a rapid decrease of we present related work done by other researches in the field.
In section 4, design issues of decentralized storage systems
Umit Cavus Buyuksahin, Universitat Politecnica de Catalunya (UPC). E- that can be used in volunteer computing systems are examined
mail: ucbuyuksahin@gmail.com
Maria Stylianou, Universitat Politecnica de Catalunya (UPC). E-mail: by extracting characteristics. In section 5 we briefly overview
marsty5@gmail.com some of the existing decentralized storage systems. Later on,
Nicos Demetriou, Universitat Politecnica de Catalunya (UPC). E-mail: in section 6 we compare them regarding their characteristics
nicosdem7@gmail.com
Muhammad Adnan Khan, Universitat Politecnica de Catalunya (UPC). E- and benefits and propose the most suitable one to be used in
mail:malikadnan78@gmail.com volunteer computing systems. Finally, in section 6 we conclude


the survey with our final remarks about the systems studied. anonymity in volunteering can increase the number of par-
ticipants which is highly appreciated and encouraged. What is
II. R ELATED W ORKS more, anonymity can be a way to prevent the denial of access
for special groups of people, which is possible when personal
In this section we present the different surveys related
information is shared.
to the subject that we are focused on. [3] discusses the
5) Robustness: Both types of systems, storage and vol-
different properties of the Peer-to-Peer based distributed file
unteer computing are prone to failures, as machines may
systems. It shows the various benefits of using P2P systems,
crash, reboot, or change location with different network char-
the design issues and properties. In addition it presents the
acteristics and capabilities. In order to efficiently associate
major distributed file systems comparing the advantages and
decentralized storage systems with volunteer ones, the former
disadvantages for each one in detail. As well, [4] provides an
systems should be robust enough to handle these changes and
insight into existing storage systems, giving a good overview
repair themselves in the case of failures, in order to preserve
of each and describes the important characteristics they should
this advantage in volunteer computing systems as well.
have. In [1], a variety of distributed storage systems is covered
in depth, presenting their functionalities and putting the reader
into the problems that these systems face and the solutions IV. D ECENTRALIZED S TORAGE S YSTEMS
proposed to overcome them. A quite short but rich paper is In the following section, we present a short summary for the
[2] discusses the evolving area of distributed storage systems storage systems studied, referring to the previously explained
and gives a brief summary of some related systems in order properties.
to provide a broader view for the subject.
A. FreeHaven
III. P RINCIPAL C HARACTERISTICS OF D ECENTRALIZED FreeHaven [10] firstly came with a solution about
S TORAGE S YSTEMS anonymity whose implementation is not commonly handled by
Several decentralized storage systems have been proposed distributed storage systems. This means that it provides peers
over the last years. However, not all of them are suitable to distribute and share data anonymously by protecting peers’
for volunteer computing. Specific characteristics should be identity. The other goals of FreeHaven are: (a) Persistence for
examined and we should ensure their existence in the intended determining lifetime of documents, (b) Flexibility for changing
storage systems, in order to meet the requirements of volunteer systems functions, (c) Accountability for limiting damage to
computing systems. Below, we analyze the most important system.
ones, their specifications and effects. Since there is not a hierarchy and all nodes are on the
1) Symmetry: Symmetry is a desired characteristic as much same level, it is a pure peer-to-peer system, it is symmetric
for decentralized storage systems as for volunteer computing and balanced. Despite of the fact that nodes do not have spe-
systems. In the case of storage systems, and more precisely in cial capability unlike client-server systems, they have special
pure peer-to-peer systems, symmetry exists when all peers are roles such as the author who initially creates documents, the
on the same level with equivalent functionality [3]. Similarly, publisher who put the documents to FreeHaven system, the
in the case of volunteer computing systems, each volunteer reader who takes documents from systems, and servers who
participant does not have priority nor a special treatment provide storage. All these nodes have a pseudonym and nodes
compared to others. Also, volunteers do not need a permission know each other by their pseudonym. Thus, locating the peers
from an administrator to execute a task or to save data. This is a difficult issue. In addition, tracing the routes is difficult
is done by definition independently and automatically. issue as well, since FreeHaven uses onion routing that is used
2) Availability: In volunteer computing systems, it is ex- for broadcasting the queries. The difficulties in both locating
pected that participants can not be enforced to enter the system peers and tracing the routes is for protecting the user identity
or leave the system in specific moments. Data should be reach- that means supplying anonymously communication. Server
able independently from the peers status, from their location nodes periodically trade parts of documents called shares with
and from the time of the request. Therefore, availability is an each other. That trading gives flexibility to the system in
essential property for decentralized storage systems in order the sense that servers can join and leave easily and without
to be used in volunteer computing systems. special treatment. For trading, nodes are chosen by a node
3) Scalability: Another important issue that has to be list that is ordered by reputation. While a successful trade
considered in both storage and volunteer computing systems, is increases the node’s reputation, malicious behavior decreases
the system’s scalability. Apparently, in decentralized systems, it [1]. In order to avoid malicious behavior and limiting
it is mandatory that they can scale enough regarding the damage the system, each node notifies its buddies about share
number of nodes. Scalability is an essential property for these movements. This buddy mechanism supplies accountability.
systems, in order to ensure that their functionality is preserved Moreover, FreeHaven is also robust since it can keep document
with the increase system’s size. although a high threshold of its shares is lost.
4) Anonymity: In volunteer computing systems, it is highly Because of its pursuit of anonymity, persistence, flexibility
desirable from volunteers to keep their identity secret, while and accountability; efficiency and convenience are ignored.
offering their resources. People are less willing to help when In order to supply availability it uses trading mechanism
they are required to share personal information. Therefore, instead of replication mechanism, thus the system is not highly


available[2]. Finally, inefficient broadcasts for communication and write operations. Though, the number of users that can
make FreeHaven less efficient. use Ivy is limited. Thus, it is designed to be utilized by small
groups of cooperative users.
B. FreeNet All peers are identical and have ability of working either as a
client or as a server. Because of its symmetric architecture, it is
FreeNet [11] is an adaptive pure peer-to-peer storage sys-
called pure peer-to-peer. Each node has two main components:
tem for publication, replication, anonymity of authors/readers
Chord/Dhash for reliable P2P distributed storage and Ivy
while retrieving data. Like FreeHaven, first goal of FreeNet is
Server for transferring data between peers. This architecture is
anonymity and privacy. However, the anonymity of FreeNet
actually log based. Each peer has its own log that includes user
does not stand for all network, it is just for file transaction be-
information and changes in the file system. Thus for each NFS
cause FreeNet provides anonymity at application layer instead
operation a log is created that is stored by Chord/DHash. Since
of transport layer. Thus, discovering source and destination is
they are immutable and are kept infinitely, peers can withdraw
infeasible. The other goals of FreeNet is deniability, resistance,
any changes. This flexibility is one of the best properties of
efficiency and decentralization.
Ivy. All users can read any logs though some file permission
The nodes in the peer-to-peer FreeNet network, query a
attributes.
file that is represented by a location independent key that
While a file system is created, a set of logs is created and
is obtained from hash functions for anonymity. Each node
a group of peers is set upon these logs. An entry pointing to
maintains each local store that is accessible for others to read
a file’s log is put on a view array. This array is traversed by
and write and have dynamic routing table that includes other
all peers in order to create a snapshot. The logs are ordered in
peers’ address with their own keys. Whenever a node receives
the array and peers use them for records. Thus some users can
a request, it firstly checks its local store. If it exists, it returns
use one of the logs concurrently. This cause conflicts, since
data, otherwise it forwards the request to the node that has the
Ivy permits concurrent write operations. For this purpose, Ivy
nearest key in the routing table. Furthermore, if the request
uses close-to-open consistency in a group of peers. In this
becomes successful, intended data will return like the request.
consistency, the Ivy server waits for Dhash which will receive
While data is retrieved, a node on the way also caches this data
new log receipts in order to commit a modify operation. Then
and inserts new key to its own routing table. This mechanism
that modification is announced. For each NFS operation, peers
provides transparent replication and increasing connectivity in
take the latest view array from DHash. Then peers check
the system. In order to cope with limited storage capacity
concurrent view vectors that affect the same file by traversing
efficiently, node storage is managed by LRU (Least Recently
logs. In any conflict condition, differences are analyzed and
Used) that means data items are sorted based on time of most
merged. For file modification an optimistic approach is used,
recent request. Therefore, lastly requested data will be at the
although for file creation locking approach is used. Thus
end of the queue. This mechanism does not ensure long term
when the number of users is increased, performance will be
survivability for less-interested files.
decreased. Because of limited scalability [1], Ivy is suited for
The FreeNet protocol is packet-oriented and uses self-
a small group of users.
contained messages. Each message contains hops-to-live limit, Every user stores a log of their modifications and at a
depth counter and randomly generated transactionID. It makes specified time interval, it generates a snapshot, a process which
the corresponding file traceable by nodes. Hops-to-live is set requires them to retrieve logs from all participating users.
by the sender of the message and it prevents indefinite message Although retrieving logs of all peers cause a bottleneck in
forwarding. Depth counter is used for setting a sufficient performance, peers can freely change a file system regardless
number of hop-to-live to ensure that the request will reach of other peers’ state. The immutable and indefinitely stored
its destination. Thus, it is incremented at each node. These logs can be used for withdrawing changes. But this operation
three values are used for inserting, retrieving and requesting is highly costed. As a result, Ivy is distributing its storage but
operations. In order to supply anonymity, it uses probabilistic it only supports a limited write-once/read-many interface [1].
routing that does not direct communication towards specific
receivers. D. Frangipani
Since probabilistic routing is used for providing anonymity,
Frangipani [13] is a high performance distributed storage
performance and reliability is not addressed. Like FreeHaven,
that is utilized by a cooperative group of users. It is not a
in order to supply anonymous communication, performance is
pure peer to peer system, since there is an administrator. It is
scarified. However, because of dynamic storage and routing,
aimed to minimize operations of the administrator that means
FreeNet network is highly scalable [3]. Moreover it is robust
Frangipani keeps it simple while many nodes are joining [1].
against big failures.
Moreover, it is designed to be used in an institution that
has secure and private network. Thus, it is not so scalable.
C. Ivy However, it provides to users a good performance, since it
Ivy [12] is another peer-to-peer storage system with file stripes data between servers by increasing performance in the
system like interface. There is no centralized or dedicated number of active servers. Frangipani can also be configured
component, thus each user is on the same level. Although to replicate data [1]. Therefore, it offers redundancy and
many other peer-to-peer storage systems just support either resilience to failures. This is a crucial property for volunteer
read or write operations for one owner, Ivy supports both read computing systems.


Frangipani has three main components. The first one is its simplicity by providing correct read-write and shared-write
the Petal Server which provides a virtual disk interface to semantics between clients via synchronous I/O, and extending
distributed storage. It looks like a local storage, thus it supports the application interface to relax consistency for performance
a transparent interface to users since distributed storage is conscious distributed applications. File and directory metadata
hidden. The second component is the Distributed Locking in Ceph is very small, almost only directory entries (file
Service. It supports consistency in the manner of multiple names) and inodes (80 bytes) in comparison with conventional
readers - single writer locking philosophy. There are two types file systems, where no file allocation metadata is necessary. In
of locks, the read and the write. When there are multiple Ceph, object names are constructed using the inode number,
changes on a file, this service makes them serial to keep and distributed to OSDs using CRUSH. In order for Ceph
consistency by using these locks. Since Frangipani ensures to distribute large amount of data a strategy is adapted that
all file in consistent state by locking mechanism, it fairly distributes new data randomly, migrates a random subsample
degrades its performance. The third component is Frangipani of existing data to new devices and uniformly redistributes
File Server Module that provides a file system like an interface. data from removed devices. To maintain system availability
It communicates with other components to be in a consistent and ensure data safety in a scalable fashion, RADOS (Reli-
state with determined block capacity. Moreover, Fragipani able Autonomic Distributed Object Store) manages its own
File Server deploys write-ahead redo logging of meta-data for replication of data using a variant of primary-copy replica-
recovery. When an error is detected in the File Server, the tion. In order to provide data safety, when acknowledging
logged data that is written in a special area in Petal Server updates, RADOS allows Ceph to realize low-latency updates
is used for recovery. This mechanism makes Frangipani more for efficient application synchronization and well-defined data
robust with replication mechanism. safety semantics. For certain failures, such as disk errors or
As a result, Frangipani is a distributed file system that can be corrupted data, OSDs can self-report. Failures that make an
scalable in terms of size and performance. However, network OSD unreachable on the network, however, require active
capacity is a barrier on its performance, because of its design monitoring, which RADOS distributes by having each OSD
issue. One of the biggest design problems in Frangipani is monitor those peers with which it shares Placement Groups.
that it assumes secure interconnection in order to scale and To facilitate fast recovery, OSDs maintain a version number for
operate within an institution [1]. Because of this issue, it each object and a log of recent changes (names and versions of
does suffer not only from performance but also from non- updated or deleted objects) for each Placement Group. Ceph
scalability. Besides, it makes an assumption that all nodes in OSD manages its local object storage with EBOFS, an Extent
the system are trusted, and thus it can not supply a secure and B-tree based Object File System.
system. Subsequently, the locking mechanism for keeping By Ceph’s shedding design assumptions, like allocation
consistency of the system can cause a dramatic performance lists, data are totally separated from metadata management,
drop. allowing them to scale independently. RADOS leverages in-
telligent OSDs to manage data replication, failure detection
and recovery, low-level disk allocation, scheduling, and data
E. Ceph
migration without giving a burden on any central server.
Ceph [16] is a distributed file system that provides excellent Finally, Ceph’s metadata management architecture provides a
performance, reliability and scalability and separates data and single uniform directory hierarchy, which obeys the POSIX
metadata in a maximum manner. It leverages the intelligence semantics, with scaling performance as new metadata servers
in Object Storage Devices (OSD) to distribute the complexity join the system.
surrounding data access and utilizes a highly adaptive dis-
tributed metadata cluster architecture, improving scalability
and reliability. F. TFS
Ceph eliminates file allocation tables and lists and replaces TFS [17] provides background tasks with large amounts of
them with generating functions. It comprises of Clients, Clus- unreliable storage without an impact on the performance of
ters of OSD (which stores all data and metadata) and Metadata standard file access operations. It allows a peer-to-peer storage
server clusters (which manages the namespace: files and direc- system to provide more storage and double its performance. It
tories). File data are stripped onto predictably named objects has an impact on replication in peer-to-peer storage systems.
using a special purpose data distribution, CRUSH (Controlled The problem with contributory storage systems is that the ap-
Replication Under Scalable Hashing), which assigns objects to plication performance degrades. As more storage is activated,
storage devices. Novel metadata cluster architecture distributes the file system operations quickly degrade and this is why
responsibility for managing the file system directory hierarchy. TFS tries to adapt transparency, which is the non burdening
Clients run on each host executing application code and effect on the system performance as contributory processes are
exposing a file system interface to applications. The code is run running. Another problem is that disks are often half empty
entirely to user space, and can be accessed either by linking to and user are not keen to contribute freely their free space. TFS
it directly or as a mounted file system. CRUSH maps data onto is a system that contributes all of the idle space while keeping a
a sequence of objects. If one or more clients open a file for very low load on the performance of the local user’s system.
read access, an MDS grants them the capability to read and It stores files in the file systems free space and minimizes
cache file content. The Ceph synchronization model retains interference with file system’s block allocation policy. Other


normal files can overwrite the contribution files at any time. is assigned to the nodes and Tapestry (a self-organizing routing
In addition there is no impact on the bandwidth needed and object location subsystem) uses local-neighbor maps to
for replication. TFS is useful for replicated storage systems route messages to their destination NodeID, digit by digit.
executing on stable machines with plenty of bandwidth. (This When an OceanStore server inserts a replica into the system,
environment is similar to the one used in FARSITE). In a Tapestry publishes its location by putting a pointer to the
stable network TFS can offer essentially more storage than replica’s location at each hop between the new replica and
dynamic. A small contribution of storage gives little impact the object’s root node. In order to locate an object, a client
on the file system’s performance and so TFS ensures the routes a request to the object’s root until it encounters a replica
transparency of contributed data. In exchange for performance pointer, which routes directly to that replica.
it sacrifices file persistence as it provides good file systems When a node wants to join, it chooses a random NodeID
performance by minimizing the amount of work needed by and a node close to itself. Through routing from this NodeID,
the system when writing ordinary files. It records which blocks finds other existing nodes that share length suffixes, generates
have been overwritten by marking them as overwritten. If an full routing table and all the neighbors are notified. When a
overwritten file is tried to be open, the system returns an error node disappears, neighbors are detecting the absence and they
and the inode/directory entry for that file is deleted and it is use backpointers to inform relying nodes. In addition a server
denoted as free. Every time a file is deleted the TFS detects can be removed from OceanStore when it becomes obsolete,
and replicates the file returning error to peers. needs schedule maintenance or has component failures. A
TFS leaves the allocation for local files intact, avoiding shutdown script to inform the system of server removal is
issues of fragmentation; TFS stores files in such a way that executed. Even if this script is not used OceanStore will detect
they are completely transparent to local access. TFS consis- and correct the server’s absence. OceanStore’s design provides
tently provides at least as much storage without overloading scalability, fault tolerance, self-maintaining and distributed
local performance. TFS can provide about 40 per cent more storage through adaptation.
storage than the best user-space technique, in the case when
the network is quite stable and enough bandwidth is available.
H. Antiquity
This may create questions concerning availability but TFS
primarily depends on a distributed system characteristics, such Antiquity [14] provides storage services for file systems
as machine availability, bandwidth and the amount of storage and backup applications. It is a wide-area distributed storage
available. system that its design assumes that all servers eventually will
fail and tries to keep the data integrity even with these failures.
Antiquity was developed in the context of OceanStore.
G. OceanStore In its model the client can be an end-user machine, the
OceanStore [6] is a global storage infrastructure which server in a client-server system or a replicated service. The
automatically recovers from failures of servers and network, system identifies the client and its append-only log from a
puts new resources easily into the system and adjusts to cryptographic key pair. A log is stored in chunks and when a
usage patterns. It combines erasure codes with a Byzantine new chunk needs to be allocated the administrator is consulted,
agreement protocol for consistent update serialization, even who authenticates the client and selects a set of storage
when malicious servers are present. servers that can host the new chunk. In order to maintain
OceanStore consists of individual servers, each cooperating data securely, high availability and most of all stored data
to provide a service. Such a group of servers is called a pool. integrity, it uses a secure log which replicates on multiple
Data flows freely between these pools, thus creating replicas servers. This way durability is ensured in a way that no data
of a data object to anywhere, increasing availability. Because is lost and all logs can be read. In the case that some logs
OceanStore is composed of untrusted servers, it utilizes redun- are not modifiable due to the failure of some servers or lack
dancy and client-side cryptographic techniques to protect data. of replicas, a quorum repair protocol replaces lost replicas
OceanStore attacks the problem of storage-level maintenance and eventually restores modifiability. In addition Antiquity
with four mechanisms: a self-organizing routing infrastructure, uses dynamic Byzantine fault-tolerant quorum (threshold) to
m-of-n data coding with repair, Byzantine update commit- provide consistency among replicas. When the data is repli-
ment, and introspective replica management. Erasure coding cated on multiple servers, it can be retrieved later even on
transforms a block of input data into fragments, which are server failures. What is more, Antiquity uses distributed hash
spread over many servers; only a fraction of the fragments tables to connect the storage servers and to monitor liveness
are needed to reconstruct the original block. A replica of an and availability of servers. It stores only pointers that identify
object must be exactly the same as the original, despite any servers in which the actual data are stored.
failures or corruption of fragments. OceanStore resolves this Antiquity’s design pursues integrity, incremental secure
by naming each object and its associated fragments by the write and Random read access, durability, consistency and
result of a secure hash function on the contents of the object, efficiency with low overhead. The results from a simulation
called globally unique identifier (GUID). A node can act as a showed that from almost all checks done, a quorum of servers
server that stores objects, as a client that initiates requests, was reachable and in a consistent state, and thus providing a
as a router that forwards messages or as all of these. A high degree of availability and consistency. The quorum repair
unique identifier NodeID (location and semantics independent) process balances the availability and consistency even more.


Concerning the scalability issue, as each log uses a single the coordinator who is responsible of all operations sends
administrator and multiple instances are allowed the role of the the vector clock to reachable nodes that are selected in a
administrator scales well and different logs can use different preference order list. Writing operations are done, according
administrators. to the receiving number of responses. Namely, this mechanism
is based on quorums. Lastly, if a node does not give any
I. BigTable response, it is supposed to be in failure mode. When it is
removed from the ring, all surrounded nodes are adjusted to
BigTable [18] is a large-scale distributed storage system for
the new state.
managing structured data. It is built on top of several existing
Dynamo is targeted to come with solutions of main prob-
Google technologies such as Google File System, Chubby,
lems of database management, such as scalability, availability,
and Sawzal and used by many Google’s online services.
reliability and performance. While it offers highly-available
The contributors have as primary goals the achievement of
and scalable system, it keeps performance high with handling
flexibility, high performance and availability.
failures. However, reaching anonymous system is not targeted
Essentially, BigTable is a "sparse, distributed, persistent
in Dynamo.
multi-dimensional sorted map" that indexes each row, column
and timestamp tuple to an array of bytes[19]. Data in BigTable
is maintained in tables that are partitioned into row ranges K. MongoDB
called tablets. Tablets are the units of data distribution and MongoDB [22] is a scalable, high-performance, open
load balancing in BigTable. The Bigtable constitutes of three source, document-oriented structured storage system. It pro-
major components: a library that is linked into every client, vides document-oriented storage with full index support, auto-
one master server, and many tablet servers, each one of sharding, sophisticated replication, and compatibility with the
them managing some number of tablets. Different versions Map/Reduce paradigm.
of data are sorted using timestamp. BigTable supports single- Instead of storing data in tables and rows as it is regularly
row transactions, which can be used to perform atomic read- done with relational databases, in MongoDB data is stored
modify-write sequences on data stored under a single row key. with dynamic schemas. The goal of MongoDB is to bridge
In overall, Bigtable is tremendously scalable, offering data the gap between key-value stores and relational databases.
availability and high performance to its users. However, it does MongoDB has two separate constructs for multi-node topolo-
not deal with issues like security among the nodes, and fault- gies, which are often combined in the highest-performance
tolerance. systems: replica sets and shared replica sets. Replica sets are
an asynchronous cluster replication technology, and sharding
J. Dynamo is an automatic data distribution system. Increasing the number
of instances in a replica set provides horizontal scalability for
Dynamo is a key-value storage system that provides keys
read performance and fault-tolerance. Increasing the number
to value mapping. It is developed and managed by Amazon
of shares (each one being a replica set) allows the distribution
that makes it a proprietary database [21]. However, it is
of distinct data to provide horizontal scalability for write
provided to some foundations’ research such as Cassandra.
performance.
High-availability and scalability are the main design issues of
MongoDB has similar features with relational databases,
Dynamo. It has incremental scalability that means one node
like indexes and dynamic queries. It accomplishes availability
can be scaled at a time. Moreover, there is not any central
as it supports asynchronous replication of data between servers
administrator and all nodes are on the same level.
and it also features a backup and repair mechanism using jour-
Dynamo is a combined form of both distributed hash
naling which increases durability and robustness. Changing
tables(DHTs) and databases [20]. The created keys by hashing
the data model from relational to document-oriented provides
data are stored in circular system structure. While they are
greater agility through flexible schemes and easier horizontal
stored, the nearest node in clockwise direction is selected to
scalability.
be assigned. Moreover, there are virtual nodes that mimic
a node but they are responsible for more than one node.
This mechanism provides incremental scalability by solving L. Riak
the partitioning problem. Dynamo has effective replication Riak [23] is a key-value storage systems that is inspired by
mechanism in order to increase availability of the data in Dynamo. Like Dynamo, it is distributed, highly-available and
the system. In this mechanism, each data is replicated to scalable. It uses map-reduce mechanism to reduce functional
its specified number of successors. Therefore, each node has limitations of key-value and to increase power of querying over
replicated data of its predecessors. In addition, system may stored data in the Riak system. Riak provides fault-tolerant
have more than one versions of a file to increase availabil- service to its users and this property increases its robustness
ity. However since it causes an inconsistency, vector clocks level.
are used to determine causal relationship between different Since it is inspired by Amazon’s Dynamo storage system
versions. These properties increases Dynamo’s durability, as that is analyzed above, Riak has many similarities with it.
well as availability. Besides "Always writable" property is It includes both databases storage and distributed hash tables
targeted by Dynamo, this is the second reason of using vector (DHTs). Like Dynamo, by using consistent hashing methods,
clocks. When a user wants to do a write operation, firstly keys are mapped to its ring system. Thus all nodes on this ring


are identical. Whenever a node joins the network, it is assigned description allows us to say that it follows a loose P2P scheme,
to define key range partitions. Then it is replicated to reach a since not everybody have the role of a peer. However, it is
more available system. Like dynamo, write and read operations described as scalable despite the number of hosts that join
are done based on quorums. Concurrent operation requests and leave the system consecutively. Because of the replication
are not handled by the help of locks because of performance technique, data is persistent and available. It is also observed
issues. Instead of a lock mechanism, vector clocks are used high consistency in the system.
in order to make system strong against failures and keep TotalRecall could be used for Volunteer Computing in the
system consistent. Another powerful point of Riak is using case of lazy repair is chosen with erasure code. With these
map-reduce method in querying. Using this method, request options, TotalRecall performs better when having dynamic
messages are directed to a set of nodes instead of propagating environments and high possibility of unavailability.
over all nodes.
Riak has symmetric structure in the node manner since it O. Farsite
does not have any super or master node among all nodes.
Farsite system [26] is a serverless, distributed storage sys-
Moreover it meets some design issues of intended decen-
tem that runs on a set of machines and takes advantage of their
tralized storage systems such as high availability, scalability
unused storage and network resources. Although, it provides
and robustness. However, anonymity is not handled in this
the semantics of a central NTFS file server, its able to scale
design, since it is relatively a new system, and it has many
and run on several machines using a portion of their storage.
compatibility problems.
Users have access to private and public files through a
location-transparent environment. Data replicas are encrypted
M. Pastis to provide security since the nodes themselves are not secure.
Pastis [24] is a completely decentralized P2P file system Moreover, these replicas are distributed among several nodes
with multiple users performing read and write operations. It to provide a reliable system despite the unreliability and
uses the Past, a highly-scalable P2P storage service, which frequent unavailability of the nodes. The files structure is based
provides a distributed hash table abstraction. It combines on a hierarchy, maintained by a distributed directory service.
Past with Pastry, a P2P key-based routing algorithm, to route Atomicity and scalability are two important properties on
messages between large amounts of Past nodes. the Farsite system. All tasks are designed as fully atomic
For every file, Pastis keeps an inode in which the file’s actions in order to remain undivided while they get executed.
metadata is stored. Each inode is stored in User Certificate Farsite could be used for Volunteer Computing, since the man-
Blocks (UCB) and files contents are stored in Content Hash agement operations can be distributed among the machines,
Blocks (CHB). When a user writes to the file, the version security is provided because of the encryption algorithm used.
counter is increased and saved to the corresponding inode with Though, it could be used only for small volunteer computing
the user’s id. To avoid conflicts, if a second user appears and systems, since it can scale up to a certain number of nodes.
tries to write to the same file, a procedure is triggered to solve
the conflict by comparing the counters and users’ ids from P. Storage@home
other replicas in the network.
Storage@home [27] is a distributed storage infrastructure
The combination of the Past and the Pastry characterizes
designed to store huge amounts of data across many machines
Pastis as a highly-scalable system in terms of network size
which join the system as volunteers. It is based on the Fold-
and amount of concurrent clients. Good locality helps in
ing@home and it made its appearance to face the problems of
acquiring optimized routes, while self-organization as well
this previous system. More precisely, the contributors address
as fault tolerance are achieved thanks to the design. Data is
the problems of backing up and distributing data efficiently
replicated among the nodes and therefore it is characterized
among the nodes, keeping in mind the limited bandwidth and
by high data availability. A write access control and data
the small donation of storage from each node.
integrity are implemented and therefore Pastis is secure since
Storage@home constitutes of the volunteers - who have
it is assumed that users trust each other.
an agent installed on their machines, a registration server,
a metadata server, an identity server and a policy engine.
N. TotalRecall The Metadata Server is responsible to store information about
TotalRecall [25] is a P2P storage system that takes into the location of the files stored in the system and to allow
high consideration an important property of storage systems; queries for those files. The Identity Server is responsible for
the availability. The system administrator can specify an avail- the security and identity functionality, as well as for tracking
ability target and studying the previous behavior of the peers, it effectively the location of IP hosts; whether they are mobile
can predict their future availability, despite the dynamic chang- or dynamic. The Registration Server is responsible to link
ing nature of the environment. Depending on the condition of the users’ profiles from the old system; the Folding@home
the system, TotalRecall may use replication, erasure-code or with this new proposed system. This task is hard to get
hybrid techniques for preserving its redundancy, while it can implemented since a beneficial aspect of Storage@home is the
dynamically repair itself using eager or lazy repair. anonymity and the intentional omission of user’s information.
Except from the peers, the TotalRecall system constitutes The Policy Engine behaves as the master of the system, as it
of the master host, the storage host and the client host. This coordinates all the components of the system. It is responsible


to plan where to put replicas of data in order to minimize in order to do a task. Thus, this system does not provide a
the chances of data loss, how data can be retrieved and how symmetric node network and it is not proper for volunteer
to be transferred to reach the node that has sent a query. It computing systems. Also, In MongoDB there are three kind of
also remains vigilant to perform repair operations when it is nodes: Standard, Passive, and Arbiter. Similarly to MongoDB,
needed. BigTable has master nodes and many chunk servers. Also,
Storage@home has vital requirements that help it preserve Antiquity contains the role of an administrator among the
its nature as a storage system as well as a volunteer computing peers, who is responsible for the new chunk allocation of
system. As a storage system, it should handle effectively file logs. Thus MongoDB, BigTable and Antiquity are not
failure and recovery operations, and as a volunteer computing symmetric in the node manner. Moreover, Farsite is based on
system it should manage the relocation of data stored in host a centralized scheme. Some nodes have - for a period of time
that disappeared. While maintaining the above requirements, - authority on some files, their content, directory, and users
the authors needed to face several challenges regarding the permissions. Similarly, TotalRecall constitutes of different
volunteers recruiting and motivation, the policy risk and the types of nodes; each type having different responsibilities
host relocation. With respect to recruiting volunteers and regarding the files. Therefore, in both systems other nodes
keeping them motivated, the system needed to be adopted as cannot work freely, without the permission of other "master"
a reward system that offers points to volunteers in order to nodes. Antiquity does not provide symmetry as it has the role
motivate and put them in a friendly competition that makes of the administrator among the peers who is responsible for the
them have fun among them. Regarding the policy risk, it new chunk allocation of a log. Last but not least, in OceanStore
was quite common for storage@home to get blocked by nodes can have different roles, such as a server, a client, a
companies, ISPs and new policies. Therefore the storage of router or all of them, thus it is not symmetric.
replicas in different nations, states and ISPs appeared to be a The rest of the systems, as it can be seen in Table 1,
fair solution. Last but not least, host relocation was another constitute of equal nodes and are subsequently characterized
great challenge that needed to be considered. The system had as symmetric.
to deal with hundreds of students who were changing residence
- most of the times decreasing their bandwidth - and becoming 2) Availability: In volunteer computing systems, partici-
slower and less effective. Also, the problem of switching off pants can enter and leave from the system in random time
the machine for a long time for traveling or maintenance periods. In order to retrieve data, the intended storage systems
purposes cost to the system and consequently a penalization should be highly available, despite the unavailability of the
policy was introduced to make the volunteers more responsible participants.
at informing the system for any changes in there condition. In Most of the systems analyzed are highly available as it is
general this system appears to be reliable as it manages to shown in the Table 1. Though, the FreeHaven system presents
prevent the loss of data. It is able to work with thousands of limited level of availability, since there is no replication
volunteers showing its great scalability and its functionality mechanism, but only periodical trading which makes data
is preserved with the existence of churn. Internet connections available. Similarly, FreeNet has limited availability because
appear as the bottleneck in the system performance showing of lack of replication mechanisms and also because it suffers
that any other possible pitfalls of the system are not significant from long term survivability, especially for non-popular files.
as they can not bypass the bandwidth problem. DHash component of Ivy makes it highly available, since
DHash replicates and distributes the blocks of files. Thus
V. D ISCUSSION participants logs can be available even if they are not available
themselves. Moreover, Frangipani has cluster member com-
All systems described offer storage distribution following
ponents that are large abstract containers on highly available
different approaches and architectures. In this section, we
block level. These cluster members make Frangipani highly
discuss up to what extend these systems have the properties
available. Ceph accomplishes availability using RADOS which
that are needed in volunteer computing systems. In Table 1
manages data replication following a primary-copy replication
we gather all systems and characteristics together, showing a
scheme and also provides update synchronization of the data.
clear view of their state.
In OceanStore, one of its main goals is to provide availability,
as data flows freely and thus replicas for the data are created.
1) Symmetry: As previously mentioned, in pure peer-to- Antiquity uses a secure log which is distributed among mul-
peer systems, all peers are on the same level with equivalent tiple servers, thus providing a high degree of availability and
functionality. Since each volunteer participant does not have ensures that all data can be accessed. If for any reason some
priority over other participants and although they are con- data is lost, a repair service is available for recovery.
trolled by central server of the system, intended distributed Furthermore, Farsite system replicates data in order to
systems should be purely peer-to-peer. ensure availability even with the often unavailability of the
In the world of storage systems, scientists have trouble nodes. Likewise, Pastis implements a lazy replication protocol
with presenting systems with "independent" nodes who work to manage replicas in different nodes. TotalRecall has as
without the guidance of an administrator. In Fragipani file a main goal the provision of availability, and it suggests
system, there is an administrator who arranges states of nodes, different ways to ensure that, such as redundancy management
and nodes need to take permission from the administrator with specified mechanisms, replication, dynamic repairs in


Characteristics
Systems
Symmetry Availability Scalability Anonymity Robustness
FreeHaven Yes Mid Low High High
FreeNet Yes Mid High Mid High
Ivy Yes High Mid No High
Frangipani No High High No High
Ceph Yes High High No High
OceanStore No High High No High
Antiquity No High High No High
BigTable No High High No High
Dynamo Yes High High No High
MongoDB No High High No High
Riak Yes High High No High
Pastis Yes High High No High
TotalRecall No High High No High
Farsite No High Mid No High
Storage@home Yes High High High High
Tablo I
C OMPARISON OF DIFFERENT S TORAGE S YSTEMS

the case of nodes are leaving permanently from the system. 5) Robustness: By definition, volunteers can come and go,
may crash or change their network status. Therefore, volunteer
3) Scalability: Scalability is an additional property re- computing systems - and by extension storage systems - should
quired. There are three main scaling techniques: Replication be enough robust to face these situations.
for spreading copies of data, caching for reusing the cached All systems studied are highly-robust, thanks to various
data and distribution of divided computation [5]. Thus the reasons and mechanisms. In FreeHaven, while peers are in
intended decentralized storage systems should have replication trading, copies of data are stored in a while until proving
or a similar mechanism. trustworthy. Although this mechanism is not good for perfor-
Of the systems studied, only three of them do not show high mance, it increases robustness of FreeHaven. Moreover, buddy
results in the scalability issue. FreeHaven and Ivy do not have system makes it robust, since buddies of each node can regen-
the scalability characteristic as their primary goal and therefore erate the lost data. Frangipani uses write-ahead redo logging
they are not highly scalable. Farsite is limited to scale up to mechanism to recovery failures easily. In Freenet protocol, a
ˆ
105 nodes, which is quite restrictive. failure message is forwarded to owner of the request without
propagating to any nodes. Thus original requester can make
Unlike to these systems, Frangipani is designed as highly-
another request. By the help of this property of the Freenet
scalable. Petal services competent works cooperatively to
protocol, it will be robust against the failures.
supply virtual disks to its user are distributed in order to
MongoDB and Riak has replication mechanisms that makes
increase scalability. Also, The rest of the storage systems
these systems large-scaled and they are fault-tolerant. These
are classified as large-scale storage systems since they are
characteristics of them provide highly-robust systems. Like
specially designed to offer scalability.
them, BigTable and Dynamo have great robustness since they
are highly-scalable.
4) Anonymity: Participants in volunteer computing systems Ceph has a very good mechanism for disk failure monitoring
want to keep secret their identities from others. Thus, intended and detection as well as fast recovery using different structures
distributed systems should provide anonymity. From our re- for the file system and by keeping a version number for each
search, we found out that most of the systems do not support object. In addition, OceanStore’s main goal is to provide a high
anonymity, as it was not in their main concerns. level of failure recovery providing fault tolerance and self-
Systems like FreeHaven, FreeNet offer anonymity as they maintenance mechanisms with automatic repair. Antiquity’s
focus in their participants needs. They propose to keep users quorum repair recovers failures and replaces lost replicas
identity, thus they increase resistance against censorship. In which makes the system quite robust.
fact, for this purpose they scarify efficiency. Like them, users Storage@home provides self-repair operations for each
in Ceph and Storage@home are anonymous. Moreover, in node involved. Pastis takes advantage of the fault tolerance
Ceph the code runs directly from the user space and the property of the storage layer that it is based on, the Past
processes RADOS and CRUSH are executed without revealing DHT . In TotalRecall, things are even easier. Since it deals
any information about the identity of the client, even when data primarily with availability, it addresses this issue using repair
are distributed. mechanisms which help as well for preserving robustness.
Anonymity is not a design issue in Frangipani. Thus each The Farsite system was designed in that way that it handles
user in the Frangipani file system are noticeable and can be Byzantine faults and therefore be more robust.
detected easily. Like Frangipani, large-scale decentralized TFS is mainly a file system that works underneath storage
storage systems such as Dynamo, Riak, BigTable, and systems. Its availability and anonymity are dependent on the
MongoDB do not handle anonymity as a design issue. nodes state and whether the nodes by themselves can be
available and anonymous. Thus, it is not included in our


discussion nor in the comparison table. the 19th international conference on World wide web,
As it is shown in Table 1 and based on our previous dis- pp.741-750, NY, USA, 2010.
cussion, Storage@home seems to be the most proper storage [8] D P. Anderson,"Boinc: A system for public-resource com-
system with a clear statement that it can be used in volunteer puting and storage", 5th IEEE/ACM International Work-
computing systems. It follows a model typical to volunteer shop on Grid Computing, Pittsburgh, USA,2004.
computing projects, and participants act as volunteers with the [9] Abdelhamid Elwaer, Ian Taylor, Omer Rana,"Optimizing
ability to compete and gain points based on their contribution Data Distribution in Volunteer Computing Systems using
in storage and their recruitment process. All users have an Resources of Participants", Scalable Computing: Practice
agent installed on their machine which takes action after the and Experience (2011), Volume 12, Number 2, ISSN
users’ registration. Data availability is maintained because 1895-1767 ,pp. 193-208,
each machine stores almost the half size of a file, to be more [10] A. Oram. (March 15, 2001)." Peer-to-Peer : Harnessing
precise up to 40 the Power of Disruptive Technologies". O’Reilly Media.
[11] Ian Clarke, Oskar Sandberg, Brandon Wiley, and
VI. C ONCLUSION Theodore W. Hong ,"Freenet: A Distributed Anonymous
Information Storage and Retrieval System" , In the Pro-
In this survey, we initially presented the various proper-
ceedings of Designing Privacy Enhancing Technologies:
ties and characteristics that a decentralized storage system
Workshop on Design Issues in Anonymity and Unobserv-
must have in order to cooperate efficiently with a volunteer
ability, July 2000, pages 46-66.
computing system. The challenges that these systems can
[12] Athicha Muthitacharoen , Robert Morris , Thomer M. Gil
face when combined are scalability, availability, symmetry,
, Bengie,"Ivy: A Read/Write Peer-to-Peer File System",
anonymity and robustness, all of which are explained in detail.
SIGOPS Oper. Syst. Rev., Vol. 36, No. SI. (2002), pp.
We selected some systems that we found important and related
31-44, doi:10.1145/844128.844132.
to our work of study and briefly described each one associating
[13] Frangipani: A Scalable Distributed File
the aforementioned characteristics with them. A comparison
System,"Frangipani: A Scalable Distributed File System",
follows that explains in depth each characteristic and covers
SOSP ’97 Proceedings of the sixteenth ACM symposium
how each one is important for this merge of decentralized
on Operating systems principles New York, NY, USA,
storage and volunteer computing systems. As shown in our
1997 .
discussion, all systems have different capabilities and function-
[14] H. Weatherspoon, P. Eaton, B. Chun, J. Kubiatowicz,
alities which make each one to be more appropriate for specific
"Antiquity: exploiting a secure log for wide-area dis-
operations. With all the properties put down and after further
tributed storage", ACM SIGOPS Operating Systems Re-
investigation, Storage@home is the most proper, having all the
view, v.41 n.3, June, 2007.
properties that such a system describes.
[15] S. A. Weil, S.A. Brandt, E. L. Miller, D. D. E. Long, C.
Maltzahn, "Ceph: a scalable, high-performance distributed
R EFERENCES file system", Proceedings of the 7th USENIX Symposium
[1] M. Placek, R. Buyya, "A taxonomy of distributed stor- on Operating Systems Design and Implementation, p.22-
age systems", Technical Report, Grid Computing and 22, Seattle, WA, November, 2006.
Distributed Systems Laboratory, The University of Mel- [16] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H.
bourne, Australia, July, 2006. Weatherspoon, J.Kubiatowicz, "Maintenance-Free Global
[2] P. Yianilos, S. Sobti, "The evolving field of distributed Data Storage", IEEE Internet Computing, v.5 n.5, pp.40-
storage", IEEE Internet Computing, v.5, pp.35-39, 2001. 49, September 2001.
[3] R. Hasan, Z. Anwar, W. Yurcik, L. Brumbaugh, R. Camp- [17] J. Cipar, M. D. Corner, E. D. Berger, "Contributing
bell, "A Survey of Peer-to-Peer Storage Techniques for Storage using the Transparent File System", ACM Trans-
Distributed File Systems", Proceedings of the International actions on Storage 3, 3, October, 2007.
Conference on Information Technology: Coding and Com- [18] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D.h
puting, v.2, pp.205-213, Las Vegas, Nevada, April, 2005. A. Wallach, "Bigtable: A Distributed Storage System for
[4] H. Ge, "Survey of Distributed Storage Systems", Course Structured Data", Proceedings of the 7th Conference on
Survey for "Advanced Topics in Information Systems", USENIX Symposium on OSDI, v.7, Seattle, WA, Novem-
Spring 2004. ber 2006.
[5] B. C. Neuman, "Scale in Distributed Systems", In Read- [19] "CS262B Advanced Topics in Com-
ings in Distributed Computing Systems, IEEE Computer puter Systems Spring 2009", Available:
Society Press, pp.463-489, 1994. http://www.eecs.berkeley.edu/ culler/summary/bigtable.html
[6] Sean Rhea, Chris Wells, Patrick Eaton, Dennis Geels [20] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
,Ben Zhao,Hakim Weatherspoon,John Kubiatowicz, " A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall,
Maintenance-Free Global Data Storage", IEEE Internet W. Vogels, "Dynamo: Amazon’s Highly Available Key-
Computing, Volume 5 Issue 5, September 2001. value Store", Proceedings of 21st ACM SIGOPS sympo-
[7] O. Nov, D. Anderson,O. Arazy, "Volunteer Computing: sium on Operating systems principles, Stevenson, Wash-
A Model of the Factors Determining Contribution to ington , October 2007.
Community-based Scientific Research", In Proceedings of [21] Amazon DynamoDB. Available:


http://aws.amazon.com/dynamodb/
[22] MongoDB, Available: http://www.mongodb.org/
[23] Welcome to the Riak Wiki. Available:
http://wiki.basho.com/Riak.html
[24] F. Picconi, J-M. Busca, and P. Sens, "Pastis: a highly-
scalable multi-user peer-to-peer ﬁle system", EuroPar,
pp.1173-1182, 2005.
[25] R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. M.
Voelker, "Total Recall: System Support for Automated
Availability Management", NSDI, San Fransisco, CA,
2004.
[26] W. J. Bolosky, J. R. Douceur, and J. Howell, "The farsite
project: a retrospective", Proceedings of SIGOPS, France,
2007.
[27] A. L. Beberg , V. S. Pande , "Storage@home: Petascale
Distributed Storage", Proceedings of IPDPS, Long Beach,
CA, March 2007.

A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems

Similaire à A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems (20)

Plus de Maria Stylianou

Plus de Maria Stylianou (16)

Dernier

Dernier (20)

A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems