SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                          1




      A Survey on Large-Scale Decentralized Storage
       Systems to be used by Volunteer Computing
                        Systems
                 Umit Cavus Buyuksahin, Maria Stylianou, Nicos Demetriou, Muhammad Adnan Khan



   Abstract—Over the last decades, distributed systems are pro-            their capacity. Due to this demand, researchers turn to unused
moted for extended computations and are presented as the ideal             storage resources. Globally, there are many personal computers
storage space for large amounts of data. Distributed Storage               whose resources are not fully used by their owners. Volunteer
Systems have been moved from the centralized architecture to a
more decentralized approach. This change allows such systems to            computing systems aim to use these storage for enormous-
be used by volunteer computing systems, where the exploitation             sized computations by considering them as if they were parts
of any available storage and resources is essential and greatly            of a huge supercomputer. This is a powerful way to utilize
needed. This survey explores the characteristics of scalable               distributed resources, in order to complete large-scale tasks.
decentralized storage systems that can be used by volunteer                   Volunteer computing systems have two main bases [7]. The
computing systems and discusses the various existing systems in
terms of the specified characteristics. For each surveyed system            first one is the computational base, in which large computa-
we give a brief description and whether the required properties            tion tasks are split into smaller tasks which are assigned to
are ensured.                                                               volunteer participants’ computers. The second base is called
  Index Terms—decentralized storage systems, volunteer com-                participative base and it deploys large number of volunteer
puting systems                                                             participants who offer their resources.
                                                                              One of the well known volunteer computing systems is
                                                                           SETI@home launched by BOINC projects [8]. Nowadays,
                      I. INTRODUCTION                                      SETI@home works with about one million computers which
   Storage is one of the fundamental parts of the computing                provide approximately 70 TeraFLOPs processing rate [8].
[1]. Although it has lower speed than RAM, it has great                    Of course this resource usage can be increased when we
persistence and low cost. Thus, central storage systems were               look at the potential resource in the world. However this is
constructed and focused on reliability, stability, and efficiency.          unnecessary since the network is growing rapidly.
However, nowadays computation is not limited on a central                     These volunteer computing systems produce huge amounts
storage space, but it is executed in a global environment,                 of computational data that should be stored. This data may
like Internet. As Internet becomes part of this computation, it            be used for later processing or sharing with other scientific
produces huge amounts of information that need to be gathered              organizations that may contribute to science area. However,
and stored. For addressing this challenge, distributed storages            today’s volunteer computing systems use centralized stor-
systems are introduced. In this design, data stored by hosts               age systems [9] to distribute data to participants. It suffers
become geographically distributed. Because of this distribu-               from limitations of centralized storage systems such as fault-
tion and the appearance of huge demands, new challenges                    tolerance, availability and scalability.
arise, such as fault-tolerance, availability, security, robustness,           In order to pass over these limitations, new storage systems
survivability, scalability, anonymity.                                     are developed which are decentralized and can be used by
   With the grow of Internet, distributed storage systems are              volunteer computing systems efficiently. As previously men-
able to scale using larger amounts of users. This growth                   tioned, there are many kind of decentralized storage systems.
has emerge the difficulty of having one central point for                   However, not all of them are suitable to be used in volunteer
administrating the system. Therefore, it is observed in other              computing systems. In this survey we study several storage
surveys that these systems are moving from the centralized                 systems, we discuss their characteristics and challenges and we
architecture to a more decentralized approach [1].                         propose the most proper one to be used in volunteer computing
   Meanwhile, supercomputers are situated among us exe-                    systems.
cuting big computations which require huge storage, power                     The rest of the paper is organized as follows: In section 3,
and computational resources, and lead to a rapid decrease of               we present related work done by other researches in the field.
                                                                           In section 4, design issues of decentralized storage systems
   Umit Cavus Buyuksahin, Universitat Politecnica de Catalunya (UPC). E-   that can be used in volunteer computing systems are examined
mail: ucbuyuksahin@gmail.com
   Maria Stylianou, Universitat Politecnica de Catalunya (UPC). E-mail:    by extracting characteristics. In section 5 we briefly overview
marsty5@gmail.com                                                          some of the existing decentralized storage systems. Later on,
   Nicos Demetriou, Universitat Politecnica de Catalunya (UPC). E-mail:    in section 6 we compare them regarding their characteristics
nicosdem7@gmail.com
   Muhammad Adnan Khan, Universitat Politecnica de Catalunya (UPC). E-     and benefits and propose the most suitable one to be used in
mail:malikadnan78@gmail.com                                                volunteer computing systems. Finally, in section 6 we conclude
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                    2



the survey with our final remarks about the systems studied.         anonymity in volunteering can increase the number of par-
                                                                    ticipants which is highly appreciated and encouraged. What is
                    II. R ELATED W ORKS                             more, anonymity can be a way to prevent the denial of access
                                                                    for special groups of people, which is possible when personal
   In this section we present the different surveys related
                                                                    information is shared.
to the subject that we are focused on. [3] discusses the
                                                                       5) Robustness: Both types of systems, storage and vol-
different properties of the Peer-to-Peer based distributed file
                                                                    unteer computing are prone to failures, as machines may
systems. It shows the various benefits of using P2P systems,
                                                                    crash, reboot, or change location with different network char-
the design issues and properties. In addition it presents the
                                                                    acteristics and capabilities. In order to efficiently associate
major distributed file systems comparing the advantages and
                                                                    decentralized storage systems with volunteer ones, the former
disadvantages for each one in detail. As well, [4] provides an
                                                                    systems should be robust enough to handle these changes and
insight into existing storage systems, giving a good overview
                                                                    repair themselves in the case of failures, in order to preserve
of each and describes the important characteristics they should
                                                                    this advantage in volunteer computing systems as well.
have. In [1], a variety of distributed storage systems is covered
in depth, presenting their functionalities and putting the reader
into the problems that these systems face and the solutions                  IV. D ECENTRALIZED S TORAGE S YSTEMS
proposed to overcome them. A quite short but rich paper is             In the following section, we present a short summary for the
[2] discusses the evolving area of distributed storage systems      storage systems studied, referring to the previously explained
and gives a brief summary of some related systems in order          properties.
to provide a broader view for the subject.
                                                                    A. FreeHaven
 III. P RINCIPAL C HARACTERISTICS OF D ECENTRALIZED                    FreeHaven [10] firstly came with a solution about
                   S TORAGE S YSTEMS                                anonymity whose implementation is not commonly handled by
   Several decentralized storage systems have been proposed         distributed storage systems. This means that it provides peers
over the last years. However, not all of them are suitable          to distribute and share data anonymously by protecting peers’
for volunteer computing. Specific characteristics should be          identity. The other goals of FreeHaven are: (a) Persistence for
examined and we should ensure their existence in the intended       determining lifetime of documents, (b) Flexibility for changing
storage systems, in order to meet the requirements of volunteer     systems functions, (c) Accountability for limiting damage to
computing systems. Below, we analyze the most important             system.
ones, their specifications and effects.                                 Since there is not a hierarchy and all nodes are on the
   1) Symmetry: Symmetry is a desired characteristic as much        same level, it is a pure peer-to-peer system, it is symmetric
for decentralized storage systems as for volunteer computing        and balanced. Despite of the fact that nodes do not have spe-
systems. In the case of storage systems, and more precisely in      cial capability unlike client-server systems, they have special
pure peer-to-peer systems, symmetry exists when all peers are       roles such as the author who initially creates documents, the
on the same level with equivalent functionality [3]. Similarly,     publisher who put the documents to FreeHaven system, the
in the case of volunteer computing systems, each volunteer          reader who takes documents from systems, and servers who
participant does not have priority nor a special treatment          provide storage. All these nodes have a pseudonym and nodes
compared to others. Also, volunteers do not need a permission       know each other by their pseudonym. Thus, locating the peers
from an administrator to execute a task or to save data. This       is a difficult issue. In addition, tracing the routes is difficult
is done by definition independently and automatically.               issue as well, since FreeHaven uses onion routing that is used
   2) Availability: In volunteer computing systems, it is ex-       for broadcasting the queries. The difficulties in both locating
pected that participants can not be enforced to enter the system    peers and tracing the routes is for protecting the user identity
or leave the system in specific moments. Data should be reach-       that means supplying anonymously communication. Server
able independently from the peers status, from their location       nodes periodically trade parts of documents called shares with
and from the time of the request. Therefore, availability is an     each other. That trading gives flexibility to the system in
essential property for decentralized storage systems in order       the sense that servers can join and leave easily and without
to be used in volunteer computing systems.                          special treatment. For trading, nodes are chosen by a node
   3) Scalability: Another important issue that has to be           list that is ordered by reputation. While a successful trade
considered in both storage and volunteer computing systems, is      increases the node’s reputation, malicious behavior decreases
the system’s scalability. Apparently, in decentralized systems,     it [1]. In order to avoid malicious behavior and limiting
it is mandatory that they can scale enough regarding the            damage the system, each node notifies its buddies about share
number of nodes. Scalability is an essential property for these     movements. This buddy mechanism supplies accountability.
systems, in order to ensure that their functionality is preserved   Moreover, FreeHaven is also robust since it can keep document
with the increase system’s size.                                    although a high threshold of its shares is lost.
   4) Anonymity: In volunteer computing systems, it is highly          Because of its pursuit of anonymity, persistence, flexibility
desirable from volunteers to keep their identity secret, while      and accountability; efficiency and convenience are ignored.
offering their resources. People are less willing to help when      In order to supply availability it uses trading mechanism
they are required to share personal information. Therefore,         instead of replication mechanism, thus the system is not highly
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                           3



available[2]. Finally, inefficient broadcasts for communication          and write operations. Though, the number of users that can
make FreeHaven less efficient.                                           use Ivy is limited. Thus, it is designed to be utilized by small
                                                                        groups of cooperative users.
B. FreeNet                                                                 All peers are identical and have ability of working either as a
                                                                        client or as a server. Because of its symmetric architecture, it is
   FreeNet [11] is an adaptive pure peer-to-peer storage sys-
                                                                        called pure peer-to-peer. Each node has two main components:
tem for publication, replication, anonymity of authors/readers
                                                                        Chord/Dhash for reliable P2P distributed storage and Ivy
while retrieving data. Like FreeHaven, first goal of FreeNet is
                                                                        Server for transferring data between peers. This architecture is
anonymity and privacy. However, the anonymity of FreeNet
                                                                        actually log based. Each peer has its own log that includes user
does not stand for all network, it is just for file transaction be-
                                                                        information and changes in the file system. Thus for each NFS
cause FreeNet provides anonymity at application layer instead
                                                                        operation a log is created that is stored by Chord/DHash. Since
of transport layer. Thus, discovering source and destination is
                                                                        they are immutable and are kept infinitely, peers can withdraw
infeasible. The other goals of FreeNet is deniability, resistance,
                                                                        any changes. This flexibility is one of the best properties of
efficiency and decentralization.
                                                                        Ivy. All users can read any logs though some file permission
   The nodes in the peer-to-peer FreeNet network, query a
                                                                        attributes.
file that is represented by a location independent key that
                                                                           While a file system is created, a set of logs is created and
is obtained from hash functions for anonymity. Each node
                                                                        a group of peers is set upon these logs. An entry pointing to
maintains each local store that is accessible for others to read
                                                                        a file’s log is put on a view array. This array is traversed by
and write and have dynamic routing table that includes other
                                                                        all peers in order to create a snapshot. The logs are ordered in
peers’ address with their own keys. Whenever a node receives
                                                                        the array and peers use them for records. Thus some users can
a request, it firstly checks its local store. If it exists, it returns
                                                                        use one of the logs concurrently. This cause conflicts, since
data, otherwise it forwards the request to the node that has the
                                                                        Ivy permits concurrent write operations. For this purpose, Ivy
nearest key in the routing table. Furthermore, if the request
                                                                        uses close-to-open consistency in a group of peers. In this
becomes successful, intended data will return like the request.
                                                                        consistency, the Ivy server waits for Dhash which will receive
While data is retrieved, a node on the way also caches this data
                                                                        new log receipts in order to commit a modify operation. Then
and inserts new key to its own routing table. This mechanism
                                                                        that modification is announced. For each NFS operation, peers
provides transparent replication and increasing connectivity in
                                                                        take the latest view array from DHash. Then peers check
the system. In order to cope with limited storage capacity
                                                                        concurrent view vectors that affect the same file by traversing
efficiently, node storage is managed by LRU (Least Recently
                                                                        logs. In any conflict condition, differences are analyzed and
Used) that means data items are sorted based on time of most
                                                                        merged. For file modification an optimistic approach is used,
recent request. Therefore, lastly requested data will be at the
                                                                        although for file creation locking approach is used. Thus
end of the queue. This mechanism does not ensure long term
                                                                        when the number of users is increased, performance will be
survivability for less-interested files.
                                                                        decreased. Because of limited scalability [1], Ivy is suited for
   The FreeNet protocol is packet-oriented and uses self-
                                                                        a small group of users.
contained messages. Each message contains hops-to-live limit,              Every user stores a log of their modifications and at a
depth counter and randomly generated transactionID. It makes            specified time interval, it generates a snapshot, a process which
the corresponding file traceable by nodes. Hops-to-live is set           requires them to retrieve logs from all participating users.
by the sender of the message and it prevents indefinite message          Although retrieving logs of all peers cause a bottleneck in
forwarding. Depth counter is used for setting a sufficient               performance, peers can freely change a file system regardless
number of hop-to-live to ensure that the request will reach             of other peers’ state. The immutable and indefinitely stored
its destination. Thus, it is incremented at each node. These            logs can be used for withdrawing changes. But this operation
three values are used for inserting, retrieving and requesting          is highly costed. As a result, Ivy is distributing its storage but
operations. In order to supply anonymity, it uses probabilistic         it only supports a limited write-once/read-many interface [1].
routing that does not direct communication towards specific
receivers.                                                              D. Frangipani
   Since probabilistic routing is used for providing anonymity,
                                                                           Frangipani [13] is a high performance distributed storage
performance and reliability is not addressed. Like FreeHaven,
                                                                        that is utilized by a cooperative group of users. It is not a
in order to supply anonymous communication, performance is
                                                                        pure peer to peer system, since there is an administrator. It is
scarified. However, because of dynamic storage and routing,
                                                                        aimed to minimize operations of the administrator that means
FreeNet network is highly scalable [3]. Moreover it is robust
                                                                        Frangipani keeps it simple while many nodes are joining [1].
against big failures.
                                                                        Moreover, it is designed to be used in an institution that
                                                                        has secure and private network. Thus, it is not so scalable.
C. Ivy                                                                  However, it provides to users a good performance, since it
   Ivy [12] is another peer-to-peer storage system with file             stripes data between servers by increasing performance in the
system like interface. There is no centralized or dedicated             number of active servers. Frangipani can also be configured
component, thus each user is on the same level. Although                to replicate data [1]. Therefore, it offers redundancy and
many other peer-to-peer storage systems just support either             resilience to failures. This is a crucial property for volunteer
read or write operations for one owner, Ivy supports both read          computing systems.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                         4



   Frangipani has three main components. The first one is               its simplicity by providing correct read-write and shared-write
the Petal Server which provides a virtual disk interface to            semantics between clients via synchronous I/O, and extending
distributed storage. It looks like a local storage, thus it supports   the application interface to relax consistency for performance
a transparent interface to users since distributed storage is          conscious distributed applications. File and directory metadata
hidden. The second component is the Distributed Locking                in Ceph is very small, almost only directory entries (file
Service. It supports consistency in the manner of multiple             names) and inodes (80 bytes) in comparison with conventional
readers - single writer locking philosophy. There are two types        file systems, where no file allocation metadata is necessary. In
of locks, the read and the write. When there are multiple              Ceph, object names are constructed using the inode number,
changes on a file, this service makes them serial to keep               and distributed to OSDs using CRUSH. In order for Ceph
consistency by using these locks. Since Frangipani ensures             to distribute large amount of data a strategy is adapted that
all file in consistent state by locking mechanism, it fairly            distributes new data randomly, migrates a random subsample
degrades its performance. The third component is Frangipani            of existing data to new devices and uniformly redistributes
File Server Module that provides a file system like an interface.       data from removed devices. To maintain system availability
It communicates with other components to be in a consistent            and ensure data safety in a scalable fashion, RADOS (Reli-
state with determined block capacity. Moreover, Fragipani              able Autonomic Distributed Object Store) manages its own
File Server deploys write-ahead redo logging of meta-data for          replication of data using a variant of primary-copy replica-
recovery. When an error is detected in the File Server, the            tion. In order to provide data safety, when acknowledging
logged data that is written in a special area in Petal Server          updates, RADOS allows Ceph to realize low-latency updates
is used for recovery. This mechanism makes Frangipani more             for efficient application synchronization and well-defined data
robust with replication mechanism.                                     safety semantics. For certain failures, such as disk errors or
   As a result, Frangipani is a distributed file system that can be     corrupted data, OSDs can self-report. Failures that make an
scalable in terms of size and performance. However, network            OSD unreachable on the network, however, require active
capacity is a barrier on its performance, because of its design        monitoring, which RADOS distributes by having each OSD
issue. One of the biggest design problems in Frangipani is             monitor those peers with which it shares Placement Groups.
that it assumes secure interconnection in order to scale and           To facilitate fast recovery, OSDs maintain a version number for
operate within an institution [1]. Because of this issue, it           each object and a log of recent changes (names and versions of
does suffer not only from performance but also from non-               updated or deleted objects) for each Placement Group. Ceph
scalability. Besides, it makes an assumption that all nodes in         OSD manages its local object storage with EBOFS, an Extent
the system are trusted, and thus it can not supply a secure            and B-tree based Object File System.
system. Subsequently, the locking mechanism for keeping                   By Ceph’s shedding design assumptions, like allocation
consistency of the system can cause a dramatic performance             lists, data are totally separated from metadata management,
drop.                                                                  allowing them to scale independently. RADOS leverages in-
                                                                       telligent OSDs to manage data replication, failure detection
                                                                       and recovery, low-level disk allocation, scheduling, and data
E. Ceph
                                                                       migration without giving a burden on any central server.
   Ceph [16] is a distributed file system that provides excellent       Finally, Ceph’s metadata management architecture provides a
performance, reliability and scalability and separates data and        single uniform directory hierarchy, which obeys the POSIX
metadata in a maximum manner. It leverages the intelligence            semantics, with scaling performance as new metadata servers
in Object Storage Devices (OSD) to distribute the complexity           join the system.
surrounding data access and utilizes a highly adaptive dis-
tributed metadata cluster architecture, improving scalability
and reliability.                                                       F. TFS
   Ceph eliminates file allocation tables and lists and replaces           TFS [17] provides background tasks with large amounts of
them with generating functions. It comprises of Clients, Clus-         unreliable storage without an impact on the performance of
ters of OSD (which stores all data and metadata) and Metadata          standard file access operations. It allows a peer-to-peer storage
server clusters (which manages the namespace: files and direc-          system to provide more storage and double its performance. It
tories). File data are stripped onto predictably named objects         has an impact on replication in peer-to-peer storage systems.
using a special purpose data distribution, CRUSH (Controlled           The problem with contributory storage systems is that the ap-
Replication Under Scalable Hashing), which assigns objects to          plication performance degrades. As more storage is activated,
storage devices. Novel metadata cluster architecture distributes       the file system operations quickly degrade and this is why
responsibility for managing the file system directory hierarchy.        TFS tries to adapt transparency, which is the non burdening
Clients run on each host executing application code and                effect on the system performance as contributory processes are
exposing a file system interface to applications. The code is run       running. Another problem is that disks are often half empty
entirely to user space, and can be accessed either by linking to       and user are not keen to contribute freely their free space. TFS
it directly or as a mounted file system. CRUSH maps data onto           is a system that contributes all of the idle space while keeping a
a sequence of objects. If one or more clients open a file for           very low load on the performance of the local user’s system.
read access, an MDS grants them the capability to read and             It stores files in the file systems free space and minimizes
cache file content. The Ceph synchronization model retains              interference with file system’s block allocation policy. Other
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                      5



normal files can overwrite the contribution files at any time.       is assigned to the nodes and Tapestry (a self-organizing routing
In addition there is no impact on the bandwidth needed             and object location subsystem) uses local-neighbor maps to
for replication. TFS is useful for replicated storage systems      route messages to their destination NodeID, digit by digit.
executing on stable machines with plenty of bandwidth. (This       When an OceanStore server inserts a replica into the system,
environment is similar to the one used in FARSITE). In a           Tapestry publishes its location by putting a pointer to the
stable network TFS can offer essentially more storage than         replica’s location at each hop between the new replica and
dynamic. A small contribution of storage gives little impact       the object’s root node. In order to locate an object, a client
on the file system’s performance and so TFS ensures the             routes a request to the object’s root until it encounters a replica
transparency of contributed data. In exchange for performance      pointer, which routes directly to that replica.
it sacrifices file persistence as it provides good file systems          When a node wants to join, it chooses a random NodeID
performance by minimizing the amount of work needed by             and a node close to itself. Through routing from this NodeID,
the system when writing ordinary files. It records which blocks     finds other existing nodes that share length suffixes, generates
have been overwritten by marking them as overwritten. If an        full routing table and all the neighbors are notified. When a
overwritten file is tried to be open, the system returns an error   node disappears, neighbors are detecting the absence and they
and the inode/directory entry for that file is deleted and it is    use backpointers to inform relying nodes. In addition a server
denoted as free. Every time a file is deleted the TFS detects       can be removed from OceanStore when it becomes obsolete,
and replicates the file returning error to peers.                   needs schedule maintenance or has component failures. A
   TFS leaves the allocation for local files intact, avoiding       shutdown script to inform the system of server removal is
issues of fragmentation; TFS stores files in such a way that        executed. Even if this script is not used OceanStore will detect
they are completely transparent to local access. TFS consis-       and correct the server’s absence. OceanStore’s design provides
tently provides at least as much storage without overloading       scalability, fault tolerance, self-maintaining and distributed
local performance. TFS can provide about 40 per cent more          storage through adaptation.
storage than the best user-space technique, in the case when
the network is quite stable and enough bandwidth is available.
                                                                   H. Antiquity
This may create questions concerning availability but TFS
primarily depends on a distributed system characteristics, such       Antiquity [14] provides storage services for file systems
as machine availability, bandwidth and the amount of storage       and backup applications. It is a wide-area distributed storage
available.                                                         system that its design assumes that all servers eventually will
                                                                   fail and tries to keep the data integrity even with these failures.
                                                                   Antiquity was developed in the context of OceanStore.
G. OceanStore                                                         In its model the client can be an end-user machine, the
   OceanStore [6] is a global storage infrastructure which         server in a client-server system or a replicated service. The
automatically recovers from failures of servers and network,       system identifies the client and its append-only log from a
puts new resources easily into the system and adjusts to           cryptographic key pair. A log is stored in chunks and when a
usage patterns. It combines erasure codes with a Byzantine         new chunk needs to be allocated the administrator is consulted,
agreement protocol for consistent update serialization, even       who authenticates the client and selects a set of storage
when malicious servers are present.                                servers that can host the new chunk. In order to maintain
   OceanStore consists of individual servers, each cooperating     data securely, high availability and most of all stored data
to provide a service. Such a group of servers is called a pool.    integrity, it uses a secure log which replicates on multiple
Data flows freely between these pools, thus creating replicas       servers. This way durability is ensured in a way that no data
of a data object to anywhere, increasing availability. Because     is lost and all logs can be read. In the case that some logs
OceanStore is composed of untrusted servers, it utilizes redun-    are not modifiable due to the failure of some servers or lack
dancy and client-side cryptographic techniques to protect data.    of replicas, a quorum repair protocol replaces lost replicas
OceanStore attacks the problem of storage-level maintenance        and eventually restores modifiability. In addition Antiquity
with four mechanisms: a self-organizing routing infrastructure,    uses dynamic Byzantine fault-tolerant quorum (threshold) to
m-of-n data coding with repair, Byzantine update commit-           provide consistency among replicas. When the data is repli-
ment, and introspective replica management. Erasure coding         cated on multiple servers, it can be retrieved later even on
transforms a block of input data into fragments, which are         server failures. What is more, Antiquity uses distributed hash
spread over many servers; only a fraction of the fragments         tables to connect the storage servers and to monitor liveness
are needed to reconstruct the original block. A replica of an      and availability of servers. It stores only pointers that identify
object must be exactly the same as the original, despite any       servers in which the actual data are stored.
failures or corruption of fragments. OceanStore resolves this         Antiquity’s design pursues integrity, incremental secure
by naming each object and its associated fragments by the          write and Random read access, durability, consistency and
result of a secure hash function on the contents of the object,    efficiency with low overhead. The results from a simulation
called globally unique identifier (GUID). A node can act as a       showed that from almost all checks done, a quorum of servers
server that stores objects, as a client that initiates requests,   was reachable and in a consistent state, and thus providing a
as a router that forwards messages or as all of these. A           high degree of availability and consistency. The quorum repair
unique identifier NodeID (location and semantics independent)       process balances the availability and consistency even more.
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                    6



Concerning the scalability issue, as each log uses a single        the coordinator who is responsible of all operations sends
administrator and multiple instances are allowed the role of the   the vector clock to reachable nodes that are selected in a
administrator scales well and different logs can use different     preference order list. Writing operations are done, according
administrators.                                                    to the receiving number of responses. Namely, this mechanism
                                                                   is based on quorums. Lastly, if a node does not give any
I. BigTable                                                        response, it is supposed to be in failure mode. When it is
                                                                   removed from the ring, all surrounded nodes are adjusted to
   BigTable [18] is a large-scale distributed storage system for
                                                                   the new state.
managing structured data. It is built on top of several existing
                                                                      Dynamo is targeted to come with solutions of main prob-
Google technologies such as Google File System, Chubby,
                                                                   lems of database management, such as scalability, availability,
and Sawzal and used by many Google’s online services.
                                                                   reliability and performance. While it offers highly-available
The contributors have as primary goals the achievement of
                                                                   and scalable system, it keeps performance high with handling
flexibility, high performance and availability.
                                                                   failures. However, reaching anonymous system is not targeted
   Essentially, BigTable is a "sparse, distributed, persistent
                                                                   in Dynamo.
multi-dimensional sorted map" that indexes each row, column
and timestamp tuple to an array of bytes[19]. Data in BigTable
is maintained in tables that are partitioned into row ranges       K. MongoDB
called tablets. Tablets are the units of data distribution and        MongoDB [22] is a scalable, high-performance, open
load balancing in BigTable. The Bigtable constitutes of three      source, document-oriented structured storage system. It pro-
major components: a library that is linked into every client,      vides document-oriented storage with full index support, auto-
one master server, and many tablet servers, each one of            sharding, sophisticated replication, and compatibility with the
them managing some number of tablets. Different versions           Map/Reduce paradigm.
of data are sorted using timestamp. BigTable supports single-         Instead of storing data in tables and rows as it is regularly
row transactions, which can be used to perform atomic read-        done with relational databases, in MongoDB data is stored
modify-write sequences on data stored under a single row key.      with dynamic schemas. The goal of MongoDB is to bridge
   In overall, Bigtable is tremendously scalable, offering data    the gap between key-value stores and relational databases.
availability and high performance to its users. However, it does   MongoDB has two separate constructs for multi-node topolo-
not deal with issues like security among the nodes, and fault-     gies, which are often combined in the highest-performance
tolerance.                                                         systems: replica sets and shared replica sets. Replica sets are
                                                                   an asynchronous cluster replication technology, and sharding
J. Dynamo                                                          is an automatic data distribution system. Increasing the number
                                                                   of instances in a replica set provides horizontal scalability for
   Dynamo is a key-value storage system that provides keys
                                                                   read performance and fault-tolerance. Increasing the number
to value mapping. It is developed and managed by Amazon
                                                                   of shares (each one being a replica set) allows the distribution
that makes it a proprietary database [21]. However, it is
                                                                   of distinct data to provide horizontal scalability for write
provided to some foundations’ research such as Cassandra.
                                                                   performance.
High-availability and scalability are the main design issues of
                                                                      MongoDB has similar features with relational databases,
Dynamo. It has incremental scalability that means one node
                                                                   like indexes and dynamic queries. It accomplishes availability
can be scaled at a time. Moreover, there is not any central
                                                                   as it supports asynchronous replication of data between servers
administrator and all nodes are on the same level.
                                                                   and it also features a backup and repair mechanism using jour-
   Dynamo is a combined form of both distributed hash
                                                                   naling which increases durability and robustness. Changing
tables(DHTs) and databases [20]. The created keys by hashing
                                                                   the data model from relational to document-oriented provides
data are stored in circular system structure. While they are
                                                                   greater agility through flexible schemes and easier horizontal
stored, the nearest node in clockwise direction is selected to
                                                                   scalability.
be assigned. Moreover, there are virtual nodes that mimic
a node but they are responsible for more than one node.
This mechanism provides incremental scalability by solving         L. Riak
the partitioning problem. Dynamo has effective replication            Riak [23] is a key-value storage systems that is inspired by
mechanism in order to increase availability of the data in         Dynamo. Like Dynamo, it is distributed, highly-available and
the system. In this mechanism, each data is replicated to          scalable. It uses map-reduce mechanism to reduce functional
its specified number of successors. Therefore, each node has        limitations of key-value and to increase power of querying over
replicated data of its predecessors. In addition, system may       stored data in the Riak system. Riak provides fault-tolerant
have more than one versions of a file to increase availabil-        service to its users and this property increases its robustness
ity. However since it causes an inconsistency, vector clocks       level.
are used to determine causal relationship between different           Since it is inspired by Amazon’s Dynamo storage system
versions. These properties increases Dynamo’s durability, as       that is analyzed above, Riak has many similarities with it.
well as availability. Besides "Always writable" property is        It includes both databases storage and distributed hash tables
targeted by Dynamo, this is the second reason of using vector      (DHTs). Like Dynamo, by using consistent hashing methods,
clocks. When a user wants to do a write operation, firstly          keys are mapped to its ring system. Thus all nodes on this ring
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                      7



are identical. Whenever a node joins the network, it is assigned     description allows us to say that it follows a loose P2P scheme,
to define key range partitions. Then it is replicated to reach a      since not everybody have the role of a peer. However, it is
more available system. Like dynamo, write and read operations        described as scalable despite the number of hosts that join
are done based on quorums. Concurrent operation requests             and leave the system consecutively. Because of the replication
are not handled by the help of locks because of performance          technique, data is persistent and available. It is also observed
issues. Instead of a lock mechanism, vector clocks are used          high consistency in the system.
in order to make system strong against failures and keep                TotalRecall could be used for Volunteer Computing in the
system consistent. Another powerful point of Riak is using           case of lazy repair is chosen with erasure code. With these
map-reduce method in querying. Using this method, request            options, TotalRecall performs better when having dynamic
messages are directed to a set of nodes instead of propagating       environments and high possibility of unavailability.
over all nodes.
   Riak has symmetric structure in the node manner since it          O. Farsite
does not have any super or master node among all nodes.
                                                                        Farsite system [26] is a serverless, distributed storage sys-
Moreover it meets some design issues of intended decen-
                                                                     tem that runs on a set of machines and takes advantage of their
tralized storage systems such as high availability, scalability
                                                                     unused storage and network resources. Although, it provides
and robustness. However, anonymity is not handled in this
                                                                     the semantics of a central NTFS file server, its able to scale
design, since it is relatively a new system, and it has many
                                                                     and run on several machines using a portion of their storage.
compatibility problems.
                                                                        Users have access to private and public files through a
                                                                     location-transparent environment. Data replicas are encrypted
M. Pastis                                                            to provide security since the nodes themselves are not secure.
   Pastis [24] is a completely decentralized P2P file system          Moreover, these replicas are distributed among several nodes
with multiple users performing read and write operations. It         to provide a reliable system despite the unreliability and
uses the Past, a highly-scalable P2P storage service, which          frequent unavailability of the nodes. The files structure is based
provides a distributed hash table abstraction. It combines           on a hierarchy, maintained by a distributed directory service.
Past with Pastry, a P2P key-based routing algorithm, to route           Atomicity and scalability are two important properties on
messages between large amounts of Past nodes.                        the Farsite system. All tasks are designed as fully atomic
   For every file, Pastis keeps an inode in which the file’s           actions in order to remain undivided while they get executed.
metadata is stored. Each inode is stored in User Certificate          Farsite could be used for Volunteer Computing, since the man-
Blocks (UCB) and files contents are stored in Content Hash            agement operations can be distributed among the machines,
Blocks (CHB). When a user writes to the file, the version             security is provided because of the encryption algorithm used.
counter is increased and saved to the corresponding inode with       Though, it could be used only for small volunteer computing
the user’s id. To avoid conflicts, if a second user appears and       systems, since it can scale up to a certain number of nodes.
tries to write to the same file, a procedure is triggered to solve
the conflict by comparing the counters and users’ ids from            P. Storage@home
other replicas in the network.
                                                                        Storage@home [27] is a distributed storage infrastructure
   The combination of the Past and the Pastry characterizes
                                                                     designed to store huge amounts of data across many machines
Pastis as a highly-scalable system in terms of network size
                                                                     which join the system as volunteers. It is based on the Fold-
and amount of concurrent clients. Good locality helps in
                                                                     ing@home and it made its appearance to face the problems of
acquiring optimized routes, while self-organization as well
                                                                     this previous system. More precisely, the contributors address
as fault tolerance are achieved thanks to the design. Data is
                                                                     the problems of backing up and distributing data efficiently
replicated among the nodes and therefore it is characterized
                                                                     among the nodes, keeping in mind the limited bandwidth and
by high data availability. A write access control and data
                                                                     the small donation of storage from each node.
integrity are implemented and therefore Pastis is secure since
                                                                        Storage@home constitutes of the volunteers - who have
it is assumed that users trust each other.
                                                                     an agent installed on their machines, a registration server,
                                                                     a metadata server, an identity server and a policy engine.
N. TotalRecall                                                       The Metadata Server is responsible to store information about
   TotalRecall [25] is a P2P storage system that takes into          the location of the files stored in the system and to allow
high consideration an important property of storage systems;         queries for those files. The Identity Server is responsible for
the availability. The system administrator can specify an avail-     the security and identity functionality, as well as for tracking
ability target and studying the previous behavior of the peers, it   effectively the location of IP hosts; whether they are mobile
can predict their future availability, despite the dynamic chang-    or dynamic. The Registration Server is responsible to link
ing nature of the environment. Depending on the condition of         the users’ profiles from the old system; the Folding@home
the system, TotalRecall may use replication, erasure-code or         with this new proposed system. This task is hard to get
hybrid techniques for preserving its redundancy, while it can        implemented since a beneficial aspect of Storage@home is the
dynamically repair itself using eager or lazy repair.                anonymity and the intentional omission of user’s information.
   Except from the peers, the TotalRecall system constitutes         The Policy Engine behaves as the master of the system, as it
of the master host, the storage host and the client host. This       coordinates all the components of the system. It is responsible
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                     8



to plan where to put replicas of data in order to minimize          in order to do a task. Thus, this system does not provide a
the chances of data loss, how data can be retrieved and how         symmetric node network and it is not proper for volunteer
to be transferred to reach the node that has sent a query. It       computing systems. Also, In MongoDB there are three kind of
also remains vigilant to perform repair operations when it is       nodes: Standard, Passive, and Arbiter. Similarly to MongoDB,
needed.                                                             BigTable has master nodes and many chunk servers. Also,
   Storage@home has vital requirements that help it preserve        Antiquity contains the role of an administrator among the
its nature as a storage system as well as a volunteer computing     peers, who is responsible for the new chunk allocation of
system. As a storage system, it should handle effectively           file logs. Thus MongoDB, BigTable and Antiquity are not
failure and recovery operations, and as a volunteer computing       symmetric in the node manner. Moreover, Farsite is based on
system it should manage the relocation of data stored in host       a centralized scheme. Some nodes have - for a period of time
that disappeared. While maintaining the above requirements,         - authority on some files, their content, directory, and users
the authors needed to face several challenges regarding the         permissions. Similarly, TotalRecall constitutes of different
volunteers recruiting and motivation, the policy risk and the       types of nodes; each type having different responsibilities
host relocation. With respect to recruiting volunteers and          regarding the files. Therefore, in both systems other nodes
keeping them motivated, the system needed to be adopted as          cannot work freely, without the permission of other "master"
a reward system that offers points to volunteers in order to        nodes. Antiquity does not provide symmetry as it has the role
motivate and put them in a friendly competition that makes          of the administrator among the peers who is responsible for the
them have fun among them. Regarding the policy risk, it             new chunk allocation of a log. Last but not least, in OceanStore
was quite common for storage@home to get blocked by                 nodes can have different roles, such as a server, a client, a
companies, ISPs and new policies. Therefore the storage of          router or all of them, thus it is not symmetric.
replicas in different nations, states and ISPs appeared to be a        The rest of the systems, as it can be seen in Table 1,
fair solution. Last but not least, host relocation was another      constitute of equal nodes and are subsequently characterized
great challenge that needed to be considered. The system had        as symmetric.
to deal with hundreds of students who were changing residence
- most of the times decreasing their bandwidth - and becoming          2) Availability: In volunteer computing systems, partici-
slower and less effective. Also, the problem of switching off       pants can enter and leave from the system in random time
the machine for a long time for traveling or maintenance            periods. In order to retrieve data, the intended storage systems
purposes cost to the system and consequently a penalization         should be highly available, despite the unavailability of the
policy was introduced to make the volunteers more responsible       participants.
at informing the system for any changes in there condition. In         Most of the systems analyzed are highly available as it is
general this system appears to be reliable as it manages to         shown in the Table 1. Though, the FreeHaven system presents
prevent the loss of data. It is able to work with thousands of      limited level of availability, since there is no replication
volunteers showing its great scalability and its functionality      mechanism, but only periodical trading which makes data
is preserved with the existence of churn. Internet connections      available. Similarly, FreeNet has limited availability because
appear as the bottleneck in the system performance showing          of lack of replication mechanisms and also because it suffers
that any other possible pitfalls of the system are not significant   from long term survivability, especially for non-popular files.
as they can not bypass the bandwidth problem.                          DHash component of Ivy makes it highly available, since
                                                                    DHash replicates and distributes the blocks of files. Thus
                       V. D ISCUSSION                               participants logs can be available even if they are not available
                                                                    themselves. Moreover, Frangipani has cluster member com-
   All systems described offer storage distribution following
                                                                    ponents that are large abstract containers on highly available
different approaches and architectures. In this section, we
                                                                    block level. These cluster members make Frangipani highly
discuss up to what extend these systems have the properties
                                                                    available. Ceph accomplishes availability using RADOS which
that are needed in volunteer computing systems. In Table 1
                                                                    manages data replication following a primary-copy replication
we gather all systems and characteristics together, showing a
                                                                    scheme and also provides update synchronization of the data.
clear view of their state.
                                                                    In OceanStore, one of its main goals is to provide availability,
                                                                    as data flows freely and thus replicas for the data are created.
   1) Symmetry: As previously mentioned, in pure peer-to-           Antiquity uses a secure log which is distributed among mul-
peer systems, all peers are on the same level with equivalent       tiple servers, thus providing a high degree of availability and
functionality. Since each volunteer participant does not have       ensures that all data can be accessed. If for any reason some
priority over other participants and although they are con-         data is lost, a repair service is available for recovery.
trolled by central server of the system, intended distributed          Furthermore, Farsite system replicates data in order to
systems should be purely peer-to-peer.                              ensure availability even with the often unavailability of the
   In the world of storage systems, scientists have trouble         nodes. Likewise, Pastis implements a lazy replication protocol
with presenting systems with "independent" nodes who work           to manage replicas in different nodes. TotalRecall has as
without the guidance of an administrator. In Fragipani file          a main goal the provision of availability, and it suggests
system, there is an administrator who arranges states of nodes,     different ways to ensure that, such as redundancy management
and nodes need to take permission from the administrator            with specified mechanisms, replication, dynamic repairs in
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                                      9


                                                                     Characteristics
                             Systems
                                              Symmetry  Availability Scalability Anonymity     Robustness
                             FreeHaven          Yes         Mid          Low         High        High
                             FreeNet            Yes         Mid          High        Mid         High
                             Ivy                Yes        High          Mid          No         High
                             Frangipani          No        High          High         No         High
                             Ceph               Yes        High          High         No         High
                             OceanStore          No        High          High         No         High
                             Antiquity           No        High          High         No         High
                             BigTable            No        High          High         No         High
                             Dynamo             Yes        High          High         No         High
                             MongoDB             No        High          High         No         High
                             Riak               Yes        High          High         No         High
                             Pastis             Yes        High          High         No         High
                             TotalRecall         No        High          High         No         High
                             Farsite             No        High          Mid          No         High
                             Storage@home       Yes        High          High        High        High
                                                               Tablo I
                                             C OMPARISON OF DIFFERENT S TORAGE S YSTEMS



the case of nodes are leaving permanently from the system.              5) Robustness: By definition, volunteers can come and go,
                                                                     may crash or change their network status. Therefore, volunteer
   3) Scalability: Scalability is an additional property re-         computing systems - and by extension storage systems - should
quired. There are three main scaling techniques: Replication         be enough robust to face these situations.
for spreading copies of data, caching for reusing the cached            All systems studied are highly-robust, thanks to various
data and distribution of divided computation [5]. Thus the           reasons and mechanisms. In FreeHaven, while peers are in
intended decentralized storage systems should have replication       trading, copies of data are stored in a while until proving
or a similar mechanism.                                              trustworthy. Although this mechanism is not good for perfor-
   Of the systems studied, only three of them do not show high       mance, it increases robustness of FreeHaven. Moreover, buddy
results in the scalability issue. FreeHaven and Ivy do not have      system makes it robust, since buddies of each node can regen-
the scalability characteristic as their primary goal and therefore   erate the lost data. Frangipani uses write-ahead redo logging
they are not highly scalable. Farsite is limited to scale up to      mechanism to recovery failures easily. In Freenet protocol, a
   ˆ
105 nodes, which is quite restrictive.                               failure message is forwarded to owner of the request without
                                                                     propagating to any nodes. Thus original requester can make
   Unlike to these systems, Frangipani is designed as highly-
                                                                     another request. By the help of this property of the Freenet
scalable. Petal services competent works cooperatively to
                                                                     protocol, it will be robust against the failures.
supply virtual disks to its user are distributed in order to
                                                                        MongoDB and Riak has replication mechanisms that makes
increase scalability. Also, The rest of the storage systems
                                                                     these systems large-scaled and they are fault-tolerant. These
are classified as large-scale storage systems since they are
                                                                     characteristics of them provide highly-robust systems. Like
specially designed to offer scalability.
                                                                     them, BigTable and Dynamo have great robustness since they
                                                                     are highly-scalable.
   4) Anonymity: Participants in volunteer computing systems            Ceph has a very good mechanism for disk failure monitoring
want to keep secret their identities from others. Thus, intended     and detection as well as fast recovery using different structures
distributed systems should provide anonymity. From our re-           for the file system and by keeping a version number for each
search, we found out that most of the systems do not support         object. In addition, OceanStore’s main goal is to provide a high
anonymity, as it was not in their main concerns.                     level of failure recovery providing fault tolerance and self-
   Systems like FreeHaven, FreeNet offer anonymity as they           maintenance mechanisms with automatic repair. Antiquity’s
focus in their participants needs. They propose to keep users        quorum repair recovers failures and replaces lost replicas
identity, thus they increase resistance against censorship. In       which makes the system quite robust.
fact, for this purpose they scarify efficiency. Like them, users         Storage@home provides self-repair operations for each
in Ceph and Storage@home are anonymous. Moreover, in                 node involved. Pastis takes advantage of the fault tolerance
Ceph the code runs directly from the user space and the              property of the storage layer that it is based on, the Past
processes RADOS and CRUSH are executed without revealing             DHT . In TotalRecall, things are even easier. Since it deals
any information about the identity of the client, even when data     primarily with availability, it addresses this issue using repair
are distributed.                                                     mechanisms which help as well for preserving robustness.
   Anonymity is not a design issue in Frangipani. Thus each          The Farsite system was designed in that way that it handles
user in the Frangipani file system are noticeable and can be          Byzantine faults and therefore be more robust.
detected easily. Like Frangipani, large-scale decentralized             TFS is mainly a file system that works underneath storage
storage systems such as Dynamo, Riak, BigTable, and                  systems. Its availability and anonymity are dependent on the
MongoDB do not handle anonymity as a design issue.                   nodes state and whether the nodes by themselves can be
                                                                     available and anonymous. Thus, it is not included in our
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS                               10



discussion nor in the comparison table.                                 the 19th international conference on World wide web,
   As it is shown in Table 1 and based on our previous dis-             pp.741-750, NY, USA, 2010.
cussion, Storage@home seems to be the most proper storage           [8] D P. Anderson,"Boinc: A system for public-resource com-
system with a clear statement that it can be used in volunteer          puting and storage", 5th IEEE/ACM International Work-
computing systems. It follows a model typical to volunteer              shop on Grid Computing, Pittsburgh, USA,2004.
computing projects, and participants act as volunteers with the     [9] Abdelhamid Elwaer, Ian Taylor, Omer Rana,"Optimizing
ability to compete and gain points based on their contribution          Data Distribution in Volunteer Computing Systems using
in storage and their recruitment process. All users have an             Resources of Participants", Scalable Computing: Practice
agent installed on their machine which takes action after the           and Experience (2011), Volume 12, Number 2, ISSN
users’ registration. Data availability is maintained because            1895-1767 ,pp. 193-208,
each machine stores almost the half size of a file, to be more       [10] A. Oram. (March 15, 2001)." Peer-to-Peer : Harnessing
precise up to 40                                                        the Power of Disruptive Technologies". O’Reilly Media.
                                                                    [11] Ian Clarke, Oskar Sandberg, Brandon Wiley, and
                      VI. C ONCLUSION                                   Theodore W. Hong ,"Freenet: A Distributed Anonymous
                                                                        Information Storage and Retrieval System" , In the Pro-
   In this survey, we initially presented the various proper-
                                                                        ceedings of Designing Privacy Enhancing Technologies:
ties and characteristics that a decentralized storage system
                                                                        Workshop on Design Issues in Anonymity and Unobserv-
must have in order to cooperate efficiently with a volunteer
                                                                        ability, July 2000, pages 46-66.
computing system. The challenges that these systems can
                                                                    [12] Athicha Muthitacharoen , Robert Morris , Thomer M. Gil
face when combined are scalability, availability, symmetry,
                                                                        , Bengie,"Ivy: A Read/Write Peer-to-Peer File System",
anonymity and robustness, all of which are explained in detail.
                                                                        SIGOPS Oper. Syst. Rev., Vol. 36, No. SI. (2002), pp.
We selected some systems that we found important and related
                                                                        31-44, doi:10.1145/844128.844132.
to our work of study and briefly described each one associating
                                                                    [13] Frangipani:      A      Scalable     Distributed     File
the aforementioned characteristics with them. A comparison
                                                                        System,"Frangipani: A Scalable Distributed File System",
follows that explains in depth each characteristic and covers
                                                                        SOSP ’97 Proceedings of the sixteenth ACM symposium
how each one is important for this merge of decentralized
                                                                        on Operating systems principles New York, NY, USA,
storage and volunteer computing systems. As shown in our
                                                                        1997 .
discussion, all systems have different capabilities and function-
                                                                    [14] H. Weatherspoon, P. Eaton, B. Chun, J. Kubiatowicz,
alities which make each one to be more appropriate for specific
                                                                        "Antiquity: exploiting a secure log for wide-area dis-
operations. With all the properties put down and after further
                                                                        tributed storage", ACM SIGOPS Operating Systems Re-
investigation, Storage@home is the most proper, having all the
                                                                        view, v.41 n.3, June, 2007.
properties that such a system describes.
                                                                    [15] S. A. Weil, S.A. Brandt, E. L. Miller, D. D. E. Long, C.
                                                                        Maltzahn, "Ceph: a scalable, high-performance distributed
                         R EFERENCES                                    file system", Proceedings of the 7th USENIX Symposium
[1] M. Placek, R. Buyya, "A taxonomy of distributed stor-               on Operating Systems Design and Implementation, p.22-
    age systems", Technical Report, Grid Computing and                  22, Seattle, WA, November, 2006.
    Distributed Systems Laboratory, The University of Mel-          [16] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H.
    bourne, Australia, July, 2006.                                      Weatherspoon, J.Kubiatowicz, "Maintenance-Free Global
[2] P. Yianilos, S. Sobti, "The evolving field of distributed            Data Storage", IEEE Internet Computing, v.5 n.5, pp.40-
    storage", IEEE Internet Computing, v.5, pp.35-39, 2001.             49, September 2001.
[3] R. Hasan, Z. Anwar, W. Yurcik, L. Brumbaugh, R. Camp-           [17] J. Cipar, M. D. Corner, E. D. Berger, "Contributing
    bell, "A Survey of Peer-to-Peer Storage Techniques for              Storage using the Transparent File System", ACM Trans-
    Distributed File Systems", Proceedings of the International         actions on Storage 3, 3, October, 2007.
    Conference on Information Technology: Coding and Com-           [18] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D.h
    puting, v.2, pp.205-213, Las Vegas, Nevada, April, 2005.            A. Wallach, "Bigtable: A Distributed Storage System for
[4] H. Ge, "Survey of Distributed Storage Systems", Course              Structured Data", Proceedings of the 7th Conference on
    Survey for "Advanced Topics in Information Systems",                USENIX Symposium on OSDI, v.7, Seattle, WA, Novem-
    Spring 2004.                                                        ber 2006.
[5] B. C. Neuman, "Scale in Distributed Systems", In Read-          [19] "CS262B         Advanced        Topics      in     Com-
    ings in Distributed Computing Systems, IEEE Computer                puter       Systems     Spring      2009",      Available:
    Society Press, pp.463-489, 1994.                                    http://www.eecs.berkeley.edu/ culler/summary/bigtable.html
[6] Sean Rhea, Chris Wells, Patrick Eaton, Dennis Geels             [20] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
    ,Ben Zhao,Hakim Weatherspoon,John Kubiatowicz, "                    A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall,
    Maintenance-Free Global Data Storage", IEEE Internet                W. Vogels, "Dynamo: Amazon’s Highly Available Key-
    Computing, Volume 5 Issue 5, September 2001.                        value Store", Proceedings of 21st ACM SIGOPS sympo-
[7] O. Nov, D. Anderson,O. Arazy, "Volunteer Computing:                 sium on Operating systems principles, Stevenson, Wash-
    A Model of the Factors Determining Contribution to                  ington , October 2007.
    Community-based Scientific Research", In Proceedings of          [21] Amazon                DynamoDB.                Available:
A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS   11



    http://aws.amazon.com/dynamodb/
[22] MongoDB, Available: http://www.mongodb.org/
[23] Welcome        to     the    Riak     Wiki.   Available:
    http://wiki.basho.com/Riak.html
[24] F. Picconi, J-M. Busca, and P. Sens, "Pastis: a highly-
    scalable multi-user peer-to-peer file system", EuroPar,
    pp.1173-1182, 2005.
[25] R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. M.
    Voelker, "Total Recall: System Support for Automated
    Availability Management", NSDI, San Fransisco, CA,
    2004.
[26] W. J. Bolosky, J. R. Douceur, and J. Howell, "The farsite
    project: a retrospective", Proceedings of SIGOPS, France,
    2007.
[27] A. L. Beberg , V. S. Pande , "Storage@home: Petascale
    Distributed Storage", Proceedings of IPDPS, Long Beach,
    CA, March 2007.

Contenu connexe

Tendances

1. Overview of Distributed Systems
1. Overview of Distributed Systems1. Overview of Distributed Systems
1. Overview of Distributed SystemsDaminda Herath
 
Final Project IEEE format
Final Project IEEE formatFinal Project IEEE format
Final Project IEEE formatFaizan Ahmed
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systemsnaveedchak
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed SystemsRupsee
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel systemManish Singh
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit INANDINI SHARMA
 
Distributed dbms cs712 power point slides lecture 1
Distributed dbms   cs712 power point slides lecture 1Distributed dbms   cs712 power point slides lecture 1
Distributed dbms cs712 power point slides lecture 1Aimal Syeda
 
Chapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systemsChapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systemsFrancelyno Murela
 
Chapter 1-distribute Computing
Chapter 1-distribute ComputingChapter 1-distribute Computing
Chapter 1-distribute Computingnakomuri
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An IntroductionSuhit Kulkarni
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemudaya khanal
 
Distributed systems1
Distributed systems1Distributed systems1
Distributed systems1Sumita Das
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)Sri Prasanna
 
Distributed System
Distributed SystemDistributed System
Distributed SystemIqra khalil
 
Chapter 1 introduction
Chapter 1 introductionChapter 1 introduction
Chapter 1 introductionTamrat Amare
 
Intro to Distributed Database Management System
Intro to Distributed Database Management SystemIntro to Distributed Database Management System
Intro to Distributed Database Management SystemAli Raza
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
Fragmentation as a Part of Security in Distributed Database: A Survey
Fragmentation as a Part of Security in Distributed Database: A SurveyFragmentation as a Part of Security in Distributed Database: A Survey
Fragmentation as a Part of Security in Distributed Database: A SurveyEditor IJMTER
 

Tendances (20)

1. Overview of Distributed Systems
1. Overview of Distributed Systems1. Overview of Distributed Systems
1. Overview of Distributed Systems
 
Final Project IEEE format
Final Project IEEE formatFinal Project IEEE format
Final Project IEEE format
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit I
 
Distributed dbms cs712 power point slides lecture 1
Distributed dbms   cs712 power point slides lecture 1Distributed dbms   cs712 power point slides lecture 1
Distributed dbms cs712 power point slides lecture 1
 
Chapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systemsChapter 1 -_characterization_of_distributed_systems
Chapter 1 -_characterization_of_distributed_systems
 
Chapter 1-distribute Computing
Chapter 1-distribute ComputingChapter 1-distribute Computing
Chapter 1-distribute Computing
 
istributed system
istributed systemistributed system
istributed system
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An Introduction
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Distributed systems1
Distributed systems1Distributed systems1
Distributed systems1
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
Distributive operating system
Distributive operating systemDistributive operating system
Distributive operating system
 
Distributed System
Distributed SystemDistributed System
Distributed System
 
Chapter 1 introduction
Chapter 1 introductionChapter 1 introduction
Chapter 1 introduction
 
Intro to Distributed Database Management System
Intro to Distributed Database Management SystemIntro to Distributed Database Management System
Intro to Distributed Database Management System
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
Fragmentation as a Part of Security in Distributed Database: A Survey
Fragmentation as a Part of Security in Distributed Database: A SurveyFragmentation as a Part of Security in Distributed Database: A Survey
Fragmentation as a Part of Security in Distributed Database: A Survey
 

Similaire à A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems

DISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docxDISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docxvinaypandey170
 
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologiesA survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologiessharefish
 
- Introduction - Distributed - System -
- Introduction - Distributed - System  -- Introduction - Distributed - System  -
- Introduction - Distributed - System -ssuser7c150a
 
Chapter 1-Introduction.ppt
Chapter 1-Introduction.pptChapter 1-Introduction.ppt
Chapter 1-Introduction.pptsirajmohammed35
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)Dinesh Modak
 
NoSql And The Semantic Web
NoSql And The Semantic WebNoSql And The Semantic Web
NoSql And The Semantic WebIrina Hutanu
 
Distributed system Tanenbaum chapter 1,2,3,4 notes
Distributed system Tanenbaum chapter 1,2,3,4 notes Distributed system Tanenbaum chapter 1,2,3,4 notes
Distributed system Tanenbaum chapter 1,2,3,4 notes SAhammedShakil
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
 
Efficient Cloud Caching
Efficient Cloud CachingEfficient Cloud Caching
Efficient Cloud CachingIJERA Editor
 
Chapter 1-Introduction.ppt
Chapter 1-Introduction.pptChapter 1-Introduction.ppt
Chapter 1-Introduction.pptbalewayalew
 
Chap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lectureChap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lectureMuhammad Arslan
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systemsijceronline
 
Lect 2 Types of Distributed Systems.pptx
Lect 2 Types of Distributed Systems.pptxLect 2 Types of Distributed Systems.pptx
Lect 2 Types of Distributed Systems.pptxPardonSamson
 

Similaire à A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems (20)

DISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docxDISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docx
 
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologiesA survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
 
Tr 85.4
Tr 85.4Tr 85.4
Tr 85.4
 
- Introduction - Distributed - System -
- Introduction - Distributed - System  -- Introduction - Distributed - System  -
- Introduction - Distributed - System -
 
Chapter 1-Introduction.ppt
Chapter 1-Introduction.pptChapter 1-Introduction.ppt
Chapter 1-Introduction.ppt
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
NoSql And The Semantic Web
NoSql And The Semantic WebNoSql And The Semantic Web
NoSql And The Semantic Web
 
Distributed system Tanenbaum chapter 1,2,3,4 notes
Distributed system Tanenbaum chapter 1,2,3,4 notes Distributed system Tanenbaum chapter 1,2,3,4 notes
Distributed system Tanenbaum chapter 1,2,3,4 notes
 
Distributed Systems.pptx
Distributed Systems.pptxDistributed Systems.pptx
Distributed Systems.pptx
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
Ijcatr04071003
Ijcatr04071003Ijcatr04071003
Ijcatr04071003
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
Efficient Cloud Caching
Efficient Cloud CachingEfficient Cloud Caching
Efficient Cloud Caching
 
Chapter 1-Introduction.ppt
Chapter 1-Introduction.pptChapter 1-Introduction.ppt
Chapter 1-Introduction.ppt
 
Chap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lectureChap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lecture
 
Chapter One.ppt
Chapter One.pptChapter One.ppt
Chapter One.ppt
 
Distributed clouds — micro clouds
Distributed clouds — micro cloudsDistributed clouds — micro clouds
Distributed clouds — micro clouds
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systems
 
Lect 2 Types of Distributed Systems.pptx
Lect 2 Types of Distributed Systems.pptxLect 2 Types of Distributed Systems.pptx
Lect 2 Types of Distributed Systems.pptx
 

Plus de Maria Stylianou

SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareSPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareMaria Stylianou
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksMaria Stylianou
 
Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Maria Stylianou
 
Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Maria Stylianou
 
Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Maria Stylianou
 
Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Maria Stylianou
 
Automatic Energy-based Scheduling
Automatic Energy-based SchedulingAutomatic Energy-based Scheduling
Automatic Energy-based SchedulingMaria Stylianou
 
Intelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesIntelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesMaria Stylianou
 
Instrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkInstrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkMaria Stylianou
 
Low-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersLow-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersMaria Stylianou
 
How Companies Learn Your Secrets
How Companies Learn Your SecretsHow Companies Learn Your Secrets
How Companies Learn Your SecretsMaria Stylianou
 
EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services Maria Stylianou
 
EEDC - Distributed Systems
EEDC - Distributed SystemsEEDC - Distributed Systems
EEDC - Distributed SystemsMaria Stylianou
 

Plus de Maria Stylianou (16)

SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication MiddlewareSPARJA: a Distributed Social Graph Partitioning and Replication Middleware
SPARJA: a Distributed Social Graph Partitioning and Replication Middleware
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible Attacks
 
Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)
 
Erlang in 10 minutes
Erlang in 10 minutesErlang in 10 minutes
Erlang in 10 minutes
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
 
Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee
 
Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee
 
Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...
 
Automatic Energy-based Scheduling
Automatic Energy-based SchedulingAutomatic Energy-based Scheduling
Automatic Energy-based Scheduling
 
Intelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesIntelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet Services
 
Instrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkInstrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel Benchmark
 
Low-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersLow-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic Registers
 
How Companies Learn Your Secrets
How Companies Learn Your SecretsHow Companies Learn Your Secrets
How Companies Learn Your Secrets
 
EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services
 
EEDC - Distributed Systems
EEDC - Distributed SystemsEEDC - Distributed Systems
EEDC - Distributed Systems
 

Dernier

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems

  • 1. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 1 A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer Computing Systems Umit Cavus Buyuksahin, Maria Stylianou, Nicos Demetriou, Muhammad Adnan Khan Abstract—Over the last decades, distributed systems are pro- their capacity. Due to this demand, researchers turn to unused moted for extended computations and are presented as the ideal storage resources. Globally, there are many personal computers storage space for large amounts of data. Distributed Storage whose resources are not fully used by their owners. Volunteer Systems have been moved from the centralized architecture to a more decentralized approach. This change allows such systems to computing systems aim to use these storage for enormous- be used by volunteer computing systems, where the exploitation sized computations by considering them as if they were parts of any available storage and resources is essential and greatly of a huge supercomputer. This is a powerful way to utilize needed. This survey explores the characteristics of scalable distributed resources, in order to complete large-scale tasks. decentralized storage systems that can be used by volunteer Volunteer computing systems have two main bases [7]. The computing systems and discusses the various existing systems in terms of the specified characteristics. For each surveyed system first one is the computational base, in which large computa- we give a brief description and whether the required properties tion tasks are split into smaller tasks which are assigned to are ensured. volunteer participants’ computers. The second base is called Index Terms—decentralized storage systems, volunteer com- participative base and it deploys large number of volunteer puting systems participants who offer their resources. One of the well known volunteer computing systems is SETI@home launched by BOINC projects [8]. Nowadays, I. INTRODUCTION SETI@home works with about one million computers which Storage is one of the fundamental parts of the computing provide approximately 70 TeraFLOPs processing rate [8]. [1]. Although it has lower speed than RAM, it has great Of course this resource usage can be increased when we persistence and low cost. Thus, central storage systems were look at the potential resource in the world. However this is constructed and focused on reliability, stability, and efficiency. unnecessary since the network is growing rapidly. However, nowadays computation is not limited on a central These volunteer computing systems produce huge amounts storage space, but it is executed in a global environment, of computational data that should be stored. This data may like Internet. As Internet becomes part of this computation, it be used for later processing or sharing with other scientific produces huge amounts of information that need to be gathered organizations that may contribute to science area. However, and stored. For addressing this challenge, distributed storages today’s volunteer computing systems use centralized stor- systems are introduced. In this design, data stored by hosts age systems [9] to distribute data to participants. It suffers become geographically distributed. Because of this distribu- from limitations of centralized storage systems such as fault- tion and the appearance of huge demands, new challenges tolerance, availability and scalability. arise, such as fault-tolerance, availability, security, robustness, In order to pass over these limitations, new storage systems survivability, scalability, anonymity. are developed which are decentralized and can be used by With the grow of Internet, distributed storage systems are volunteer computing systems efficiently. As previously men- able to scale using larger amounts of users. This growth tioned, there are many kind of decentralized storage systems. has emerge the difficulty of having one central point for However, not all of them are suitable to be used in volunteer administrating the system. Therefore, it is observed in other computing systems. In this survey we study several storage surveys that these systems are moving from the centralized systems, we discuss their characteristics and challenges and we architecture to a more decentralized approach [1]. propose the most proper one to be used in volunteer computing Meanwhile, supercomputers are situated among us exe- systems. cuting big computations which require huge storage, power The rest of the paper is organized as follows: In section 3, and computational resources, and lead to a rapid decrease of we present related work done by other researches in the field. In section 4, design issues of decentralized storage systems Umit Cavus Buyuksahin, Universitat Politecnica de Catalunya (UPC). E- that can be used in volunteer computing systems are examined mail: ucbuyuksahin@gmail.com Maria Stylianou, Universitat Politecnica de Catalunya (UPC). E-mail: by extracting characteristics. In section 5 we briefly overview marsty5@gmail.com some of the existing decentralized storage systems. Later on, Nicos Demetriou, Universitat Politecnica de Catalunya (UPC). E-mail: in section 6 we compare them regarding their characteristics nicosdem7@gmail.com Muhammad Adnan Khan, Universitat Politecnica de Catalunya (UPC). E- and benefits and propose the most suitable one to be used in mail:malikadnan78@gmail.com volunteer computing systems. Finally, in section 6 we conclude
  • 2. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 2 the survey with our final remarks about the systems studied. anonymity in volunteering can increase the number of par- ticipants which is highly appreciated and encouraged. What is II. R ELATED W ORKS more, anonymity can be a way to prevent the denial of access for special groups of people, which is possible when personal In this section we present the different surveys related information is shared. to the subject that we are focused on. [3] discusses the 5) Robustness: Both types of systems, storage and vol- different properties of the Peer-to-Peer based distributed file unteer computing are prone to failures, as machines may systems. It shows the various benefits of using P2P systems, crash, reboot, or change location with different network char- the design issues and properties. In addition it presents the acteristics and capabilities. In order to efficiently associate major distributed file systems comparing the advantages and decentralized storage systems with volunteer ones, the former disadvantages for each one in detail. As well, [4] provides an systems should be robust enough to handle these changes and insight into existing storage systems, giving a good overview repair themselves in the case of failures, in order to preserve of each and describes the important characteristics they should this advantage in volunteer computing systems as well. have. In [1], a variety of distributed storage systems is covered in depth, presenting their functionalities and putting the reader into the problems that these systems face and the solutions IV. D ECENTRALIZED S TORAGE S YSTEMS proposed to overcome them. A quite short but rich paper is In the following section, we present a short summary for the [2] discusses the evolving area of distributed storage systems storage systems studied, referring to the previously explained and gives a brief summary of some related systems in order properties. to provide a broader view for the subject. A. FreeHaven III. P RINCIPAL C HARACTERISTICS OF D ECENTRALIZED FreeHaven [10] firstly came with a solution about S TORAGE S YSTEMS anonymity whose implementation is not commonly handled by Several decentralized storage systems have been proposed distributed storage systems. This means that it provides peers over the last years. However, not all of them are suitable to distribute and share data anonymously by protecting peers’ for volunteer computing. Specific characteristics should be identity. The other goals of FreeHaven are: (a) Persistence for examined and we should ensure their existence in the intended determining lifetime of documents, (b) Flexibility for changing storage systems, in order to meet the requirements of volunteer systems functions, (c) Accountability for limiting damage to computing systems. Below, we analyze the most important system. ones, their specifications and effects. Since there is not a hierarchy and all nodes are on the 1) Symmetry: Symmetry is a desired characteristic as much same level, it is a pure peer-to-peer system, it is symmetric for decentralized storage systems as for volunteer computing and balanced. Despite of the fact that nodes do not have spe- systems. In the case of storage systems, and more precisely in cial capability unlike client-server systems, they have special pure peer-to-peer systems, symmetry exists when all peers are roles such as the author who initially creates documents, the on the same level with equivalent functionality [3]. Similarly, publisher who put the documents to FreeHaven system, the in the case of volunteer computing systems, each volunteer reader who takes documents from systems, and servers who participant does not have priority nor a special treatment provide storage. All these nodes have a pseudonym and nodes compared to others. Also, volunteers do not need a permission know each other by their pseudonym. Thus, locating the peers from an administrator to execute a task or to save data. This is a difficult issue. In addition, tracing the routes is difficult is done by definition independently and automatically. issue as well, since FreeHaven uses onion routing that is used 2) Availability: In volunteer computing systems, it is ex- for broadcasting the queries. The difficulties in both locating pected that participants can not be enforced to enter the system peers and tracing the routes is for protecting the user identity or leave the system in specific moments. Data should be reach- that means supplying anonymously communication. Server able independently from the peers status, from their location nodes periodically trade parts of documents called shares with and from the time of the request. Therefore, availability is an each other. That trading gives flexibility to the system in essential property for decentralized storage systems in order the sense that servers can join and leave easily and without to be used in volunteer computing systems. special treatment. For trading, nodes are chosen by a node 3) Scalability: Another important issue that has to be list that is ordered by reputation. While a successful trade considered in both storage and volunteer computing systems, is increases the node’s reputation, malicious behavior decreases the system’s scalability. Apparently, in decentralized systems, it [1]. In order to avoid malicious behavior and limiting it is mandatory that they can scale enough regarding the damage the system, each node notifies its buddies about share number of nodes. Scalability is an essential property for these movements. This buddy mechanism supplies accountability. systems, in order to ensure that their functionality is preserved Moreover, FreeHaven is also robust since it can keep document with the increase system’s size. although a high threshold of its shares is lost. 4) Anonymity: In volunteer computing systems, it is highly Because of its pursuit of anonymity, persistence, flexibility desirable from volunteers to keep their identity secret, while and accountability; efficiency and convenience are ignored. offering their resources. People are less willing to help when In order to supply availability it uses trading mechanism they are required to share personal information. Therefore, instead of replication mechanism, thus the system is not highly
  • 3. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 3 available[2]. Finally, inefficient broadcasts for communication and write operations. Though, the number of users that can make FreeHaven less efficient. use Ivy is limited. Thus, it is designed to be utilized by small groups of cooperative users. B. FreeNet All peers are identical and have ability of working either as a client or as a server. Because of its symmetric architecture, it is FreeNet [11] is an adaptive pure peer-to-peer storage sys- called pure peer-to-peer. Each node has two main components: tem for publication, replication, anonymity of authors/readers Chord/Dhash for reliable P2P distributed storage and Ivy while retrieving data. Like FreeHaven, first goal of FreeNet is Server for transferring data between peers. This architecture is anonymity and privacy. However, the anonymity of FreeNet actually log based. Each peer has its own log that includes user does not stand for all network, it is just for file transaction be- information and changes in the file system. Thus for each NFS cause FreeNet provides anonymity at application layer instead operation a log is created that is stored by Chord/DHash. Since of transport layer. Thus, discovering source and destination is they are immutable and are kept infinitely, peers can withdraw infeasible. The other goals of FreeNet is deniability, resistance, any changes. This flexibility is one of the best properties of efficiency and decentralization. Ivy. All users can read any logs though some file permission The nodes in the peer-to-peer FreeNet network, query a attributes. file that is represented by a location independent key that While a file system is created, a set of logs is created and is obtained from hash functions for anonymity. Each node a group of peers is set upon these logs. An entry pointing to maintains each local store that is accessible for others to read a file’s log is put on a view array. This array is traversed by and write and have dynamic routing table that includes other all peers in order to create a snapshot. The logs are ordered in peers’ address with their own keys. Whenever a node receives the array and peers use them for records. Thus some users can a request, it firstly checks its local store. If it exists, it returns use one of the logs concurrently. This cause conflicts, since data, otherwise it forwards the request to the node that has the Ivy permits concurrent write operations. For this purpose, Ivy nearest key in the routing table. Furthermore, if the request uses close-to-open consistency in a group of peers. In this becomes successful, intended data will return like the request. consistency, the Ivy server waits for Dhash which will receive While data is retrieved, a node on the way also caches this data new log receipts in order to commit a modify operation. Then and inserts new key to its own routing table. This mechanism that modification is announced. For each NFS operation, peers provides transparent replication and increasing connectivity in take the latest view array from DHash. Then peers check the system. In order to cope with limited storage capacity concurrent view vectors that affect the same file by traversing efficiently, node storage is managed by LRU (Least Recently logs. In any conflict condition, differences are analyzed and Used) that means data items are sorted based on time of most merged. For file modification an optimistic approach is used, recent request. Therefore, lastly requested data will be at the although for file creation locking approach is used. Thus end of the queue. This mechanism does not ensure long term when the number of users is increased, performance will be survivability for less-interested files. decreased. Because of limited scalability [1], Ivy is suited for The FreeNet protocol is packet-oriented and uses self- a small group of users. contained messages. Each message contains hops-to-live limit, Every user stores a log of their modifications and at a depth counter and randomly generated transactionID. It makes specified time interval, it generates a snapshot, a process which the corresponding file traceable by nodes. Hops-to-live is set requires them to retrieve logs from all participating users. by the sender of the message and it prevents indefinite message Although retrieving logs of all peers cause a bottleneck in forwarding. Depth counter is used for setting a sufficient performance, peers can freely change a file system regardless number of hop-to-live to ensure that the request will reach of other peers’ state. The immutable and indefinitely stored its destination. Thus, it is incremented at each node. These logs can be used for withdrawing changes. But this operation three values are used for inserting, retrieving and requesting is highly costed. As a result, Ivy is distributing its storage but operations. In order to supply anonymity, it uses probabilistic it only supports a limited write-once/read-many interface [1]. routing that does not direct communication towards specific receivers. D. Frangipani Since probabilistic routing is used for providing anonymity, Frangipani [13] is a high performance distributed storage performance and reliability is not addressed. Like FreeHaven, that is utilized by a cooperative group of users. It is not a in order to supply anonymous communication, performance is pure peer to peer system, since there is an administrator. It is scarified. However, because of dynamic storage and routing, aimed to minimize operations of the administrator that means FreeNet network is highly scalable [3]. Moreover it is robust Frangipani keeps it simple while many nodes are joining [1]. against big failures. Moreover, it is designed to be used in an institution that has secure and private network. Thus, it is not so scalable. C. Ivy However, it provides to users a good performance, since it Ivy [12] is another peer-to-peer storage system with file stripes data between servers by increasing performance in the system like interface. There is no centralized or dedicated number of active servers. Frangipani can also be configured component, thus each user is on the same level. Although to replicate data [1]. Therefore, it offers redundancy and many other peer-to-peer storage systems just support either resilience to failures. This is a crucial property for volunteer read or write operations for one owner, Ivy supports both read computing systems.
  • 4. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 4 Frangipani has three main components. The first one is its simplicity by providing correct read-write and shared-write the Petal Server which provides a virtual disk interface to semantics between clients via synchronous I/O, and extending distributed storage. It looks like a local storage, thus it supports the application interface to relax consistency for performance a transparent interface to users since distributed storage is conscious distributed applications. File and directory metadata hidden. The second component is the Distributed Locking in Ceph is very small, almost only directory entries (file Service. It supports consistency in the manner of multiple names) and inodes (80 bytes) in comparison with conventional readers - single writer locking philosophy. There are two types file systems, where no file allocation metadata is necessary. In of locks, the read and the write. When there are multiple Ceph, object names are constructed using the inode number, changes on a file, this service makes them serial to keep and distributed to OSDs using CRUSH. In order for Ceph consistency by using these locks. Since Frangipani ensures to distribute large amount of data a strategy is adapted that all file in consistent state by locking mechanism, it fairly distributes new data randomly, migrates a random subsample degrades its performance. The third component is Frangipani of existing data to new devices and uniformly redistributes File Server Module that provides a file system like an interface. data from removed devices. To maintain system availability It communicates with other components to be in a consistent and ensure data safety in a scalable fashion, RADOS (Reli- state with determined block capacity. Moreover, Fragipani able Autonomic Distributed Object Store) manages its own File Server deploys write-ahead redo logging of meta-data for replication of data using a variant of primary-copy replica- recovery. When an error is detected in the File Server, the tion. In order to provide data safety, when acknowledging logged data that is written in a special area in Petal Server updates, RADOS allows Ceph to realize low-latency updates is used for recovery. This mechanism makes Frangipani more for efficient application synchronization and well-defined data robust with replication mechanism. safety semantics. For certain failures, such as disk errors or As a result, Frangipani is a distributed file system that can be corrupted data, OSDs can self-report. Failures that make an scalable in terms of size and performance. However, network OSD unreachable on the network, however, require active capacity is a barrier on its performance, because of its design monitoring, which RADOS distributes by having each OSD issue. One of the biggest design problems in Frangipani is monitor those peers with which it shares Placement Groups. that it assumes secure interconnection in order to scale and To facilitate fast recovery, OSDs maintain a version number for operate within an institution [1]. Because of this issue, it each object and a log of recent changes (names and versions of does suffer not only from performance but also from non- updated or deleted objects) for each Placement Group. Ceph scalability. Besides, it makes an assumption that all nodes in OSD manages its local object storage with EBOFS, an Extent the system are trusted, and thus it can not supply a secure and B-tree based Object File System. system. Subsequently, the locking mechanism for keeping By Ceph’s shedding design assumptions, like allocation consistency of the system can cause a dramatic performance lists, data are totally separated from metadata management, drop. allowing them to scale independently. RADOS leverages in- telligent OSDs to manage data replication, failure detection and recovery, low-level disk allocation, scheduling, and data E. Ceph migration without giving a burden on any central server. Ceph [16] is a distributed file system that provides excellent Finally, Ceph’s metadata management architecture provides a performance, reliability and scalability and separates data and single uniform directory hierarchy, which obeys the POSIX metadata in a maximum manner. It leverages the intelligence semantics, with scaling performance as new metadata servers in Object Storage Devices (OSD) to distribute the complexity join the system. surrounding data access and utilizes a highly adaptive dis- tributed metadata cluster architecture, improving scalability and reliability. F. TFS Ceph eliminates file allocation tables and lists and replaces TFS [17] provides background tasks with large amounts of them with generating functions. It comprises of Clients, Clus- unreliable storage without an impact on the performance of ters of OSD (which stores all data and metadata) and Metadata standard file access operations. It allows a peer-to-peer storage server clusters (which manages the namespace: files and direc- system to provide more storage and double its performance. It tories). File data are stripped onto predictably named objects has an impact on replication in peer-to-peer storage systems. using a special purpose data distribution, CRUSH (Controlled The problem with contributory storage systems is that the ap- Replication Under Scalable Hashing), which assigns objects to plication performance degrades. As more storage is activated, storage devices. Novel metadata cluster architecture distributes the file system operations quickly degrade and this is why responsibility for managing the file system directory hierarchy. TFS tries to adapt transparency, which is the non burdening Clients run on each host executing application code and effect on the system performance as contributory processes are exposing a file system interface to applications. The code is run running. Another problem is that disks are often half empty entirely to user space, and can be accessed either by linking to and user are not keen to contribute freely their free space. TFS it directly or as a mounted file system. CRUSH maps data onto is a system that contributes all of the idle space while keeping a a sequence of objects. If one or more clients open a file for very low load on the performance of the local user’s system. read access, an MDS grants them the capability to read and It stores files in the file systems free space and minimizes cache file content. The Ceph synchronization model retains interference with file system’s block allocation policy. Other
  • 5. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 5 normal files can overwrite the contribution files at any time. is assigned to the nodes and Tapestry (a self-organizing routing In addition there is no impact on the bandwidth needed and object location subsystem) uses local-neighbor maps to for replication. TFS is useful for replicated storage systems route messages to their destination NodeID, digit by digit. executing on stable machines with plenty of bandwidth. (This When an OceanStore server inserts a replica into the system, environment is similar to the one used in FARSITE). In a Tapestry publishes its location by putting a pointer to the stable network TFS can offer essentially more storage than replica’s location at each hop between the new replica and dynamic. A small contribution of storage gives little impact the object’s root node. In order to locate an object, a client on the file system’s performance and so TFS ensures the routes a request to the object’s root until it encounters a replica transparency of contributed data. In exchange for performance pointer, which routes directly to that replica. it sacrifices file persistence as it provides good file systems When a node wants to join, it chooses a random NodeID performance by minimizing the amount of work needed by and a node close to itself. Through routing from this NodeID, the system when writing ordinary files. It records which blocks finds other existing nodes that share length suffixes, generates have been overwritten by marking them as overwritten. If an full routing table and all the neighbors are notified. When a overwritten file is tried to be open, the system returns an error node disappears, neighbors are detecting the absence and they and the inode/directory entry for that file is deleted and it is use backpointers to inform relying nodes. In addition a server denoted as free. Every time a file is deleted the TFS detects can be removed from OceanStore when it becomes obsolete, and replicates the file returning error to peers. needs schedule maintenance or has component failures. A TFS leaves the allocation for local files intact, avoiding shutdown script to inform the system of server removal is issues of fragmentation; TFS stores files in such a way that executed. Even if this script is not used OceanStore will detect they are completely transparent to local access. TFS consis- and correct the server’s absence. OceanStore’s design provides tently provides at least as much storage without overloading scalability, fault tolerance, self-maintaining and distributed local performance. TFS can provide about 40 per cent more storage through adaptation. storage than the best user-space technique, in the case when the network is quite stable and enough bandwidth is available. H. Antiquity This may create questions concerning availability but TFS primarily depends on a distributed system characteristics, such Antiquity [14] provides storage services for file systems as machine availability, bandwidth and the amount of storage and backup applications. It is a wide-area distributed storage available. system that its design assumes that all servers eventually will fail and tries to keep the data integrity even with these failures. Antiquity was developed in the context of OceanStore. G. OceanStore In its model the client can be an end-user machine, the OceanStore [6] is a global storage infrastructure which server in a client-server system or a replicated service. The automatically recovers from failures of servers and network, system identifies the client and its append-only log from a puts new resources easily into the system and adjusts to cryptographic key pair. A log is stored in chunks and when a usage patterns. It combines erasure codes with a Byzantine new chunk needs to be allocated the administrator is consulted, agreement protocol for consistent update serialization, even who authenticates the client and selects a set of storage when malicious servers are present. servers that can host the new chunk. In order to maintain OceanStore consists of individual servers, each cooperating data securely, high availability and most of all stored data to provide a service. Such a group of servers is called a pool. integrity, it uses a secure log which replicates on multiple Data flows freely between these pools, thus creating replicas servers. This way durability is ensured in a way that no data of a data object to anywhere, increasing availability. Because is lost and all logs can be read. In the case that some logs OceanStore is composed of untrusted servers, it utilizes redun- are not modifiable due to the failure of some servers or lack dancy and client-side cryptographic techniques to protect data. of replicas, a quorum repair protocol replaces lost replicas OceanStore attacks the problem of storage-level maintenance and eventually restores modifiability. In addition Antiquity with four mechanisms: a self-organizing routing infrastructure, uses dynamic Byzantine fault-tolerant quorum (threshold) to m-of-n data coding with repair, Byzantine update commit- provide consistency among replicas. When the data is repli- ment, and introspective replica management. Erasure coding cated on multiple servers, it can be retrieved later even on transforms a block of input data into fragments, which are server failures. What is more, Antiquity uses distributed hash spread over many servers; only a fraction of the fragments tables to connect the storage servers and to monitor liveness are needed to reconstruct the original block. A replica of an and availability of servers. It stores only pointers that identify object must be exactly the same as the original, despite any servers in which the actual data are stored. failures or corruption of fragments. OceanStore resolves this Antiquity’s design pursues integrity, incremental secure by naming each object and its associated fragments by the write and Random read access, durability, consistency and result of a secure hash function on the contents of the object, efficiency with low overhead. The results from a simulation called globally unique identifier (GUID). A node can act as a showed that from almost all checks done, a quorum of servers server that stores objects, as a client that initiates requests, was reachable and in a consistent state, and thus providing a as a router that forwards messages or as all of these. A high degree of availability and consistency. The quorum repair unique identifier NodeID (location and semantics independent) process balances the availability and consistency even more.
  • 6. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 6 Concerning the scalability issue, as each log uses a single the coordinator who is responsible of all operations sends administrator and multiple instances are allowed the role of the the vector clock to reachable nodes that are selected in a administrator scales well and different logs can use different preference order list. Writing operations are done, according administrators. to the receiving number of responses. Namely, this mechanism is based on quorums. Lastly, if a node does not give any I. BigTable response, it is supposed to be in failure mode. When it is removed from the ring, all surrounded nodes are adjusted to BigTable [18] is a large-scale distributed storage system for the new state. managing structured data. It is built on top of several existing Dynamo is targeted to come with solutions of main prob- Google technologies such as Google File System, Chubby, lems of database management, such as scalability, availability, and Sawzal and used by many Google’s online services. reliability and performance. While it offers highly-available The contributors have as primary goals the achievement of and scalable system, it keeps performance high with handling flexibility, high performance and availability. failures. However, reaching anonymous system is not targeted Essentially, BigTable is a "sparse, distributed, persistent in Dynamo. multi-dimensional sorted map" that indexes each row, column and timestamp tuple to an array of bytes[19]. Data in BigTable is maintained in tables that are partitioned into row ranges K. MongoDB called tablets. Tablets are the units of data distribution and MongoDB [22] is a scalable, high-performance, open load balancing in BigTable. The Bigtable constitutes of three source, document-oriented structured storage system. It pro- major components: a library that is linked into every client, vides document-oriented storage with full index support, auto- one master server, and many tablet servers, each one of sharding, sophisticated replication, and compatibility with the them managing some number of tablets. Different versions Map/Reduce paradigm. of data are sorted using timestamp. BigTable supports single- Instead of storing data in tables and rows as it is regularly row transactions, which can be used to perform atomic read- done with relational databases, in MongoDB data is stored modify-write sequences on data stored under a single row key. with dynamic schemas. The goal of MongoDB is to bridge In overall, Bigtable is tremendously scalable, offering data the gap between key-value stores and relational databases. availability and high performance to its users. However, it does MongoDB has two separate constructs for multi-node topolo- not deal with issues like security among the nodes, and fault- gies, which are often combined in the highest-performance tolerance. systems: replica sets and shared replica sets. Replica sets are an asynchronous cluster replication technology, and sharding J. Dynamo is an automatic data distribution system. Increasing the number of instances in a replica set provides horizontal scalability for Dynamo is a key-value storage system that provides keys read performance and fault-tolerance. Increasing the number to value mapping. It is developed and managed by Amazon of shares (each one being a replica set) allows the distribution that makes it a proprietary database [21]. However, it is of distinct data to provide horizontal scalability for write provided to some foundations’ research such as Cassandra. performance. High-availability and scalability are the main design issues of MongoDB has similar features with relational databases, Dynamo. It has incremental scalability that means one node like indexes and dynamic queries. It accomplishes availability can be scaled at a time. Moreover, there is not any central as it supports asynchronous replication of data between servers administrator and all nodes are on the same level. and it also features a backup and repair mechanism using jour- Dynamo is a combined form of both distributed hash naling which increases durability and robustness. Changing tables(DHTs) and databases [20]. The created keys by hashing the data model from relational to document-oriented provides data are stored in circular system structure. While they are greater agility through flexible schemes and easier horizontal stored, the nearest node in clockwise direction is selected to scalability. be assigned. Moreover, there are virtual nodes that mimic a node but they are responsible for more than one node. This mechanism provides incremental scalability by solving L. Riak the partitioning problem. Dynamo has effective replication Riak [23] is a key-value storage systems that is inspired by mechanism in order to increase availability of the data in Dynamo. Like Dynamo, it is distributed, highly-available and the system. In this mechanism, each data is replicated to scalable. It uses map-reduce mechanism to reduce functional its specified number of successors. Therefore, each node has limitations of key-value and to increase power of querying over replicated data of its predecessors. In addition, system may stored data in the Riak system. Riak provides fault-tolerant have more than one versions of a file to increase availabil- service to its users and this property increases its robustness ity. However since it causes an inconsistency, vector clocks level. are used to determine causal relationship between different Since it is inspired by Amazon’s Dynamo storage system versions. These properties increases Dynamo’s durability, as that is analyzed above, Riak has many similarities with it. well as availability. Besides "Always writable" property is It includes both databases storage and distributed hash tables targeted by Dynamo, this is the second reason of using vector (DHTs). Like Dynamo, by using consistent hashing methods, clocks. When a user wants to do a write operation, firstly keys are mapped to its ring system. Thus all nodes on this ring
  • 7. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 7 are identical. Whenever a node joins the network, it is assigned description allows us to say that it follows a loose P2P scheme, to define key range partitions. Then it is replicated to reach a since not everybody have the role of a peer. However, it is more available system. Like dynamo, write and read operations described as scalable despite the number of hosts that join are done based on quorums. Concurrent operation requests and leave the system consecutively. Because of the replication are not handled by the help of locks because of performance technique, data is persistent and available. It is also observed issues. Instead of a lock mechanism, vector clocks are used high consistency in the system. in order to make system strong against failures and keep TotalRecall could be used for Volunteer Computing in the system consistent. Another powerful point of Riak is using case of lazy repair is chosen with erasure code. With these map-reduce method in querying. Using this method, request options, TotalRecall performs better when having dynamic messages are directed to a set of nodes instead of propagating environments and high possibility of unavailability. over all nodes. Riak has symmetric structure in the node manner since it O. Farsite does not have any super or master node among all nodes. Farsite system [26] is a serverless, distributed storage sys- Moreover it meets some design issues of intended decen- tem that runs on a set of machines and takes advantage of their tralized storage systems such as high availability, scalability unused storage and network resources. Although, it provides and robustness. However, anonymity is not handled in this the semantics of a central NTFS file server, its able to scale design, since it is relatively a new system, and it has many and run on several machines using a portion of their storage. compatibility problems. Users have access to private and public files through a location-transparent environment. Data replicas are encrypted M. Pastis to provide security since the nodes themselves are not secure. Pastis [24] is a completely decentralized P2P file system Moreover, these replicas are distributed among several nodes with multiple users performing read and write operations. It to provide a reliable system despite the unreliability and uses the Past, a highly-scalable P2P storage service, which frequent unavailability of the nodes. The files structure is based provides a distributed hash table abstraction. It combines on a hierarchy, maintained by a distributed directory service. Past with Pastry, a P2P key-based routing algorithm, to route Atomicity and scalability are two important properties on messages between large amounts of Past nodes. the Farsite system. All tasks are designed as fully atomic For every file, Pastis keeps an inode in which the file’s actions in order to remain undivided while they get executed. metadata is stored. Each inode is stored in User Certificate Farsite could be used for Volunteer Computing, since the man- Blocks (UCB) and files contents are stored in Content Hash agement operations can be distributed among the machines, Blocks (CHB). When a user writes to the file, the version security is provided because of the encryption algorithm used. counter is increased and saved to the corresponding inode with Though, it could be used only for small volunteer computing the user’s id. To avoid conflicts, if a second user appears and systems, since it can scale up to a certain number of nodes. tries to write to the same file, a procedure is triggered to solve the conflict by comparing the counters and users’ ids from P. Storage@home other replicas in the network. Storage@home [27] is a distributed storage infrastructure The combination of the Past and the Pastry characterizes designed to store huge amounts of data across many machines Pastis as a highly-scalable system in terms of network size which join the system as volunteers. It is based on the Fold- and amount of concurrent clients. Good locality helps in ing@home and it made its appearance to face the problems of acquiring optimized routes, while self-organization as well this previous system. More precisely, the contributors address as fault tolerance are achieved thanks to the design. Data is the problems of backing up and distributing data efficiently replicated among the nodes and therefore it is characterized among the nodes, keeping in mind the limited bandwidth and by high data availability. A write access control and data the small donation of storage from each node. integrity are implemented and therefore Pastis is secure since Storage@home constitutes of the volunteers - who have it is assumed that users trust each other. an agent installed on their machines, a registration server, a metadata server, an identity server and a policy engine. N. TotalRecall The Metadata Server is responsible to store information about TotalRecall [25] is a P2P storage system that takes into the location of the files stored in the system and to allow high consideration an important property of storage systems; queries for those files. The Identity Server is responsible for the availability. The system administrator can specify an avail- the security and identity functionality, as well as for tracking ability target and studying the previous behavior of the peers, it effectively the location of IP hosts; whether they are mobile can predict their future availability, despite the dynamic chang- or dynamic. The Registration Server is responsible to link ing nature of the environment. Depending on the condition of the users’ profiles from the old system; the Folding@home the system, TotalRecall may use replication, erasure-code or with this new proposed system. This task is hard to get hybrid techniques for preserving its redundancy, while it can implemented since a beneficial aspect of Storage@home is the dynamically repair itself using eager or lazy repair. anonymity and the intentional omission of user’s information. Except from the peers, the TotalRecall system constitutes The Policy Engine behaves as the master of the system, as it of the master host, the storage host and the client host. This coordinates all the components of the system. It is responsible
  • 8. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 8 to plan where to put replicas of data in order to minimize in order to do a task. Thus, this system does not provide a the chances of data loss, how data can be retrieved and how symmetric node network and it is not proper for volunteer to be transferred to reach the node that has sent a query. It computing systems. Also, In MongoDB there are three kind of also remains vigilant to perform repair operations when it is nodes: Standard, Passive, and Arbiter. Similarly to MongoDB, needed. BigTable has master nodes and many chunk servers. Also, Storage@home has vital requirements that help it preserve Antiquity contains the role of an administrator among the its nature as a storage system as well as a volunteer computing peers, who is responsible for the new chunk allocation of system. As a storage system, it should handle effectively file logs. Thus MongoDB, BigTable and Antiquity are not failure and recovery operations, and as a volunteer computing symmetric in the node manner. Moreover, Farsite is based on system it should manage the relocation of data stored in host a centralized scheme. Some nodes have - for a period of time that disappeared. While maintaining the above requirements, - authority on some files, their content, directory, and users the authors needed to face several challenges regarding the permissions. Similarly, TotalRecall constitutes of different volunteers recruiting and motivation, the policy risk and the types of nodes; each type having different responsibilities host relocation. With respect to recruiting volunteers and regarding the files. Therefore, in both systems other nodes keeping them motivated, the system needed to be adopted as cannot work freely, without the permission of other "master" a reward system that offers points to volunteers in order to nodes. Antiquity does not provide symmetry as it has the role motivate and put them in a friendly competition that makes of the administrator among the peers who is responsible for the them have fun among them. Regarding the policy risk, it new chunk allocation of a log. Last but not least, in OceanStore was quite common for storage@home to get blocked by nodes can have different roles, such as a server, a client, a companies, ISPs and new policies. Therefore the storage of router or all of them, thus it is not symmetric. replicas in different nations, states and ISPs appeared to be a The rest of the systems, as it can be seen in Table 1, fair solution. Last but not least, host relocation was another constitute of equal nodes and are subsequently characterized great challenge that needed to be considered. The system had as symmetric. to deal with hundreds of students who were changing residence - most of the times decreasing their bandwidth - and becoming 2) Availability: In volunteer computing systems, partici- slower and less effective. Also, the problem of switching off pants can enter and leave from the system in random time the machine for a long time for traveling or maintenance periods. In order to retrieve data, the intended storage systems purposes cost to the system and consequently a penalization should be highly available, despite the unavailability of the policy was introduced to make the volunteers more responsible participants. at informing the system for any changes in there condition. In Most of the systems analyzed are highly available as it is general this system appears to be reliable as it manages to shown in the Table 1. Though, the FreeHaven system presents prevent the loss of data. It is able to work with thousands of limited level of availability, since there is no replication volunteers showing its great scalability and its functionality mechanism, but only periodical trading which makes data is preserved with the existence of churn. Internet connections available. Similarly, FreeNet has limited availability because appear as the bottleneck in the system performance showing of lack of replication mechanisms and also because it suffers that any other possible pitfalls of the system are not significant from long term survivability, especially for non-popular files. as they can not bypass the bandwidth problem. DHash component of Ivy makes it highly available, since DHash replicates and distributes the blocks of files. Thus V. D ISCUSSION participants logs can be available even if they are not available themselves. Moreover, Frangipani has cluster member com- All systems described offer storage distribution following ponents that are large abstract containers on highly available different approaches and architectures. In this section, we block level. These cluster members make Frangipani highly discuss up to what extend these systems have the properties available. Ceph accomplishes availability using RADOS which that are needed in volunteer computing systems. In Table 1 manages data replication following a primary-copy replication we gather all systems and characteristics together, showing a scheme and also provides update synchronization of the data. clear view of their state. In OceanStore, one of its main goals is to provide availability, as data flows freely and thus replicas for the data are created. 1) Symmetry: As previously mentioned, in pure peer-to- Antiquity uses a secure log which is distributed among mul- peer systems, all peers are on the same level with equivalent tiple servers, thus providing a high degree of availability and functionality. Since each volunteer participant does not have ensures that all data can be accessed. If for any reason some priority over other participants and although they are con- data is lost, a repair service is available for recovery. trolled by central server of the system, intended distributed Furthermore, Farsite system replicates data in order to systems should be purely peer-to-peer. ensure availability even with the often unavailability of the In the world of storage systems, scientists have trouble nodes. Likewise, Pastis implements a lazy replication protocol with presenting systems with "independent" nodes who work to manage replicas in different nodes. TotalRecall has as without the guidance of an administrator. In Fragipani file a main goal the provision of availability, and it suggests system, there is an administrator who arranges states of nodes, different ways to ensure that, such as redundancy management and nodes need to take permission from the administrator with specified mechanisms, replication, dynamic repairs in
  • 9. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 9 Characteristics Systems Symmetry Availability Scalability Anonymity Robustness FreeHaven Yes Mid Low High High FreeNet Yes Mid High Mid High Ivy Yes High Mid No High Frangipani No High High No High Ceph Yes High High No High OceanStore No High High No High Antiquity No High High No High BigTable No High High No High Dynamo Yes High High No High MongoDB No High High No High Riak Yes High High No High Pastis Yes High High No High TotalRecall No High High No High Farsite No High Mid No High Storage@home Yes High High High High Tablo I C OMPARISON OF DIFFERENT S TORAGE S YSTEMS the case of nodes are leaving permanently from the system. 5) Robustness: By definition, volunteers can come and go, may crash or change their network status. Therefore, volunteer 3) Scalability: Scalability is an additional property re- computing systems - and by extension storage systems - should quired. There are three main scaling techniques: Replication be enough robust to face these situations. for spreading copies of data, caching for reusing the cached All systems studied are highly-robust, thanks to various data and distribution of divided computation [5]. Thus the reasons and mechanisms. In FreeHaven, while peers are in intended decentralized storage systems should have replication trading, copies of data are stored in a while until proving or a similar mechanism. trustworthy. Although this mechanism is not good for perfor- Of the systems studied, only three of them do not show high mance, it increases robustness of FreeHaven. Moreover, buddy results in the scalability issue. FreeHaven and Ivy do not have system makes it robust, since buddies of each node can regen- the scalability characteristic as their primary goal and therefore erate the lost data. Frangipani uses write-ahead redo logging they are not highly scalable. Farsite is limited to scale up to mechanism to recovery failures easily. In Freenet protocol, a ˆ 105 nodes, which is quite restrictive. failure message is forwarded to owner of the request without propagating to any nodes. Thus original requester can make Unlike to these systems, Frangipani is designed as highly- another request. By the help of this property of the Freenet scalable. Petal services competent works cooperatively to protocol, it will be robust against the failures. supply virtual disks to its user are distributed in order to MongoDB and Riak has replication mechanisms that makes increase scalability. Also, The rest of the storage systems these systems large-scaled and they are fault-tolerant. These are classified as large-scale storage systems since they are characteristics of them provide highly-robust systems. Like specially designed to offer scalability. them, BigTable and Dynamo have great robustness since they are highly-scalable. 4) Anonymity: Participants in volunteer computing systems Ceph has a very good mechanism for disk failure monitoring want to keep secret their identities from others. Thus, intended and detection as well as fast recovery using different structures distributed systems should provide anonymity. From our re- for the file system and by keeping a version number for each search, we found out that most of the systems do not support object. In addition, OceanStore’s main goal is to provide a high anonymity, as it was not in their main concerns. level of failure recovery providing fault tolerance and self- Systems like FreeHaven, FreeNet offer anonymity as they maintenance mechanisms with automatic repair. Antiquity’s focus in their participants needs. They propose to keep users quorum repair recovers failures and replaces lost replicas identity, thus they increase resistance against censorship. In which makes the system quite robust. fact, for this purpose they scarify efficiency. Like them, users Storage@home provides self-repair operations for each in Ceph and Storage@home are anonymous. Moreover, in node involved. Pastis takes advantage of the fault tolerance Ceph the code runs directly from the user space and the property of the storage layer that it is based on, the Past processes RADOS and CRUSH are executed without revealing DHT . In TotalRecall, things are even easier. Since it deals any information about the identity of the client, even when data primarily with availability, it addresses this issue using repair are distributed. mechanisms which help as well for preserving robustness. Anonymity is not a design issue in Frangipani. Thus each The Farsite system was designed in that way that it handles user in the Frangipani file system are noticeable and can be Byzantine faults and therefore be more robust. detected easily. Like Frangipani, large-scale decentralized TFS is mainly a file system that works underneath storage storage systems such as Dynamo, Riak, BigTable, and systems. Its availability and anonymity are dependent on the MongoDB do not handle anonymity as a design issue. nodes state and whether the nodes by themselves can be available and anonymous. Thus, it is not included in our
  • 10. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 10 discussion nor in the comparison table. the 19th international conference on World wide web, As it is shown in Table 1 and based on our previous dis- pp.741-750, NY, USA, 2010. cussion, Storage@home seems to be the most proper storage [8] D P. Anderson,"Boinc: A system for public-resource com- system with a clear statement that it can be used in volunteer puting and storage", 5th IEEE/ACM International Work- computing systems. It follows a model typical to volunteer shop on Grid Computing, Pittsburgh, USA,2004. computing projects, and participants act as volunteers with the [9] Abdelhamid Elwaer, Ian Taylor, Omer Rana,"Optimizing ability to compete and gain points based on their contribution Data Distribution in Volunteer Computing Systems using in storage and their recruitment process. All users have an Resources of Participants", Scalable Computing: Practice agent installed on their machine which takes action after the and Experience (2011), Volume 12, Number 2, ISSN users’ registration. Data availability is maintained because 1895-1767 ,pp. 193-208, each machine stores almost the half size of a file, to be more [10] A. Oram. (March 15, 2001)." Peer-to-Peer : Harnessing precise up to 40 the Power of Disruptive Technologies". O’Reilly Media. [11] Ian Clarke, Oskar Sandberg, Brandon Wiley, and VI. C ONCLUSION Theodore W. Hong ,"Freenet: A Distributed Anonymous Information Storage and Retrieval System" , In the Pro- In this survey, we initially presented the various proper- ceedings of Designing Privacy Enhancing Technologies: ties and characteristics that a decentralized storage system Workshop on Design Issues in Anonymity and Unobserv- must have in order to cooperate efficiently with a volunteer ability, July 2000, pages 46-66. computing system. The challenges that these systems can [12] Athicha Muthitacharoen , Robert Morris , Thomer M. Gil face when combined are scalability, availability, symmetry, , Bengie,"Ivy: A Read/Write Peer-to-Peer File System", anonymity and robustness, all of which are explained in detail. SIGOPS Oper. Syst. Rev., Vol. 36, No. SI. (2002), pp. We selected some systems that we found important and related 31-44, doi:10.1145/844128.844132. to our work of study and briefly described each one associating [13] Frangipani: A Scalable Distributed File the aforementioned characteristics with them. A comparison System,"Frangipani: A Scalable Distributed File System", follows that explains in depth each characteristic and covers SOSP ’97 Proceedings of the sixteenth ACM symposium how each one is important for this merge of decentralized on Operating systems principles New York, NY, USA, storage and volunteer computing systems. As shown in our 1997 . discussion, all systems have different capabilities and function- [14] H. Weatherspoon, P. Eaton, B. Chun, J. Kubiatowicz, alities which make each one to be more appropriate for specific "Antiquity: exploiting a secure log for wide-area dis- operations. With all the properties put down and after further tributed storage", ACM SIGOPS Operating Systems Re- investigation, Storage@home is the most proper, having all the view, v.41 n.3, June, 2007. properties that such a system describes. [15] S. A. Weil, S.A. Brandt, E. L. Miller, D. D. E. Long, C. Maltzahn, "Ceph: a scalable, high-performance distributed R EFERENCES file system", Proceedings of the 7th USENIX Symposium [1] M. Placek, R. Buyya, "A taxonomy of distributed stor- on Operating Systems Design and Implementation, p.22- age systems", Technical Report, Grid Computing and 22, Seattle, WA, November, 2006. Distributed Systems Laboratory, The University of Mel- [16] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. bourne, Australia, July, 2006. Weatherspoon, J.Kubiatowicz, "Maintenance-Free Global [2] P. Yianilos, S. Sobti, "The evolving field of distributed Data Storage", IEEE Internet Computing, v.5 n.5, pp.40- storage", IEEE Internet Computing, v.5, pp.35-39, 2001. 49, September 2001. [3] R. Hasan, Z. Anwar, W. Yurcik, L. Brumbaugh, R. Camp- [17] J. Cipar, M. D. Corner, E. D. Berger, "Contributing bell, "A Survey of Peer-to-Peer Storage Techniques for Storage using the Transparent File System", ACM Trans- Distributed File Systems", Proceedings of the International actions on Storage 3, 3, October, 2007. Conference on Information Technology: Coding and Com- [18] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D.h puting, v.2, pp.205-213, Las Vegas, Nevada, April, 2005. A. Wallach, "Bigtable: A Distributed Storage System for [4] H. Ge, "Survey of Distributed Storage Systems", Course Structured Data", Proceedings of the 7th Conference on Survey for "Advanced Topics in Information Systems", USENIX Symposium on OSDI, v.7, Seattle, WA, Novem- Spring 2004. ber 2006. [5] B. C. Neuman, "Scale in Distributed Systems", In Read- [19] "CS262B Advanced Topics in Com- ings in Distributed Computing Systems, IEEE Computer puter Systems Spring 2009", Available: Society Press, pp.463-489, 1994. http://www.eecs.berkeley.edu/ culler/summary/bigtable.html [6] Sean Rhea, Chris Wells, Patrick Eaton, Dennis Geels [20] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, ,Ben Zhao,Hakim Weatherspoon,John Kubiatowicz, " A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, Maintenance-Free Global Data Storage", IEEE Internet W. Vogels, "Dynamo: Amazon’s Highly Available Key- Computing, Volume 5 Issue 5, September 2001. value Store", Proceedings of 21st ACM SIGOPS sympo- [7] O. Nov, D. Anderson,O. Arazy, "Volunteer Computing: sium on Operating systems principles, Stevenson, Wash- A Model of the Factors Determining Contribution to ington , October 2007. Community-based Scientific Research", In Proceedings of [21] Amazon DynamoDB. Available:
  • 11. A SURVEY ON LARGE-SCALE DECENTRALIZED STORAGE SYSTEMS TO BE USED BY VOLUNTEER COMPUTING SYSTEMS 11 http://aws.amazon.com/dynamodb/ [22] MongoDB, Available: http://www.mongodb.org/ [23] Welcome to the Riak Wiki. Available: http://wiki.basho.com/Riak.html [24] F. Picconi, J-M. Busca, and P. Sens, "Pastis: a highly- scalable multi-user peer-to-peer file system", EuroPar, pp.1173-1182, 2005. [25] R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. M. Voelker, "Total Recall: System Support for Automated Availability Management", NSDI, San Fransisco, CA, 2004. [26] W. J. Bolosky, J. R. Douceur, and J. Howell, "The farsite project: a retrospective", Proceedings of SIGOPS, France, 2007. [27] A. L. Beberg , V. S. Pande , "Storage@home: Petascale Distributed Storage", Proceedings of IPDPS, Long Beach, CA, March 2007.