SlideShare une entreprise Scribd logo
1  sur  53
Use Distributed File system as a Storage Tier
FabrizioManfred Furuholmen
Agenda


        Introduction
          Next Generation Data Center
          Distributed File system

        Distributed File system
          OpenAFS
          GlusterFS
          HDFS
          Ceph

        Case Studies

        Conclusion

               2

                                    16/02/2012
Class Exam


 What do you know about DFS ?

 How can you create a Petabyte
  storage ?

 How can you make a centralized
  system log ?

 How can you allocate space for your
  user or system, when you have a
  thousands of users/systems ?

 How can you retrieve data from
  everywhere ?
                            3

                                        16/02/2012
Introduction


Next Generation Data Center: the ―FABRIC‖

Key categories:
   Continuous data protection and disaster
    recovery

   File and block data migration across
    heterogeneous environments

   Server and storage virtualization

   Encryption for data in-flight and at-rest



In other words: Cloud data center
                                    4

                                                16/02/2012
Introduction


Storage Tier in the ―FABRIC‖
   High Performance
   Scalability
   Simplified Management
   Security
   High Availability

Solutions
 Storage Area Network
 Network Attached Storage
 Distributed file system



                             5

                                 16/02/2012
Introduction


What is a Distributed File system ?


“A distributed file system takes advantage of the
  interconnected nature of the network by storing
  files on more than one computer in the network
  and making them accessible to all of them..”




                         6

                                           16/02/2012
Introduction


 What do you expected from a distributed file system ?

• Uniform Access:     file names global support

• Security: to provide a global authentication/authorization

• Reliability: the elimination of each single point of failure

• Availability: administrators perform routine maintenance while the file
  server is in operation, without disrupting the user’s routines

• Scalability: Handle terabytes of data

• Standard conformance: some IEEE POSIX file system semantics standard

• Performance: high performance



                                        7
Part II




Implementations
 How many DFS do you know ?




             8
OpenAFS: introduction



                 is theopen sourceimplementation of
                 AndrewFile system of IBM
Key ideas:
 Make clients do work whenever possible.

   Cache whenever possible.

   Exploit file usage properties. Understand them. One-third of Unix
    files are temporary.

   Minimize system-wide knowledge and change. Do not hardwire
    locations.

   Trust the fewest possible entities. Do not trust workstations.

   Batch if possible to group operations.

                                      9

                                                                     16/02/2012
OpenAFS: design




           10

                  16/02/2012
OpenAFS: components

Cell

•Cell is collection of file servers and
 workstation
•The directories under /afs are
 cells, unique tree
•Fileserver contains volumes

Volumes

•Volumes are "containers" or sets of
 related files and directories
•Have size limit
•3 type rw, ro, backup

Mount Point Directory
                                               Server A
•Access to a volume is provided through
 a mount point                                                         Server C
•A mount point is just like a static
 directory                                                Server A+B


                                          11
OpenAFS: performances

                                                    OpenAFS                                                                       OpenAFS OSD 2 Servers
                                                                write



40000

35000

30000
                                                                                                                   35000-40000
25000                                                                                                              30000-35000

20000                                                                                                              25000-30000
                                                                                                                   20000-25000
15000
                                                                                                                   15000-20000
10000                                                                                                              10000-15000
                                                                                                16384              5000-10000
  5000
                                                                                              1024                 0-5000
        0                                                                                            block
                                                                                         64
            64

                     256

                                 1024

                                             4096

                                                    16384




                                                                                 4
                                                            65536

                                                                     262144




                           kb




                                                                     read


 90000

 80000

  70000
                                                                                                                    80000-90000
  60000                                                                                                             70000-80000
  50000                                                                                                             60000-70000
                                                                                                                    50000-60000
  40000
                                                                                                                    40000-50000
  30000
                                                                                                                    30000-40000
  20000                                                                                                             20000-30000
  10000                                                                                                             10000-20000
                                                                                                     131072         0-10000
            0
                                                                                                16384         43
                 4


                            16


                                            64


                                                    256


                                                              1024




                                                                                              2048
                                                                          4096


                                                                                     16384




                                        a
OpenAFS: features

 Uniform name space: same path on all
  workstations

 Security: base to krb4/krb5, extended ACL,
  traffic encryption

 Reliability: read-only replication, HA
  database, read/write replica in OSD version

 Availability: maintenance tasks without
  stopping the service

 Scalability: server aggregation

 Administration: administration delegation

 Performance: client side disk base persistent
  cache, big rate client per Server
                              13

                                                  16/02/2012
openAFS: who uses it ?

Morgan Stanley IT
• Internal usage
• Storage: 450 TB (ro)+ 15 TB (rw)
• Client: 22.000


Pictage, Inc
•   Online picture album
•   Storage: 265TB ( planned growth to 425TB in twelve months)
•   Volumes: 800,000.
•   Files: 200 000 000.


Embian
• Internet Shared folder
• Storage: 500TB
• Server: 200 Storage server
• 300 App server


RZH
•Internal usage 210TB

                                           14
OpenAFS: good for ...


      Good
      •   Wide Area Network
      •   Heterogeneous System
      •   Read operation > write operation
      •   Large number of clients/systems
      •   Usage directly by end-users
      •   Federation


              Bad
              • Locking
              • Database
              • Unicode
              • Large File
              • Some limitations on ..

                 15
GlusterFS

“Gluster can manage data in a
  single global namespace on
  commodity hardware..‖


Keys:
 Lower Storage Cost—Open source software runs on commodity
   hardware

 Scalability—Linearly scales to hundreds of Petabytes

 Performance—No metadata server means no bottlenecks

 High Availability—Data mirroring and real time self-healing

 Virtual Storage for Virtual Servers—Simplifies storage and keeps VMs
  always-on

 Simplicity—Complete web based management suite

                                     16

                                                                 16/02/2012
GlusterFS: design




            17

                    16/02/2012
GlusterFS: components

Volume
                                             volume posix1
•Volume is the basic element for data         type storage/posix
 export                                       option directory /home/export1
•The volumes can be stacked for              end-volume
 extension

Capabilities
                                             volume brick1
•Specific options (features) can be           type features/posix-locks
 enabled for each volume (cache, pre          option mandatory
 fetch, etc.)                                subvolumes posix1
•Simple creation for custom extensions       end-volume
 with api interface

Services                                     volume server
                                              type protocol/server
•Access to a volume is provided through       option transport-type tcp
 services like tcp, unix socket,              option transport.socket.listen-port 6996
 infiniband                                  subvolumes brick1
                                              option auth.addr.brick1.allow *
                                             end-volume


                                        18

                                                                          16/02/2012
Gluster: components




           19

                      16/02/2012
Gluster: performance




            20

                       16/02/2012
Gluster: carateristics

 Uniform name space: same path on all
  workstation

 Reliability: read-1 replication, asynchronous
  replication for disaster recovery

 Availability: No system downtime for
  maintenance (better in the next release)

 Scalability: Truly linear scalability

 Administration: Self Healing, Centralized logging
  and reporting, Appliance version

 Performance: Stripe files across dozens of
  storage blocks, Automatic load balancing, per
  volume i/o tuning
                                 21

                                                      16/02/2012
Gluster: who uses it ?


 Avail TVN (USA)
400TB for Video on demand, video
storage

 Fido Film (Sweden)
visual FX and Animation studio

 University of Minnesota (USA)
142TB Supercomputing

 Partners Healthcare (USA)
336TB Integrated health system

Origo(Switzerland)
open source software development
and collaboration platform




                                   22
Gluster: good for ...


      Good
      • Large amount of data
      • Access with different protocols
      • Directly access from applications
        (api layer)
      • Disaster recover (better in the
        next release)
      • SAN replacement, vm storage


             Bad
             • User-space
             • Low granularity in security setting
             • High volumes of operations on
               same file


                23
Implementations

Implementations

Old way
   Metadata and data in the same place
   Single stream per file




New way
   Multiple streams are parallel channels
    through which data can flow
   Files are striped across a set of nodes in
    order to facilitate parallel access
   OSD Separation of file metadata
    management (MDS) from the storage of
    file data

                                     24

                                                 16/02/2012
HDFS: Hadoop



HDFS is part of the Apache
  Hadoopproject which develops
  open-source software for
  reliable, scalable, distributed
  computing.

Hadoop was inspired by Google’s
  MapReduce and Google File
  system




                              25

                                    16/02/2012
HDFS: Google File System



― Design of a file systems for a different environment
   where assumptions of a general purpose file system
   do not hold—interesting to see how new assumptions
   lead to a different type of system…‖



Key ideas:
 Component failures are the norm.
 Huge files (not just the occasional file)
 Append rather than overwrite is typical
 Co-design of application and file system API—specialization.
  For example can have relaxed consistency.


                              26

                                                     16/02/2012
HDFS: MapReduce


 “Moving Computation is Cheaper than Moving Data”


Map
• Split and mapped in key-
  value pairs




    Combine
    • For efficiency reasons, the
      combiner works directly to map
      operation outputs .




         Reduce
         • The files are then
           merged, sorted and reduced


                                        27
HDFS: goals



                      Scalable: can reliably store and
                            process petabytes.




                   Economical: It distributes the data and
                       processing across clusters of
                      commonly available computers.


Goals
                   Efficient: can process data in parallel
                       on the nodes where the data is
                                   located.



                     Reliable: automatically maintains
                        multiple copies of data and
                    automatically redeploys computing
                          tasks based on failures.



              28
HDFS: design




           29
HDFS: components

Namenode

• An HDFS cluster consists of a single
  NameNode
• It is a master server that manages
  the file system namespace and
  regulates access to files by clients.

Datanodes

• Datanode manage storage attached
  to the system it run on
• Applay the map rule of MapReduce


Blocks

• File is split into one or more blocks
  and these blocks are stored in a set
  of DataNodes

                                    30
HDFS: features

 Uniform name space: same path on all
  workstations

 Reliability: rw replication, re-balancing, copy
  in different locations

 Availability: hot deploy

 Scalability: server aggregation

 Administration: HOD

 Performance: “grid” computation, parallel
  transfer



                                31

                                                    16/02/2012
HDFS: who uses it ?

                               Yahoo!
                               A9.com
                               AOL
                               Booz Allen Hamilton
                               EHarmony
                               Facebook
                               Freebase
                               Fox Interactive Media
                               IBM
                               ImageShack
                               ISI
Major players                  Joost
                               Last.fm
                               LinkedIn
                               Metaweb
                               Meebo
                               Ning
                               Powerset (now part of Microsoft)
                               Proteus Technologies
                               The New York Times
                               Rackspace
                               Veoh
                               Twitter
                               …
                     32
HDFS: good for ...


      Good
      • Task distribution (Basic GRID
        infrastructure)
      • Distribution of content (High
        throughput of data access )
      • Archiving
      • Etherogenous envirorment



            Bad
            • Not General purpose File system
            • Not Posix Compliant
            • Low granularity in security setting
            • Java


                33
Ceph

“Ceph is designed to handle workloads
in which tens thousands of clients or
more simultaneously access the same
file orwrite to the same directory–
usage scenarios that bring typical
enterprise storage systems to their
knees.‖
Keys:
   Seamless scaling — The file system can be seamlessly expanded by simply
    adding storage nodes (OSDs). However, unlike most existing file systems, Ceph
    proactively migrates data onto new devices in order to maintain a balanced
    distribution of data.

   Strong reliability and fast recovery — All data is replicated across multiple
    OSDs. If any OSD fails, data is automatically re-replicated to other devices.

   Adaptive MDS — The Ceph metadata server (MDS) is designed to dynamically
    adapt its behavior to the current workload.

                                            34
Ceph: design




      • Client
      • Metadat
OSD     a Cluster
      • Object
        Storage
        Cluster




                       35
Ceph: features



Dynamic Distributed Metadata

• Metadata Storage
• Dynamic Subtree Partitioning
• Traffic Control

Reliable Autonomic Distributed Object
Storage

• Data Distribution
• Replication
• Data Safety
• Failure Detection
• Recovery and Cluster Updates

                            36
Ceph: features


Pseudo-random data distribution function (CRUSH)

Reliable object storage service (RADOS)

Extent B-tree object File System (today btrfs)




                   37
Ceph: features



Splay Replication
• Only after it has been safely committed to disk is a final commit
 notification sent to the client.




                                     38
Ceph: good for …



     Good
     • Scientific application, High
       throughput of data access
     • Heavy Read / Write operations
     • It is the most advance distributed
       file system



            Bad
            • Young (Linux 2.6.34)
            • Linux only
            • Complex



               39
Others



Lustre              PVFS       MooseFS




Cloudstore
                    PNFS           …
(kosmos)




                                Search
XtreemFS          Tahoe-LAFS
                               Wikipedia..




                       40
Part III




 Case Studies



           41
Class Exam


 What can DFS do for you ?

 How can you create a Petabyte
  storage ?

 How can you make a centralized
  system log ?

 How can you allocate space for your
  user or system, when you have a
  thousands of users/systems ?

 How can you retrieve data from
  everywhere ?

                              42

                                        16/02/2012
File sharing


Problem
•Share Documents across a wide
 network area
•Share home folder across different
 Terminal servers

Solution

•OpenAFS
•Samba

Results

•Single ID, Kerberos/ldap
•Single file system

Usage

•800 users
•15 branch offices
•File sharing /home dir




                                         43
Web Service


Problem

• Big Storage on a little budget

Solution

• Gluster

Results

• High Availability data storage
• Low price

Usage

• 100 TB image archive
• Multimedia content for web site




                                     44
Internet Disk: myS3



Problems

•Data from everywhere
•Disaster Recover

Solution

•myS3
•Hadoop / OpenAFS

Results

•High Availability
•Access through HTTP protocol (REST
 Interface)
•Disaster Recovery

Usage

•Users backup
•Application backend
•200 Users
•6 TB




                                      45
Log concentrator



Problem

• Log concentrator

Solution

• Hadoop cluster
• Syslog-NG

Results

• High availability
• Fast search
• “Storage without limits”

Usage

• Security audit and access control




                                      46
Private cloud


Problems

• Low cost VM storage
• VM self provisioning

Solution

• GlusterFS
• openAFS
• Custom provisioning

Rresults

• Auto provisioning
• Low cost
• Flexible solution

Usage

• Development env
• Production env
Conclusion: problems

Do you have enough bandwidth ?
    Failure
For 10 PB of storage, you will have an
average of22consumer-grade SATA drives
failing per day.


    Read/write time
Each of the 2TB drives takes approximately
best case 24,390 seconds to be read and
written over the network.



    Data Replication
Data replication is the number of the disk
drives, plus difference.




                                             48

                                                  16/02/2012
Conclusion


Environment Analysis
• No true Generic DFS
• Not simple move 800TB btw different solutions


      Dimension
      • Start with the right size
      • Servers number is related to speed needed and number of clients
      • Network for Replication


              Divide system in Class of Service
              • Different disk Type
              • Different Computer Type


                     System Management
                     • Monitoring Tools
                     • System/Software Deploy Tools


                                                  49
Conclusion: next step




            50

                        16/02/2012
Links




OpenAFS               Gluster                  Hadoop                Ceph
• www.openafs.org     • www.gluster.org        • Hadoop.apache.org   • ceph.newdream.n
• www.beolink.org                              • Isabel Drost          et
                                                                     • Publication
                                                                     • Mailing list




                                          51
I look forwardto meeting you…


       XVII European AFS meeting 2010
              PILSEN - CZECH REPUBLIC
                  September 13-15


   Who should attend:
       Everyone interested in deploying a globally accessible
        file system
       Everyone interested in learning more about real
        world usage of Kerberos authentication in single
        realm and federated single sign-on environments
       Everyone who wants to share their knowledge and
        experience with other members of the AFS and
        Kerberos communities
       Everyone who wants to find out the latest
        developments affecting AFS and Kerberos

   More Info: http://afs2010.civ.zcu.cz/
                       52

                                                     16/02/2012
Thankyou

manfred@zeropiu.com

Contenu connexe

En vedette

Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Phil Cryer
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage systemZhichao Liang
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionNuno Loureiro
 
7 distributed storage_open_stack
7 distributed storage_open_stack7 distributed storage_open_stack
7 distributed storage_open_stackopenstackindia
 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemNaza hamed Jan
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemMilad Sobhkhiz
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeVenkatesh Devam ☁
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems ReviewSchubert Zhang
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Data Con LA
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storagekakugawa
 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemAnamika Singh
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon Nexus, Inc.
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systemsawesomesos
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax
 
electrical distribution system ppt/13b61a0221
electrical distribution system ppt/13b61a0221electrical distribution system ppt/13b61a0221
electrical distribution system ppt/13b61a0221saikrishna kandhikatla
 

En vedette (20)

Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and Processing
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage Solution
 
7 distributed storage_open_stack
7 distributed storage_open_stack7 distributed storage_open_stack
7 distributed storage_open_stack
 
Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Integrated Distributed Solar and Storage
Integrated Distributed Solar and StorageIntegrated Distributed Solar and Storage
Integrated Distributed Solar and Storage
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage Scheme
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storage
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage System
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
electrical distribution system ppt/13b61a0221
electrical distribution system ppt/13b61a0221electrical distribution system ppt/13b61a0221
electrical distribution system ppt/13b61a0221
 

Similaire à Use Distributed Filesystem as a Storage Tier

Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库Accenture
 
Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库Accenture
 
A Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionA Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionJason Strate
 
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...Andrea Scharnhorst
 
Varnish, The Good, The Awesome, and the Downright Crazy
Varnish, The Good, The Awesome, and the Downright CrazyVarnish, The Good, The Awesome, and the Downright Crazy
Varnish, The Good, The Awesome, and the Downright CrazyMike Willbanks
 
Varnish, The Good, The Awesome, and the Downright Crazy.
Varnish, The Good, The Awesome, and the Downright Crazy.Varnish, The Good, The Awesome, and the Downright Crazy.
Varnish, The Good, The Awesome, and the Downright Crazy.Mike Willbanks
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingOlga Lavrentieva
 
IBM Storwize V7000 Ultimate Performance Eng
IBM Storwize V7000 Ultimate Performance EngIBM Storwize V7000 Ultimate Performance Eng
IBM Storwize V7000 Ultimate Performance EngOleg Korol
 
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...Laili Aidi
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...CUBRID
 
Novell ZENworks Configuration Management Database Management
Novell ZENworks Configuration Management Database ManagementNovell ZENworks Configuration Management Database Management
Novell ZENworks Configuration Management Database ManagementNovell
 
Open Source Versions of Amazon's SNS and SQS.pptx
Open Source Versions of Amazon's SNS and SQS.pptxOpen Source Versions of Amazon's SNS and SQS.pptx
Open Source Versions of Amazon's SNS and SQS.pptxOpenStack Foundation
 
From concept to cloud a look at modern software development
From concept to cloud a look at modern software developmentFrom concept to cloud a look at modern software development
From concept to cloud a look at modern software developmentSoftware Guru
 
AVLM 2009 Toledo by Ilse Depré
AVLM 2009 Toledo by Ilse DepréAVLM 2009 Toledo by Ilse Depré
AVLM 2009 Toledo by Ilse Depréavlm2009avnet
 
Vmug hyper v overview
Vmug hyper v overviewVmug hyper v overview
Vmug hyper v overviewsubtitle
 
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)Ontico
 
(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline Pilot(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline PilotBIOVIA
 

Similaire à Use Distributed Filesystem as a Storage Tier (20)

Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库
 
Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库Acceleration for big data, hadoop and memcached it168文库
Acceleration for big data, hadoop and memcached it168文库
 
A Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionA Function by Any Other Name is a Function
A Function by Any Other Name is a Function
 
Varnish Cache
Varnish CacheVarnish Cache
Varnish Cache
 
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...Walking through a library remotely. Digital Humanities seminar April 12, 2013...
Walking through a library remotely. Digital Humanities seminar April 12, 2013...
 
Varnish, The Good, The Awesome, and the Downright Crazy
Varnish, The Good, The Awesome, and the Downright CrazyVarnish, The Good, The Awesome, and the Downright Crazy
Varnish, The Good, The Awesome, and the Downright Crazy
 
Varnish, The Good, The Awesome, and the Downright Crazy.
Varnish, The Good, The Awesome, and the Downright Crazy.Varnish, The Good, The Awesome, and the Downright Crazy.
Varnish, The Good, The Awesome, and the Downright Crazy.
 
Virtual Box Aquarium May09
Virtual Box Aquarium May09Virtual Box Aquarium May09
Virtual Box Aquarium May09
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
 
IBM Storwize V7000 Ultimate Performance Eng
IBM Storwize V7000 Ultimate Performance EngIBM Storwize V7000 Ultimate Performance Eng
IBM Storwize V7000 Ultimate Performance Eng
 
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
 
Access to open data through open access articles in the life sciences
Access to open data through open access articles in the life sciencesAccess to open data through open access articles in the life sciences
Access to open data through open access articles in the life sciences
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
 
Novell ZENworks Configuration Management Database Management
Novell ZENworks Configuration Management Database ManagementNovell ZENworks Configuration Management Database Management
Novell ZENworks Configuration Management Database Management
 
Open Source Versions of Amazon's SNS and SQS.pptx
Open Source Versions of Amazon's SNS and SQS.pptxOpen Source Versions of Amazon's SNS and SQS.pptx
Open Source Versions of Amazon's SNS and SQS.pptx
 
From concept to cloud a look at modern software development
From concept to cloud a look at modern software developmentFrom concept to cloud a look at modern software development
From concept to cloud a look at modern software development
 
AVLM 2009 Toledo by Ilse Depré
AVLM 2009 Toledo by Ilse DepréAVLM 2009 Toledo by Ilse Depré
AVLM 2009 Toledo by Ilse Depré
 
Vmug hyper v overview
Vmug hyper v overviewVmug hyper v overview
Vmug hyper v overview
 
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
 
(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline Pilot(ATS3-PLAT01) Recent developments in Pipeline Pilot
(ATS3-PLAT01) Recent developments in Pipeline Pilot
 

Plus de Manfred Furuholmen (19)

Pisa
PisaPisa
Pisa
 
Samba4 Introduction
Samba4 IntroductionSamba4 Introduction
Samba4 Introduction
 
Restfs internals
Restfs internalsRestfs internals
Restfs internals
 
Introduction to message_queue
Introduction to message_queueIntroduction to message_queue
Introduction to message_queue
 
Restfs
RestfsRestfs
Restfs
 
Winbind as Identity Management Connector
Winbind as Identity Management ConnectorWinbind as Identity Management Connector
Winbind as Identity Management Connector
 
Managing OpenAFS users with OpenIDM
Managing OpenAFS users with OpenIDMManaging OpenAFS users with OpenIDM
Managing OpenAFS users with OpenIDM
 
Afs manager
Afs managerAfs manager
Afs manager
 
Pt server ng
Pt server ngPt server ng
Pt server ng
 
Best Practices to create High Load Websites
Best Practices to create High Load WebsitesBest Practices to create High Load Websites
Best Practices to create High Load Websites
 
Be lazy... make automation
Be lazy... make automationBe lazy... make automation
Be lazy... make automation
 
Disaster recovery
Disaster recoveryDisaster recovery
Disaster recovery
 
Domestic cloud
Domestic cloudDomestic cloud
Domestic cloud
 
Inexpensive storage
Inexpensive storageInexpensive storage
Inexpensive storage
 
Samba management Console
Samba management ConsoleSamba management Console
Samba management Console
 
Link Samba to Cloud Storage
Link Samba to Cloud StorageLink Samba to Cloud Storage
Link Samba to Cloud Storage
 
Samba as a gateway to OpenAFS
Samba as a gateway to OpenAFSSamba as a gateway to OpenAFS
Samba as a gateway to OpenAFS
 
Samba distributed env
Samba distributed envSamba distributed env
Samba distributed env
 
AFS case study
AFS case studyAFS case study
AFS case study
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Use Distributed Filesystem as a Storage Tier

  • 1. Use Distributed File system as a Storage Tier FabrizioManfred Furuholmen
  • 2. Agenda  Introduction  Next Generation Data Center  Distributed File system  Distributed File system  OpenAFS  GlusterFS  HDFS  Ceph  Case Studies  Conclusion 2 16/02/2012
  • 3. Class Exam  What do you know about DFS ?  How can you create a Petabyte storage ?  How can you make a centralized system log ?  How can you allocate space for your user or system, when you have a thousands of users/systems ?  How can you retrieve data from everywhere ? 3 16/02/2012
  • 4. Introduction Next Generation Data Center: the ―FABRIC‖ Key categories:  Continuous data protection and disaster recovery  File and block data migration across heterogeneous environments  Server and storage virtualization  Encryption for data in-flight and at-rest In other words: Cloud data center 4 16/02/2012
  • 5. Introduction Storage Tier in the ―FABRIC‖  High Performance  Scalability  Simplified Management  Security  High Availability Solutions  Storage Area Network  Network Attached Storage  Distributed file system 5 16/02/2012
  • 6. Introduction What is a Distributed File system ? “A distributed file system takes advantage of the interconnected nature of the network by storing files on more than one computer in the network and making them accessible to all of them..” 6 16/02/2012
  • 7. Introduction What do you expected from a distributed file system ? • Uniform Access: file names global support • Security: to provide a global authentication/authorization • Reliability: the elimination of each single point of failure • Availability: administrators perform routine maintenance while the file server is in operation, without disrupting the user’s routines • Scalability: Handle terabytes of data • Standard conformance: some IEEE POSIX file system semantics standard • Performance: high performance 7
  • 8. Part II Implementations How many DFS do you know ? 8
  • 9. OpenAFS: introduction is theopen sourceimplementation of AndrewFile system of IBM Key ideas:  Make clients do work whenever possible.  Cache whenever possible.  Exploit file usage properties. Understand them. One-third of Unix files are temporary.  Minimize system-wide knowledge and change. Do not hardwire locations.  Trust the fewest possible entities. Do not trust workstations.  Batch if possible to group operations. 9 16/02/2012
  • 10. OpenAFS: design 10 16/02/2012
  • 11. OpenAFS: components Cell •Cell is collection of file servers and workstation •The directories under /afs are cells, unique tree •Fileserver contains volumes Volumes •Volumes are "containers" or sets of related files and directories •Have size limit •3 type rw, ro, backup Mount Point Directory Server A •Access to a volume is provided through a mount point Server C •A mount point is just like a static directory Server A+B 11
  • 12. OpenAFS: performances OpenAFS OpenAFS OSD 2 Servers write 40000 35000 30000 35000-40000 25000 30000-35000 20000 25000-30000 20000-25000 15000 15000-20000 10000 10000-15000 16384 5000-10000 5000 1024 0-5000 0 block 64 64 256 1024 4096 16384 4 65536 262144 kb read 90000 80000 70000 80000-90000 60000 70000-80000 50000 60000-70000 50000-60000 40000 40000-50000 30000 30000-40000 20000 20000-30000 10000 10000-20000 131072 0-10000 0 16384 43 4 16 64 256 1024 2048 4096 16384 a
  • 13. OpenAFS: features  Uniform name space: same path on all workstations  Security: base to krb4/krb5, extended ACL, traffic encryption  Reliability: read-only replication, HA database, read/write replica in OSD version  Availability: maintenance tasks without stopping the service  Scalability: server aggregation  Administration: administration delegation  Performance: client side disk base persistent cache, big rate client per Server 13 16/02/2012
  • 14. openAFS: who uses it ? Morgan Stanley IT • Internal usage • Storage: 450 TB (ro)+ 15 TB (rw) • Client: 22.000 Pictage, Inc • Online picture album • Storage: 265TB ( planned growth to 425TB in twelve months) • Volumes: 800,000. • Files: 200 000 000. Embian • Internet Shared folder • Storage: 500TB • Server: 200 Storage server • 300 App server RZH •Internal usage 210TB 14
  • 15. OpenAFS: good for ... Good • Wide Area Network • Heterogeneous System • Read operation > write operation • Large number of clients/systems • Usage directly by end-users • Federation Bad • Locking • Database • Unicode • Large File • Some limitations on .. 15
  • 16. GlusterFS “Gluster can manage data in a single global namespace on commodity hardware..‖ Keys:  Lower Storage Cost—Open source software runs on commodity hardware  Scalability—Linearly scales to hundreds of Petabytes  Performance—No metadata server means no bottlenecks  High Availability—Data mirroring and real time self-healing  Virtual Storage for Virtual Servers—Simplifies storage and keeps VMs always-on  Simplicity—Complete web based management suite 16 16/02/2012
  • 17. GlusterFS: design 17 16/02/2012
  • 18. GlusterFS: components Volume volume posix1 •Volume is the basic element for data type storage/posix export option directory /home/export1 •The volumes can be stacked for end-volume extension Capabilities volume brick1 •Specific options (features) can be type features/posix-locks enabled for each volume (cache, pre option mandatory fetch, etc.) subvolumes posix1 •Simple creation for custom extensions end-volume with api interface Services volume server type protocol/server •Access to a volume is provided through option transport-type tcp services like tcp, unix socket, option transport.socket.listen-port 6996 infiniband subvolumes brick1 option auth.addr.brick1.allow * end-volume 18 16/02/2012
  • 19. Gluster: components 19 16/02/2012
  • 20. Gluster: performance 20 16/02/2012
  • 21. Gluster: carateristics  Uniform name space: same path on all workstation  Reliability: read-1 replication, asynchronous replication for disaster recovery  Availability: No system downtime for maintenance (better in the next release)  Scalability: Truly linear scalability  Administration: Self Healing, Centralized logging and reporting, Appliance version  Performance: Stripe files across dozens of storage blocks, Automatic load balancing, per volume i/o tuning 21 16/02/2012
  • 22. Gluster: who uses it ?  Avail TVN (USA) 400TB for Video on demand, video storage  Fido Film (Sweden) visual FX and Animation studio  University of Minnesota (USA) 142TB Supercomputing  Partners Healthcare (USA) 336TB Integrated health system Origo(Switzerland) open source software development and collaboration platform 22
  • 23. Gluster: good for ... Good • Large amount of data • Access with different protocols • Directly access from applications (api layer) • Disaster recover (better in the next release) • SAN replacement, vm storage Bad • User-space • Low granularity in security setting • High volumes of operations on same file 23
  • 24. Implementations Implementations Old way  Metadata and data in the same place  Single stream per file New way  Multiple streams are parallel channels through which data can flow  Files are striped across a set of nodes in order to facilitate parallel access  OSD Separation of file metadata management (MDS) from the storage of file data 24 16/02/2012
  • 25. HDFS: Hadoop HDFS is part of the Apache Hadoopproject which develops open-source software for reliable, scalable, distributed computing. Hadoop was inspired by Google’s MapReduce and Google File system 25 16/02/2012
  • 26. HDFS: Google File System ― Design of a file systems for a different environment where assumptions of a general purpose file system do not hold—interesting to see how new assumptions lead to a different type of system…‖ Key ideas:  Component failures are the norm.  Huge files (not just the occasional file)  Append rather than overwrite is typical  Co-design of application and file system API—specialization. For example can have relaxed consistency. 26 16/02/2012
  • 27. HDFS: MapReduce “Moving Computation is Cheaper than Moving Data” Map • Split and mapped in key- value pairs Combine • For efficiency reasons, the combiner works directly to map operation outputs . Reduce • The files are then merged, sorted and reduced 27
  • 28. HDFS: goals Scalable: can reliably store and process petabytes. Economical: It distributes the data and processing across clusters of commonly available computers. Goals Efficient: can process data in parallel on the nodes where the data is located. Reliable: automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. 28
  • 30. HDFS: components Namenode • An HDFS cluster consists of a single NameNode • It is a master server that manages the file system namespace and regulates access to files by clients. Datanodes • Datanode manage storage attached to the system it run on • Applay the map rule of MapReduce Blocks • File is split into one or more blocks and these blocks are stored in a set of DataNodes 30
  • 31. HDFS: features  Uniform name space: same path on all workstations  Reliability: rw replication, re-balancing, copy in different locations  Availability: hot deploy  Scalability: server aggregation  Administration: HOD  Performance: “grid” computation, parallel transfer 31 16/02/2012
  • 32. HDFS: who uses it ? Yahoo! A9.com AOL Booz Allen Hamilton EHarmony Facebook Freebase Fox Interactive Media IBM ImageShack ISI Major players Joost Last.fm LinkedIn Metaweb Meebo Ning Powerset (now part of Microsoft) Proteus Technologies The New York Times Rackspace Veoh Twitter … 32
  • 33. HDFS: good for ... Good • Task distribution (Basic GRID infrastructure) • Distribution of content (High throughput of data access ) • Archiving • Etherogenous envirorment Bad • Not General purpose File system • Not Posix Compliant • Low granularity in security setting • Java 33
  • 34. Ceph “Ceph is designed to handle workloads in which tens thousands of clients or more simultaneously access the same file orwrite to the same directory– usage scenarios that bring typical enterprise storage systems to their knees.‖ Keys:  Seamless scaling — The file system can be seamlessly expanded by simply adding storage nodes (OSDs). However, unlike most existing file systems, Ceph proactively migrates data onto new devices in order to maintain a balanced distribution of data.  Strong reliability and fast recovery — All data is replicated across multiple OSDs. If any OSD fails, data is automatically re-replicated to other devices.  Adaptive MDS — The Ceph metadata server (MDS) is designed to dynamically adapt its behavior to the current workload. 34
  • 35. Ceph: design • Client • Metadat OSD a Cluster • Object Storage Cluster 35
  • 36. Ceph: features Dynamic Distributed Metadata • Metadata Storage • Dynamic Subtree Partitioning • Traffic Control Reliable Autonomic Distributed Object Storage • Data Distribution • Replication • Data Safety • Failure Detection • Recovery and Cluster Updates 36
  • 37. Ceph: features Pseudo-random data distribution function (CRUSH) Reliable object storage service (RADOS) Extent B-tree object File System (today btrfs) 37
  • 38. Ceph: features Splay Replication • Only after it has been safely committed to disk is a final commit notification sent to the client. 38
  • 39. Ceph: good for … Good • Scientific application, High throughput of data access • Heavy Read / Write operations • It is the most advance distributed file system Bad • Young (Linux 2.6.34) • Linux only • Complex 39
  • 40. Others Lustre PVFS MooseFS Cloudstore PNFS … (kosmos) Search XtreemFS Tahoe-LAFS Wikipedia.. 40
  • 41. Part III Case Studies 41
  • 42. Class Exam  What can DFS do for you ?  How can you create a Petabyte storage ?  How can you make a centralized system log ?  How can you allocate space for your user or system, when you have a thousands of users/systems ?  How can you retrieve data from everywhere ? 42 16/02/2012
  • 43. File sharing Problem •Share Documents across a wide network area •Share home folder across different Terminal servers Solution •OpenAFS •Samba Results •Single ID, Kerberos/ldap •Single file system Usage •800 users •15 branch offices •File sharing /home dir 43
  • 44. Web Service Problem • Big Storage on a little budget Solution • Gluster Results • High Availability data storage • Low price Usage • 100 TB image archive • Multimedia content for web site 44
  • 45. Internet Disk: myS3 Problems •Data from everywhere •Disaster Recover Solution •myS3 •Hadoop / OpenAFS Results •High Availability •Access through HTTP protocol (REST Interface) •Disaster Recovery Usage •Users backup •Application backend •200 Users •6 TB 45
  • 46. Log concentrator Problem • Log concentrator Solution • Hadoop cluster • Syslog-NG Results • High availability • Fast search • “Storage without limits” Usage • Security audit and access control 46
  • 47. Private cloud Problems • Low cost VM storage • VM self provisioning Solution • GlusterFS • openAFS • Custom provisioning Rresults • Auto provisioning • Low cost • Flexible solution Usage • Development env • Production env
  • 48. Conclusion: problems Do you have enough bandwidth ?  Failure For 10 PB of storage, you will have an average of22consumer-grade SATA drives failing per day.  Read/write time Each of the 2TB drives takes approximately best case 24,390 seconds to be read and written over the network.  Data Replication Data replication is the number of the disk drives, plus difference. 48 16/02/2012
  • 49. Conclusion Environment Analysis • No true Generic DFS • Not simple move 800TB btw different solutions Dimension • Start with the right size • Servers number is related to speed needed and number of clients • Network for Replication Divide system in Class of Service • Different disk Type • Different Computer Type System Management • Monitoring Tools • System/Software Deploy Tools 49
  • 50. Conclusion: next step 50 16/02/2012
  • 51. Links OpenAFS Gluster Hadoop Ceph • www.openafs.org • www.gluster.org • Hadoop.apache.org • ceph.newdream.n • www.beolink.org • Isabel Drost et • Publication • Mailing list 51
  • 52. I look forwardto meeting you… XVII European AFS meeting 2010 PILSEN - CZECH REPUBLIC September 13-15 Who should attend:  Everyone interested in deploying a globally accessible file system  Everyone interested in learning more about real world usage of Kerberos authentication in single realm and federated single sign-on environments  Everyone who wants to share their knowledge and experience with other members of the AFS and Kerberos communities  Everyone who wants to find out the latest developments affecting AFS and Kerberos More Info: http://afs2010.civ.zcu.cz/ 52 16/02/2012

Notes de l'éditeur

  1. The session is composed by 3 main parts.The first part we will see some definition on dfs and a new trends on data center, in special way what the big player try to doThe second part will be to explain the architectures of four distributed file systemAt the end I will give you some example and real case studies, with explained tecnologiesLat but not the least conclusion .. And Dinner
  2. I want start with some question, to understand who are you ? You will find an answear of questions on case studies part
  3. Today you can find the big player with some announce and sometime solution with the name Fabric, for example Cisco call its solution unify Fabric, but Which is the idea behind this name ? With Fabric we have to go back to a grid idea, with many nodes .but this time we have also some other categories, in special way the fabric has concepts of ...Probably is not a new idea .. But anyway ..this is the future that we will see as advertising
  4. On the 5 categories showed before, the most important one today, for my opinion, i the storage tier, because we don’t still have the right way and we have a lot of solutions more unknown or in beta stage and only some consolidated old architecture. For the fabric the storage tier need to be ... The last two are not directy connected to torage but could be.The more used solutions today is Storage Area Network, and sometime Network Attached Storage, big player today said the future is ome over ethernet, no one talks about distributed file system .. Do you know distributed file system ?
  5. Than the first question is .. With this explanation could be useful for a data center .. What do you think ?
  6. What do you expect from a DFS, what do you need for the fabric
  7. Unfortunatly or fortunatly we have dozen of dfs, we can create 5 categories, each of them try to solve some specific problem that means we don’t have a true generic filesystem like our ext3 or any file system used on local hard drive.
  8. The First file system that I talk is openafs, the true origin of AFS is the carneymellon university, the keys ideas behind the design of openafs are That mean time and locationUse persistent cache on client sideHide data locationThis is the opposite of nfs, the afs used kb4 and now krb5
  9. We have 2 types of services, one is name database and it is a collection of database (the name probably give you some ) and the other one is the file server also in this case .. You can understand the function. In the databae server you have 4 service, one for search and lookup the data, your information are spread around many server how can understand where is it ? Simple you use the Volume location service, this service give you the server where the information are sotred.Another service is the ptserver, it is a database for handle mapping btw id and user name and the same for groups. It also contain the group owner and member of a specific groupBu Server is the database with the information on last backup and some other related information for backup serviceThe last is deprecated, it is a special version of kerberos 4 now you can use a standard kerberos 5This is for the db server, on the other hand we have file server, witch read and save the data on the specific partition.OpenAFS is a set of file in standard file system, the block are handle with a map of inode of the partition, for this reason it is much better use separeted partitionLast component, is the client, on the client you have a kernel module and cache manager, with kerberos ticket all your request are autheticated, and handle by kernel, the cache manager controll and handle all the entry of the cche .OpenAFS works with RPC and callback that means the file server know you have a copy of a file, if the file change the fileserver break the callback to users with this mecanisim the cache is not a timer cache but a coherent.. And you have reduce the network traffic
  10. Now we see how the information is archive,Volumes are similar to logical volume, the quota work as a quota and you can expand as you want, depend on the underline filesystem sizeYou can move volume wheterever you want, you can replicate volume , unfortunatly the read only copy is more a snapshoot .. Real tiem replicaYou have a specific command for handle syncronization btw volume
  11. The user can define its own group ancacl
  12. With the last cache changes, you lose 5 % of speed with a warm cache copared with a read of local filesyste,
  13. The basic idea of gluster is replace Storage Attach storage with a bunch of low cost server without single point of failure in simple way, Goals of the project is High level of scalability, performance and high avaibility ... Today with the idea of a virtualization storage
  14. The basic idea ofGluster is to be simple, you can use as a Lego, with different bricks.You have a server side where the partition is exported and a client side, most of the work is made in clint side because you don’t have metadata server, than all deciion are made by clients through information stored on the serverYou have different of interconnection of server and client.One of the big advantage of gluster is to re export the filesystem with other protocol.
  15. I have mentionedbriks, we have also in this case the volume that is a bricks, you can define incremental capability with extension of the volume the idea come from Hurd
  16. Is used as storage, SAN replacement, today more attention on vmware world and virtual machine with specific features for handling the fail over
  17. Is used as storage, SAN replacement, today more attention on vmware world and virtual machine with specific features for handling the fail over
  18. We see two implementation which don’t use separation of metadata throght data, single stream one server send to you the entiry file block per block And mantain the meta dta on separeted infrastructure, and introduction of concept of object storage
  19. Components, you have a datanode where the metadata information are stored and many data nodes wher piece of file are copiedClient send a requst to namenode .. Te name node send back the list of block datanodes
  20. The namenoe is single point of failure, you need to use some high avaibility, the information are stored in memory, but on it writes a journal file on disk, in case of crash you can Copy 2 in the same rack one .. In external rack
  21. Basicaly is very good for log and or analisys, could be also for coordination like a distributd locking, interesting some project are hive a meta language like sql used to querys the data, and the hbase .. The inplementation of big table of google
  22. Ceph try to solve some limitation present in the osd model, in special way in the separation of meta data and data. The main objective are scalability realibility and high avaibility
  23. In the high level we have usual components , meta data server object storage cluster ... The nme is chenaged but
  24. On the mds side you have a dynamic subtree partition base on traffic controllN the stora system you have automatic replication and failue detection
  25. Unfortunatly or fortunatly we have dozen of dfs, we can create 5 categories, each of them try to solve some specific problem that means we don’t have a true generic filesystem like our ext3 or any file system used on local hard drive.