5. LinuxTag 2013 5
Me ;-)
● Teacher of mathematics & physics
● PhD in experimental physics
● Started with Linux in 1996
● Linux/UNIX trainer
● Solution engineer in HPC and CAx environment
● Head of the Linux Strategy team @Amadeus
6. LinuxTag 2013 6
Storage: History
● Reviewing storage task responsibilities
● Block allocation
● Space management
● Extension of SCSI standard
● Object based storage
● Meta-Data handling separated from data
management
7. LinuxTag 2013 7
Object based storage
● Storage objects quite general
● Partition, file, ...
● Unique identifier
● OSD (Object based Storage Device)
● Hardware -> original trigger
● Software -> common implementation
● Main component of distributed file systems
8. LinuxTag 2013 8
Distributed storage:
Paradigm changes
● Block -> Object
● Central -> Distributed
● Few -> Many
● Big -> Small
● Server <-> Storage
9. LinuxTag 2013 9
Distributed File Systems
● 'Recent' attention on distributed storage
● Cloud hype
● Big Data
● See also CEPH talks
10. LinuxTag 2013 10
Distributed storage – Now what?!?
● Several implementations
● Different functions
● Support models
● Storage vendors initiatives
● Relation to Linux distributions
Here and now ==> GlusterFS
12. LinuxTag 2013 12
History
● Gluster founded in 2005
● Gluster = GNU + cluster
● Acquisition by Red Hat in 2011
● Community project
● 3.2 in 2011
● 3.3 in 2012
● Commercial product: RedHat Storage Server
13. LinuxTag 2013 13
The Client
● Native
● 'speaks' GLUSTERFS
● Not part of the Linux Kernel
● FUSE-based
● NFS
● Normal NFS client stack
● S3/Swift compatible
● Proxy needed
14. LinuxTag 2013 14
The Server
● Data
● Bricks
● Translators
● Volumes -> exported/served to the client
● Meta-Data
● No dedicated instance
● Distributed hashing approach
17. LinuxTag 2013 17
The Brick
● Trust each other
● Interconnect
● TCP/IP and/or RDMA/Infiniband
● Dedicated file systems on GlusterFS server
● XFS recommended, EXT4 works too
● Extended attributes a must
● Two main processes/daemons
● glusterd and glusterfsd
18. LinuxTag 2013 18
The Translator
● One per purpose
● Replication
● POSIX
● Quota
● I/O behaviour
● Chained -> brick graph
● Technically: configuration
19. LinuxTag 2013 19
The Volume
● Service unit
● Layer of configuration
● distributed, replicated, striped, ...
● NFS
● Cache
● Permissions
● ....
24. LinuxTag 2013 24
Meta Data
● 2 kinds
● More of local file system style
● Related to distributed nature
● Some stored in backend file system
● Permissions
● Time stamps
● Distribution/replication
● Some calculated on the fly
● Brick location
25. LinuxTag 2013 25
Elastic Hash Algorithm
● Based on file names
● Name space divided
● Full brick handled via relinking
● Stored in extended attributes
● Client needs to know topology
27. LinuxTag 2013 27
Self-Healing
● On demand vs. Scheduled
● File based
● Based on extended attributes
● Split-brain
● Quorum function
● Sometimes: manual intervention
28. LinuxTag 2013 28
Geo replication
● Asynchronous
● Based on rsync/ssh
● Master-Slave
● If needed: cascading
● One way street
● Clocks in sync!
29. LinuxTag 2013 29
From files to objects
● Introduced with version 3.3
● Hard links with some hierarchy
● Re-uses GFID (inode number)
● UFO
● Unified File and Object
● Combination with RESTful API
● S3 and swift compatible
33. LinuxTag 2013 33
NAS replacement
● NFS as 1:1
● Server: GlusterFS
● Client: NFS
● NFS as such
● Server: GlusterFS
● Client: GlusterFS
34. LinuxTag 2013 34
Storage back-end for KVM and Co
● Stacked (indirect)
● Not smart
● Workable for main hypervisors
● Direct
● QEMU
● libvirt
● oVirt/RHEV
35. LinuxTag 2013 35
SAN replacement
● Not quite advanced (yet)
● New translator needed
● Development started
● Presenting GlusterFS as block device
● Additional items needed
● Locking
● ...
37. LinuxTag 2013 37
Take aways
● Thin distributed file system layer
● Modular architecture
● Operationally ready
● Still some surprises
● Active development and community