SlideShare une entreprise Scribd logo
1  sur  54
GlusterFS – Architecture & Roadmap
Vijay Bellur
GlusterFS co-maintainer
http://twitter.com/vbellur
05/17/16
Agenda
● What is GlusterFS?
● Architecture
● Integration
● Use Cases
● Future Directions
● Challenges
● Q&A
05/17/16
What is GlusterFS?
●
A general purpose scale-out distributed file system.
●
Aggregates storage exports over network interconnect to
provide a single unified namespace.
●
Filesystem is stackable and completely in userspace.
●
Layered on disk file systems that support extended
attributes.
05/17/16
Typical GlusterFS Deployment
Global namespace
Scale-out storage
building blocks
Supports
thousands of clients
Access using
GlusterFS native,
NFS, SMB and HTTP
protocols
Linear performance
scaling
05/17/16
GlusterFS Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular
● Deployment agnostic
● Unified access
● Largely POSIX compliant
05/17/16
Concepts & Algorithms
05/17/16
GlusterFS concepts – Trusted Storage Pool
●
Trusted Storage Pool (cluster) is a collection of storage servers.
●
Trusted Storage Pool is formed by invitation – “probe” a new
member from the cluster and not vice versa.
●
Logical partition for all data and management operations.
●
Membership information used for determining quorum.
●
Members can be dynamically added and removed from the
pool.
05/17/16
GlusterFS concepts – Trusted Storage Pool
Node2
Probe
Probe
accepted
Node 1 and Node 2 are peers in a trusted storage pool
Node2Node1
Node1
05/17/16
GlusterFS concepts – Trusted Storage Pool
Node1 Node2 Node3Node2Node1 Trusted Storage Pool
Node3Node2Node1
Detach
05/17/16

A brick is the combination of a node and an export directory – for e.g.
hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number bricks per node

Ideally, each brick in a cluster should be of the same size
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks
05/17/16
GlusterFS concepts - Volumes
● A volume is a logical collection of bricks.
● Volume is identified by an administrator provided name.
● Volume is a mountable entity and the volume name is provided
at the time of mounting.
– mount -t glusterfs server1:/<volname> /my/mnt/point
● Bricks from the same node can be part of different volumes
05/17/16
GlusterFS concepts - Volumes
Node2Node1 Node3
/export/brick1
/export/brick2
/export/brick1
/export/brick2
/export/brick1
/export/brick2
music
Videos
05/17/16
Volume Types
➢
Type of a volume is specified at the time of volume
creation
➢
Volume type determines how and where data is placed
➢
Following volume types are supported in glusterfs:
a) Distribute
b) Stripe
c) Replication
d) Distributed Replicate
e) Striped Replicate
f) Distributed Striped Replicate
05/17/16
Distributed Volume
➢
Distributes files across various bricks of the volume.
➢
Directories are present on all bricks of the volume.
➢
Single brick failure will result in loss of data availability.
➢
Removes the need for an external meta data server.
05/17/16
How does a distributed volume work?
➢
Uses Davies-Meyer hash algorithm.
➢
A 32-bit hash space is divided into N ranges for N bricks
➢
At the time of directory creation, a range is assigned to each directory.
➢
During a file creation or retrieval, hash is computed on the file name.
This hash value is used to locate or place the file.
➢
Different directories in the same brick end up with different hash
ranges.
05/17/16
How does a distributed volume work?
05/17/16
How does a distributed volume work?
05/17/16
How does a distributed volume work?
05/17/16
Replicated Volume
●
Synchronous replication of all directory and file updates.
●
Provides high availability of data when node failures occur.
●
Transaction driven for ensuring consistency.
●
Changelogs maintained for re-conciliation.
●
Any number of replicas can be configured.
05/17/16
How does a replicated volume work?
05/17/16
How does a replicated volume work?
05/17/16
Distributed Replicated Volume
● Distribute files across replicated bricks
● Number of bricks must be a multiple of the replica count
● Ordering of bricks in volume definition matters
● Scaling and high availability
● Reads get load balanced.
● Most preferred model of deployment currently.
05/17/16
Distributed Replicated Volume
05/17/16
Striped Volume
●
Files are striped into chunks and placed in various bricks.
●
Recommended only when very large files greater than the size
of the disks are present.
●
Chunks are files with holes – this helps in maintaining offset
consistency.
●
A brick failure can result in data loss. Redundancy with
replication is highly recommended (striped replicated volumes).
05/17/16
Elastic Volume Management
Application transparent operations that can be performed in the
storage layer.
●
Addition of Bricks to a volume
●
Remove brick from a volume
●
Rebalance data spread within a volume
●
Replace a brick in a volume
●
Performance / Functionality tuning
05/17/16
Access Mechanisms
05/17/16
Access Mechanisms
Gluster volumes can be accessed via the following mechanisms:
– FUSE based Native protocol
– NFSv3
– SMB
– libgfapi
– ReST/HTTP
– HDFS
05/17/16
FUSE based native access
05/17/16
NFS access
05/17/16
libgfapi
●
Exposes APIs for accessing Gluster volumes.
●
Reduces context switches.
●
qemu, samba, NFS Ganesha integrated with libgfapi.
●
Both sync and async interfaces available.
●
Emerging bindings for various languages.
05/17/16
libgfapi v/s FUSE – FUSE access
05/17/16
libgfapi v/s FUSE – libgfapi access
05/17/16
ReST based access
05/17/16
ReST - G4S
Client
Proxy Account
Container
Object
HTTP Request
(REST)
Directory
Volume
FileClient
NFS or
GlusterFS Mount
Unified File and object view.
Entity mapping between file and object building blocks
05/17/16
Hadoop access
05/17/16
Implementation
05/17/16
Translators in GlusterFS
●
Building blocks for a GlusterFS process.
●
Based on Translators in GNU HURD.
●
Each translator is a functional unit.
●
Translators can be stacked together for achieving
desired functionality.
●
Translators are deployment agnostic – can be loaded
in either the client or server stacks.
05/17/16
Customizable Translator Stack
05/17/16
Ecosystem Integration
05/17/16
Ecosystem Integration
●
Currently integrated with various ecosystems:
●
OpenStack
●
Samba
●
Ganesha
●
oVirt
●
qemu
●
Hadoop
●
pcp
●
Proxmox
●
uWSGI
05/17/16
OpenStack Havana and GlusterFS – Current Integration
Glance
Images
Nova
Nodes
Swift
Objects
Cinder
Data
Glance
Data
Swift
Data
Swift
API
Storage
Server
Storage
Server
Storage
Server…
KVM
KVM
KVM
…
● Separate Compute and
Storage Pools
● GlusterFS directly provides
Swift object service
● Integration with Keystone
● GeoReplication for multi-
site support
● Swift data also available via
other protocols
● Supports non-OpenStack use
in addition to OpenStack use
Logical View Physical View
libgfapi
05/17/16
OpenStack and GlusterFS – Future Integration
Glance
Images
Nova Swift
Objects
Cinder
Data
Glance
Data
Swift
Data
Swift
API
Storage Storage Storage
…
KVM
KVM
KVM
…
libgfapi
Glance
Data
Manila
Data
Savanna
Data
FileBlock Object
05/17/16
GlusterFS & oVirt
●
Trusted Storage Pool and Gluster Volume
management - oVirt 3.1
●
FUSE based posixFS support for VM image storage -
oVirt 3.1
●
libgfapi based Gluster native storage domain - oVirt
3.3
●
Manage converged virtualization and storage clusters
in oVirt
●
ReST APIs & SDK for GlusterFS management.
05/17/16
GlusterFS & oVirt
05/17/16
Use Cases - current
● Unstructured data storage
● Archival
● Disaster Recovery
● Virtual Machine Image Store
● Cloud Storage for Service Providers
● Content Cloud
● Big Data
● Semi-structured & Structured data
05/17/16
Future Directions
05/17/16
New Features in GlusterFS 3.5
● Distributed geo-replication
● File snapshots
● Compression translator
● Multi-brick Block Device volumes
● Readdir ahead translator
● Quota Scalability
05/17/16
Beta Features in GlusterFS 3.5
● Disperse translator for Erasure Coding
● Encryption at rest
● Support for bricks on Btrfs
● libgfapi support for NFS Ganesha (NFS v4)
05/17/16
Geo-replication in 3.5
●
Before 3.5
➢
Merkle tree based optimal volume crawling
➢
Single driver on the master
➢
SPOF
●
In 3.5
➢
Based on changelog
➢
One driver per replica set on the master
➢
No SPOF
05/17/16
Quota in 3.5
● Before 3.5
– Client side enforcement
– Configuration in volume files would block scalability
– GFID accesses could cause incorrect accounting
– Only hard quota supported
● In 3.5
– Server side enforcement
– Better configuration management for scalability.
– GFID to path conversion enables correct accounting.
– Both hard and soft quotas supported
05/17/16
Prominent Features beyond GlusterFS 3.5
●
Volume snapshots
●
New Style Replication
●
pNFS access with NFS Ganesha
●
Data tiering / HSM
●
Multi master geo-replication
●
Support Btrfs features
●
Caching improvements
●
libgfchangelog
●
and more...
05/17/16
Challenges
● Scalability – 1024 nodes, 72 brontobytes?
● Hard links
● Rename
● Monolithic tools
● Monitoring
● Reduce Capex and Opex
05/17/16
Resources
Mailing lists:
gluster-users@gluster.org
gluster-devel@nongnu.org
IRC:
#gluster and #gluster-dev on freenode
Links:
http://www.gluster.org
http://hekafs.org
http://forge.gluster.org
http://www.gluster.org/community/documentation/index.php/Arch
Thank you!
Vijay Bellur
vbellur at redhat dot com

Contenu connexe

Tendances

Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
Schubert Zhang
 

Tendances (20)

2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...Introducing github.com/open-cluster-management – How to deliver apps across c...
Introducing github.com/open-cluster-management – How to deliver apps across c...
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
[OpenInfra Days Korea 2018] (Track 4) Provisioning Dedicated Game Server on K...
[OpenInfra Days Korea 2018] (Track 4) Provisioning Dedicated Game Server on K...[OpenInfra Days Korea 2018] (Track 4) Provisioning Dedicated Game Server on K...
[OpenInfra Days Korea 2018] (Track 4) Provisioning Dedicated Game Server on K...
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
NGINX: Basics and Best Practices
NGINX: Basics and Best PracticesNGINX: Basics and Best Practices
NGINX: Basics and Best Practices
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
 
[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Openstack zun,virtual kubelet
Openstack zun,virtual kubeletOpenstack zun,virtual kubelet
Openstack zun,virtual kubelet
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 

En vedette

Gluster fs buero20_presentation
Gluster fs buero20_presentationGluster fs buero20_presentation
Gluster fs buero20_presentation
Martin Alfke
 
Finding the Right Balance: Security vs. Performance with Network Storage Systems
Finding the Right Balance: Security vs. Performance with Network Storage SystemsFinding the Right Balance: Security vs. Performance with Network Storage Systems
Finding the Right Balance: Security vs. Performance with Network Storage Systems
Arun Olappamanna Vasudevan
 

En vedette (20)

Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_clift
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff Darcy
 
Gluster fs buero20_presentation
Gluster fs buero20_presentationGluster fs buero20_presentation
Gluster fs buero20_presentation
 
Gluster.next feb-2016
Gluster.next feb-2016Gluster.next feb-2016
Gluster.next feb-2016
 
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&amp;_roadmap-vijay_bellur-linuxcon_eu_2013
 
Finding the Right Balance: Security vs. Performance with Network Storage Systems
Finding the Right Balance: Security vs. Performance with Network Storage SystemsFinding the Right Balance: Security vs. Performance with Network Storage Systems
Finding the Right Balance: Security vs. Performance with Network Storage Systems
 
Gluster 3.3 deep dive
Gluster 3.3 deep diveGluster 3.3 deep dive
Gluster 3.3 deep dive
 
Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Openstackoverview-DEC2013
Openstackoverview-DEC2013Openstackoverview-DEC2013
Openstackoverview-DEC2013
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with Gluster
 
Mesos & Marathon - Piloter les services de votre système
Mesos & Marathon - Piloter les services de votre systèmeMesos & Marathon - Piloter les services de votre système
Mesos & Marathon - Piloter les services de votre système
 
Continuous delivery with jenkins, docker and exoscale
Continuous delivery with jenkins, docker and exoscaleContinuous delivery with jenkins, docker and exoscale
Continuous delivery with jenkins, docker and exoscale
 
YDAL Barcelona
YDAL BarcelonaYDAL Barcelona
YDAL Barcelona
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching final
 
Lcna example-2012
Lcna example-2012Lcna example-2012
Lcna example-2012
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 

Similaire à GlusterFs Architecture & Roadmap - LinuxCon EU 2013

Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
openstackindia
 

Similaire à GlusterFs Architecture & Roadmap - LinuxCon EU 2013 (20)

Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
GlusterFS Talk for CentOS Dojo Bangalore
GlusterFS Talk for CentOS Dojo BangaloreGlusterFS Talk for CentOS Dojo Bangalore
GlusterFS Talk for CentOS Dojo Bangalore
 
Red Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFSRed Hat Gluster Storage : GlusterFS
Red Hat Gluster Storage : GlusterFS
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fs
 
20160130 Gluster-roadmap
20160130 Gluster-roadmap20160130 Gluster-roadmap
20160130 Gluster-roadmap
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
GlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataGlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big data
 
GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...GlusterFS : un file system open source per i big data di oggi e domani - Robe...
GlusterFS : un file system open source per i big data di oggi e domani - Robe...
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vosOSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
 
GlusterFS and Openstack Storage
GlusterFS and Openstack StorageGlusterFS and Openstack Storage
GlusterFS and Openstack Storage
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 

Plus de Gluster.org

nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
Gluster.org
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
Gluster.org
 

Plus de Gluster.org (20)

Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraAutomating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas Siravara
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
 
Throttling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleThrottling Traffic at Facebook Scale
Throttling Traffic at Facebook Scale
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDO
 
Releases: What are contributors responsible for
Releases: What are contributors responsible forReleases: What are contributors responsible for
Releases: What are contributors responsible for
 
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanRIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and Kubernetes
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!
 
Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster: a SWOT Analysis
Gluster: a SWOT Analysis
 
GlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal Madappa
 
Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6
 
What Makes Us Fail
What Makes Us FailWhat Makes Us Fail
What Makes Us Fail
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and future
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2
 
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

GlusterFs Architecture & Roadmap - LinuxCon EU 2013

  • 1. GlusterFS – Architecture & Roadmap Vijay Bellur GlusterFS co-maintainer http://twitter.com/vbellur
  • 2. 05/17/16 Agenda ● What is GlusterFS? ● Architecture ● Integration ● Use Cases ● Future Directions ● Challenges ● Q&A
  • 3. 05/17/16 What is GlusterFS? ● A general purpose scale-out distributed file system. ● Aggregates storage exports over network interconnect to provide a single unified namespace. ● Filesystem is stackable and completely in userspace. ● Layered on disk file systems that support extended attributes.
  • 4. 05/17/16 Typical GlusterFS Deployment Global namespace Scale-out storage building blocks Supports thousands of clients Access using GlusterFS native, NFS, SMB and HTTP protocols Linear performance scaling
  • 5. 05/17/16 GlusterFS Architecture – Foundations ● Software only, runs on commodity hardware ● No external metadata servers ● Scale-out with Elasticity ● Extensible and modular ● Deployment agnostic ● Unified access ● Largely POSIX compliant
  • 7. 05/17/16 GlusterFS concepts – Trusted Storage Pool ● Trusted Storage Pool (cluster) is a collection of storage servers. ● Trusted Storage Pool is formed by invitation – “probe” a new member from the cluster and not vice versa. ● Logical partition for all data and management operations. ● Membership information used for determining quorum. ● Members can be dynamically added and removed from the pool.
  • 8. 05/17/16 GlusterFS concepts – Trusted Storage Pool Node2 Probe Probe accepted Node 1 and Node 2 are peers in a trusted storage pool Node2Node1 Node1
  • 9. 05/17/16 GlusterFS concepts – Trusted Storage Pool Node1 Node2 Node3Node2Node1 Trusted Storage Pool Node3Node2Node1 Detach
  • 10. 05/17/16  A brick is the combination of a node and an export directory – for e.g. hostname:/dir  Each brick inherits limits of the underlying filesystem  No limit on the number bricks per node  Ideally, each brick in a cluster should be of the same size /export3 /export3 /export3 Storage Node /export1 Storage Node /export2 /export1 /export2 /export4 /export5 Storage Node /export1 /export2 3 bricks 5 bricks 3 bricks GlusterFS concepts - Bricks
  • 11. 05/17/16 GlusterFS concepts - Volumes ● A volume is a logical collection of bricks. ● Volume is identified by an administrator provided name. ● Volume is a mountable entity and the volume name is provided at the time of mounting. – mount -t glusterfs server1:/<volname> /my/mnt/point ● Bricks from the same node can be part of different volumes
  • 12. 05/17/16 GlusterFS concepts - Volumes Node2Node1 Node3 /export/brick1 /export/brick2 /export/brick1 /export/brick2 /export/brick1 /export/brick2 music Videos
  • 13. 05/17/16 Volume Types ➢ Type of a volume is specified at the time of volume creation ➢ Volume type determines how and where data is placed ➢ Following volume types are supported in glusterfs: a) Distribute b) Stripe c) Replication d) Distributed Replicate e) Striped Replicate f) Distributed Striped Replicate
  • 14. 05/17/16 Distributed Volume ➢ Distributes files across various bricks of the volume. ➢ Directories are present on all bricks of the volume. ➢ Single brick failure will result in loss of data availability. ➢ Removes the need for an external meta data server.
  • 15. 05/17/16 How does a distributed volume work? ➢ Uses Davies-Meyer hash algorithm. ➢ A 32-bit hash space is divided into N ranges for N bricks ➢ At the time of directory creation, a range is assigned to each directory. ➢ During a file creation or retrieval, hash is computed on the file name. This hash value is used to locate or place the file. ➢ Different directories in the same brick end up with different hash ranges.
  • 16. 05/17/16 How does a distributed volume work?
  • 17. 05/17/16 How does a distributed volume work?
  • 18. 05/17/16 How does a distributed volume work?
  • 19. 05/17/16 Replicated Volume ● Synchronous replication of all directory and file updates. ● Provides high availability of data when node failures occur. ● Transaction driven for ensuring consistency. ● Changelogs maintained for re-conciliation. ● Any number of replicas can be configured.
  • 20. 05/17/16 How does a replicated volume work?
  • 21. 05/17/16 How does a replicated volume work?
  • 22. 05/17/16 Distributed Replicated Volume ● Distribute files across replicated bricks ● Number of bricks must be a multiple of the replica count ● Ordering of bricks in volume definition matters ● Scaling and high availability ● Reads get load balanced. ● Most preferred model of deployment currently.
  • 24. 05/17/16 Striped Volume ● Files are striped into chunks and placed in various bricks. ● Recommended only when very large files greater than the size of the disks are present. ● Chunks are files with holes – this helps in maintaining offset consistency. ● A brick failure can result in data loss. Redundancy with replication is highly recommended (striped replicated volumes).
  • 25. 05/17/16 Elastic Volume Management Application transparent operations that can be performed in the storage layer. ● Addition of Bricks to a volume ● Remove brick from a volume ● Rebalance data spread within a volume ● Replace a brick in a volume ● Performance / Functionality tuning
  • 27. 05/17/16 Access Mechanisms Gluster volumes can be accessed via the following mechanisms: – FUSE based Native protocol – NFSv3 – SMB – libgfapi – ReST/HTTP – HDFS
  • 30. 05/17/16 libgfapi ● Exposes APIs for accessing Gluster volumes. ● Reduces context switches. ● qemu, samba, NFS Ganesha integrated with libgfapi. ● Both sync and async interfaces available. ● Emerging bindings for various languages.
  • 31. 05/17/16 libgfapi v/s FUSE – FUSE access
  • 32. 05/17/16 libgfapi v/s FUSE – libgfapi access
  • 34. 05/17/16 ReST - G4S Client Proxy Account Container Object HTTP Request (REST) Directory Volume FileClient NFS or GlusterFS Mount Unified File and object view. Entity mapping between file and object building blocks
  • 37. 05/17/16 Translators in GlusterFS ● Building blocks for a GlusterFS process. ● Based on Translators in GNU HURD. ● Each translator is a functional unit. ● Translators can be stacked together for achieving desired functionality. ● Translators are deployment agnostic – can be loaded in either the client or server stacks.
  • 40. 05/17/16 Ecosystem Integration ● Currently integrated with various ecosystems: ● OpenStack ● Samba ● Ganesha ● oVirt ● qemu ● Hadoop ● pcp ● Proxmox ● uWSGI
  • 41. 05/17/16 OpenStack Havana and GlusterFS – Current Integration Glance Images Nova Nodes Swift Objects Cinder Data Glance Data Swift Data Swift API Storage Server Storage Server Storage Server… KVM KVM KVM … ● Separate Compute and Storage Pools ● GlusterFS directly provides Swift object service ● Integration with Keystone ● GeoReplication for multi- site support ● Swift data also available via other protocols ● Supports non-OpenStack use in addition to OpenStack use Logical View Physical View libgfapi
  • 42. 05/17/16 OpenStack and GlusterFS – Future Integration Glance Images Nova Swift Objects Cinder Data Glance Data Swift Data Swift API Storage Storage Storage … KVM KVM KVM … libgfapi Glance Data Manila Data Savanna Data FileBlock Object
  • 43. 05/17/16 GlusterFS & oVirt ● Trusted Storage Pool and Gluster Volume management - oVirt 3.1 ● FUSE based posixFS support for VM image storage - oVirt 3.1 ● libgfapi based Gluster native storage domain - oVirt 3.3 ● Manage converged virtualization and storage clusters in oVirt ● ReST APIs & SDK for GlusterFS management.
  • 45. 05/17/16 Use Cases - current ● Unstructured data storage ● Archival ● Disaster Recovery ● Virtual Machine Image Store ● Cloud Storage for Service Providers ● Content Cloud ● Big Data ● Semi-structured & Structured data
  • 47. 05/17/16 New Features in GlusterFS 3.5 ● Distributed geo-replication ● File snapshots ● Compression translator ● Multi-brick Block Device volumes ● Readdir ahead translator ● Quota Scalability
  • 48. 05/17/16 Beta Features in GlusterFS 3.5 ● Disperse translator for Erasure Coding ● Encryption at rest ● Support for bricks on Btrfs ● libgfapi support for NFS Ganesha (NFS v4)
  • 49. 05/17/16 Geo-replication in 3.5 ● Before 3.5 ➢ Merkle tree based optimal volume crawling ➢ Single driver on the master ➢ SPOF ● In 3.5 ➢ Based on changelog ➢ One driver per replica set on the master ➢ No SPOF
  • 50. 05/17/16 Quota in 3.5 ● Before 3.5 – Client side enforcement – Configuration in volume files would block scalability – GFID accesses could cause incorrect accounting – Only hard quota supported ● In 3.5 – Server side enforcement – Better configuration management for scalability. – GFID to path conversion enables correct accounting. – Both hard and soft quotas supported
  • 51. 05/17/16 Prominent Features beyond GlusterFS 3.5 ● Volume snapshots ● New Style Replication ● pNFS access with NFS Ganesha ● Data tiering / HSM ● Multi master geo-replication ● Support Btrfs features ● Caching improvements ● libgfchangelog ● and more...
  • 52. 05/17/16 Challenges ● Scalability – 1024 nodes, 72 brontobytes? ● Hard links ● Rename ● Monolithic tools ● Monitoring ● Reduce Capex and Opex
  • 53. 05/17/16 Resources Mailing lists: gluster-users@gluster.org gluster-devel@nongnu.org IRC: #gluster and #gluster-dev on freenode Links: http://www.gluster.org http://hekafs.org http://forge.gluster.org http://www.gluster.org/community/documentation/index.php/Arch
  • 54. Thank you! Vijay Bellur vbellur at redhat dot com