SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Gustavo Brand
Adrien Lèbre
École des Mines / Scalus EU Project
Nantes – France
Nov 2012
On-demand File Caching
with GlusterFS
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Agenda
• This is an entry level Development talk
• Different approach using Gluster components
• Problem
• Investigation experiments
• Where Gluster fits in
• Current and Future scenarios
• Fops overview
• Gluster observations and some issues
• Examples
2
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Problem statement
• Problems:
• Is it still relevant to use additional nodes to analyze
data faster?
• How applications accessing data through the DFSs at
LAN/WAN levels are impacted?
• Should locality be considered as a major concern to
design a DFS?
• In summary: Validate the assumption about the impact
that metadata/data traffic and latency have into wide
area scenarios when executing parallel applications.
3
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Investigation through
experimentation
• Investigation method:
• We conducted experiments with:
• HDFS and Lustre
• Different scenarios within LANs(-) and WANs(/):
• CD-M, C-D-M
• CD/M, C-D/M, CD-M/CD, C-D-M/C, C/D-M
• Running 3 M/R applications (grep, writer, sort)
• Fixed 8GB file size with 16, 32, 64 nodes
• Scaled the file size 8-128GB with 64 nodes
• Main reference measured: runtime, among others
4
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Invest. findings
• Exploring bigger file sizes with the best and
worst scenarios: Sort application
5
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Invest. findings
• In most cases, accessing data through WAN
leads to worse performance (as expected…)
• It was better to use less nodes than trying
to benefit from external WAN ones
• The completion time for the local scenarios
with 16 nodes is similar to the WAN ones
using 2x or 4x more nodes
6
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Current WiP - GBFS
• Use implicit striping (propagation according
to the access pattern)
• Avoiding unsolicited traffic through local
placement
• Extending it group-wide (stripe into the group)
• Using the groups to manage which data is stored
locally (WAN, LAN, local node... you name it)
• Having a persistent and volatile LRU persistent
caching per group.
7
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
What about Gluster?
• Using some of its translators:
• Posix + Server translators as backend
• GBFS + Client translators at the “client” side.
• Peer to peer way, with each node having a
server(+brick) and client(s).
• At GBFS, additional info is provided about other
subvols at start time as volume “options”
(i.e.: groups each one is related to).
8
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
What about Gluster?
• Advantages: easy access to other
nodes through the subvolumes
(masking RPC comms)
• Easy installation over FSs/some DFSs
• xattr size limits can be problematic
• Translators approach enables a good
functionality granularity.
9
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Scenario - phase 1
10
• Local DFSs at each side
WAN
1 2 3 4 5 6
I/O Server 01 I/O Server 02
Group A (with an underlying DFS)
MDS
(Metadata)
All clients using /mnt/lustre-A
Group B
All clients using remotely
lustre-A + local lustre-B
I/O Server 01 I/O Server 02
MDS
(Metadata)
Subvol GA
/mnt/lustre-A
Subvol GB
/mnt/lustre-B
Compute Nodes with shared storageCompute Nodes
Root
Node
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
1st read from B
11
User at B
(FUSE)
GBFS
Brick @A Brick @B
Subvol 1: Brick A
Subvol 2: Brick B
Checks extended
metadata at A:
Range of bytes
available at my group?
READ from group B
Read from A
Write locally at B
Write metadata at A
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
2nd read from B
12
User at B
(FUSE)
GBFS
Brick @A Brick @B
Subvol 1: Brick A
Subvol 2: Brick B
Checks extended
metadata at A:
Range of bytes
available at my group?
READ from group B
Read at B
No metadata
update at A
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Write
• Using same GFID, with sparse files
• Testing different algorithms/versions:
• 1st: write centralized
+ overwrite metadata on write at A
• 2nd: write locally
+ overwrite local metadata on write
+ validate local metadata on read
• 3rd: write locally
+ invalidate metadata on write
• Going from centralized to distributed.
13
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Write example
14
User at B
(FUSE)
GBFS
Brick @A Brick @B
Overwrite ext. meta
at A.
Write data at A.
WRITE from
group B
User at B
(FUSE)
GBFS
Brick @A Brick @B
WRITE from
Group B
Write range at B
Write metadata at B
Update ext. meta
range pointing to B
1st simple approach
Central write
3rd approach, locally
Invalid on write
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Scenario - phase 2
15
• Local storage at each node (at B)
WAN
1 2 3 4 5 6
I/O Server 01 I/O Server 02
Group A (with an underlying DFS)
MDS
(Metadata)
All clients using /mnt/lustre-A
7 8
Group B
All clients using remotely
/mnt/lustre-A + local storage
Sub volume 1
/mnt/lustre
Sub volume 2
/mnt/local8
Root
Node
Compute Nodes Compute Nodes with local storage
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Bigger group shot
16
• Multiple groups
WAN
1 2 3
4 5 6
I/O Server 01 I/O Server 02
Group A (with an underlying DFS)
MDS
(Metadata)
Base cluster
All clients using local lustre-A
+ remote storages
7 8
Group B All clients using local storages
+ remote groups
Subvol GA
/mnt/lustre-A
Subvol localB8
/mnt/localB8
9 10 11
Group C
Local DFS
Subvol localB7
/mnt/localB7
Subvol localB6
/mnt/localB6
...
Subvol GA
/mnt/lustre-A
Subvol localB8
/mnt/localB8
Subvol localB7
/mnt/localB7
Subvol GC
/mnt/DFS-C
...
Subvol GC
/mnt/DFS-C
All clients using remotely
lustre-A + local Bs + local DFS
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Nodes addition on
demand - concept
• Having an elastic addition and capacity
17
WAN
1 2 3
6
I/O Server 01 I/O Server 02
Group A
MDS
(Metadata)
Add local
compute nodes
Group B
8 9 10
Group C
DFS X
4 5
7
12 13
Group D
Group N
DFS Y
Add local
storage nodes
Add a remote DFS to expand resources
Add a remote DFS to expand computing power
And so on…
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Gluster observations
• More dev documentation would be helpful :)
• “Translator 101” and “xlator_api_2” posts are
very good
• Main resources are Jeff Darcy blog entries,
the gluster-devel list and the IRC.
• Few comments all over the code
(as of 3.3 at least)
• Overall architecture of gluster communication
protocols, client/server/bricks algorithms, etc.
18
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Gluster observations
• Complex translators like DHT and AFR details in
terms of routines, schemes, comments would be
helpful as well.
• Ex: Algorithm about how each one keeps a
unified namespace, for example (overall view).
• Common develop. techniques details can save a
lot of time from contributors/new developers.
19
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Common techniques
• fd_ctx and inode_ctx common usage;
• general use of local / private / *_ctx per
translator;
• striping techniques used at current code;
• stubs/sync calls (pending at xlator_api_2 :) )
• etc...
20
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
Common techniques
Example taken from gluster devel-list
21
--> WIND / UNWIND flow:
read() {
while(want more) {
WIND _lookup()
}
}
_lookup_cbk() {
wind _open()
}
_read_lookup_cbk_open_cbk() {
wind _read()
add to priv/fd_ctx/local->iovec
}
_read_lookup_cbk_open_cbk_read_cbk() {
wind _close()
UNWIND
}
--> Syncop flow:
read() {
// pull in other content
while(want more) {
syncop_lookup()
syncop_open()
syncop_read()
syncop_close()
}
return iovec / UNWIND
}
Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012
• Thanks
Contact:
gustavo.bervian_brand@inria.fr
gbrand at gluster-devel IRC :)
22

Contenu connexe

Tendances

Tendances (20)

Developing apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapiDeveloping apps and_integrating_with_gluster_fs_-_libgfapi
Developing apps and_integrating_with_gluster_fs_-_libgfapi
 
YDAL Barcelona
YDAL BarcelonaYDAL Barcelona
YDAL Barcelona
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
Gluster wireshark niels_de_vos
Gluster wireshark niels_de_vosGluster wireshark niels_de_vos
Gluster wireshark niels_de_vos
 
Smb gluster devmar2013
Smb gluster devmar2013Smb gluster devmar2013
Smb gluster devmar2013
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 
Lisa 2015-gluster fs-hands-on
Lisa 2015-gluster fs-hands-onLisa 2015-gluster fs-hands-on
Lisa 2015-gluster fs-hands-on
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching final
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Join the super_colony_-_feb2013
Join the super_colony_-_feb2013Join the super_colony_-_feb2013
Join the super_colony_-_feb2013
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015
 
State of the_gluster_-_lceu
State of the_gluster_-_lceuState of the_gluster_-_lceu
State of the_gluster_-_lceu
 
20160130 Gluster-roadmap
20160130 Gluster-roadmap20160130 Gluster-roadmap
20160130 Gluster-roadmap
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.org
 

En vedette

En vedette (16)

Accessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonAccessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willson
 
Qemu gluster fs
Qemu gluster fsQemu gluster fs
Qemu gluster fs
 
Gluster for sysadmins
Gluster for sysadminsGluster for sysadmins
Gluster for sysadmins
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff Darcy
 
Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in Containers
 
Lcna example-2012
Lcna example-2012Lcna example-2012
Lcna example-2012
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
Gdeploy 2.0
Gdeploy 2.0Gdeploy 2.0
Gdeploy 2.0
 
Bug triage in_gluster
Bug triage in_glusterBug triage in_gluster
Bug triage in_gluster
 
Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
Gsummit apis-2013
Gsummit apis-2013Gsummit apis-2013
Gsummit apis-2013
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
 
Introduction to Open Source
Introduction to Open SourceIntroduction to Open Source
Introduction to Open Source
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
GlusterFS Containers
GlusterFS ContainersGlusterFS Containers
GlusterFS Containers
 

Similaire à On demand file-caching_-_gustavo_brand

vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
CloudStack - Open Source Cloud Computing Project
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Progress on adapting BlobSeer to WAN scale
Progress on adapting BlobSeer to WAN scaleProgress on adapting BlobSeer to WAN scale
Progress on adapting BlobSeer to WAN scale
Viet-Trung TRAN
 

Similaire à On demand file-caching_-_gustavo_brand (20)

GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Gluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmapGluster fs current_features_and_roadmap
Gluster fs current_features_and_roadmap
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Progress on adapting BlobSeer to WAN scale
Progress on adapting BlobSeer to WAN scaleProgress on adapting BlobSeer to WAN scale
Progress on adapting BlobSeer to WAN scale
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Hadoop Fundamentals
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentals
 
Hadoop fundamentals
Hadoop fundamentalsHadoop fundamentals
Hadoop fundamentals
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
 

Plus de Gluster.org

nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
Gluster.org
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
Gluster.org
 

Plus de Gluster.org (20)

Automating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas SiravaraAutomating Gluster @ Facebook - Shreyas Siravara
Automating Gluster @ Facebook - Shreyas Siravara
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
 
Facebook’s upstream approach to GlusterFS - David Hasson
Facebook’s upstream approach to GlusterFS  - David HassonFacebook’s upstream approach to GlusterFS  - David Hasson
Facebook’s upstream approach to GlusterFS - David Hasson
 
Throttling Traffic at Facebook Scale
Throttling Traffic at Facebook ScaleThrottling Traffic at Facebook Scale
Throttling Traffic at Facebook Scale
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDO
 
Releases: What are contributors responsible for
Releases: What are contributors responsible forReleases: What are contributors responsible for
Releases: What are contributors responsible for
 
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanRIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
RIO Distribution: Reconstructing the onion - Shyamsundar Ranganathan
 
Gluster and Kubernetes
Gluster and KubernetesGluster and Kubernetes
Gluster and Kubernetes
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!
 
Gluster: a SWOT Analysis
Gluster: a SWOT Analysis Gluster: a SWOT Analysis
Gluster: a SWOT Analysis
 
GlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal MadappaGlusterD-2.0: What's Happening? - Kaushal Madappa
GlusterD-2.0: What's Happening? - Kaushal Madappa
 
Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6Scalability and Performance of CNS 3.6
Scalability and Performance of CNS 3.6
 
What Makes Us Fail
What Makes Us FailWhat Makes Us Fail
What Makes Us Fail
 
Gluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and futureGluster as Native Storage for Containers - past, present and future
Gluster as Native Storage for Containers - past, present and future
 
Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2Heketi Functionality into Glusterd2
Heketi Functionality into Glusterd2
 
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...Architecture of the High Availability Solution for Ganesha and Samba with Kal...
Architecture of the High Availability Solution for Ganesha and Samba with Kal...
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

On demand file-caching_-_gustavo_brand

  • 1. Gustavo Brand Adrien Lèbre École des Mines / Scalus EU Project Nantes – France Nov 2012 On-demand File Caching with GlusterFS
  • 2. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Agenda • This is an entry level Development talk • Different approach using Gluster components • Problem • Investigation experiments • Where Gluster fits in • Current and Future scenarios • Fops overview • Gluster observations and some issues • Examples 2
  • 3. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Problem statement • Problems: • Is it still relevant to use additional nodes to analyze data faster? • How applications accessing data through the DFSs at LAN/WAN levels are impacted? • Should locality be considered as a major concern to design a DFS? • In summary: Validate the assumption about the impact that metadata/data traffic and latency have into wide area scenarios when executing parallel applications. 3
  • 4. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Investigation through experimentation • Investigation method: • We conducted experiments with: • HDFS and Lustre • Different scenarios within LANs(-) and WANs(/): • CD-M, C-D-M • CD/M, C-D/M, CD-M/CD, C-D-M/C, C/D-M • Running 3 M/R applications (grep, writer, sort) • Fixed 8GB file size with 16, 32, 64 nodes • Scaled the file size 8-128GB with 64 nodes • Main reference measured: runtime, among others 4
  • 5. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Invest. findings • Exploring bigger file sizes with the best and worst scenarios: Sort application 5
  • 6. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Invest. findings • In most cases, accessing data through WAN leads to worse performance (as expected…) • It was better to use less nodes than trying to benefit from external WAN ones • The completion time for the local scenarios with 16 nodes is similar to the WAN ones using 2x or 4x more nodes 6
  • 7. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Current WiP - GBFS • Use implicit striping (propagation according to the access pattern) • Avoiding unsolicited traffic through local placement • Extending it group-wide (stripe into the group) • Using the groups to manage which data is stored locally (WAN, LAN, local node... you name it) • Having a persistent and volatile LRU persistent caching per group. 7
  • 8. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 What about Gluster? • Using some of its translators: • Posix + Server translators as backend • GBFS + Client translators at the “client” side. • Peer to peer way, with each node having a server(+brick) and client(s). • At GBFS, additional info is provided about other subvols at start time as volume “options” (i.e.: groups each one is related to). 8
  • 9. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 What about Gluster? • Advantages: easy access to other nodes through the subvolumes (masking RPC comms) • Easy installation over FSs/some DFSs • xattr size limits can be problematic • Translators approach enables a good functionality granularity. 9
  • 10. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Scenario - phase 1 10 • Local DFSs at each side WAN 1 2 3 4 5 6 I/O Server 01 I/O Server 02 Group A (with an underlying DFS) MDS (Metadata) All clients using /mnt/lustre-A Group B All clients using remotely lustre-A + local lustre-B I/O Server 01 I/O Server 02 MDS (Metadata) Subvol GA /mnt/lustre-A Subvol GB /mnt/lustre-B Compute Nodes with shared storageCompute Nodes Root Node
  • 11. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 1st read from B 11 User at B (FUSE) GBFS Brick @A Brick @B Subvol 1: Brick A Subvol 2: Brick B Checks extended metadata at A: Range of bytes available at my group? READ from group B Read from A Write locally at B Write metadata at A
  • 12. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 2nd read from B 12 User at B (FUSE) GBFS Brick @A Brick @B Subvol 1: Brick A Subvol 2: Brick B Checks extended metadata at A: Range of bytes available at my group? READ from group B Read at B No metadata update at A
  • 13. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Write • Using same GFID, with sparse files • Testing different algorithms/versions: • 1st: write centralized + overwrite metadata on write at A • 2nd: write locally + overwrite local metadata on write + validate local metadata on read • 3rd: write locally + invalidate metadata on write • Going from centralized to distributed. 13
  • 14. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Write example 14 User at B (FUSE) GBFS Brick @A Brick @B Overwrite ext. meta at A. Write data at A. WRITE from group B User at B (FUSE) GBFS Brick @A Brick @B WRITE from Group B Write range at B Write metadata at B Update ext. meta range pointing to B 1st simple approach Central write 3rd approach, locally Invalid on write
  • 15. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Scenario - phase 2 15 • Local storage at each node (at B) WAN 1 2 3 4 5 6 I/O Server 01 I/O Server 02 Group A (with an underlying DFS) MDS (Metadata) All clients using /mnt/lustre-A 7 8 Group B All clients using remotely /mnt/lustre-A + local storage Sub volume 1 /mnt/lustre Sub volume 2 /mnt/local8 Root Node Compute Nodes Compute Nodes with local storage
  • 16. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Bigger group shot 16 • Multiple groups WAN 1 2 3 4 5 6 I/O Server 01 I/O Server 02 Group A (with an underlying DFS) MDS (Metadata) Base cluster All clients using local lustre-A + remote storages 7 8 Group B All clients using local storages + remote groups Subvol GA /mnt/lustre-A Subvol localB8 /mnt/localB8 9 10 11 Group C Local DFS Subvol localB7 /mnt/localB7 Subvol localB6 /mnt/localB6 ... Subvol GA /mnt/lustre-A Subvol localB8 /mnt/localB8 Subvol localB7 /mnt/localB7 Subvol GC /mnt/DFS-C ... Subvol GC /mnt/DFS-C All clients using remotely lustre-A + local Bs + local DFS
  • 17. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Nodes addition on demand - concept • Having an elastic addition and capacity 17 WAN 1 2 3 6 I/O Server 01 I/O Server 02 Group A MDS (Metadata) Add local compute nodes Group B 8 9 10 Group C DFS X 4 5 7 12 13 Group D Group N DFS Y Add local storage nodes Add a remote DFS to expand resources Add a remote DFS to expand computing power And so on…
  • 18. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Gluster observations • More dev documentation would be helpful :) • “Translator 101” and “xlator_api_2” posts are very good • Main resources are Jeff Darcy blog entries, the gluster-devel list and the IRC. • Few comments all over the code (as of 3.3 at least) • Overall architecture of gluster communication protocols, client/server/bricks algorithms, etc. 18
  • 19. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Gluster observations • Complex translators like DHT and AFR details in terms of routines, schemes, comments would be helpful as well. • Ex: Algorithm about how each one keeps a unified namespace, for example (overall view). • Common develop. techniques details can save a lot of time from contributors/new developers. 19
  • 20. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Common techniques • fd_ctx and inode_ctx common usage; • general use of local / private / *_ctx per translator; • striping techniques used at current code; • stubs/sync calls (pending at xlator_api_2 :) ) • etc... 20
  • 21. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 Common techniques Example taken from gluster devel-list 21 --> WIND / UNWIND flow: read() { while(want more) { WIND _lookup() } } _lookup_cbk() { wind _open() } _read_lookup_cbk_open_cbk() { wind _read() add to priv/fd_ctx/local->iovec } _read_lookup_cbk_open_cbk_read_cbk() { wind _close() UNWIND } --> Syncop flow: read() { // pull in other content while(want more) { syncop_lookup() syncop_open() syncop_read() syncop_close() } return iovec / UNWIND }
  • 22. Gustavo Brand Building a Group Based DFS proof of concept using Gluster / Nov 2012 • Thanks Contact: gustavo.bervian_brand@inria.fr gbrand at gluster-devel IRC :) 22