Introduction to Spectrum Scale Active File Management (AFM)
and its use cases. Spectrum Scale Protocols - Unified File & Object Access (UFO) Feature Details
AFM + Object : Unique Wan Caching for Object Store
2. #ibmedge
Agenda
• Introduction to Spectrum Scale Active File Manager (AFM)
• AFM Use Cases
• Spectrum Scale Protocol
• Unified File & Object Access (UFO) Feature Details
• AFM + Object : Unique Wan Caching for Object Store
• Deep Dive on Single Site & Multi-site Caching
• Configuration Commands with Demo
• Q & A
1
5. #ibmedge
AFM Overview
• Active file management (AFM) uses a home-and-cache model in which a single
home provides the primary storage of data, and exported data is cached in a
local GPFS™ file system
• AFM is primarily suited for remote caching
• Users access files from the cache system
• For read requests, when the file is not yet cached, AFM retrieves the file from the home site
• For write requests, writes are allowed on the cache system and can be pushed back to the
home system, depending on the cache types
4
6. #ibmedge
AFM Caching Overview
5
Spectrum Scale
Storage Array
Storage
node
Storage
node
Home Cluster
Spectrum Scale
Storage Array
Storage
node
Storage
node
Cache Cluster
Nodes are made
NFS servers
Few nodes are
made gateway
nodes
Cache filesets
are associated to
NFS export at
home.
7. #ibmedge
Global Sharing with Spectrum Scale AFM
• Expands the GPFS global namespace across geographical distances
– Caches local ‘copies’ of data distributed to one or more GPFS clusters
– Low latency ‘local’ read and write performance
– Automated namespace management
– As data is written or modified at one location, all other locations see that same data
• Efficient data transfers over wide area network (WAN)
- Works with unreliable, high latency connections
• Speeds data access to collaborators and resources
around the world
6
GPFS
GPFS
GPFS
8. #ibmedge
AFM Caching Basics
• Sites – two sides for a cache relationship
• A single home cluster
– Presents a fileset that can be cached (export with NFS)
– Can be non-GPFS cluster/nodes
• One or more cache clusters
– Associates a local fileset with the home export
• AFM Fileset
• Independent fileset with per-inode in xattrs
• Data is fetched into the fileset on access (or prefetched on command)
• Data written to the fileset is copied back to home
• Gateway Node (designation)
• Maintains an in-memory queue of pending operations
• Moves data between the cache and home clusters
• Monitors connectivity to home, switches to disconnected mode on outage, triggers recovery on failure
7
9. #ibmedge
Spectrum Scale AFM Use Cases
8
Global Namespace
• Provides common
name space across
globally distributed
cloud
• Persistent scalable
cache for remote File
System
Content distribution
• Central site is
where data is
created,
maintained
• Branch/edge sites
can periodically
pre-fetch or pull on
demand
Content
Consolidation
Disaster Recovery
• Replication of data
across WAN with
consistency points
• Failover and
Failback support
• Branch offices
work on local active
data
• Master repository
maintained centrally
• Adv functions –
backup etc on central
site
11. #ibmedge
Enhanced Protocol Support from 4.1.1 release
The Challenge: How can I share my storage infrastructure across all of my legacy and new
generation applications?
The Solution:
• The new IBM Spectrum Scale Protocol Node allows access to data stored in a Spectrum
Scale filesystem, using additional access methods and protocols.
• The Protocol Node functions are clustered and can support transparent failover for NFS
and SWIFT protocols as well as SMB protocols.
• Multiprotocol data access from other systems using the following protocols
• NFS v3 and v4
• SMB 2 and SMB 3.0 mandatory features / CIFS for Windows support.
• OpenStack Swift and S3 API support for object storage.
10
12. #ibmedge
Adding Protocol Support
11
Administrator
Command Line Interface
Users
NFS
SMB/CIFS
POSIX
Open Stack Swift
PN1
Protocol
Node
Flash
Disk
Tape
ExternalTCP/IPorIBNetwork
PN2
PNn
…
NSD1
Network
Shared Disks
NSD2
NSDn
…
Physical Storage
IBMSpectrumScaleClusterTCP/IPorIBNetwork
Mgmt Nodes
Authentication
Services
keystone
Open Stack Cinder
SpectrumScaleClusterNodes
Elastic
Storage
Server
13. #ibmedge
IBM Spectrum Scale Benefits
12
Better performance Eliminate hotspots with massively parallel access to files
Sequential I/O with ES greater than 400 GB/s
Throughput advantage for parallel streaming workloads, e.g. Tech Computing
and Analytics
More Storage. More Files. Hyper Scale.
Simplified Management Easier management with one global namespace instead of managing islands of
NAS arrays, e.g. no need to copy data between compute clusters
Integrated policy driven automation
Fewer storage administrators required
Lower Cost Optimizes storage tiers including flash, disk and tape
Increased efficiency and more efficient provisioning due to parallelization and
striping technology
Remove duplicate copies of data, e.g. run analytics on one copy of data without
having to set up a separate silo
14. #ibmedge
IBM Spectrum Scale – Protocol Integration
• Software Offering - protocol support is added to GPFS
• Can be configured on existing GPFS clusters or new cluster
• Support for Intel and Power Systems
• RHEL 7/7.1
– Protocol node requirement
– Remaining GPFS nodes can have any supported environment/platform
• Use of installation”) also limited to RHEL 7/7.1
• Add support for the following protocols
• SMB
• NFS
• Object (HTTP Rest)
• Some cluster nodes are designated as “Protocol Nodes” (aka. CES nodes)
• Integrated management of the protocol services
• Active-Active clustering
• High availability through IP fail-over
13
16. #ibmedge
Protocol Support Considerations
• Adding Protocol Nodes to GPFS Cluster:
• All RHEL7/xServers or All RHEL7/pServers
• Not NSD Servers
• Protocol Export IPs distributed among the protocol nodes
– Different policies for balancing and failback
• Management: GUI and CLI
• Deployment: Easy Automated Deployment
• Flexibility: customer choice of nodes/disks/storage options
• Scale: limits for capacity/performance based on GPFS;
• CES nodes limits based on protocols enabled
• 16 nodes, 3000 connections/node and 20K connections/cluster for SMB
• 32 nodes for only NFS or only Object or NFS+Object
• Security: root access for cluster management but have sudo access support
• Roll your own or combine with Lab Services to meet expectations
15
18. #ibmedge
Spectrum Scale Object Storage
• Basic support added in 4.1.1 release & enhanced in 4.2 and 4.2.1 release
• Based on Openstack Swift (Juno Release)
• REST-based data access
• Growing number of clients due to extremely simple protocol
• Applications can easily save & access data from anywhere using HTTP
• Simple set of atomic operations:
– PUT (upload)
– POST (update metadata)
– GET (download)
– DELETE
• Amazon S3 Protocol support
• High Availability with CES Integration
• Simple and Automated Installation Process
• Integrated authentication (Keystone) support
• Native GPFS Command Line Interface to manage Object service (mmobj command)
17
19. #ibmedge
Spectrum Scale Object Storage – Additional Features
• Unified file and object support with Hadoop connectors
• Support for Encryption
• Support for Compression
• Only Object Store with Tape support for Backup
• Object store with integrated transparent cloud tiering Support
• Multi Region support
• AD/LDAP support for authentication
• ILM support for Object
• Movement of Object across storage tiers based on access heat
• Spectrum Scale Object with IBM DeepFlash becomes object store over all flash array for newer faster workloads.
• Spectrum Scale Object with WAN caching support (AFM)
18
21. #ibmedge
Unified File and Object (UFO Support)
Spectrum Scale: Redefining Unified Storage
• Challenge
The world is not converged/file/object/HDFS today!
and never will be completely…
• Unified Scale-out Content Repository
• File or object in. Object or file out.
• Integrated big data analytics support
• Native protocol support
• High-performance that scales
• Single Management Plane
20
Spectrum Scale
NFS SMBPOSIX
SSD Fast
Disk
Slow
Disk
Tape
Swift/S3HDFS
22. #ibmedge
What is Unified File and Object Access?
• Accessing object using file interfaces (SMB/NFS/POSIX)
and accessing file using object interfaces (REST) helps
legacy applications designed for file to seamlessly start
integrating into the object world.
• It allows object data to be accessed using applications
designed to process files. It allows file data to be published
as objects.
• Multi protocol access for file and object in the same
namespace (with common User ID management
capability) allows supporting and hosting data oceans of
different types of data with multiple access options.
• Optimizes various use cases and solution architectures
resulting in better efficiency as well as cost savings.
21
<Clustered file system>
Swift (With Swift on File)
NFS/SMB/POSIXObject(http)
2
1
<Container>
Data ingested
as Objects
3
Data ingested
as Files4
Files accessed as
Objects
24. #ibmedge
The Need: Thin-Thick storage capacity site deployments
for Object Data
23
Applications
Applications
Applications
…
Limited storage
Limited storage
Limited storage
Unlimited storageCentral Site
Site 3
Site 2
Site 1
Object Data
Object Data
Object Data
Centralized Analytics
Centralized Backup
• Geo Dispersed multiple sites with limited storage capacity
• Independent Applications running on each sites accessing/generating object data.
• Centralized Home for consolidated object data – ability to grow storage capacity.
• centralized backup for all sites via central location
• ability to run analytics for all sites in central location
25. #ibmedge
Usecase Requirements
• There is an object store site that is closer to the end application but has a
limited storage capacity.
• To cater to large storage capacity requirement there is another object store setup
at a geographically remote site which has unlimited or expandable storage
capacity, that acts as a central archive.
• The relationship between these two object stores need to be setup in such a way that
allows applications to access all object data from the site closer to them for faster
access, even though it has limited storage capacity.
• The central site should have ability to do in place analytics of data.
• The central site should have ability to do backup of the data.
• If cache goes down the application should be able to failover to the central site.
24
26. #ibmedge
The Solution: Unique WAN caching for Object Store -
available only with Spectrum Scale
25
…
Unlimited storage
Central Site Centralized Analytics
Centralized Backup
Applications Limited storage
Site 1
Object Data
Spectrum Scale
Cluster with
Protocol Nodes
(Object Enabled)
Spectrum Scale
Cluster with
Protocol Nodes
(Object Enabled)
Spectrum Scale
AFM (IW) Relationship with
cache eviction enabled on Site 1
Object Data can be
accessed as Files using
Unified file and Object
Feature and used for
analytics
Data can be centrally
backed to TapeSpectrum Scale Feature Requirements Addressed
AFM with Spectrum Scale Object - Allows objects store to have thin cache with eviction enabled and
thick home.
AFM in IW Modes Allows for fail-back and fail-over from cache site to Home useful
during disaster.
Unified File and Object Access with HDFS connector Allows centralized and in-place analytics of data at Home site
Tape Integration Centralized backup
27. #ibmedge
Thin Object Store Cache – Thick Object Store Archive
26
Spectrum Scale
Home#1
Spectrum Scale
Cache#1Service
1
Serives
XXX
Site #1
Fileset
Object
access
Object
Ingest
Fileset
11TB/d
ay
AFM
Independent-Writer
Swift API Swift API
Failover/Failback
Existing Services Cache in Region 1 Archive in Region 2
Replicate
XXTB of data
everyday
• Cache Site in Region 1 with limited storage and Home site in Region with max storage per data center
• Object data to be archived from cache site in Region 1 to home site in Region 2 using AFM –IW
• On cache failure, application will fail over home site for object access. Application will fail-back when
cache comes up.
• Limited storage on cache site addressed by using Eviction along with AFM
• Key Features used in Solution: Spectrum Scale Object , AFM (IW) with Eviction
• Available and documented in 4.2.1
28. #ibmedge
Spectrum Scale
Cluster for Region 1
Home
Cluster for
Region 1
Service
s
Service
s
Region #1
Spectrum Scale
Cluster for Region 1
Service
s
Service
s
Region #2
SwiftAPI
Objects
Objects
Existing Services Cache Home in Region 3
Home
Cluster for
Region 2
Swift API Swift API
Failover/Failback
Swift API Swift APIFailover/Failback
One can include multiple sites where each site has its own home cluster at the central region and
replicate the setup shown in previous slide for single site.
Multiple site Deployment
29. #ibmedge
Configuration Steps
• Details Configuration Step Available in 4.2.1 in Knowledge Center
Using AFM with Spectrum Scale Object
• http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.1/com.ibm.s
pectrum.scale.v4r21.doc/bl1ins_usingafmwithobject.htm
28
30. #ibmedge
Conclusion
• Spectrum Scale provides rich set of features like
• AFM
• Protocols with POSIX, SMB,NFS and Object
• Unified File and Object Access
• In Place analytics using build-in Hadoop connectors
• Integrating AFM with spectrum scale object delivers unique solution
required for many multi-site deployments wherein:
• One can have thin cache object store with auto eviction facility closer to
the applications or users
• Centralized thick home object store which can act as failback object store
for the thin cache sites.
• Ability to do in-place analytics of all the data on the home site
• Ability to do a central backup at the home site.
29
31. #ibmedge
Spectrum Scale User Group
• The Spectrum Scale User Group is free
to join and open to all using, interested
in using or integrating Spectrum Scale.
• Join the User Group activities to meet
your peers and get access to experts
from partners and IBM.
• Driven and owned by Customers
• Next meetings:
- APAC: October 14, Melbourne
- Global at SC16 : November 13 1pm to 5pm, Salt Lake City
• Web page: http://www.spectrumscale.org/
• Presentations: http://www.spectrumscale.org/presentations/
• Mailing list: http://www.spectrumscale.org/join/
• Contact: http://www.spectrumscale.org/committee/
• Meet Bob Oesterlin (US Co-Principal) at Edge2016: Robert.Oesterlin@nuance.com
32. #ibmedge
Session : How to apply Flash benefits to big data
analytics and unstructured data
NDA & Customers ONLY
• Who: IBM Elastic Storage Server Offering Management
• Alex Chen
• When: Thursday, September 22, 2016
• 1:15pm to 2:15pm
• Where: Grand Garden Arena, Lower Level, MGM, Studio 10
• Contact(if any questions)
• • cmukhya@us.ibm.com, douglasof@us.ibm.co
31
33. #ibmedge
Spectrum Scale Trial VM
• Download the IBM Spectrum Scale Trial VM from :
• http://www-03.ibm.com/systems/storage/spectrum/scale/trial.html
32
34. #ibmedge
References
Spectrum Scale 4.2.1 Knowledge Center: Using AFM with object
http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1ins_usingafm
withobject.htm
Spectrum Scale Object Store – Unified File and Object
http://www.slideshare.net/SandeepPatil154/spectrum-scaleexternalunifiedfile-object
From Archive to Insight: Debunking Myths of Analytics on Object Stores – Dean Hildebrand, Bill Owen,
Simon Lorenz, Luis Pabon, Rui Zhang. Vancouver Summit, Spring 2015.
https://www.youtube.com/watch?v=brhEUptD3JQ
Deploying Swift on a File System – Bill Owen, Thiago Da Silva. BrownBag at OpenStack Paris, Fall 2014
https://www.youtube.com/watch?v=vPn2uZF4yWo
Breaking the Mold with OpenStack Swift and GlusterFS – Jon Dickinson, Luis Pabo. Atlanta Summit, Spring 2014
https://www.youtube.com/watch?v=pSWdzjA8WuA
SNIA SDC 2015
http://www.snia.org/sites/default/files/SDC15_presentations/security/DeanHildebrand_Sasi__OpenStack%20Swift
OnFile.pdf
36. #ibmedge
Notices and Disclaimers Con’t.
35
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not
tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the
ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual
property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,
FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG,
Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business
Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
37. #ibmedge
IBM Spectrum Scale Summary
36
• Avoid vendor lock-in with true Software Avoid vendor
lock-in with true Software Defined Storage and Open
Standards
• Seamless performance & capacity scaling
• Automate data management at scale
• Enable global collaboration
Data management at scale OpenStack and Spectrum Scale helps
clients manage data at scale
Business: I need virtually
unlimited storage
Operations: I need a flexible
infrastructure that supports
both object and file based
storage
Operations: I need to
minimize the time it takes to
perform common storage
management tasks
A single data plane
that supports
Cinder, Glance,
Swift, Manila as well
as NFS, SMB, et. al.
A fully automated
policy based data
placement and
migration tool
An open & scalable
cloud platform
Sharing with a
variety of WAN
caching modes
Results
• Converge File and Object based storage under one roof
• Employ enterprise features to protect data, e.g.
Snapshots, Backup, and Disaster Recovery
• Support native file, block and object sharing to data
Spectrum Scale
NFS
SMBPOSIX
SSD Fast
Disk
Slow
Disk
Tape
Swift
HDFS
Cinder
Glance Manila
36
Collaboration: I need to
share data between people,
departments and sites with
low latency.
Data management at scale