The document discusses OpenStack Swift, an open source cloud storage system. It provides an overview of Swift and how it enables applications for the web and mobile through standards-based APIs and scalability. SwiftStack's CEO Joe Arnold also discusses field observations of how Swift supports infrastructure as a service through capabilities like large storage capacities, high concurrency, and multi-tenancy.
Scaling API-first – The story of a global engineering organization
Building Apps and Scaling Storage with OpenStack Swift
1. OpenStack
APAC Conference
Building Applications with OpenStack
Swift
Joe Arnold, CEO August 11, 2012
SwiftStack Inc
@joearnold
2. August 11, 2012
Compute | Storage | Networking
Global “Nova” “Swift” “Quantum”
community of cloud software developers &
users
Apache 2 Open Source License
+180 Participating Companies - 1,000’s of
developers
4
2
3. August 11, 2012
OpenStack Swift - Born in Production
2009 2010 2011
Developed in large-scale production
environments
Currently +70 developers - has doubled every 6
month
3
4. August 11, 2012
e:
pl
c e Ex
am
pa ent
k
ep
s
lo
ym
A
i a
ft
c
D
U
S
Sw
R 4
5. August 11, 2012
e:
pl
am
u
nt
dEx
m lo
e
ep
lo
y
C A
DP S
Sw
i ft
H U
5
6. August 11, 2012
e:
pl
am
ap nt
Ex
rn
e
ym
te
lo
ep A
Sw
i ft In
D
U
S
6
7. August 11, 2012
e:
pl
ud
Ex
am
ym
e
c lo
nt
ep
lo
U or
ea
if
KT
tD
K7
Sw
8. August 11, 2012
e:
pl
am
e r
nt
Ex
y e
ft
ep
la
lo
ym
A
i ft
So
D
U
S
Sw 8
9. August 11, 2012
e:
pl
am
Ex
lix
ent
lo
ay
ym
lia
ep
H ra
ft
D
st
Au 9
i
Sw
10. August 11, 2012
e:
pl
am
c e nt
Ex
a n m
e
eov
plo
y
ce
t N
D an
Sw
if
e Fr 10
11. August 11, 2012
About SwiftStack
Cloud Storage System based on OpenStack Swift
Cloud storage technical leadership
Swift Core team
Project lead
Experience
Building large-scale cloud storage at
Rackspace, Engine Yard, Internap, Korea Telecom
11
28. August 11, 2012
Swift Scales to Massive # of users
Access Node Access Node Access Node Access Node
Add Proxy Nodes A hash ring
is shared amongst each node in
the cluster.
Add Storage Nodes capacity
can be added by growing existing
availability zones, or adding new
availability zones.
28
29. August 11, 2012
Swift Scales to Massive # of users
Swift uses shared-nothing architecture
1. account data
2. object data
3. All data distributed via hash ring
Simple mechanisms proven at scale
1. Whole files on disk
2. Routed networking (Layer 3 networks)
3. Proven techniques with HTTP for transport
4. Proven techniques with rsync for replication
29
40. August 11, 2012
Watch Everything
Lightweight
Swift Process
Swift Process
Swift Process
UDP StatsD
Swift Process
Swift Process
Swift Process
StatsD
Time-Series Backend
Ganglia / Graphite
40
42. Make each look like a mini-
Make each look like a mini-
August 11, 2012
product. Box around each.
product. Box around each.
SwiftStack Plug-Ins
User Dashboard
On-disk Encryption
Active Directory/LDAP
Integration
Utilization API for Billing
Metadata Search
42
43. August 11, 2012
Field Observations
Web/Mobile Applications
Massive Number of Users
Infrastructure as a Service
43
44. Thank you!
OpenStack
APAC Conference
August 11, 2012
Joe Arnold
CEO, SwiftStack
joe@swiftstack.com
@joearnold
Notes de l'éditeur
Hi I’m Joe Arnold, CEO of SwiftStack. I’m here to talk about OpenStack Swift. I have a background in cloud infrastructure and open-source software. My background includes positions atYahoo!Engine Yard - where we built Rails3 and was an early platform-as-a-service offering on top of Amazon Web Services.I also was involved with some of the first deployments of OpenStack Swift at Internap and KT.
Swift was originally built at Rackspace. Born in production environment. Open sourced as one of the original OpenStack Projects in 2012. OpenStack Swift is in use, in production at many service providers
Rackspace CloudFiles - The team lead now works for SwiftStack
- SwiftStack has team members who were a part of this deployment
- SwiftStack has team members who also part of this deployment.
Softlayer has deployed Swift -- they also demonstrated the flexibility in swift by implementing things like metadata search.
Haylix - a cloud provider in Australia
-- eNovance has deployed Swift
About SwiftStack: We provide Cloud Storage System that, at its core, is OpenStack Swift. What SwiftStack does, is provides a complete cloud storage solution so that - our customers can easily roll-out OpenStack Swift - and have the tools they need for ongoing operational support -- so that they can manage their SwiftStack cluster. SwiftStack is the 2nd-leading contributor to Swift behind Rackspace. - We have multiple members on the core open source team... One of the two companies - We have the Project Technical Lead for Swift - We have the ability to solve the hard problems and are key influencers of the project - What this means is that we are key drivers for the project The SwiftStack team has been assembled from those who have been - involved with many Swift deployments, - including its development at Rackspace.
I’d like to do this talk simply by walking through some things that we’re seeing out in the field while working with customers. This isn’t hypothetical, though I can’t name names quite yet. What’s interesting is that in the infrastructure space, is that demand precedes infrastructure capability .Very rarely does a successful infrastructure project anticipate demand.The demand is there our customers have problems and it is up to the infrastructure to solve a problem.- Hadoop - born from Yahoo!- Swift - born at Rackspace So here are the some of the observations that I’m seeing today.And walk through how Swift is addressing them address and solving the problems we’re seeing our customers face today.
Field Observations Web/Mobile Applications- Applications are moving to mobile devices - How are applications being built? Massive Users- How to accommodate rapid growth & a large user base Infrastructure as a service- In the enterprise- Service providers
Software delivered over the internet, rather than being ran locally. Mobile devices have very much accelerated this. We’ve seen mobile adoption dramatically accelerate the need for cloud-based applications. - This is true on the consumer side - But also true in the enterprise, where employees are showing up to work with tablets or expecting to use their mobile devices to do get work done. - Our customers are building applications that can be used from any device, anywhere The center of our computing world is no longer desktops and cloud-based applications have helped make that a reality. This offers our customers users a huge benefit- User’s data is accessible from more devices- Users are not dependent on a single device- Data can be resiliently stored on behalf of the user
Compared to desktop environments, Data Storage is small on mobile devices - - and it’s shrinking on laptops! -- New laptops with SSDs actually have less storage then their predecessors! -- The exception is that this data is going to live in a cloud environment Storage is for applications is remote. Data is coupled with the applicationWhat this means is less data is being stored locally on workstations / laptops / mobile devices.More data is being stored centrally -- away from the device.
a) speaks HTTP Deliver content directly to devices. There isn’t a need for an application server to relay the request. The request can be served directly to the client or application.
b) Upload direct from devicesOften with applications that ingest a lot of data there are intermediaries to handle and process data uploadsWith Swift can handle direct uploads from clients from an application.
c) Cache with HTTPIt’s just HTTP, so all the familiar tools of caching can be applied. An HTTP cache can be put in front of the cluster to cache frequently-accessed content
d) Natively integrates with content delivery networksThe Swift API had content delivery built-inThere are plug-ins to integrate with content distribution networks
e) Clients for many languages- Integrate directly into your application
Massive Users- How to accommodate rapid growth & a large user base
enterprise apps are moving to cloud-based apps which means more users on app instance a) What I am seeing is that these software providers are servicing multiple organizations from the same platform.Now, because it is delivered as a service, many, many more users are utilizing the same instance of an application. That application has more demands on multi-tenancy, scaling in size and number of user requests. b) Compound this with the fact that more devices are able to leverage these applications. This results in more usage. Users are becoming heavier users because they can interact with the applications more frequently.
d) Long-tail data assetsOur customers are also trying to deal with the long-tail data problem.Now all those users are storing lots of data -- but you don’t know which bit of data is going to be accessed at any given time. I call that the ‘long-tail data problem’A photo, a document, an audio recording, a bit of personal application data.
So it becomes a challenge for all of us as infrastructure providers, it becomes a challenge to build and provide infrastructure as more and more users come to use these applications.From a storage perspective what this means:a) increasing demands to grow storage capacityb) increasing demands to grow the numbers of users that can be supported
So it becomes a challenge for all of us as infrastructure providers, it becomes a challenge to build and provide infrastructure as more and more users come to use these applications.From a storage perspective what this means:a) increasing demands to grow storage capacityb) increasing demands to grow the numbers of users that can be supported
Swift can scale requests via a proxy tier & object tierData is distributed throughout the clusterAdd more nodesIncreases storage capacity and request-serving capacity.For example11-U (1/4 rack)single 10Gbit NodesServe 36GbitThe cluster is not limited by a the throughput of a single device.
Consequence of this is that applications don’t need to be aware of infrastructure- When working with filesystems a common method to grow a user base is by sharding users into separate infrastructure components.However,- This pushes up complexity into the application- Code requires changes when capacity additions are made. Growing capacity needs to be coordinated with code changes.- By not using a filesystem interface and using HTTP as an interface for object assets, applications can be simpler and infrastructure can scale independantly from software.-- Simpler System-- Less Failures
Not just HTTP - It’s been running in production, at scale for several years at this point. - Swift _is_ production ready.- Your deployment of Swift is architecturally identical to public cloud storage offerings.- Which means it’s more than just an API. It’s fundamental to how the storage system works.
Field Observations Web/Mobile Applications- Applications are moving to mobile devices - How are applications being built? Massive Users- How to accommodate rapid growth & a large user base Infrastructure as a service- In the enterprise- Service providers
Two big drivers - Reduce costs -- Better utilize infrastructure that is currently in use-- More efficiently manage as a single pool-- Provide services that are based on standard hardware and open-source software. Improve business agility- Most importantly-- Infrastructure services vs. infrastructure for applications-- What we’re seeing in larger companies is that infrastructure is being decoupled from specific applications.-- Infrastructure is being provided as a service within enterprises, so that they can consolidate their efforts to provide a consistent service for all the applications they need to support Sell a service - Infrastructure is being shared amongst multiple customers- who, in-turn, have many, many users. - There is a need for multi-tenant storage services -- Internal chargeback -- external customers
The end result of all of this- Data center deployments are becoming larger to support more: Applications, Customers, Users- More customers are utilizing the same underlying infrastructure-- This means that there can be more efficient utilization of resources-- But it also means that larger-scale infrastructure deployments are required.From a storage perspective what this means:- support a multi-tenancy. With multiple customers and applications utilizing the same underlying infrastructure.
OpenStack Swift enables a consolidation of infrastructure.It is a storage project with these features:Scales to support a large number of concurrent usersLinear growth in the number of simultaneous usersDesigned with multi-tenancy from the ground-upWhich makes it ideal for building infrastructure as a service
To run a successful storage service, Operational efficiencies are important Operationalizing these services requiresDriving operational efficienciesMoving from 10’s -> 100’s -> 1000’s of nodes / administratorManagement and operations becoming driving concernsEverything changes at scaleHaving the ability to understand how a system is performing when running at scale, with customer / user data on the line.Having the tools necessary to manage and operate the cluster are crucial.
This is what we at SwiftStack are focusing on. - Beyond being core contributors to Swift- SwiftStack provides a complete Cloud Storage System SwiftStack Nodes- Provided by multiple hardware vendors-- Standard hardware-- Not locked into a hardware vendor-- Compatible with Open Compute- Runs SwiftStack Node Software-- Complete runtime stack for OpenStack Swift--- We have plug-ins for various Authentication systems like LDAP and Active Directory--- We have instrumented the full runtime-stack with metrics collection agents for monitoring--- We include a stable version of OpenStack Swift that has been integrated with the runtime stack. --- We provide forward patches.
SwiftStack Controller- Deployment management and configuration-- Configures and deploys new nodes- Capacity orchestration-- The controller identifies the capacity of the hardware and will integrate the capacity into the cluster- Operates-- Tunes the cluster for the workload and hardware- Monitors-- Monitors everything in the cluster so you can pinpoint what is going on. ...everything required to run a production cluster
Two ironclad rules of storage: data accumulates and size of new disks increases. Because it’s more economical to grow as needed. Managing the clusters growth is a big part of the operational tasks When capacity is added to Swift the data gets redistributed throughout the cluster so each node is ‘even’. If you add 50-100% capacity of a cluster. It will grind to a halt through the rebalancing activity. So what we like to do is add capacity gradually. By controlling the rate we ensure that data remains available. But here automation is key. So what we have done is just automated that process. So that capacity can be added (or removed) gradually over time. Not in big lumps.
First we watch everything.Tracking server-level metrics like CPU utilization, load, memory consumption, disk usage and utilization, etc. is necessary, but not sufficient.We just added instrumentation that was agreed to be merged this summit to provide for deeply instrumentation of all the inner workings of a Swift cluster.The cost of the instrumentation is extremely low: a ‘sendto’ of one UDP packet. No-op if it’s turned off. If that overhead is too high the StatsD client library can send a random samplingThis is available in the current version of Swift for everyone to use.It will provide incredibly rich and precise information on what’s happening in the cluster.How we implement this in practice is by running a StatsD server on each node.We embed it in collectd.And that funnels the data back to a Graphite Cluster that we use as a time-series data store for the metrics.
What SwiftStack does with this data is two things: First, because we are watching everything, we can provide deep visibility into the cluster. - It’s actually really cool, because you can zoom into a point in time and apply that view across all the metrics that are collected. And because we are watching everything, we can provide alerts as well. - It’s not just enough to provide the alerts, those alerts need to be actionable as well.- We have an alert workflow where alerts are acknowledged and archived.
SwiftStack also provides plug-ins for OpenStack Swift to provide - A user dashboard - On-disk encryption - Active Directory and LDAP Integration - Utilization API to provide billing integration - Metadata search index so that content in the storage cluster can be searched.
When you choose what storage system you want to use in production that is going to be used by my multiple applications -- the decision becomes bigger -- it’s a longer-term decision.OpenStack Swift is an open system. It’s the open choiceNo lock-in from any hardware or software vendorYou can swap out your Swift provider and not need to change your applicationsYou can choose a new hardware provider and mix and match a single cluster from multiple hardware vendors -- even with different capacity.