Lecture for the San Jose State masters program on cloud computing. Topic focuses on using OpenStack to deploy infrastructure clouds with commodity hardware and open source software. Covers virtualization, networking, storage, deployment and operations.
3. Introduction
• Aruba AirWave network management
• Yahoo! - Bangalore, India
• Engine Yard
• Cloudscaling
Monday, April 11, 2011
-- Engineering Manager at a startup that was acquired by Aruba Networks. Network
management. 10,000's of devices under management
-- Bangalore, India office for Yahoo!
-- Head of engineering at Engine Yard. Ruby on Rails deployment platform
--- Early cloud player. Built own infrastructure cloud.
--- Early serious Amazon AWS user.
---- Invested in us, but they were tight lipped about their roadmap.
--- Managed the team that built the current Ruby on Rails
- Current: Helped found Cloudscaling
-- Founded a year ago
-- Over the summer built Korea Telcom's infrastructure cloud
-- Un-named customer: Huge object storage system. Largest of it's kind outside Rackspace
(we'll get into that later)
-- Building infrastructure clouds. Focus on utility computing. Help our customers drive costs
down in Facility, Hardware, Software, Operations.
- Exercise: What's your experience in the cloud?
-- Used the web? :)
-- Who has IT experience? Ran a webserver, mailserver, configured a network of computers?
-- Who has programming experience?
-- Who has provisioned a machine in the cloud? Amazon, Rackspace, etc.?
4. Building cloud infrastructure for
telcos and service providers
Monday, April 11, 2011
What we do help telcos and service providers build the most competitive clouds in the world;
We build large scale, highly competitive Public Clouds for Global Telcos and service providers
5. Taxonomy of Cloud Computing
SaaS - Software-as-a-Service
PaaS - Platform-as-a-Service
IaaS - Infrastructure-as-a-Service
Monday, April 11, 2011
There is a stack of services which all get referred to as 'Cloud'. What is the taxonomy that will
frame the discussion?
- SaaS: Software-as-a-Service
-- Examples: Salesforce, Gmail
- PaaS: Platform-as-a-Service
-- Examples: AWS Beanstalk, Engine Yard, Twilio
- IaaS: Infrastructure-as-a-Service
-- Examples: Amazon Web Services, GoGrid, Rackspace Cloud
- Focus of discussion will be on IaaS
- Grey lines between them
6. Virtualization -> Infrastructure Cloud
• Server consolidation drives virtualization
• Rise of VPS & Enterprise DC Management
• Amazon launches S3, EC2
• Tooling emerges around ecosystem
Monday, April 11, 2011
- Bill Gates “Blue screen of death” demo at Comdex
- VMWare demoed a ‘Blue Screen’. Great for Test/Dev. Solid little market.
- Message didn’t click until the server consolidation market was “discovered”
- S3/EC2 Launches ~ 2006
- EC2 out of Beta in late 2008
7. Infrastructure as a Service (IaaS)
• Virtual Machine Instances
• Network Services
• Data Storage
• On Demand
• Via an API
• Pay-per-use
• Multi-tenant
Monday, April 11, 2011
What is considered infrastructure cloud computing?
- Not just virtualization of compute resources
- Must provide services:
-- Virtual Machine Instances
-- Network Services
-- Data Storage
-- On Demand
-- Via an API
-- Pay-per-use
-- Multi-tenant Platform
8. Public Cloud vs. Enterprise IT
• Major disruption in traditional IT
• IT is aggressively being automated
• Applications are not migrating to the
cloud
Monday, April 11, 2011
Public Cloud vs. Enterprise IT (or Enterprise Cloud)
Public infrastructure clouds blazed the trail. Internal IT are looking for same cost advantages.
- Major disruption in traditional IT
-- I used 'vs.' rather than '&', as infrastructure cloud is a major disruption in traditional IT
-- hundreds : 1 admin vs thousands : 1 admin
-- System administrator may run, say 100 systems in traditional IT / Infrastructure cloud
provider will run thousands of systems per administrator/developer.
- IT is aggressively being automated, infrastructure cloud computing is the next evolution
- The evolution is hitting an inflection point where some companies are opting-out of
traditional system administrators altogether
- Example - Running computer-science lab + automation.
- Example - Shift in Engine Yard's operating model. 80% systems : 20% software => 30%
systems : 70% software
- Generalization: Applications are not migrating to the cloud because migration is more
expensive than operational savings
-- Greenfield applications being deployed to the cloud
--- Dramatic improvement in configuration automation / deployment tools
-- Legacy applications
--- Staying put
--- Being replaced by software-as-a-service (which are greenfield applications being
deployed in the cloud)
---- How many new Microsoft Exchange installs are happening these days?
9. Cost basis of Infrastructure Clouds
Exercise: What is the cost basis of infrastructure cloud
hardware?
Building a business case:
• Need margin for: facilities (space, power,
cooling), technical operations
• You can sell a 10 GB RAM virtual machine for
50-cents an hour ($360/month)
• You can buy 40 GB RAM machines
• The practical life of the machines is 5 years
How much can you spend on hardware/software to
build the infrastructure to give a reasonable margin
70% to the rest of the business?
Monday, April 11, 2011
-- $360/mo * 4 /machine * 60 months of useful life * 30% target expense = $25,920
-- That's a target that needs to be hit for the machine and it's tenancy costs in the system
(rack, network ports).
-- Software Licensing Costs: Hypervisor, infrastructure management, monitoring, operational
support systems
- AWS's cost basis
- To enter the game, cost basis must be on par with existing infrastructure clouds
-- Market price + your value add (control, security, network, physical location, unique
features, etc.)
- Private / Enterprise IT is competing with infrastructure cloud computing
10. Applications in the Cloud
Not same services as traditional DC / Hosting
• Networking different
• Storage different
• Design for Failure
• Reliance on Configuration Management
Monday, April 11, 2011
What are the architectural shifts needed to run applications in the cloud?
- Infrastructure clouds are not just the same compute resources packaged up via an API
Networking is radically different
Storage is different
-- Reddit’s recent outage was due to them using EBS (Amazon’s block storage) incorrectly.
Designed for Failure
-- Not high-availability
- Increased Importance of Monitoring
- Increased Importance of Configuration Management
11. Compute Virtualization Management
• VM Management
• Networking Management
• Storage Management
• User Management
• Multi-tenancy
• Scheduling
• Available via an API
• Machine Image Management
Monday, April 11, 2011
What are key features for for cloud building?
12. OpenStack Introduction
• OpenStack is open source Infrastructure Cloud
Software
• NASA: Compute Management
• Rackspace: Object Storage
Monday, April 11, 2011
OpenStack is a fusion of two projects.
One from Rackspace and another from NASA.
Rackspace wanted to build the next version of their cloud in the open.
13. Large Community
53 Companies: http://openstack.org/community/companies/
Monday, April 11, 2011
I do represent a company who is part of the openstack community. One of many companies
who are part of the OpenStack community.
14. Make a Bigger Pie
Monday, April 11, 2011
Symbiotic Ecosystem The whole product/experience gets better as more people / companies
get involved.
These companies want to create a large ecosystem that they are a part of.
Why would Rackspace want to do this? After we did our launch of OpenStack Object Storage,
there was a team of folks at Rackspace who had to go cube-to-cube to explain why it’s good
for Rackspace.
It’s a common growth pattern -- when you’re behind, open up.
Apple iPhone (at least initially), Facebook did this.
Openstack has ecosystem potential.
There can be a lot of players who are all building stuff around this project.
So the next question then is how does openstack foster an ecosystem that you can be a part
of?
15. OpenStack is Open Source
• Apache 2 - permissive license
• Organizational Structure
• Blueprints
• Governance Board
• Very Open Culture
Monday, April 11, 2011
You guys know the two broad categories of open-source licenses right?
- permissive: Apache 2, BSD,
- viral: GPL
16. OpenStack is very healthy for our industry
• Amazon’s presence is too dominant
• While Amazon innovative itself...
• ...they have created a proprietary ecosystem
• An multi-company ecosystem will foster broader
innovation and relevant competition
Monday, April 11, 2011
Follow me for a minute and I’m going to use Amazon AWS as my big bad wolf, because
they’re the reigning leader in this race right now.
Amazon has created a defacto and proprietary ecosystem. They were first and are very, very
good.
Amazon created this ecosystem with a positive feedback loop. The more PaaS are on
Amazon, the better their ecosystem becomes.
However, if Amazon decides to add a new feature to their platform, they could crush you.
It’s like if Apple comes out with their own version of Angry Birds. (Angry iBirds?)
This is a precedent for this -- Cloudera (hadoop) vs. AWS Map-Reduce.
This Amazon positive feedback loop is great if you’re Amazon.
But not everyone can use AWS.
OpenStack is the project that offers the promise of a more open ecosystem.
17. Ecosystem
OpenStack
Providers
Tools
Monday, April 11, 2011
For innovation in cloud computing to continue, we must have many cloud computing
infrastructure players to support a diverse ecosystem.
Ecosystem of tools is emerging, but we’re very early in this cycle for OpenStack.
There is opportunity for a positive feedback loop to happen.
- The better the tools, the more compelling it will be for more implementations of Swift to
come online.
- The more implementations of OpenStack, the more attractive it becomes to build great
tooling around it.
18. OpenStack part of the solution
Ecosystem
Billing Portal
Authentication
Installer Front-End
Network Ops
Hardware
Data Center
Monday, April 11, 2011
OpenStack alone doesn’t equal a whole service.
- The OpenStack code provides the kernel of functionality
- But there is much software to write, systems to integrate, hardware choices and design
decisions to make.
The components are
- Data Center Concerns
- Networking
- Hardware Choices
- Access Layer
- Installing
19. OpenStack Compute
• OpenStack Compute is the Virtualization
component of OpenStack
Monday, April 11, 2011
End Rant
A bit of a corporate love triangle is going on here...
- Nasa team... mostly all subcontractors of a company called anso labs
- Started as a quick hack experiment.
- NASA had a need to dynamically provision compute. Tried to use eucalyptus, but it wasn't
working out for them at the time. Challenged themselves to work for a week to see what they
could pice together on their own. After that week, they were quite pleased with their progress,
that they dumped eucalyptus to build one themselves.
20. OpenStack Compute Early & Evolving
• Initially designed for NASA compute needs
• Evolving to handle needs of service
providers / enterprise
Monday, April 11, 2011
Many things are what I would call ‘early’
- Account Management reflects the what NASA needed. (Much work for service provider to
integrate.)
-- IPs map to projects
- API has some elements missing
- Scheduling is simplistic (round-robin)
- Multi-tenancy security work to be done
- In general there is a lot of churn in the code base
21. OpenStack Compute API
• AWS EC2 API
• Emerging Standard
Monday, April 11, 2011
- Currently EC2 interfaces
- Will standardize on Rackspace API?
- unsure on how multiple, concurrent APIs will be supported
Compute API Standards:
- Standards emerge.
- You don't need to be declarative about it.
- Post-facto. Many application developers use libraries anyway. (fog, jclouds, libcloud, etc.)
- What’s important here is that you will be able to stand-up a local development environment
with the same code that will be powering your infrastructure provider.
- AWS EC2 API is an emerging standard
22. OpenStack Compute: Commodity Hardware
Monday, April 11, 2011
- Image: Keith in the KT Data Center
- There are other drop-in solutions for infrastructure clouds from Cisco/EMC/VMWare. This is
not one of them.
- It's designed around off-the-shelf, lower-price point components
- We're using Arista for our networking layer.
- Nasa uses off-the-shelf AoE hardware for storage.
- I imagine that we'll use Nexenta for iSCSI blocks (when that's available)
- Compute nodes direct from a Taiwanese manufacturer -- not Dell/IBM/HP
24. OpenStack Compute Hypervisor Support
• KVM
• Xen
• Citrix XenServer
• ~ Hyper V
• ~ Virtualbox
Monday, April 11, 2011
25. Networking
• Networking is different
• Challenges
• Fewer options
• Private networks are slow
• Benefits
• Reconfigurable with an API
• Good at flat, large networks
Monday, April 11, 2011
- Challanges:
-- Generally more latent than specially-designed networking configurations
-- Fewer options in how the network is configured
-- No hardware options (load blanancers, security devices, firewalls, spam filters, deep packet
inspection tools, etc.)
-- Simulated private networks (VLANS, VPNs) are generally slow
- What they're good at:
-- Easily reconfigurable via an API with simple tools.
-- Good at very large, flat, layer-3 networks
-- This is because layer-3 IP networks scale really well.
-- Routing technology is mature. ECMP/OSPF work well.
- Implementers perspective
-- When trying to scale to hundereds of switches, thousands of physical servers and tens-of-
thousands of virtual servers providing a simipler network is much easier to grow and manage.
Only layer 3 networks are designed for this scale.
-- However, customers don't always want that type of networking. They'd like to take
advantage of multicast, choose their own IPs, setup networking tiers, etc. This is especially
true of older, legacy applications which had more initial flexibility during initial network
design and build-out.
26. TCP / IP
• Ethernet ‘switches’
• IP ‘routes’
Monday, April 11, 2011
- Refresh of OSI & TCP/IP network stack
-- diagram
-- Eithernet 'switches'
--- Layer 2 is simpler. Each resource has an ethernet address ('MAC' address) and the switch
forwards appropriate packets to that physical device.
--- It's simple and fast, but it doesn't scale. When there are many devices it means that each
switch needs to know where everything is. Adding on virtual machines in the mix only
componds the problem.
--- VLANs are a tool to work around this issue, by creating 'virtual' ethernet networks. But it
too has scalability issues. (Same goes with spaning tree protocol (STP) and it's varients.)
-- IP 'routes'
--- Layer 3 adds an abstraction layer on top of ethernet.
--- IP address provides instructions on how to route a packet to the correct network.
--- You often hear, "what's your IP address?" or "What's the IP address of that machine?" But
it's wrong to think of IP in that way. What an IP address is, is a _route_ to a particular network.
--- Thought in this way, an IP network can be built to an arbritary size. And we have a
working example! "The Internet!"
--- The disadvantage of IP is mobility. With eithernet, switches route packets to the right
machines whereever they are on the network. With IP if a machine needs to move, its route
has changed, therefore packets will not reach it and it's IP needs to change.
27. Middle Ground Emerging
• Virtualizing Layer 2 "Ethernet" networks as a
middle-ground
• L2 over L3
• OpenFlow
Monday, April 11, 2011
- Virtualizing Layer 2 "Ethernet" networks as a middle-ground
-- Currently Layer 2 networks on multi-tennent clouds are generally latent. Implemented by
routing through a dedicated customer VM which leads to bottlenecks.
-- More sophisticated networking tools comming soon as cloud providers ramp up their
game / new implementations (OpenFlow, L2 over L3).
-- Noteabily:
--- L2 over L3: layer 2 packets will be encapsulated with routable IP. Implemeneted on a ToR
switch (optimlly) or a host, to allow arbritrary network topologies.
--- OpenFlow: Think 'dynamic eithernet switches' that are managed by a controller to actively
manage forwarding paths
28. Case Study: Korea Telecom Networking
• Provide customer VLANs
• Schedule customers to compute racks
• Router VM per customer environment
• 10 GbE throughout
Monday, April 11, 2011
- Case Study: Korea Telecom Networking
-- Provide customer VLANs
-- Schedule customers to compute racks
-- Router VM per customer environment
-- 10 GbE throughout
- Summary:
-- Hard for existing applications to map directly
-- Dramatic improvements around the corner to overcome limits of L2/L3 networking
29. OpenStack Compute Networking
Flat Network
Monday, April 11, 2011
- two options
-- "flat" L3
--- w/ DHCP (w/ physical vlan limitations)
- Only configures VMs... not underlying networking infrastructure
30. OpenStack Compute Networking
Flat Network
Monday, April 11, 2011
- L2 w/ VLANS
-- Must IP chosen before VM is scheduled
-- Injects into guest
31. Cloud Storage
• VM-Backing Storage
• Block Storage
• Object Storage
Monday, April 11, 2011
- Image: Hitachi 2TB desktop drive we use in our Object Storage Cluster
32. Cloud Storage: VM Backing
• 'laying-down' OS Image
• A place for the running VM
• Local Disk vs SAN
• Ephemeral vs persistent
Monday, April 11, 2011
VM-Backing Storage
- The main features that are being provided here are:
-- Copying over a master/gold operating-system image and 'laying-down' (inserting
credentials, licenses, formatting partition, etc.) the image onto disk readying it for the
hypervisor to boot
-- A place for the running VM
Local Disk vs. SAN
-- Local disk: pro - cheap, fast. con - hard to manage
-- SAN: con - expensive, network IO-bound. pro - improved manageability, VM migration
possible
Ephemeral vs persistent
-- Ephemeral: Simpler operational model. Tooling & configuration management helps.
Opinionated about re-deployable infrastructure
-- Persistent: More management. Appealing for traditional admins.
33. Block Storage
• 'Mountable', persistent volumes
• Can be provisioned via an API
• Features such as snapshotting
• Frequently used to run databases or other
• Implemented with a combination of SAN +
object storage system
Monday, April 11, 2011
34. Open Stack Object Storage
API
Data Storage
Monday, April 11, 2011
Just to baseline. Swift is the project name for the OpenStack Object Storage.
It’s a storage service that is accessed via an API.
Via the api you can create containers and PUT objects (data) into them.
***That’s about it.
It’s not a blocks.
It’s not a filesystem.
Needs an ecosystem
35. Cloud Storage History
s3
’06 ’07 ’08 ’09 ’10
Monday, April 11, 2011
This whole thing got started in 2006 when Amazon launched S3, Simple Storage Service.
And if everyone can re-wind in their heads back to 2006 when S3 came out.
It was a strange animal. It made sense, but it was kind-of a novelty.
- No SLA
- Paranoid about “outsourcing data”
But we got used to it. It started with backup tools.
When new applications were developed, application developers became really keen on using
S3
- Didn’t need to go out and buy a storage array
-- And no upfront cost
-- They didn’t need to guess at how much they were going to use
For these reasons, S3 became more and more baked into the tools that developers were
using.
- Ruby on Rails (paperclip)
- Zend PHP
- Hadoop (map-reduce)
/* At the Ruby on Rails deployment company, Engine Yard (which is where I was before
Cloudscaling).
- In 2008, we developed an AWS-based deployment platform.
- The old deployment system was on in-house hardware.
- One of the features was a clustered, posix-compliant filesystem with GFS. You could have
many virtual machines all connecting to the same volume in a relatively-sane way.
- In the move to AWS, we couldn’t provide the same type of storage system.
- But because S3 had permeated into the tools developers were using, it wasn’t an issue.
*/
36. Cloud Storage History
s3 Cloud Files
’06 ’07 ’08 ’09 ’10
Monday, April 11, 2011
In 2008, Mosso, a subsidiary of Rackspace, launched its own Object Storage system called
CloudFS, now called Rackspace Cloud Files.
37. Cloud Storage History
s3 Cloud Files Object Storage
’06 ’07 ’08 ’09 ’10
Monday, April 11, 2011
And, of course, over this summer the project was open-sourced as part of the OpenStack
project.
And that brings us to present day.
***So here we are:
- Object Storage is a big market. With two big players in the space.
- Rackspace has open-sourced their implementation
- This sets the stage for more deployments going forward
38. OpenSource Projects
CyberDuck Ruby Multi-Cloud Library
Filesystem Rackspace Library
Monday, April 11, 2011
- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.
- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.
(But it still references Rackspace in its interface.)
- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on getting
changes merged.
- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quick
response to folding in patches.
So if you’re thinking of deploying an openstack object storage, there is a reasonable body of
open-source tools and client libraries.
So that brings us to getting this up and running with service providers.
/*
What’s missing from this list are cloud management services.
I would love to talk with those who are providing OpenStack support.
I know what it’s going to take to provide the real motivation is a large potential customer
base, that’s going to show up when there are running, public, implementations of OpenStack.
*/
39. OpenSource Projects
CyberDuck Ruby Multi-Cloud Library
Filesystem Rackspace Library
Monday, April 11, 2011
- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.
- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.
(But it still references Rackspace in its interface.)
- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on getting
changes merged.
- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quick
response to folding in patches.
So if you’re thinking of deploying an openstack object storage, there is a reasonable body of
open-source tools and client libraries.
So that brings us to getting this up and running with service providers.
/*
What’s missing from this list are cloud management services.
I would love to talk with those who are providing OpenStack support.
I know what it’s going to take to provide the real motivation is a large potential customer
base, that’s going to show up when there are running, public, implementations of OpenStack.
*/
40. OpenSource Projects
CyberDuck Ruby Multi-Cloud Library
Filesystem Rackspace Library
Monday, April 11, 2011
- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.
- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.
(But it still references Rackspace in its interface.)
- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on getting
changes merged.
- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quick
response to folding in patches.
So if you’re thinking of deploying an openstack object storage, there is a reasonable body of
open-source tools and client libraries.
So that brings us to getting this up and running with service providers.
/*
What’s missing from this list are cloud management services.
I would love to talk with those who are providing OpenStack support.
I know what it’s going to take to provide the real motivation is a large potential customer
base, that’s going to show up when there are running, public, implementations of OpenStack.
*/
41. OpenSource Projects
CyberDuck Ruby Multi-Cloud Library
Filesystem Rackspace Library
Monday, April 11, 2011
- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.
- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.
(But it still references Rackspace in its interface.)
- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on getting
changes merged.
- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quick
response to folding in patches.
So if you’re thinking of deploying an openstack object storage, there is a reasonable body of
open-source tools and client libraries.
So that brings us to getting this up and running with service providers.
/*
What’s missing from this list are cloud management services.
I would love to talk with those who are providing OpenStack support.
I know what it’s going to take to provide the real motivation is a large potential customer
base, that’s going to show up when there are running, public, implementations of OpenStack.
*/
42. OpenSource Projects
CyberDuck Ruby Multi-Cloud Library
Filesystem Rackspace Library
Monday, April 11, 2011
- Cyberduck: Mac-based GUI. Needed patching. Author has pulled in changes in latest build.
- Fog: Ruby, multi-cloud library. Needed patching. Wesley Berry has pulled in our changes.
(But it still references Rackspace in its interface.)
- Cloudfuse: Implements a FUSE-based filesystem. Needed patching. Still working on getting
changes merged.
- Rackspace’s libraries: Again, some needed patching to support a Swift cluster. Very quick
response to folding in patches.
So if you’re thinking of deploying an openstack object storage, there is a reasonable body of
open-source tools and client libraries.
So that brings us to getting this up and running with service providers.
/*
What’s missing from this list are cloud management services.
I would love to talk with those who are providing OpenStack support.
I know what it’s going to take to provide the real motivation is a large potential customer
base, that’s going to show up when there are running, public, implementations of OpenStack.
*/
43. Object Storage
Monday, April 11, 2011
We’ve been working with OpenStack Object Storage since it came out in late July.
Because Object Storage deployments makes a lot of sense for our clients.
Having a way to conveniently host data is a very good thing for a service provider.
1) It’s good to have data close to the resources they're using (with compute, or content
distribution, for example)
2) It’s awfully convenient to have a place for data
- whether that’s for backups
- media assets.
3) Provide in-house tools to build an application around
- anywhere you are using S3 / CloudFiles during application development
Now, if you want, you can build and run your own cluster with the
- performance characteristics that you choose
- where you choose,
- with the security model you want
44. Object Storage
100 TB Usable
$90,000
5 Object Stores
4 Proxy Servers
Monday, April 11, 2011
Here is what it looked like.
46. Zones
1 2 3 4 5
Monday, April 11, 2011
Early on in the standup process.
Swift has this concept of Zones. Zones designed to be physical segmentation of data.
- Racks, power sources, fire sprinkler heads, physical buildings
- Boundaries that can isolate physical failures
The standard configuration is to have data replicated across three zones.
So initially we were thinking, well, let’s just create three zones.
But the problem is, if one of the zones catastrophically fails, there isn’t another zone for the
data to move into.
Now, we wanted to be able to tolerate a whole rack failure and still have a place for data to
land.
We choose to go with 5 zones.
If we had two total zone failures, and there was enough capacity remaining in the system, we
could run with.
With our small deployment where we had one machine == one zone
Our larger deployment will have one rack == one zone
47. 7kW
Monday, April 11, 2011
The Data Center
One of the 1st things to note is power density -or- space requirements of the system
- mechanical things tend to draw a lot of power.
- In our configuration, to utilize a full-rack in a data center, we had to live in a “high-density”
neighborhood of the data center.
- Our configuration runs with 10 4u object stores ran 7kw a rack
- That’s 370 drives per rack.
- Careful when powering up whole racks
- Plan accordingly
- The other option for us was to “go wide” and run 1/2 racks, where we would use more
space.
48. Networking
Aggregation Aggregation
Access Access
10GbE 10GbE
Access Access
1GbE Switch Switch 1GbE
Monday, April 11, 2011
The Networking
We took a 2-tier approach.
It starts out with a pair of redundant aggregation switches.
A single switch would be a single point of failure.
All requests go through the “Access Layer” that connect directly to the aggregation switches
at 10GbE.
- The access layer contains proxy servers, authentication, load balancers, etc.
Each rack has a single switch that is connected via 10GbE to the aggregation layer.
- We went with single as we plan on being able to handle single rack failures.
And it tapers down to 1GbE to an individual object store from the top-of-rack switches.
49. Object Stores
JBOD
48 GB RAM
36, 2TB Drives
No RAID
Newish Xeon
Monday, April 11, 2011
The object stores:
Beefy!
- 48GB RAM
- 36 2TB
- Newish Xeon
These are not just a bunch of disks!
The system has a lot of work to do. Lots of metadata to keep in memory.
***Lots of processes needed to be ran to handle the parallelism.
Commodity, but enterprise quality gear. Enterprise drives.
/*
SATA not SAS
*/
50. Access Layer Servers
AKA Proxy Servers
24 GB RAM
10 GbE
Newish Xeon
Monday, April 11, 2011
Access Layer
AKA “Proxy Servers”
- Xeon w/ 24 GB RAM
- 10 GbE
Our original deployment bottlenecked here.
Huge caveat here.
Different usage patterns will dramatically very Architecture, Hardware and Networking mix
- Archive
-- Occasionally tar-up a wad of data and park it there.
-- Much lower burden on the entire system.
- Trusted networks
-- Don’t need SSL, but wants lots of bandwidth
- Lots of little puts and gets
-- Proxy servers will need to handle the SSL load of many requests
Although I just outlined some hardware here. It’s not an exact recipe to follow.
51. Access Layer
Proxy Process
SSL
Load Balancing
Client
Access Layer
Monday, April 11, 2011
How the system is used has a huge impact on what to build.
The next uncharted territory we had to figure out was what we’re calling the ‘Access Layer’
You need to run the swift proxy-server process, which routes requests to the correct location
in the object stores (using the ring)
In addition:
- But you’re also likely to want to handle SSL termination
- And to load balance across the servers running the proxy processes
- We’ve also heard about using a commercial load balancer here as well
- HA Solution
Many options here:
- What we’re running is a round-robin DNS w/ Nginx terminating SSL directly in front of the
proxy.
- What we’re likely to end up with is running an HA Proxy configuration sharing an IP using
VRRP for HA, dumping straight into the proxy processes
Being pragmatic, there are other services that need a home as well
- Authentication
- Portal
Sticking them in the access layer can make a whole lot of sense.
52. Lessons Learned
• Lots of RAM
• Think parallelism
• Extremely durable
Monday, April 11, 2011
So what did we learn about this configuration?
- High data rates typically require a lot of parallel accesses, there is often significant per-
access latency (10s of ms, 1000x what a san or local storage device might show)
- But the tradeoff is a ton of objects can be retrieved and written at the same time. Having a
lot of RAM in aggregate helps. It makes the metadata costs for accessing lots of objects
manageable.
- We grossly undersized the proxy servers.
We did a ton of testing:
- Bitrot, bad disk, (happened to have) bad DIMM, failed node, failed zone, failed switch
This is an embarrassing story, but it’s worth telling. We even had one of the zones down for
two days right early on. Nobody noticed. We noticed when we re-ran some benchmarks and
some of the peak performance numbers weren't what they were on a previous run.
53. Running & Operating Infrastructure Clouds
Monday, April 11, 2011
When you buy a new car today. You open up the hood and there is a black plastic cover over
the engine.
To diagnose, you plug the car into a computer and there are sensors all over the place to tell
you what is wrong and how to fix it.
We need to be mindful to the fact that this is just one of many systems that they operate.
As much as possible, we’d like to hand them a whole working product that tells them how to
operate and maintain it.
For there to be a large number of swift deployments, the system needs to have handles so
that operators can maintain their system.
Here is what we’ve assembled so far.
54. Installation
- Chef Server
Installer - PXE
- REST API
Supporting
Services
$ cli tool - TFTP
- DHCP
- DNS
Monday, April 11, 2011
We built an installer.
Installing:
- To operate at any reasonable scale, consistency in configuration is hugely important.
- So we automate the crap out of everything.
- This installer (which runs as a virtual appliance) brings bare metal to fully-installed node
It is running a PXE server, so machines can ask for a new OS
Our configuration management tool of choice is Chef, so the installer is running chef-server
The net effect is that an operator can use a cli tool and punch in the mac address & what role
the machine should be.
And this system will take it from factory fresh, to a fully-installed swift node.
55. Infrastructure Services are Software Systems
Monday, April 11, 2011
A big web application
Software System.
Automating the install is not enough.
A couple of years ago, I was going through the process of choosing a storage vendor.
- Each sales team that I met with said the same thing: “We’re a software company”
- Now, I think they said that because there is more margin in being a software company.
- But in truth, they’re right. There is a heck of a lot in the software that drives these systems.
We treat the entire system as if it were a big web application that is undergoing active
development.
Change happens. We are continuously adding features or needing to respond to operational
issues
We pull from our dev ops roots to build tooling that is capable of driving frequent change
into the system.
We must be able to perform upgrades in the field with confidence.
56. One-Button Install
Development Lab Pre-Production
(Laptop) (Virtual Machines) (Small Environment)
Production
Monday, April 11, 2011
The first thing we put together was to automate the install process so that we could have a
one button install of a local development environment.
- This brought the system up from a fresh ubuntu install on a single VM somewhere.
- Perfect to do local development on.
The next step was to model-out a simulated deployment of how we were to configure a
production environment.
- In a lab environment, we recreate the production environment with virtual machines.
- And remember that ‘virtual appliance’ installer that does the provisioning?
- We use that same tool so that we can have a self-contained, simulated build of the entire
system.
Next, we have our small-ish environment where we have one physical machine per zone.
- We can use that environment as a pre-production for changes.
57. Continuous Integration
Development Lab
(Laptop) (Virtual Machines)
Monday, April 11, 2011
Finally, to vet the software changes as they are being made, we have a continuous integration
setup with Hudson.
When code is checked-in, we can rebuild in the lab. This is the system with a bunch of VMs.
Tests can be ran against that system and report any errors that crop up.
/* A full-rebuild takes us about 45 minutes. */
All these systems put together gives us the confidence that the system that we’ve put
together will run well and upgrade smoothly.
/*
Time-to-live is a big deal for us.
We aggressively pursue automation because it enables a repeatable deployment process at
scale.
Yes, a lot of time was spent automating installs. But we built these systems because we
ended up doing a lot of software development to integrate to make it run, ‘as a service’.
*/
58. Operations
• Automate agent install
• NOC
• Replacement rather than repair
Monday, April 11, 2011
Finally, operators need to know when the system needs work.
We bake in monitoring agents as part of the system install.
Because we have lot cost install tools, we are favoring replacement over repair.
- For example if a boot volume goes bad, or bad DIMMs.
- Basically anything that doesn’t involve swapping out a drive.
- It’s easier to replace the component, have the system rebalance itself and add new
equpment.
- Integrate w/ existing NOC systems
-- nobody will act upon an alert that isn’t seen
59. Billing
Utilization
- RAM Hours
- Storage Usage
Requests
Monday, April 11, 2011
Billing
This is a huge topic! We could have an entire talk on this topic alone.
Collect and store those metrics.
Utilization
- must compute storage ‘averages’ measuring on some interval you feel comfortable with
- Decide if you are going to keep track of small obj / num requests
Compute is usually metered at the RAM / hour.
Do you bill for requests? or make small files penal? charge for internal access?
At one company I was a heavy user of Amazon S3 -- they called me up out of the blue and
told me to re-architect our application because we were creating too many ‘Buckets’ (Folders).
Apparently we were causing excessive load. He thought he was being helpful saying “you will
save $700 a month!” I appreciated the sentiment to try to save me money, but the cost of re-
architecting the app would be more than a reasonable payback period in that context. The
excessive load was their problem. I was okay paying an extra $700 a month.
The moral of the story is to be okay with the usage you price for.
60. Pricing
Consumption Pricing Capacity Pricing
vs
Monday, April 11, 2011
Welcome to the world of consumption-pricing vs. capacity pricing.
Making a dinner reservation at the hippest restaurant in town.
They ask for your credit-card. Why? Because if you don’t show up, they still need to charge
you for the meal you didn’t eat.
They’re practicing capacity pricing. They have a fixed number of seats that they need to fill.
This same practice is the standard in our industry. When you go buy a bit of storage
equipment, you buy based on its capacity. You can’t say to a vendor... would you mind
installing 100 TB of storage, but I’ll only pay for 50... because I only plan on using 50TB on
_average_. They would laugh you out the door!
When you go to buy bandwidth, you are charged 95-percentile. You pay for bandwidth that
goes unused because you’re paying for the capacity to be available with some wiggle-room
for extra-ordinary bursts.
So service providers are having to figure out how to deal with this.
It’s a bigger deal at a smaller scale. A single customer could but-in and consume a large
amount of a cluster on a percentage basis.
The averages even out at a larger scale.
61. Authentication
Client
1
4 3 build this
Authentication
Service 2
5
Internal
Authentication
Infrastructure Cloud
Monday, April 11, 2011
- This is a multi-tenant system
- Authentication server is not intended to be used by a service provider.
- However, you will need to setup a few things in between the core storage cluster and your
authentication system.
- Clients (as in libraries/tools/services) expect to hit an auth server so that they can get 1) a
token that can be used with subsequent requests and 2) a URL of the storage system they
should hit.
- This “Authentication Middleware” must also be capable of getting a token from the storage
cluster to verify that it’s still good.
- So here is how the process works: Client makes request to Auth server and gets a token +
URL, Client makes API calls to the storage cluster with the Token, The storage cluster asks the
Auth Server if the token is good.
62. THANK YOU!
Joe Arnold
joe@cloudscaling.com
@joearnold
Monday, April 11, 2011