This crash course is designed to give an overview of cloud computing architecture and the open source software that can be used to deploy and manage a cloud computing environment.
Topics to be discussed in this session will include virtualization (KVM, LXC, and Xen Project), orchestration (Apache CloudStack, Eucalyptus, Open Nebula, and OpenStack), and storage (GlusterFS, Ceph, and others). The talk will also provide insight into how to deliver Platform-as-a-Service (PaaS) and what technologies can be used to compliment this evolving cloud computing paradigm.
Systems administrators and IT generalists will leave the discussion with a general overview of the options at their disposal to effectively build and manage their own cloud computing environments using free and open source software and understand the capabilities and benefits of a host of technologies.
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
RICON 2014 - Build a Cloud Day - Crash Course Open Source Cloud Computing
1. RICON 2014: Build a Cloud Day
Crash Course in
Open Source Cloud Computing
Mark Hinkle
Senior Director, Open Source Solutions
Citrix Inc.
mark.hinkle@citrix.com
mrhinkle@gmail.com
@mrhinkle
2. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ABOUT ME
I Help Build Open Source Ecosystems
Open Source Experience
• Manage Citrix Open Source Business Office
• Apache CloudStack Committer and PMC Member
• Advisory boards Gluster and Xen Project
• Joined Citrix via Cloud.com acquisition July 2011
• Zenoss Core open source project to 100,000 users,
1.5 million downloads
• Former LinuxWorld Magazine Editor-in-Chief
• Open Management Consortium organizer
• Author - “Windows to Linux Business Desktop
Migration” – Thomson
• NetDirector Project - Open Source Configuration
Management
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
3. http://www.slideshare.net/socializedsoftware
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes
were made. You may do so in any reasonable manner, but not in any way that suggests the licensor
endorses you or your use.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions
under the same license as the original.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Slides Available on Slideshare:
Creative Commons Attributions-ShareAlike 4.0 International
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
4. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Agenda
• The State of Cloud Computing
• Vetting Open Source Cloud Projects
• Virtualization
• Infrastructure-as-a-Service
• Platform-as-a-Service
• SDN
• Open Source for Amazon Web Services
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
5. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
The State of Cloud
Computing
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
6. THE PUBLIC CLOUD PLAYERS
$356 Billion Market Cap
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
$141 Billion Market Cap
$363 Billion Market Cap
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
7. Revenue (in Billions)
0 0.5 1 1.5 2 2.5 3 3.5 4
Source: Company data, Evercore Group LLC, Research. Azure based on MSFT comments about a $1 billion rev run rate in May 2013.
Google based on estimate by TBR (Technology Business Research)
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Public Cloud Revenue
Rackspace
Google
Azure
Amazon
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
8. IS IT PRACTICAL TO TRY TO
DUPLICATE AMAZON,
GOOGLE AND MICROSOFT?
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
9. Managed Cloud SP/SI Cloud
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Public Cloud
• Global Footprint
• Massive Scale
• Extreme Velocity
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
z
Cloud Industry Shake Out
zz
Vendors
Advantages
Challenges
• Stability
• Security
• Privacy
• End to End Network
• Security & SLA
• App QoS
• SI Capabilities
• Enterprise Trust
• SMB Channel
• Higher price than
Public Cloud
• Limited services
capabilities
• Agility
• Stack lock-in, not
best of breed
10. Enterprise Solution Providers
Move to Converged Infrastructure
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
IDC estimates that total
worldwide spending on
converged infrastructure
will hit $17.8 billion in
2016. Will open source
hardware make those
dollars go farther?
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
11. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
The Death of Net Neutrality
• Pay for play over
internet started (e.g.
Netflix peering
agreement)
• What if you had to pay
to gain access to your
cloud provider?
• Would this change the
dynamics of public
cloud computing?
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
12. Open Source Hardware the Next
Big Thing
• Frank Frankovsky, Chairman and President,
Open Compute Project Foundation
• Jason Taylor, Director of Infrastructure,
• Bill Laing, Corporate Vice President of Cloud
• Jason Waxman, General Manager, High Density
Computing, Data Center Group, Intel
• Mark Roenigk, Chief Operating Officer,
• Andy Bechtolsheim, Arista Networks
• Don Duet, Managing Director, Goldman Sachs
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Facebook
and Enterprise, Microsoft
Rackspace Hosting
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
13. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Vetting
Open Source Projects
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
14. Indicators of Open Source Health
• Code Velocity
• Committers
• Committer Reputation
• User-driven or Vendor-Driven
Innovation
• User Activity
• Corporate Support*
• Reputation of Foundation*
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
How can you tell if a Project is Legit
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
15. Open Source isn’t a Zero-Sum
Game
…the future of technological innovation is not stealing limited
resources away from one another, but creating new resources
— and new opportunities to create new resources — together in
a rich ecosystem.”
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Allison Randal
Open Source Hacker
Former OSCON Program Chair
@allisonrandal
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
17. DevOps
Toolchain
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Platform-as-a-Service (PaaS)
? ?
Infrastructure-as-a-Service (IaaS)
Orchestration
?
Compute Storage Networking
(Networking-as-a-Service)
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Orchestration
Configuration
Management
Monitoring
Open Source Cloud Stack
18. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Virtualization
Carving up compute resources
OPEN SOURCE
• Xen Project
• Citrix XenServer
• KVM
• VirtualBox
• OpenVZ
• LXC
• libcontainer
PROPRIETARY
• VMware
• Microsoft Hyper-V
• OracleVM (Based on Xen Project)
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
19. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Hypervisors and Containers
Differences in virtualization
Type 1 Hypervisors
VMware, Xen Project, Hyper-V
Type 2 Hypervisors
KVM, VirtualBox
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Containers
LXC, libcontainer
20. The Portability Problem
• Different file formats for virtual machines
(VMware uses vmdk file format, Xen and
Hyper-V use VHD, KVM uses Raw or QCOW2)
• Guest images may be “processor architecture”
• VMware and Xen can manage SCSI devices, but
• KVM and Xen can use virtio drivers but not
• VMware uses a proprietary agent inside the
guest OS (VMware tools) which does not work
with Xen or KVM
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Containers compared to Hardware Virtualization
bound
KVM cannot
VMware
• Yada, Yada, Yada
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
21. • Lets your run a Linux system within
• A container is a group of processes on a
Linux box, put together the provide an
isolated environment
• From the inside, it looks like a VM
• Externally it looks like normal processes
• “chroot on steroids”
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Containers
“Lightweight” Linux Virtualization
another Linux system
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
22. Docker Container Packaging
Docker is an open-source project to easily
create lightweight, portable, self-sufficient
containers from any application. The same
container that a developer builds and tests
on a laptop can run at scale, in production,
on VMs, bare metal, public clouds and
more.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Open source Container Packaging Engine
To learn more please visit:
www.docker.io
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
23. • Managed daemonized processes on Linux
• Create ability to re-use and manage similar
• Content agnostic
• Hardware agnostic
• Easy to automate
• Integrated with other tools: Chef, OpenShift,
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
What is Docker
System for Managing and Deploying Linux Containers
using libcontainer
applications
Puppet, VMware, etc.
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
24. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Docker’s
Growing
Ecosyste
m
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
25. Kubernetes builds on top of Docker to
construct a clustered container scheduling
service. Kubernetes enables users to ask
a cluster to run a set of containers. The
system will automatically pick worker
nodes to run those containers on, which
we think of more as "scheduling" than
"orchestration”
Kubernetes
To learn more please visit:
https://github.com/GoogleCloudPlatform/kubernetes Greek for Shipmaster
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Container Cluster Management – Scheduler
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
26. • Fig -Fast, isolated development environments
• Flynn - Next-generation application platform
• Panamax – Drag-and-Drop Docker Containerization
• Project Atomic – JEOS designed to run Docker
containers
• SocketPlane – Docker Networking (coming soon)
• Weave – Docker Networking
• 13,000+ Docker-related repos on Github
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Docker Related Projects
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
27. Continuous Integration
• Code – Application is stored
• Build – Code is built (Jenkins)
• Test – Unit tests are
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Rebuild Applications on any Cloud and/or Virtualized Infrastructure
in a repository
(Subversion,Git)
automated (Jenkins)
• Deploy – Deploy code to
server various ways
Code
Build
Test
Deploy
Thoughtworks Go – Open Source
Continuous Deliver System
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
28. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Cloud APIs
Everything (should) have an API in the Cloud
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
• Deltacloud(ruby)
• Daisein(java)
• Jclouds(java)
• Libcloud(python)
• Fog(ruby)
29. One to many tools for managing large numbers of devices
Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in
automating infrastructure while having access limited appropriately. (Originally authored Func)
Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH.
It uses a simple DSL that allows you to define tasks, which may be applied to machines in
certain roles
RunDeck Rundeck is an open-source process automation and command orchestration tool with a web
Func Func provides a two-way authenticated system for generically executing tasks, integrations with
MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or
Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or
Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Cloud Automation Tools
Project Description
console.
puppet and cobbler.
parallel job execution systems.
complex) commands.
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
30. Project Year Started License Virtualization
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Minimum Viable Cloud
Infrastructure-as-a-Service | IaaS | Compute Orchestration
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Technologies
Apache
CloudStack
2008 Apache (Bare Metal), Xenserver,
KVM, LXC VMware Hyper-
V
Eucalyptus 2006 GPL Xen, KVM, VMware
(commercial version)
OpenNebula 2005 Apache Xen, KVM, VMware
OpenStack 2010 (Developed by
NASA by Anso Labs
previously)
Apache VMware ESX and ESXi, ,
Xen, XenServer, KVM,
LXC, QEMU and Virtual
Box
31. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OpenStack
The Boy Band of the Open Source Cloud
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
32. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Span Compute, Storage and Networking
IDENTITY
SERVICE
IMAGE
SERVICE
TELEMETRY
SERVICE
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
ORCHESTRATION
SERVICE
OpenStack Shared Services
33. Even More OpenStack Projects
• Trove
Database Service
• Ironic
Bare Metal (Ironic)
• Marconi
Queue Service
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Span Compute, Storage and Networking
• Cinder
Block Storage Service
• Ceilometer
Metering/Monitoring
• Heat
Orchestration
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
34. Open Source Solution Providers
If you can’t do it yourself
“OpenStack is not a product. If you are building a large infrastructure, it’s
more like a tool kit. It gives you a lot of technologies that do take a lot of
effort to integrate.”
Chris Kemp, OpenStack Board Member and Co-Founder
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
CEO of Piston Computing
35. Project Description
Ceph Distributed file storage system developed by DreamHost ->
GlusterFS Scale Out NAS system aggregating storage over Ethernet or
Riak CS Riak CS is open source software designed to provide simple,
available, distributed cloud storage at any scale. Riak CS is S3-
API compatible and supports per-tenant reporting for billing and
metering use cases. (object)
Sheepdog Distributed storage for KVM hypervisors, distributed iSCSI
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Cloud Storage
Virtualized, Distributed usually on Commodity Hardware
InkTank -> Red Hat (block, object, file)
Infiniband (file)
OpenStack
Storage
Long-term object storage system (object)
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
36. What about Open Source PaaS?
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
37. Project Sponsors Languages/Frameworks
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Platform-as-a-Service (PaaS)
Abstracted Cloud-Scale Run-Time Environments
CloudFoundry VMware -> Pivotal -> CloudFoundry
Foundation
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Spring for Java, Ruby for Rails and
Sinatra, node.js, Grails, Scala on
Lift and more via partners (e.g.
Python, PHP)
Cloudify Gigaspaces [Groovy for deployment recipes]
OpenShift Origin Red Hat Java, Ruby, PHP, Perl and Python
Apache Stratos WSO2 - >Apache Stratus PHP, Tomcat, MySQL “cartridges”
38. Apache Mesos is a cluster manager that simplifies the
complexity of running applications on a shared pool of
servers. Largely supported by Twitter, used by
LinkedIn, AirBNB too.
Features
• Fault-tolerant replicated master using ZooKeeper
• Scalability to 10,000s of nodes
• Isolation between tasks with Linux Containers
• Multi-resource scheduling (memory and CPU
• Java, Python and C++ APIs for developing new
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Apache Mesos
aware)
parallel applications
• Web UI for viewing cluster state
To learn more please visit:
http://mesos.apache.org/
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
39. SOFTWARE DEFINED NETWORKING
Virtualization meets the network Decoupling of the control and data planes of the network to
improve efficiency. Communication from a SDN controller via a
protocol to network devices both physical and virtual.
Abstractions allow for programmable networks.
Network can be changed quickly via a controller
Network offerings can match virtualization offerings for finer
grained security in a highly volatile compute landscape.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Automation
Dynamic Networks
Security
Heterogeneous Management
Single control point for various devices.
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
40. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Business Applications
SDN Overview
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Network Services
SDN
Control
Software
API API
Network Devices Network Devices Network Devices
Network Devices Network Devices Network Devices
Application
Layer
Control
Layer
Infrastructure
Layer
Control Data Plane Interface (e.g. OpenFlow)
41. Benefits of SDN
Network Virtualization is the final frontier of Software Defined Datacenter
• Dynamically update networks
• Automate network
functionality
• “Program” security into the
network
• Centrally apply policies to
network and services
• Optimize networks
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
42. OpenFlow
Virtualization meets the network OpenFlow enables networks to
evolve, by giving a remote
controller the power to modify
the behavior of network
devices, through a well-defined
"forwarding instruction set".
The growing OpenFlow
ecosystem now includes
routers, switches, virtual
switches, and access points
from a range of vendors.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
43. Software Defined Network Controllers and more
Floodlight The Floodlight Open SDN Controller is an enterprise-class, Apache-licensed, Java-based OpenFlow
Controller. It is supported by a community of developers including a number of engineers from Big Switch
Networks. - See more at: http://www.projectfloodlight.org/floodlight/#sthash.9IhA1Ih5.dpuf
Indigo Indigo is an open source project aimed at enabling support for OpenFlow on physical and hypervisor
switches. Big Switch has helped numerous companies OpenFlow enable their equipment, and we
provide firmware for a number of popular switches. Indigo is the basis of Switch Light by Big Switch
Networks. - See more at: http://www.projectfloodlight.org/indigo/#sthash.K7LiHcqc.dpuf
Lincx LINCX is a pure OpenFlow software switch written in Erlang. It runs within a separate domain under Xen
Nox NOX is the original OpenFlow controller, and facilitates development of fast C++ controllers on Linux.
Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous
Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network
automation through programmatic extension, while still supporting standard management interfaces and
protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag).
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Open Source SDN
Project Description
hypervisor using LING (erlangonxen.org).
vendors in development. E.g IBM DOVE
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
44. Open vSwitch is a production quality,
multilayer virtual switch licensed under the
open source Apache 2.0 license. It is
designed to enable massive network
automation through programmatic extension,
while still supporting standard management
interfaces and protocols (e.g. NetFlow, sFlow,
SPAN, RSPAN, CLI, LACP, 802.1ag).
To learn more please visit our website:
http://openvswitch.org/
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Open vSwitch
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
45. DevOps
Toolchain
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CloudFoundry, OpenShift, Gigaspaces
Docker
Platform-as-a-Service
Mesos Kubernetes
Infrastructure-as-a-Service | IaaS | Orchestration
(OpenStack, Apache CloudStack, Eucalyptus)
Compute
(Containers,
KVM, Xen)
Storage
(Ceph, Gluster)
Networking
(OpenDaylight,
Contrail)
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Orchestration
-
Ansible/SaltStack/Scalr*
Configuration
Management
(CFengine/Chef/Puppet)
Monitoring
(logstash,graphite,)
Open Source Cloud Stack
46. NetFlix Open Source AWS Toolbag
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
46
Asgard Astaynax Edda
Eureka Priam Simian Army
http://netflix.github.com
Tools developed by a super Amazon Web Services Power User
47. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Contact Me
Happy to Chat about Open Source, Cloud or Pittsburgh Sports
Professional: mark.hinkle@citrix.com
Personal: mrhinkle@gmail.com
Phone: 919.228.8049
Professional: http://open.citrix.com
Personal: http://www.socializedsoftware.com
Twitter: @mrhinkle
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
48. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APPENDIX A
Additional Links to related stuff
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
49. ADDITIONAL LINKS
• Devops Toolchains Group
• Software Defined Networking: The New Norm for Networks
(Whitepaper)
• DevOps Wikipedia Page
• NoSQL-Database.org – Ultimate Guide to the Non-Relational Universe
• Open Cloud Initiative
• NIST Cloud Computing Platform
• Open Virtualization Format Specs
• Clouderati Twitter Account
• Planet DevOps
• Nicira Whitepaper – It’s Time to Virtualize the Network
• Why Open vSwitch FAQ
• Stanford Seminar - Software-Defined Networking at the Crossroads
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
50. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ADDITIONAL LINKS (CONT’D)
• SDN, NFV, and open source: The Operator’s View
• Puppet Labs: Build a Toolbox for Continuous Delivery
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
51. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APPENDIX B
Stuff I’d liked to have talked
about but didn’t have time
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
52. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
60 SECOND CLOUD DEFINITION
Just because Software Marketing Guys Think it’s the Internet
5 CHARACTERISTICS OF CLOUD
1. On-Demand Self-Service
2. Broad Network Access
3. Resource Pooling
4. Rapid Elasticity
5. Measured Service
User Cloud a.k.a.
SOFTWARE-AS-A-SERVICE
Developer Cloud a.k.a.
PLATFORM-AS-A-SERVICE
Systems Cloud a.k.a.
INFRASTRUCTURE-AS-A-SERVICE
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
53. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
SCALE-UP SCALE OUT
Elasticity and the cloud
Vertical Scaling (Scale-Up)
Allocate additional resources to
VMs, requires a reboot, no need for
distributed app logic, single-point of
OS failure
Horizontal Scaling (Scale-Out)
Application needs logic to work in
distributed fashion (e.g. HA-Proxy
and Apache Hadoop)
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
54. Bitnami BitNami provides free, ready to run environments for your favorite open source
web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP,
Rails, Django and many more.
Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple
Oz Command-line tool that has the ability to create images for common Linux
SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
SOURCING CLOUD APPLIANCES
Packaging Engines for VMs
Tool/Project What you can do with them
virtualization and Cloud providers
distributions to run on KVM
Amazon EC2.
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
55. PACKER MULTIPLATFORM VM CREATION
Packer is easy to use and automates the
creation of any type of machine image. It
embraces modern configuration
management by encouraging you to use
automated scripts to install and configure
the software within your Packer-made
images.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
To learn more please visit:
www.packer.io
Open source Automation for VMs
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
56. CONFIGURATION MANAGEMENT
TOOLS Tools with features for configuring cloud infrastructure
Project Year Started Language License Client/Server
Chef 2009 Ruby Apache Chef Solo – No
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CFengine 1993 C Apache Yes
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Chef Server - Yes
Puppet 2004 Ruby GPL Yes & standalone
Salt 2011 Python Apache yes
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
56
57. CLOUD MONITORING TOOLS
Tools with features for monitoring cloud infrastructure
Project Type of Monitoring Collection Methods
Cacti / RRDTool Performance SNMP, syslog
Nagios Availability SNMP,TCP, ICMP, IPMI,
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Graphite Performance Agent
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
syslog
Sensu Availability Agent
Zabbix Availability/ Performance and more SNMP, TCP/ICMP, IPMI,
Synthetic Transactions
Zenoss Availability, Performance, Event
Management
SNMP, ICMP, SSH, syslog,
WMI
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
57
58. CLOUD PROVISIONING TOOLS
Packaging Engines for VMs
Can provision 10s to 1000s of machines on various clouds.
Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE
boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs
Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Project Installation Targets
Apache Provisionr
(incubating)
Crowbar (Bare metal provisioning)
JuJu Public Clouds - Amazon Web Services HP Cloud,
Private OpenStack clouds, Bare Metal via MAAS.
via ZeroMQ
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
58
59. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
BIG DATA
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
60. API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually
consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook
CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface
for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully
consistent, Misc: Transparent topology changes during operation, provides memcached-compatible
caching buckets
API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication:
HDFS Replication, Written in: Java
PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API,
Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High
performance C++ implementation of Google's Bigtable.
MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication:
Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a
defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave,
Misc: also lists, sets, sorted sets, hashes, queues.
Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written
in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
NOSQL DATABASES
Horizontally scalable unstructured data retrieval
Name Type Description
Apache
Wide Column
Cassandra
Store/Families
HBase Wide Column
Store/Families
Hypertable Wide Column
Store/Families
Master Slave & Auto-Sharding, Written in: C++,Concurrency
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
61. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
MAP REDUCE
Algorithm for Parallelized Data Set Processing
Problem
Data
Master
Node
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
Worker
Node 1
Worker
Node 2
Worker
Node 3
Solution
Data
Map
Reduce
62. • Handles large amounts of
• Stores data in native format
• Delivers linear scalability at
• Resilient in case of
infrastructure failures
• Transparent application
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE HADOOP
Apache Project for Parallelized Data Set Processing
Overview
• Handles large amounts of
data
• Stores data in native format
• Delivers linear scalability at
low cost
• Resilient in case of
infrastructure failures
• Transparent application
scalability
Features
data
low cost
scalability
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
63. Machine Learning
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE HADOOP ECOSYSTEM
Non-Relational DB
Hadoop Hadoop Common
HDFS
Distributes & replicates data
across machines
RICON 2014 - Buiid A Cloud Day - Open Source Cloud Computing
MapReduce
Distributes & monitors tasks
Hive
Data warehouse that
provides SQL interface.
Ad hoc projection of
data structure to
unstructured
MapReduce
• Parallel programming
• Handles large data blocks
HBase
Column-oriented
schema-less distributed
DB modeled after
Google’s BigTable
Random real time
read/write.
Scripting
Pig
Platform for
manipulating and
analyzing large data sets.
Scripting language for
analysts.
Mahout
Machine learning
libraries for
recommendations ,
clustering, classifications
and item sets.
Chuckwa Zookeeper
Notes de l'éditeur
Netflix is now paying Time Warner Cable for direct access and faster streams — Tech News and Analysis - https://gigaom.com/2014/08/19/netflix-is-now-paying-time-warner-cable-for-direct-access-and-faster-streams/
Dashboard of Performance
Openhub has a good graphical representation of code velocity and listing of developers – www.openhub.com
Bitgeria
Bitgeria does number of dashboards.
Top choices for Cloud Computing are Xen and KVM.
OpenVZ, container virtualization for Linux, is an interesting option as it has a very minimal overhead to scale application space similar to containers like BSD Jails. Advantage is that memory allocation is soft and unutilized memory can be used by other applications.
Type 1 Hypervisor – VMware, Xen Project, XenServer, Hyper-V
Type 1 (or native, bare metal) hypervisors run directly on the host's hardware to control the hardware and to manage guest operating systems. A guest operating-system thus runs on another level above the hypervisor.
Type 2 Hypervisor – VM
Type 2 (or hosted) hypervisors run within a conventional operating-system environment. With the hypervisor layer as a distinct second software level, guest operating-systems run at the third level above the hardware. VMware Workstation and VirtualBox exemplify Type 2 hypervisors.
Image portability across hypervisorshttps://www.ibm.com/developerworks/community/blogs/9e696bfa-94af-4f5a-ab50-c955cca76fd0/entry/image_portability_across_hypervisors1?lang=en
What is Docker: http://www.docker.com/whatisdocker/
Common use cases for Docker include:
Automating the packaging and deployment of applications
Creation of lightweight, private PAAS environments
Automated testing and continuous integration/deployment
Deploying and scaling web apps, databases and backend services
Martin Fowler - Continuous Integration
http://www.martinfowler.com/articles/continuousIntegration.html
Types of Tasks Accomplished by an API
Provisioning (creating, re-creating, moving, or deleting components e.g. virtual machines, vlans)
Configuration (assigning or changing attributes of the architecture such as security and network settings)
Cloud Providers
Daisein -
Jclouds – java API Abstraction
Libcloud – started by CloudKick (now Rackspace) to abstract clouds, Apache incubator project
Deltacloud – started by Red Hat to abstract clouds, Apache incubator project
Fog - provider and abstraction level API across compute and storage, written in Ruby
Ansible
Ansible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Ansible is also used to roll out and manage clusters of machines and ISV software, such as Basho's flagship key-value store Riak.
Capistrano
Capistrano is a developer tool for deploying web applications. It is typically installed on a workstation, and used to deploy code from your source code management (SCM) to one, or more servers.
Capistrano recently added classes capabilities that match cobbler.
RunDeck
RunDeck is cross-platform open source software that helps you automate ad-hoc and routine procedures in data center or cloud environments. RunDeck allows you to run tasks on any number of nodes from a web-based or command-line interface. RunDeck also includes other features that make it easy to scale up your scripting efforts including: access control, workflow building, scheduling, logging, and integration with external sources for node and option data.
Func
Func allows for running commands on remote systems in a secure way, like SSH, but offers several improvements.
Func allows you to manage an arbitrary group of machines all at once.
Func automatically distributes certificates to all "slave" machines. There's almost nothing to configure.
Func comes with a command line for sending remote commands and gathering data.
There are lots of modules already provided for common tasks.
Anyone can write their own modules using the simple Python module API.
Everything that can be done with the command line can be done with the Python client API. The hack potential is unlimited.
You'll never have to use "expect" or other ugly hacks to automate your workflow.
It's really simple under the covers. Func works over XMLRPC and SSL.
Since func uses certmaster, any program can use func certificates, latch on to them, and take advantage of secure master-to-slave communication.
There are no databases or crazy stuff to install and configure. Again, certificate distribution is automatic too.
Mcollective
The Marionette Collective AKA mcollective is a framework to build server orchestration or parallel job execution systems.
Mcollective is used as a means of programmatic execution of Systems Administration actions on clusters of servers.
MCollective use modern tools like Publish Subscribe Middleware and modern philosophies like real time discovery of network resources using meta data and not hostnames. Delivering a very scalable and very fast parallel execution environment.
Scalr
Scalr is a pretty darn good open source cloud management tool. It provides both an automation framework (do Foo when Bar) and a web interface (where is this volume mounted) for managing infrastructure on the cloud, like EC2.
FEATURES
* Integrated into Opscode Chef, for configuration management.
* Pre-automated software, such as nginx, mysql, redis, mongo, and rabbitmq
* Blazing fast UI
* Multi-cloud
* More at http://scalr.net/features/
ROADMAP
* http://wiki.scalr.net/Roadmap
OpenStack Shared Services - https://www.openstack.org/software/openstack-shared-services/
Identity Service
OpenStack Identity provides a central directory of users mapped to the OpenStack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style logins.
Image Service
The OpenStack Image Service provides discovery, registration and delivery services for disk and server images. The ability to copy or snapshot a server image and immediately store it away is a powerful capability of the OpenStack cloud operating system. Stored images can be used as a template to get new servers up and running quickly and more consistently if you are provisioning multiple servers than installing a server operating system and individually configuring additional services. It can also be used to store and catalog an unlimited number of backups.
Telemetry Service
The OpenStack Telemetry service aggregates usage and performance data across the services deployed in an OpenStack cloud. This powerful capability provides visibility and insight into the usage of the cloud across dozens of data points and allows cloud operators to view metrics globally or by individual deployed resources.
Orchestration Service
OpenStack Orchestration is a template-driven engine that allows application developers to describe and automate the deployment of infrastructure. The flexible template language can specify compute, storage and networking configurations as well as detailed post-deployment activity to automate the full provisioning of infrastructure as well as services and applications. Through integration with the Telemetry service, the Orchestration engine can also perform auto-scaling of certain infrastructure elements.
OpenStack Shared Services
https://www.openstack.org/software/openstack-shared-services/
Identity Service
OpenStack Identity provides a central directory of users mapped to the OpenStack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style logins.
Image Service
The OpenStack Image Service provides discovery, registration and delivery services for disk and server images. The ability to copy or snapshot a server image and immediately store it away is a powerful capability of the OpenStack cloud operating system. Stored images can be used as a template to get new servers up and running quickly and more consistently if you are provisioning multiple servers than installing a server operating system and individually configuring additional services. It can also be used to store and catalog an unlimited number of backups.
Telemetry Service
The OpenStack Telemetry service aggregates usage and performance data across the services deployed in an OpenStack cloud. This powerful capability provides visibility and insight into the usage of the cloud across dozens of data points and allows cloud operators to view metrics globally or by individual deployed resources.
Orchestration Service
OpenStack Orchestration is a template-driven engine that allows application developers to describe and automate the deployment of infrastructure. The flexible template language can specify compute, storage and networking configurations as well as detailed post-deployment activity to automate the full provisioning of infrastructure as well as services and applications. Through integration with the Telemetry service, the Orchestration engine can also perform auto-scaling of certain infrastructure elements.
Debate: How Many Open Source Platforms Are Enough?
http://www.enterprisetech.com/2014/06/23/debate-many-open-source-platforms-enough/
OpenStack came in for the most criticism for issues such cost of deployment and maintenance. “OpenStack is not a product,” Kemp responded. “If you are building a large infrastructure, it’s more like a tool kit. It gives you a lot of technologies that do take a lot of effort to integrate.” The tool kit is used to create a product, Kemp stressed.
OpenStack Vendors
Canonical Ubuntu OpenStack - http://www.ubuntu.com/cloud/tools/openstack
CloudScaling – Elastic Cloud Infrastructure - http://www.cloudscaling.com/
Elastic Cloud Infrastructure – built on OpenStack – enables any IT group to deploy cloud services comparable to the capabilities of the world’s largest and most successful public clouds. Cloudscaling solutions allow your organization to rapidly scale resources, achieve new levels of agility and improve market responsiveness. All with full control and governance in the privacy of your on-premise data center.
HP Cloud OS - http://www8.hp.com/us/en/business-solutions/solution.html?compURI=1421776#.UzoD3K1dVDo
Based on OpenStack technology, HP Cloud OS provides the foundation for the HP Cloud common architecture across private, public, and hybrid cloud delivery.
Piston Cloud Computing - http://www.pistoncloud.com/openstack-cloud-software/
Piston OpenStack is a software product that uses advanced systems intelligence to orchestrate an entire private cloud environment using commodity hardware. Starting with an extremely lightweight custom Linux OS called Iocane Micro-OS™, and using an advanced high-availability system called Moxie Runtime Environment™, Piston keeps your cloud running no matter what – through hardware failure, operator error, upgrades, and power outages.
Red Hat Distribution of OpenStack - http://openstack.redhat.com/Main_Page
RDO is a community of people using and deploying OpenStack on Red Hat Enterprise Linux, Fedora and distributions derived from these (such as CentOS, Scientific Linux and others). We have documentation to help get started, forums where you can connect with other users, and community-supported packages of the most up-to-date OpenStack releases available for download.
Rackspace Private Cloud powered by OpenStack - http://www.rackspace.com/cloud/private/
Software Defined Networking (SDN) is an emerging network architecture where network control is decoupled from forwarding and is directly programmable. This migration of control, formerly tightly bound in individual network devices, into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services, which can treat the network as a logical or virtual entity.
This figure depicts a logical view of the SDN architecture. Network intelligence is (logically) centralized in software-based SDN controllers, which maintain a global view of the network. As a result, the network appears to the applications and policy engines as a single, logical switch. With SDN, enterprises and carriers gain vendor-independent control over the entire network from a single logical point, which greatly simplifies the network design and operation. SDN also greatly simplifies the network devices themselves, since they no longer need to understand and process thousands of protocol standards but merely accept instructions from the SDN controllers.
Open Flow
OpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available.
In a classical router or switch, the fast packet forwarding (data path) and the high level routing decisions (control path) occur on the same device. An OpenFlow Switch separates these two functions. The data path portion still resides on the switch, while high-level routing decisions are moved to a separate controller, typically a standard server. The OpenFlow Switch and Controller communicate via the OpenFlow protocol, which defines messages, such as packet-received, send-packet-out, modify-forwarding-table, and get-stats.
The data path of an OpenFlow Switch presents a clean flow table abstraction; each flow table entry contains a set of packet fields to match, and an action (such as send-out-port, modify-field, or drop). When an OpenFlow Switch receives a packet it has never seen before, for which it has no matching flow entries, it sends this packet to the controller. The controller then makes a decision on how to handle this packet. It can drop the packet, or it can add a flow entry directing the switch on how to forward similar packets in the future.
OpenFlow is the first standard communications interface defined betweenthe control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). It is the absence of an open interface to the forwarding plane that has led to the characterization of today’s networking devices as monolithic, closed, and mainframe-like. No other standard protocol does what OpenFlow does, and a protocol like OpenFlow is needed to move network control out of the networking switches to logically centralized control software
Floodlight - http://www.projectfloodlight.org/floodlight/
- OpenFlow – works with physical- and virtual- switches that speak the OpenFlow protocol
- Apache-licensed – lets you use Floodlight for almost any purpose Open community
Floodlight is developed by an open community of developers. We welcome code contributions from active participants and we’ll openly share information on project status, roadmap, bugs, etc.
Easy to Use- Floodlight is drop dead simple to build and run. Read through the Documentation (link)
Tested and Supported – Floodlight is the core of a commercial controller product from Big Switch Networks (link) and is actively tested and improved by a community of professional developers.
Indigo - http://www.projectfloodlight.org/indigo/
Indigo is an open source project aimed at enabling support for OpenFlow on physical and hypervisor switches. Big Switch has helped numerous companies OpenFlow enable their equipment, and we provide firmware for a number of popular switches.
Indigo is the basis of Switch Light by Big Switch Networks.
- See more at: http://www.projectfloodlight.org/indigo/#sthash.K7LiHcqc.dpuf
Lincx - https://github.com/FlowForwarding/lincx
LINCX is a pure OpenFlow software switch written in Erlang. It runs within a separate domain under Xen hypervisor using LING (erlangonxen.org).
LINCX is a new faster version of LINC-Switch.
Open Daylight – http://www.opendaylight.com
The adoption of new technologies and pursuit of programmable networks has the potential to significantly improve levels of functionality, flexibility and adaptability of mainstream datacenter architectures. To leverage this abstraction to its fullest requires the network to adapt and evolve to a Software-Defined architecture. One of the architectural elements required to achieve this goal is a Software-Defined-Networking (SDN) platform that enables network control and programmability.
Open vSwitch
Open vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). In addition, it is designed to support distribution across multiple physical servers similar to VMware's vNetwork distributed vswitch or Cisco's Nexus 1000V. See the full feature list here
Why Open vSwitch - http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=WHY-OVS;hb=HEAD
Hypervisors need the ability to bridge traffic between VMs and with theoutside world. On Linux-based hypervisors, this used to mean using thebuilt-in L2 switch (the Linux bridge), which is fast and reliable. So,
it is reasonable to ask why Open vSwitch is used.
The answer is that Open vSwitch is targeted at multi-server virtualization deployments, a landscape for which the previous stack is not well suited. These environments are often characterized by highly
dynamic end-points, the maintenance of logical abstractions, and (sometimes) integration with or offloading to special purpose switching hardware.
NetFlix AWS Toolbag – http://netflix.github.com
Over 25 projects developed by NetFlix to manager their cloud deployments.
Asgard
Asgard is a web-based tool for managing cloud-based applications and infrastructure.
Astyanaz
Astyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.
Edda
Edda is a Service to track changes in your cloud deployments.
Eureka
Eureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers.
At Netflix, Eureka is used for the following purposes apart from playing a critical part in mid-tier load balancing.
For aiding Netflix Asgard - an open source service which makes cloud deployments easier, in
Fast rollback of versions in case of problems avoiding the re-launch of 100's of instances which could take a long time.
In rolling pushes, for avoiding propagation of a new version to all instances in case of problems.
For our cassandra deployments to take instances out of traffic for maintenance.
For our memcached caching services to identify the list of nodes in the ring.
Priam
Priam is a process/tool that runs alongside Apache Cassandra to automate the following:
- Backup and recovery (Complete and incremental)
- Token management
- Seed discovery
- Configuration
Support AWS environment
Simian Army
The Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures
Private cloud
The cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise.
Public cloud
The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
Hybrid cloud
The cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
Salt - https://github.com/saltstack/salt
Cacti
Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.
RRDTool
RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.
Graphite
Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.
These tools are all appropriate for Linux guest operating systems, Windows operating system provisioning is not well addressed in OSS.
Axembler Provisonr
Provisionr solves the problem of cloud portability by hiding completely the APIs and only focusing on building a cluster that matches the same set of assumptions on all clouds, assumptions like: a specific OS, pre-installed packages and binaries, sane dns settings, ssh & vpn access etc. - think a solid foundation for configuration.
As a secondary goal Provisionr will also provide primitives for building automatic or semi-automatic workflows for configuring and monitoring services, workflows that assume that all the machines share a common set of characteristics as described above.
Cobbler
Cobbler is a Linux installation server that allows for rapid setup of network installation environments. It glues together and automates many associated Linux tasks so you do not have to hop between lots of various commands and applications when rolling out new systems, and, in some cases, changing existing ones.
With a simple series of commands, network installs can be configured for PXE, reinstallations, media-based net-installs, and virtualized installs (supporting Xen, qemu, KVM, and some variants of VMware). Cobbler uses a helper program called 'koan' (which interacts with Cobbler) for reinstallation and virtualization support.
Crowbar
Bare metal provisioning for CloudStack developed by Dell using Opscode Chef.
Juju
Metal as a Service (MAAS)
MAAS offers a nice UI to provision your Ubuntu servers. Each physical server (“node”) will be commissioned automatically on first boot. During the commissioning process administrators are able to configure hardware settings manually before an automated smoke test and burn-in test are done. Once commissioned, a node can be deployed on demand by name, or allocated to a queue for dynamic allocation to services being deployed on this MAAS.
Salt Cloud
Salt Cloud is a tool for provisioning salted minions across various cloud providers. Currently supported providers are:
- Amazon EC2
- GoGrid
- HP Cloud (using OpenStack)
- Joyent
- Linode
- OpenStack
- Rackspace (using OpenStack)
The salt-cloud command can be used to query configured providers, create VMs on them, deploy salt-minion on those VMs and destroy them when no longer needed.
Salt Cloud requires Salt to be installed, but does not require any Salt daemons to be running. However, if used in a salted environment, it is best to run Salt Cloud on the salt-master, so that it can properly lay down salt keys when it deploys machines, and then properly remove them later. If Salt Cloud is run in this manner, minions will automatically be approved by the master; no need to manually authenticate them later.
Deprecated
Spacewalk
Spacewalk manages software content updates for Red Hat derived distributions such as Fedora, CentOS, and Scientific Linux, within your firewall. You can stage software content through different environments, managing the deployment of updates to systems and allowing you to view at which update level any given system is at across your deployment. A clean central web interface allows viewing of systems and their software update status, and initiating update actions.
Big data the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis[4] and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.
NoSQL
In computing, NoSQL (commonly interpreted as "not only SQL"[1]) is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally do not use SQL for data manipulation.
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
Apache Cassandra
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Cassandra's ColumnFamily data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching.
Cassandra is in use at Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco, OpenX, Digg, CloudKick, Ooyala, and more companies that have large, active data sets. The largest known Cassandra cluster has over 300 TB of data in over 400 machines.
Hypertable
Hypertable is based on a design developed by Googl(e.g. BigTable clone) to meet their scalability requirements and solves the scale problem better than any of the other NoSQL solutions out there.
Mongo DB
MongoDB (from "humongous") is a cross-platform document-oriented database system.
Redis
Redis is an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
Riak
Riak is known for its ability to distribute data across nodes using consistent hashing in a simple key/value scheme in namespaces called buckets.
MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.