SlideShare une entreprise Scribd logo
1  sur  32
Scalable Object Storage with
Apache CloudStack and Apache
Hadoop
April 30 2013
Chiradeep Vittal
@chiradeep
Agenda
• What is CloudStack
• Object Storage for IAAS
• Current Architecture and Limitations
• Requirements for Object Storage
• Object Storage integrations in CloudStack
• HDFS for Object Storage
• Future directions
• History
• Incubating in the Apache
Software Foundation since
April 2012
• Open Source since May
2010
• In production since 2009
– Turnkey platform for delivering
IaaS clouds
– Full featured GUI, end-user API
and admin API
Apache CloudStack
Build your cloud the way the
world’s most successful
clouds are built
How did Amazon build its cloud?
Commodity
Servers
Commodity
Storage
Networking
Open Source Xen Hypervisor
Amazon Orchestration Software
AWS API (EC2, S3, …)
Amazon eCommerce Platform
How can YOU build a cloud?
Servers StorageNetworking
Open Source Xen Hypervisor
Amazon Orchestration Software
AWS API (EC2, S3, …)
Amazon eCommerce Platform
Hypervisor (Xen/KVM/VMW/)
CloudStack Orchestration Software
Optional Portal
CloudStack or AWS API
Secondary Storage
Image
L3/L2 core
DC Edge
End users
Pod Pod Pod Pod
Zone Architecture
Pod
Access Sw
MySQL
CloudStack
Admin/User API
Primary Storage
NFS/ISCSI/FC
Hypervisor (Xen
/VMWare/KVM)
VM
VM
Snapshot
Snapshot
Image
Disk Disk
VM
Cloud-Style Workloads
• Low cost
– Standardized, cookie cutter infrastructure
– Highly automated and efficient
• Application owns availability
– At scale everything breaks
– Focus on MTTR instead of MTBF
Secondary Storage
Image
L3/L2 core
DC Edge
Pod Pod Pod Pod
At scale…everything breaks
Pod
Access Sw
Primary Storage
NFS/ISCSI/FC
Hypervisor (Xen
/VMWare/KVM)
VM
VM
Snapshot
Snapshot
Image
Disk Disk
VM
Region “West”
Zone “West-Alpha”
Zone “West-Beta”
Zone “West-Gamma”
Zone “West-Delta”
Low Latency Backbone
(e.g., SONET ring)
Regions and zones
Region “East”
Region “South”
Internet
Geographic
separation
Region “West”
Low Latency
Secondary Storage in CloudStack 4.0
• NFS server default
– can be mounted by hypervisor
– Easy to obtain, set up and operate
• Problems with NFS:
– Scale: max limits of file systems
• Solution: CloudStack can manage multiple NFS stores (+
complexity)
– Performance
• N hypervisors : 1 storage CPU / 1 network link
– Wide area suitability for cross-region storage
• Chatty protocol
– Lack of replication
Object Storage Technology
Region “West”
Zone “West-Alpha”
Zone “West-Beta”
Zone “West-Gamma”
Zone “West-Delta”
Object Storage in a region
• Replication
• Audit
• Repair
• Maintenance
Region “West”
Object Storage enables reliability
Object Storage Technology
Region “West”
Object Storage also enables other
applications
Object Store
API Servers
• DropBox
• Static Content
• Archival
Object Storage characteristics
• Highly reliable and durable
– 99.9 % availability for AWS S3
– 99.999999999 % durability
• Massive scale
– 1.3 trillion objects stored across 7 AWS regions [Nov 2012 figures]
– Throughput: 830,000 requests per second
• Immutable objects
– Objects cannot be modified, only deleted
• Simple API
– PUT/POST objects, GET objects, DELETE objects
– No seek / no mutation / no POSIX API
• Flat namespace
– Everything stored in buckets.
– Bucket names are unique
– Buckets can only contain objects, not other buckets
• Cheap and getting cheaper
CloudStack S3 API Server
Object Storage Technology
S3
API Servers
MySQL
CloudStack S3 API Server
• Understands AWS S3 REST-style and SOAP API
• Pluggable backend
– Backend storage needs to map simple calls to their
API
• E.g., createContainer, saveObject, loadObject
– Default backend is a POSIX filesystem
– Backend with Caringo Object Store (commercial
vendor) available
– HDFS backend also available
• MySQL storage
– Bucket -> object mapping
– ACLs, bucket policies
Object Store Integration into
CloudStack
• For images and snapshots
• Replacement for NFS secondary storage
Or
Augmentation for NFS secondary storage
• Integrations available with
– Riak CS
– Openstack Swift
• New in 4.2 (upcoming):
– Framework for integrating storage providers
What do we want to build ?
• Open source, ASL licensed object storage
• Scales to at least 1 billion objects
• Reliability and durability on par with S3
• S3 API (or similar, e.g., Google Storage)
• Tooling around maintenance and
operation, specific to object storage
The following slides are a design
discussion
Architecture of Scalable Object
Storage
API Servers
Auth Servers
Object Servers Replicators/Auditors
Object
Lookup
Servers
Why HDFS
• ASF Project (Apache Hadoop)
• Immutable objects, replication
• Reliability, scale and performance
– 200 million objects in 1 cluster [Facebook]
– 100 PB in 1 cluster [Facebook]
• Simple operation
– Just add data nodes
HDFS-based Object Storage
S3 API Servers
S3 Auth Servers
Data nodes
Namenode
pair
HDFS API
BUT
• Name Node Scalability
– 150 bytes RAM / block
– GC issues
• Name Node SPOF
– Being addressed in the community✔
• Cross-zone replication
– Rack-awareness placement ✔
– What if the zones are spread a little further apart?
• Storage for object metadata
– ACLs, policies, timers
Name Node scalability
• 1 billion objects = 3 billion blocks (chunks)
– Average of 5 MB/object = 5 PB (actual), 15
PB (raw)
– 450 GB of RAM per Name Node
• 150b x 3 x 10^9
– 16 TB / node => 1000 Data nodes
• Requires Name Node federation ?
• Or an approach like HAR files
Name Node Federation
Extension: Federated NameNodes are HA pairs
Federation issues
• HA for name nodes
• Namespace shards
– Map object -> name node
• Requires another scalable key-value store
– HBase?
• Rebalancing between name nodes
Replication over lossy/slower links
A. Asynchronous replication
– Use distcp to replicate between clusters
– 6 copies vs. 3
– Master/Slave relationship
• Possibility of loss of data during failover
• Need coordination logic outside of HDFS
B. Synchronous replication
– API server writes to 2 clusters and acks only
when both writes are successful
– Availability compromised when one zone is
down
CAP Theorem
Consistency or Availability during partition
Many nuances
Storage for object metadata
A. Store it in HDFS along with the object
– Reads are expensive (e.g., to check ACL)
– Mutable data, needs layer over HDFS
B. Use another storage system (e.g. HBase)
– Name node federation also requires this.
C. Modify Name Node to store metadata
– High performance
– Not extensible
Object store on HDFS Future
• Viable for small-sized deployments
– Up to 100-200 million objects
– Datacenters close together
• Larger deployments needs development
– No effort ongoing at this time
Conclusion
• CloudStack needs object storage for
“cloud-style” workloads
• Object Storage is not easy
• HDFS comes close but not close enough
• Join the community!

Contenu connexe

Tendances

HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...Michael Stack
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling PinterestC4Media
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...Michael Stack
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Joydeep Sen Sarma
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheUnleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheAmazon Web Services
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...Michael Stack
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon AthenaJulien SIMON
 
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFSSimple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFSAmazon Web Services
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Amazon Web Services
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsAmazon Web Services
 
Getting Started with EC2, S3 and EMR
Getting Started with EC2, S3 and EMRGetting Started with EC2, S3 and EMR
Getting Started with EC2, S3 and EMRArun Sirimalla
 
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2Maris Elsins
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupRafal Kwasny
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 

Tendances (20)

HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling Pinterest
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
 
Unleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCacheUnleash the Power of Redis with Amazon ElastiCache
Unleash the Power of Redis with Amazon ElastiCache
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon Athena
 
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFSSimple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
 
Amazon DynamoDB and Amazon DAX
Amazon DynamoDB and Amazon DAXAmazon DynamoDB and Amazon DAX
Amazon DynamoDB and Amazon DAX
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
Getting Started with EC2, S3 and EMR
Getting Started with EC2, S3 and EMRGetting Started with EC2, S3 and EMR
Getting Started with EC2, S3 and EMR
 
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 

Similaire à Scalable Object Storage with Apache CloudStack and Apache Hadoop

Red Hat Storage Server For AWS
Red Hat Storage Server For AWSRed Hat Storage Server For AWS
Red Hat Storage Server For AWSRed_Hat_Storage
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed_Hat_Storage
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
New use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis RicoNew use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis RicoCeph Community
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSWong Hoi Sing Edison
 
Orchestration across multiple cloud platforms using Heat
Orchestration across multiple cloud platforms using HeatOrchestration across multiple cloud platforms using Heat
Orchestration across multiple cloud platforms using HeatCoreStack
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStackCisco DevNet
 

Similaire à Scalable Object Storage with Apache CloudStack and Apache Hadoop (20)

Red Hat Storage Server For AWS
Red Hat Storage Server For AWSRed Hat Storage Server For AWS
Red Hat Storage Server For AWS
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
New use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis RicoNew use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis Rico
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Big data on aws
Big data on awsBig data on aws
Big data on aws
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWS
 
Orchestration across multiple cloud platforms using Heat
Orchestration across multiple cloud platforms using HeatOrchestration across multiple cloud platforms using Heat
Orchestration across multiple cloud platforms using Heat
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
 
DEVNET-1106 Upcoming Services in OpenStack
DEVNET-1106	Upcoming Services in OpenStackDEVNET-1106	Upcoming Services in OpenStack
DEVNET-1106 Upcoming Services in OpenStack
 

Plus de buildacloud

The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittalbuildacloud
 
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh BoddapatiPolicy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapatibuildacloud
 
L4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef LaribiL4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef Laribibuildacloud
 
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David NalleyJenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David Nalleybuildacloud
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirchbuildacloud
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowskibuildacloud
 
Cloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex HenevaldCloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex Henevaldbuildacloud
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalleybuildacloud
 
Managing infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike CohenManaging infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike Cohenbuildacloud
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirchbuildacloud
 
Monitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
Monitoring CloudStack in context with Converged Infrastructure by Mike TurnlundMonitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
Monitoring CloudStack in context with Converged Infrastructure by Mike Turnlundbuildacloud
 
Rest api design by george reese
Rest api design by george reeseRest api design by george reese
Rest api design by george reesebuildacloud
 
Enterprise grade firewall and ssl termination to ac by will stevens
Enterprise grade firewall and ssl termination to ac by will stevensEnterprise grade firewall and ssl termination to ac by will stevens
Enterprise grade firewall and ssl termination to ac by will stevensbuildacloud
 
State of the cloud by reuven cohen
State of the cloud by reuven cohenState of the cloud by reuven cohen
State of the cloud by reuven cohenbuildacloud
 
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicek
Securing Your Cloud With the Xen Hypervisor by Russell PavlicekSecuring Your Cloud With the Xen Hypervisor by Russell Pavlicek
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicekbuildacloud
 
DevCloud - Setup and Demo on Apache CloudStack
DevCloud - Setup and Demo on Apache CloudStack DevCloud - Setup and Demo on Apache CloudStack
DevCloud - Setup and Demo on Apache CloudStack buildacloud
 
Cloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper ContrailCloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper Contrailbuildacloud
 
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...buildacloud
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski buildacloud
 
CloudStack University by Sebastien Goasguen
CloudStack University by Sebastien GoasguenCloudStack University by Sebastien Goasguen
CloudStack University by Sebastien Goasguenbuildacloud
 

Plus de buildacloud (20)

The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh BoddapatiPolicy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
 
L4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef LaribiL4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef Laribi
 
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David NalleyJenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirch
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowski
 
Cloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex HenevaldCloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex Henevald
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
 
Managing infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike CohenManaging infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike Cohen
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirch
 
Monitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
Monitoring CloudStack in context with Converged Infrastructure by Mike TurnlundMonitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
Monitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
 
Rest api design by george reese
Rest api design by george reeseRest api design by george reese
Rest api design by george reese
 
Enterprise grade firewall and ssl termination to ac by will stevens
Enterprise grade firewall and ssl termination to ac by will stevensEnterprise grade firewall and ssl termination to ac by will stevens
Enterprise grade firewall and ssl termination to ac by will stevens
 
State of the cloud by reuven cohen
State of the cloud by reuven cohenState of the cloud by reuven cohen
State of the cloud by reuven cohen
 
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicek
Securing Your Cloud With the Xen Hypervisor by Russell PavlicekSecuring Your Cloud With the Xen Hypervisor by Russell Pavlicek
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicek
 
DevCloud - Setup and Demo on Apache CloudStack
DevCloud - Setup and Demo on Apache CloudStack DevCloud - Setup and Demo on Apache CloudStack
DevCloud - Setup and Demo on Apache CloudStack
 
Cloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper ContrailCloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper Contrail
 
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
 
CloudStack University by Sebastien Goasguen
CloudStack University by Sebastien GoasguenCloudStack University by Sebastien Goasguen
CloudStack University by Sebastien Goasguen
 

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Scalable Object Storage with Apache CloudStack and Apache Hadoop

  • 1. Scalable Object Storage with Apache CloudStack and Apache Hadoop April 30 2013 Chiradeep Vittal @chiradeep
  • 2. Agenda • What is CloudStack • Object Storage for IAAS • Current Architecture and Limitations • Requirements for Object Storage • Object Storage integrations in CloudStack • HDFS for Object Storage • Future directions
  • 3. • History • Incubating in the Apache Software Foundation since April 2012 • Open Source since May 2010 • In production since 2009 – Turnkey platform for delivering IaaS clouds – Full featured GUI, end-user API and admin API Apache CloudStack Build your cloud the way the world’s most successful clouds are built
  • 4. How did Amazon build its cloud? Commodity Servers Commodity Storage Networking Open Source Xen Hypervisor Amazon Orchestration Software AWS API (EC2, S3, …) Amazon eCommerce Platform
  • 5. How can YOU build a cloud? Servers StorageNetworking Open Source Xen Hypervisor Amazon Orchestration Software AWS API (EC2, S3, …) Amazon eCommerce Platform Hypervisor (Xen/KVM/VMW/) CloudStack Orchestration Software Optional Portal CloudStack or AWS API
  • 6. Secondary Storage Image L3/L2 core DC Edge End users Pod Pod Pod Pod Zone Architecture Pod Access Sw MySQL CloudStack Admin/User API Primary Storage NFS/ISCSI/FC Hypervisor (Xen /VMWare/KVM) VM VM Snapshot Snapshot Image Disk Disk VM
  • 7. Cloud-Style Workloads • Low cost – Standardized, cookie cutter infrastructure – Highly automated and efficient • Application owns availability – At scale everything breaks – Focus on MTTR instead of MTBF
  • 8. Secondary Storage Image L3/L2 core DC Edge Pod Pod Pod Pod At scale…everything breaks Pod Access Sw Primary Storage NFS/ISCSI/FC Hypervisor (Xen /VMWare/KVM) VM VM Snapshot Snapshot Image Disk Disk VM
  • 9. Region “West” Zone “West-Alpha” Zone “West-Beta” Zone “West-Gamma” Zone “West-Delta” Low Latency Backbone (e.g., SONET ring) Regions and zones
  • 11. Secondary Storage in CloudStack 4.0 • NFS server default – can be mounted by hypervisor – Easy to obtain, set up and operate • Problems with NFS: – Scale: max limits of file systems • Solution: CloudStack can manage multiple NFS stores (+ complexity) – Performance • N hypervisors : 1 storage CPU / 1 network link – Wide area suitability for cross-region storage • Chatty protocol – Lack of replication
  • 12. Object Storage Technology Region “West” Zone “West-Alpha” Zone “West-Beta” Zone “West-Gamma” Zone “West-Delta” Object Storage in a region • Replication • Audit • Repair • Maintenance
  • 13. Region “West” Object Storage enables reliability
  • 14. Object Storage Technology Region “West” Object Storage also enables other applications Object Store API Servers • DropBox • Static Content • Archival
  • 15. Object Storage characteristics • Highly reliable and durable – 99.9 % availability for AWS S3 – 99.999999999 % durability • Massive scale – 1.3 trillion objects stored across 7 AWS regions [Nov 2012 figures] – Throughput: 830,000 requests per second • Immutable objects – Objects cannot be modified, only deleted • Simple API – PUT/POST objects, GET objects, DELETE objects – No seek / no mutation / no POSIX API • Flat namespace – Everything stored in buckets. – Bucket names are unique – Buckets can only contain objects, not other buckets • Cheap and getting cheaper
  • 16. CloudStack S3 API Server Object Storage Technology S3 API Servers MySQL
  • 17. CloudStack S3 API Server • Understands AWS S3 REST-style and SOAP API • Pluggable backend – Backend storage needs to map simple calls to their API • E.g., createContainer, saveObject, loadObject – Default backend is a POSIX filesystem – Backend with Caringo Object Store (commercial vendor) available – HDFS backend also available • MySQL storage – Bucket -> object mapping – ACLs, bucket policies
  • 18. Object Store Integration into CloudStack • For images and snapshots • Replacement for NFS secondary storage Or Augmentation for NFS secondary storage • Integrations available with – Riak CS – Openstack Swift • New in 4.2 (upcoming): – Framework for integrating storage providers
  • 19. What do we want to build ? • Open source, ASL licensed object storage • Scales to at least 1 billion objects • Reliability and durability on par with S3 • S3 API (or similar, e.g., Google Storage) • Tooling around maintenance and operation, specific to object storage
  • 20. The following slides are a design discussion
  • 21. Architecture of Scalable Object Storage API Servers Auth Servers Object Servers Replicators/Auditors Object Lookup Servers
  • 22. Why HDFS • ASF Project (Apache Hadoop) • Immutable objects, replication • Reliability, scale and performance – 200 million objects in 1 cluster [Facebook] – 100 PB in 1 cluster [Facebook] • Simple operation – Just add data nodes
  • 23. HDFS-based Object Storage S3 API Servers S3 Auth Servers Data nodes Namenode pair HDFS API
  • 24. BUT • Name Node Scalability – 150 bytes RAM / block – GC issues • Name Node SPOF – Being addressed in the community✔ • Cross-zone replication – Rack-awareness placement ✔ – What if the zones are spread a little further apart? • Storage for object metadata – ACLs, policies, timers
  • 25. Name Node scalability • 1 billion objects = 3 billion blocks (chunks) – Average of 5 MB/object = 5 PB (actual), 15 PB (raw) – 450 GB of RAM per Name Node • 150b x 3 x 10^9 – 16 TB / node => 1000 Data nodes • Requires Name Node federation ? • Or an approach like HAR files
  • 26. Name Node Federation Extension: Federated NameNodes are HA pairs
  • 27. Federation issues • HA for name nodes • Namespace shards – Map object -> name node • Requires another scalable key-value store – HBase? • Rebalancing between name nodes
  • 28. Replication over lossy/slower links A. Asynchronous replication – Use distcp to replicate between clusters – 6 copies vs. 3 – Master/Slave relationship • Possibility of loss of data during failover • Need coordination logic outside of HDFS B. Synchronous replication – API server writes to 2 clusters and acks only when both writes are successful – Availability compromised when one zone is down
  • 29. CAP Theorem Consistency or Availability during partition Many nuances
  • 30. Storage for object metadata A. Store it in HDFS along with the object – Reads are expensive (e.g., to check ACL) – Mutable data, needs layer over HDFS B. Use another storage system (e.g. HBase) – Name node federation also requires this. C. Modify Name Node to store metadata – High performance – Not extensible
  • 31. Object store on HDFS Future • Viable for small-sized deployments – Up to 100-200 million objects – Datacenters close together • Larger deployments needs development – No effort ongoing at this time
  • 32. Conclusion • CloudStack needs object storage for “cloud-style” workloads • Object Storage is not easy • HDFS comes close but not close enough • Join the community!

Notes de l'éditeur

  1. Need a better slide than this
  2. Frequently require CCNA , Vmwareceritification, EMC training, etc etc. But they chose commondity systems. And simple networking.Can also sell cheaply since they use their own commerce platform.
  3. The key here is the API on top of the infrastructure. This is the disruptive piece for the industry. Forget about CCNA, Vmware cert, now people can programmatically control their infrastructure as well as the VMs on top of it.