High Profile 🔝 8250077686 📞 Call Girls Service in Siri Fort🍑
Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer
1. Optimizing IT Costs using Virtualization, Green and
PRESENTATION TITLE GOES HERE
Cloud Computing
David Royer
SNIA Brasil, Chairman
Rio Info 2009
Rio de Janeiro, Brazil
2. SNIA At A Glance
Voice of the storage industry representing approximately
$50-60B in worldwide revenue for hardware and software
Founded in 1997 as a non-profit trade association
Worldwide headquarters in San Francisco USA
Global presence in A/NZ, Canada, China, EMEA, India,
Japan and South-Asia
Technology Center activities in Colorado, Beijing, Tokyo, and
Bangalore
Focus on education, conferences, specifications / standards,
software, industry alliances, best practices, plugfests, and
conformance testing for SNIA specifications
Co-owner of Storage Networking World (SNW) conference
with Computerworld/IDG Enterprise
a collaborative environment and serve as global contributors
toward the advancement of standards, education, and
innovation in the storage and information management industry
4. Worldwide Disk Storage Systems and
Branded Tape Storage Segment Factory
Revenue Growth
YoY Growth by Segment
30.00%
20.00%
10.00%
0.00%
-10.00%
Q1
Q2
Q3
Q4
-20.00%
08
08
08
08
20
20
20
20
-30.00%
-40.00%
-50.00%
-60.00%
-70.00%
Tape - Entry Level Tape - Midrange Tape - High End
Int Disk - Entry Int Disk - Midrange Ext Disk - Entry
Ext Disk - Midrange Ext Disk - High End
• Entry level and midrange external DSS are the only segments showing flat/positive YoY growth in 4Q
2008. This can be attributed to: customers deferring purchase of larger, more expensive storage systems
in favor of lower cost, more modular systems and; the emergence of technologies, such as iSCSI, that
offer enterprise level features yet at a lower price point than traditional FC SAN systems
5. Storage Hardware 2009 Outlook
Tape will continue to decline as disk-based archival and back-up technologies
emerge
Internal storage is closely tied to the server market, which is expected to be
weaker in the coming quarters than the external disk market
External disk storage systems market will feel further the impact of the
economic crisis. Weakness seen in higher end systems, specifically
mainframes and FC SAN.
Healthier segments include:
iSCSI SAN – specifically in the upper entry level and midrange market
Verticals such as Healthcare, Video Surveillance, and Government
Midrange product offerings: as customers fulfilling their enterprise
storage needs with midrange products
Enterprise VTL: Will augment midrange and enterprise tape drives,
especially in tape libraries and automation
Source IDC Doc # 218274
6. Storage Software Growth – Average 7%
Data Protection, growth rate through 2013, 6.2%
Archiving Software, growth rate through 2013, 10.4%
Storage Device Management Software, growth rate through 2013, 2.8%
Storage Management Software, growth rate through 2013, 5.6%
Storage Infrastructure, growth rate through 2013, 5.9%
Storage Replication, growth rate through 2013, 7.6%
File System, growth rate through 2013, 7.1%
Source IDC Doc # 217529
7. E-Discovery Growth
Combination of software:
Storage infrastructure, e-discovery, collaboration, ECM, data
management, and security
Hardware
Storage spending growth was underpinned by data volume
and requirements to store, manage, index, archive, and preserve data
Servers
Source IDC Doc # 218259
8. Focus on a Few
Industry Storage Trends
Green IT
Cloud Computing
Virtualization
9. Abstract
Best Practices in Managing Virtualized Environments
Today, data center environments are increasingly complex with
virtualization at all layers of the IT stack, including network, server,
SAN and storage. IT professionals are often challenged in diagnosing
application performance issues, optimizing infrastructure resource
utilization, and planning for future changes. The best practices for
managing complex data center environments include cross domain
management orientation, watching the infrastructure response time
for cross-domain performance, looking for application contention and
contention-based latency in the storage layer, best fit analysis of
workloads to storage resources, and working toward infrastructure
performance SLAs. Key requirements for this new breed of
management software include agent-less discovery and SMI-S support.
9
10. Virtualization is
Everywhere
Tremendous Benefits
Pooling of resources
Rapidly deploy new
App Servers Web Servers Security
applications Client Network
NETWORK
Increase resource
utilization Server Virtualization
Over-subscribe resources
Lower acquisition cost and Storage Network
SAN SAN
TCO
Traditional system Array Virtualization
management practices
may no longer work
10
11. What’s “Real” about
Virtualization?
Like the Emperor‟s new (virtualized) clothes –
A logical interface presenting a
normalized “resource” that isn‟t “all there”
Built over physical and other virtual layers that do not look at all like
the presented logical resource
We will discuss two major IT virtualization initiatives
Storage Virtualization
Server Virtualization
(and the combination of the two!)
Check out SNIA Tutorial:
Virtualization 1- What, Why,
Where, and How
11
12. Virtualization Pools Resources
Physical Infrastructure Model Virtual Infrastructure Model
CLIENT NETWORK CLIENT NETWORK
Server Pool
SAN SAN STORAGE NETWORLK
Storage Pool Tier 1
Tier 2
Archive
12
13. Managing Virtualized
Environments
Managing through Virtualization is Challenging
Diagnosing Performance Problems
Optimizing Resource Utilization
Planning for Future Changes
Virtualization Feature “New” Admin Challenge
Clients Reserve and Share Resource Performance still
Resource Capacity Degrades Non-linearly with Load
Dynamic Infrastructure Finding Transitional bottlenecks
Increased Resource Utilization Optimal Resource Deployment
Easy to provision new VMs Predicting if the next VM fits
13
14. The Bottom Line…
Applications share resources
Poor performance is caused by:
Hard-to-find I/O bottlenecks and
resource contention
Mis-alignment between layers of
virtualization
Under-provisioning shared resources
Over-provisioning of shared
resources as insurance negates ROI
Inhibitors to success
Virtualized data center complexity
Lack of cross-domain management
Lack of cross-domain communication
14
15. Best Practices in Managing Virtualized
Environments
Solving Old Problems in a New Environment
Recommended Best Practices -
1. Cross Domain Analysis and Shared Resource Contention
2. Adopt an Application View of Performance
3. Use Automation Wisely
4. “Effective Capacity” Management
5. Model-based Optimization and Planning
15
16. 1. Cross Domain Analysis
Virtualization Management is “Cross-Domain” -
Create a Cross-Domain Baseline (discover and collect)
Mapping from multiple layers (app, server, storage, physical & virtual)
Aim for agent-less and “on-line”
Standards like SMI-S are essential for heterogeneous environments
Check Configuration First
Don‟t optimize or “plan a baseline” from a poorly configured system
Checklist vendor configuration best practices
Newer technologies (Thin-wide arrays, 10 GbE networks,
SSDs) move performance bottlenecks elsewhere. SNIA Tutorial:
Check out
Solving Business-Oriented Goals
with SMI-S
16
18. Find Shared Resource
Contention
Stepping Through a Virtual Looking Glass -
Need to Map through Virtualization Layers
Map relationships at every level
Exponential problem of server virtualization over storage virtualization
Sum up the loads from every client that shares each resource
Quantify Application Contention due to Sharing
Calculate performance impact back to each application
Root cause is mostly figuring out What’s Changed when
Capacity runs out
If Load changed, was it aberrant behavior or growth?
If Configuration changed, does it violate policy or show thrashing?
If Contention arose, who is new to the pool?
18
19. Application Contention
Cross Domain visibility is
naturally “foggy”
Domain specific management has
limited view
Virtualization makes it harder
Management requires
end-to-end picture
19
20. Cross-Domain: Navigating the Virtualized
Environment
A common map Need a map through
helps different domain all the indirection
admins communicate
Long data path from application to array…
Sharing can be
dynamic – maps
must be too
20
21. 2. Adopt Application View of Performance
The Customer is Always Right –
Application Infrastructure Performance
How long do it take an I/O to complete from the application point of
view (Response Time)
Some applications ($$$) are more loved than others
Manage to this “Service” Performance
Element utilizations are interesting,
but service performance is the goal
Look for Abnormal “Service” Behavior
Not just default rule-of-thumb thresholds on utilizations
21
22. Service Layer Metrics
Customer Resource
40
35
30
25
Response
20
Time ( sec )
Optimal
15
Throughput
Throughput @ 10
Service Level Agreement
Response Time 5 Maximum
Throughput
0
0 200 400 600 800 1000 1200 1400
Throughput ( transactions / sec )
22
23. Look for Abnormal Behavior
Check for Abnormal
Behavior
Acceptable Variance
Calculate baseline
A statistical analysis of
variance of
performance
over time
Compare data to baseline
Shared Resources tend to
average out peaks that will
show in dedicated
resources
Helps Justify
Virtualization
23
24. 4. “Effective Capacity”
Management
Capacity Management Isn’t Just “Enough GBs”
Storage has both space and time constraints
(server folk have it easy!)
Manage to the total “Effective Capacity”
Maximum utilization that gives good performance
Not to total actual utilization (aka “saturation”)
Build in Automation for Scalability
Virtualized environments tend to sprawl
And they can change dynamically
Check out SNIA Tutorial:
Storage Virtualization II –
Effective Use of Virtualization
24
26. 4. Use Automation Wisely
Build in Automation for Scalability
Virtualized environments tend to sprawl
And they can change dynamically
Almost everything can be automated
Event Monitoring
Performance collection and reporting
Analysis of Performance and Configuration
correlation of events with performance, first and second order analysis
Provisioning, Reconfiguration and Migration
Don‟t forget to leave an audit trail
Feedback loop Check out SNIA Tutorial:
Storage Virtualization II –
What where the effects of the change? Effective Use of Virtualization
26
27. 5. Model based Optimization and Planning
Moving Towards a Real-Time Datacenter -
Constantly Increase Operational Efficiency
Most working infrastructure is sub-optimized
Dedicated resources
“If it ain‟t broke, don‟t fix it” attitudes (or capabilities)
However, when everything is shared, everyone goes down together…
Real-er Time Capacity Planning
Utilizations are related to Response Time through Queuing Theory
Need to predict performance degradation under
future application load changes
Need to predict performance improvements from possible
architectural/technology changes
Planning and tuning will go from large cyclical events to
smaller, more dynamic perturbations
27
28. Queuing Theory to The Rescue…
Queuing Models create Response Time curves
Based on established mathematics (Buzen, et.al – see www.cmg.org )
Useful analytically (historically) as well as predictively
For a simple example think of a check-out line at the grocery store
Complex Queuing Network Models can represent
nested and virtualized IT domains
Advanced cross-domain solutions model IT virtualization
28
29. Best Practices in
Managing Virtualized Environments
In Summary -
1. Cross Domain Analysis and Shared Resource Contention
Virtualization is about sharing across IT domains,
and that‟s often the problem
2. Adopt an Application View of Performance
Manage to customer service levels
3. Use Automation Wisely
Doing more with less time and fewer errors
4. “Effective Capacity” Management
Shared resources still obey the laws of physics
5. Model-based Optimization and Planning
Leverage Prediction to Improve your Future
29
31. Energy Cost of Data Storage
50,000 3,000
45,000
40,000 2,500
Capacity (PBs)
35,000 2,000
30,000
$M
25,000 1,500
20,000
15,000 1,000
10,000 500
5,000
0 0
99
00
01
02
03
04
05
06
07
08
09
10
11
19
20
20
20
20
20
20
20
20
20
20
20
20
Installed # of Petabytes (57% 2006-2011 CAGR)
Cost to Power and Cool (19% 2006-2011 CAGR)
IDC #212714, “The Real Costs to Power and Cool All the World's External Storage” – June 2008 Dave Reinsel
Chart used by permission of IDC
32. What Impacts Energy
Consumption for Data Storage
Storage capacity / usage efficiency
increasing data larger capacity more disks
redundant copies magnify capacity needs
variability in usage and utilization inefficient allocation of space
What is valuable data? What is the retention policy?
Data transfer rate / access speed
high I/O bandwidth higher rotational speed; striping across many drives
low access times faster actuators; higher rotational speeds; caches
How fast and immediate must data be available? (time-to-data)
Data integrity
25% of “digital universe” is unique, but 75% are replicas / duplicates
partly to ensure data integrity and survivability; partly wasteful
Data availability / system reliability
RAID uses extra drives, plus redundant power supplies, fans, controllers,
How valuable is data? How likely are failures? How fast must data be
available?
33. Potential Paths to “Green” Storage
Improve usage efficiency must be driven by
De-duplication metrics / standards
/ guidelines
Thin provisioning
Minimize energy consumption
Improved component designs – high-efficiency power
supplies, advanced & flexible drives
Variants of MAID – idle and spin-down
New technologies
Solid state storage
Alternative + hybrid system designs (opportunity to rethink)
34. Anatomy of a Storage System
System design, complexity and
Switches redundancy vary depending on
applications & usage Apps Software
Component designs, software features, and
Appliances workload affect power consumption and
efficiency
Power Supplies
Disk Arrays
Fans
Controllers
PDUs
Power Distribution Unit
Hard drives
UPSs
Uninterruptible Power Supply
35. Storage –
Power Supply Efficiency
1 - Redundant power supplies are
standard, except in the smallest systems
Power Supplies
(for
servers)*
Fans
*presented by EPA at ENERGY STAR Computer
Server Stakeholder Meetings; July 2008
Controllers
2 - Significant
mechanical
components, require
Hard drives
dual-output power
supplies (12V, 5V)
3 - Power supplies often custom-
designed for reliability
36. Idle Power versus Active Power
Idle Mode for a Storage Array
storage system is protecting data, ready to process IOs
background maintenance & optimization tasks on-going
factors: time-to-data, overhead electronics, fan, maintenance
systems are idle large fractions of the time
Active Mode for a Storage Array
storage system is carrying out IOs
background tasks continue in parallel
factors: workload (seq/random), response time, throughput
evaluate a variety of workloads, plus sustained peak power
37. HDD Capacity versus
High Performance
Capacity
focused on GB/watt at rest
1 TB SATA: 15W
4 x 250 GB FC: 64W
also tend to have better $/GB
NOTE: power use is quadratic with respect to rotational
speed
Use the slowest drives that will fit your needs
Performance
focused on seek time
1 TB SATA: 12 – 15 ms
300 GB FC: 3 – 4 ms
also designed for higher RAS * environments
* RAS = Reliability, Availability, Security
38. SSD vs HDD
Power Value - Significant Power Savings
15k RPM Enterprise
HDD SSD
Idle Temp Load Temp
85°F
94°F
SSDs reduce
energy cost to
6.8W 0.5W operate and cool
the data center 10.1W 0.9W
Idle Power
Load Power
~38% Less Heat, ~90% Less Power
39. Storage Taxonomy
for Energy Measurement
Need a taxonomy (product classification) to enable fair
comparisons among similar storage products
e.g. for motor vehicles – motorcycles, cars, trucks
Similar green metrics may apply to all product categories, but
different values establish best-in-class
Unique considerations apply to special categories
e.g. amphibious cars, skid steer loaders, tanks
Clear taxonomy will simplify comparisons and aid regulatory
efforts
40. SNIA Measurement Standard - Draft
Storage taxonomy
Measurement conditions
Idle metric
Active metric(s)
Reporting results
41. 1) Storage Taxonomy (1 of 2)
Online Storage Near Online Storage
Prime storage, able to serve random as well as Intended as second tier storage behind Online
sequential workloads with minimal delay Storage. Able to service Random and
Storage Taxonomy Summary Sequential workloads, but perhaps with
noticeable delay in time to 1st data access.
Maximum Capacity Guidance Note: Maximum Capacity Guidance reflects the
maximum capacity a given offering can be purchased with and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute
value. There will be case where a device may have greater or small capabilities, but otherwise is an appropriate match for a given classification due to
Max Storage Devices Max Storage Devices
other criteria, e.g.: redundancy capabilities
Group 1) SoHo & Consumer
Storage which is designed primarily for home (consumer) or home / small office usage. Up to 4 Devices
–Often Direct Connected (USB, IP, etc)
–No option for redundancy (will contain SPOFs)
Group 2) Entry, DAS, or JBOD
Storage which is dedicated to one or at most a very limited number of servers. Often will not include any More than 4 Devices Up to 4 Devices
integrated controller, but rely on server host for that functionality.
–Often Direct Connected (SATA, IP, etc.)
–May optionally offer limited number of redundancy features
Group 3) Entry / Midrange
SAN or NAS connected storage which places a higher emphasis on value than scalability and More than 20 Devices More than 4 Devices
performance. This is often referred to as „Entry Level‟ storage.
–Network connected (IP, SAN, etc.)
–Has options for redundancy features
Group 4) Midrange / Enterprise
SAN or NAS connected storage which delivers a balance of performance and features. Offers higher level More than 100 Devices More than 100 Devices
of management as well as scalability and reliability capabilities.
–Network connected (IP, SAN, etc.)
–Has options for and often delivered with full redundancy (no SPOF)
Group 5) Enterprise / Mainframe
Storage which exhibits large scalability and extreme robustness associated with Mainframe deployments,
though are not restricted to Mainframe only deployments. More than 1000 Devices
–Mainframe connectivity with optional network connection (IP, SAN..)
–Always delivered with full redundancy (no SPOF)
–Often Capable of non-disruptive serviceability
See: Green Storage Power Measurement Specification for complete details
43. Desired Storage Metric –
“Productivity”
Many possible definitions – must balance simplicity against applicability
• “typical workload”, with levels • detailed performance benchmark – results/W
Standard Performance Evaluation Corporation
• “four corners”, maximum
performance, maximum power
Random, Sequential, • The Green Grid Productivity Proxy Proposals
write read
example – Proxy #4 – bits/kilowatt-hour
Random, Sequential
read write
44. Complications
Server power Storage power
• Max power =/= Max performance
SPECweb 2005 (banking) + storage
• Significant
whole-system
considerations
Single disk drive power profile “Storage Modeling for Power
Estimation”, Miriam Allalouf , Yuriy
Arbitman, Michael Factor, Ronen I.
Kat, Kalman Meth, and Dalit Naor;
IBM Haifa Research Labs;
manuscript; March 2009
IBM Haifa Research Labs
“The Next Frontier for Power/Performance Benchmarking:
Energy Efficiency of Storage Subsystems” Klaus-Dieter Lange;
SPEC Benchmark Workshop 2009; January 2009
45. Need for Data Redundancy
RAID 10 – protect against multiple disk failures
DR Mirror – protect against whole-site disasters
Backups – protect against failures and unintentional
deletions/changes
Compliance archive – protect against heavy fines
Test/dev copies – protect live data from mutilation by
unbaked code
Overprovisioning – protect against volume out of space
application crashes
Snapshots – quicker and more efficient backups
46. Result of Redundancy
- Power consumption is roughly linear in
the number of naïve (full) copies
Test
Test
10 TB Test
Test
Test
Archive Archive ~10x +
Backup Backup Backup
Snapshots Snapshots Snapshots Snapshots
5 TB “Growth” “Growth” “Growth” “Growth”
RAID10 RAID10 RAID10 RAID10
Data Data Data Data
Snapshots Snapshots Snapshots Snapshots Snapshots
“Growth” “Growth” “Growth” “Growth” “Growth” “Growth”
RAID10 RAID10 RAID10 RAID10 RAID10 RAID10 RAID10
1 TB
Data Data Data Data Data Data Data Data
App RAID 10 Over- Snap- DR Disk Compliance Test/Dev
Data Overhead provision shots Mirror Backup Archive copies
47. Positive Effect of
Green Storage Technologies
- Green storage technologies use less raw
Test capacity to store and use the same data set
10 TB
Test
Test - Power consumption falls accordingly
Test
Test Test
Test
Test Test Test Test
Test
Test Test Test Test
Archive Test Test Test Test
Test Test Test
Backup Archive
Snapshots Test Test
5 TB “Growth” Backup
Archive
Archive
Archive
RAID10 Snapshots Backup Backup Backup Archive
“Growth” Backup
Data Snapshots Snapshots Snapshots Snapshots
RAID DP “Growth” “Growth” “Growth” “Growth”
Data RAID DP RAID DP RAID DP RAID DP
Snapshots
Data Data Data Data
“Growth”
Snapshots
“Growth” Snapshots Snapshots Snapshots Snapshots
RAID10
1 TB RAIDDP
“Growth”
RAIDDP
“Growth”
RAIDDP
“Growth”
RAIDDP
“Growth”
RAIDDP
Data Data Data Data Data Data
RAID 5/6 Thin Multi- Virtual Dedupe
Provisioning Use Clones &
Backups Compression
48. Green Storage Technologies
Enabling technologies
Storage virtualization
Storage capacity planning
Green software
Compression
Snapshots
Virtual (writeable) clones
Thin provisioning
Non-mirrored RAID
Deduplication and SIS
Resizeable volumes
49. Typical Savings
Thin provisioning
40 - 60%
Average 30% utilization over 80% utilization
RAID 6
35%
For 14-disk RAID 6 set, compared to RAID 1/10
Deduplication
40 – 95%, depending on dataset and time interval
~ 40 – 50% average over time
Resizeable volumes
20 – 50%
50. Green Storage Technologies
(cont.)
Other storage technologies and power saving techniques
Capacity vs. high performance drives
ILM / HSM
MAID
SSDs
Power supply and fan efficiencies
Facilities-side technologies
Hot aisle/cold aisle
Water & natural cooling
Flywheel UPSs
51. Savings Matrix
Savings can multiply in combinations with checkboxes
C SS VC TP R DD RV
Compression (C)
Snapshots (SS)
Virtual Clones (VC)
Thin Provisioning (TP)
RAID (R)
Deduplication (DD)
Resizeable Vols (RV)
52. SNIA Green Efforts
SNIA Green Storage Initiative (GSI) and SNIA Green Storage Technical
Work Group (TWG)
on-going efforts to develop data-driven green standards & metrics
power measurements at multi-vendor “unplugged” fests
alliances with other active green organizations
(The Green Grid, 80PLUS/Climate Savers, DMTF, SPEC, SPC)
collaboration with EPA on the ENERGY STAR for Storage program
Whitepapers / workshops
four tutorials at SNW; online tutorials available
(www.snia.org/education/tutorials)
white papers from GSI
54. IDC: Worldwide IT Cloud Services Spending*/**
$5.5 billion
Storage Storage
5% 13%
Server
9%
Business Server Business
Applications 8% Applications
57% 52%
App Dev &
Deployment
11% App Dev &
Deployment
9%
Infrastructure Infrastructure
Software Software
18% 18%
2008 2012
$16.2 billion $42.3 billion
* by Product/Service Type, 2008 & 2012
** Includes enterprise IT spending on Business Applications, Systems Infrastructure
Software, Application Development
& Deployment Software, Servers and Storage
Source: IDC - IT Cloud Services Forecast - 2008, 2012: A Key Driver of New Growth
55. Some basic cloud storage
attributes
Pay as you go
Self service provisioning
Scalable, Elastic
Rich application interfaces
No need for consumers to directly manage their own storage
resource
By offloading the Storage Management, data
owners can focus more on the management of data
requirements ...
56. Cloud Computing Perceived Benefits
and Demand Drivers
Cloud computing‟s “nirvana-like” Which in turn puts pressure on
promise drives higher service the enterprise data center to
level expectations among deliver higher service quality (at
business entities and individual lower cost)
Business Entities
users IT Users IT Providers
Key Benefit: Key Benefit: Key Benefit:
Innovation Quality of Experience Competitivenes
Faster, easier innovation Speed of access Lower TCO
New business models Ease of access (anywhere, Faster Time to Market
New products and services anytime) Higher Cust Rentention
Faster time to market Ease of use Service quality
Lower IT cost Minimal software requirements Resource optimization
Lower IT risk (brand on access device Resiliency
protection) No long-term commitments Flexibility
Improved IT user productivity Efficiency
Improved Client Satisfaction “Green”
Improved Disaster Recovery Enhanced chargeback
57. What is Cloud Storage?
Cloud Storage can be contrasted with SAN/NAS storage
Both are “Storage Networking”
Provisioning may be different (some interfaces do not require this)
How you pay for it may be different
One primary difference is that essential management tasks for storage
resources are performed by the Cloud operator and not the storage user
Public Storage Clouds
Latency may be an issue for most enterprise applications
Primarily aimed at web-facing applications that already serve data over the web
Importance of SLA Management
Private Storage Clouds
Can be either web-facing or used for enterprise applications
Can be operated by internal IT departments – driving costs down and achieving
better utilizations
Importance of SLA Management
Hybrid use of public and private clouds (including existing data centers)
This is not only about capacity provisioning
Data Assurance, Security, Delivery, Migration…
Leverage Virtualized and Self*/Automated Management Environments
Also part of Virtual Data Centers
58. Some Examples of Cloud Interfaces
De facto and proprietary interfaces
Amazon S3 (http://aws.amazon.com/s3) “As simple as possible, but no
simpler”
GoGrid (http://wiki.gogrid.com/wiki/index.php/Cloud_Storage)
Some offer standard data path APIs, but allocation and provisioning
are behind “storefronts” or proprietary APIs
SAMBA, RSYNC, SCP – “standard” open source
Microsoft Azure Interface
De jure APIs
WebDAV (http://www.ietf.org/rfc/rfc2518.txt)
iSCSI (http://www.ietf.org/rfc/rfc3720.txt)
NFS (http://www.ietf.org/rfc/rfc3530.txt)
FTP (http://www.ietf.org/rfc/rfc959.txt)
But very few of these interfaces support the use of
metadata on individual data elements
59. Cloud Storage:
Use Cases and Requirements
Store my file and give me back a URL (i.e. Amazon S3)
Best Effort Quality of Service?
Provision a filesystem and mount it (i.e. WebDAV)
Quality of Service specification via provisioning interface
Give me Filesystems/LUNs for my Cloud Computing
NAS box in the cloud…
Store my backup files until I need them back
Maybe offer me a local cache as well
Archive my files in the Cloud for Preservation/Compliance
Maybe offer me eDiscovery services, “tape in the mail” retrieval
Store all my files, allowing me to set the Data Requirements, let me cache
and distribute geographically
Policy driven Data Services based on Data System Metadata markings
60. Types of APIs
Besides the “Data Path” APIs (previous slide), there are other interfaces
that Cloud Storage may require
E.g. Storage Provisioning
For certain types of data storage interfaces (block, file) from the cloud
you will need to provision/allocate storage before you can use it
This provisioning can be done via a UI or an API
Existing standards can be leveraged (e.g. SNIA SMI-S)
E.g. Storage Metering
Since the cloud storage paradigm is “pay as you go”, you need to know
what your bill will be at the end of the billing cycle
What operations affect my bill?
UI typical, but an API standard would enable interoperability and better
automation
Telecom Industry Practice – every transaction has a “Call
Detail Record” that is aggregated for billing
61. Some Example
Data Storage Interfaces
Block Interfaces
SCSI, ATA, IDE
Local File Interfaces
POSIX, NTFS
Network File Interfaces
NFS, CIFS, SMB2,
Appletalk, Novell, AFS
Object Based
OSD, XAM
Database
JDBC, ODBC
Not all of these make sense for the Cloud
62. Cloud API to the
Resource Domain Model
Cloud interfaces with all 3
domains (Information,
Data, Storage)
Integration of services with
different type of Clouds
(Compute, Applications...)
Federation of Clouds
Cloud Exchange,
Cloudbursting…
Data Movement
Migration, Delivery,
Regulations
63. XAM API: an example
Data Storage Interface
XAM is the first interface to standardize XAM User metadata is un-
system metadata for retention of data interpretable by the system, but
stored with the other data and is
XAM implements the basic capability available for use in queries
to Read and Write Data (through
Xstreams) Given this we can see that XAM is
XAM has the ability to locate any a data storage interface that is
XSet with a query or by supplying used by both Storage and Data
the XUID Services (functions)
XAM allows Metadata to be
added to the data and keeps both
in an XSet object
XAM uses and produces system
metadata for each XSet
For example Access and Commit
times (Storage System Metadata)
But it also uniquely specifies Data
System Metadata for Retention
Data Services
64. Standards for Cloud Storage
Service access interfaces
Storage service interfaces
Cloud Service User
Service Management
Virtual Image Management
Provisioning
QOS SOA
Application
Performance management
Middleware
Chargeback accounting
Data protection
Storage Security Virtualized Infrastructure
Server / Storage / Network
Compute
Storage infrastructure
management interfaces (SMIS)
65. SNIA Cloud Technical Work Group
www.snia.org/cloud
Engaging the industry
http://groups.google.com/group/snia-cloud
Alliances
Education & Whitepapers
Use Cases & Taxonomy
Interface Specification
And coming soon to Brazil! Cloud Storage Brasil
http://groups.google.com/group/snia-cloud-br?hl=pt-br