1. What Is Object Storage, When Should You Use It and What to Look For When Purchasing It
Sponsored by EMC
Guest Speakers: George Crump, Founder and Lead Analyst, Storage Switzerland
George Hamilton, Sr. Product Marketing Manager, EMC Atmos and Centera
Moderator: (Don Keefe)
Is It time To Consider Object Storage
What Is Object Storage?
When Should You Use It?
What to Look For When Purchasing it
(Don Keefe): Hello, and welcome to today‟s SearchCloudApplications.com Presentation “What is
object storage? When should you use it and what to look for when purchasing it?”
2. Agenda
• Storage Swiss Background
• The State of Unstructured Data
• What Is Object Storage?
• When Should You Use It?
• What to Look For When Purchasing it
My name is (Don Keefe), and I am going to be the moderator for today‟s presentation. Today‟s
presentation is being brought to you by EMC. Before I begin today‟s presentation, please note that
the slides will be pushed to your screen automatically, and all the audio will be streamed to you
through your computer.
If you have any questions today for our speakers, you can enter them by clicking on the questions
tab which is on the left side of your screen, and click on the submit question button; your questions
will be addressed at the end of today‟s presentation. With that said, it is now my pleasure to
introduce our speakers for today; joining us are George Crump, George is founder and lead analyst
at Store Switzerland, and analyst firm focused on the storage marketplace.
We are also joined today by George Hamilton. George is a senior product marketing manager in
EMC‟s advanced storage division, responsible for EMC‟s object storage platforms, EMC Atmos
and center. Now let me hand things over to George Crump to begin his presentation. Please go
ahead, George.
3. Background
• Analyst firm covering storage, cloud and virtualization markets
• Knowledge of these markets is gained through product
testing, real world implementations and interactions with users
and suppliers
• The results of this research are found in the articles, briefing
reports, case studies and lab reports on our web site
www.storage-switzerland.com
George Crump: Thanks. Welcome everybody thank you for tuning in today and my role is to
take you guys through the -– sort of the basics of object storage, what is it, when should you use it
and then what to look for when purchasing it. So from an agenda standpoint, I will just kind of
give some quick background on who we are, what the state of unstructured data is, what is object
storage, when should you use it and again, what to look for.
So just some background on us, we are an analyst firm, we focus on the storage cloud and
virtualization marketplace. Those markets cover everything from data projection to cloud storage
infrastructures to cloud application; things -– all of the various things in those realms. We gain
knowledge of these markets through product testing, real world implementations, interactions with
users and suppliers, then you can find a lot of that research on our Web sites in the form of articles,
briefings, case studies, et cetera. So let‟s talk about the state of unstructured data.
4. The State Of Unstructured Data
• Unstructured Data Growth is the
key problem being faced by both
cloud providers and large
enterprises.
• The Petabyte is the new Terabyte
• Double Digit PB Storage
Infrastructures are common
• Billion of files under management
You know, data is kind of interesting. I‟ve been involved in storage for many years now, and in the
early „90s, mid „90s, we were all very focused on online data base backup and how fast could be
get data bases back up, and everybody was sort of beating their chest toward one terabyte and hour
backups and things like that, and that was big back then, I mean that was an important obviously
aspect of data center. What is interesting is how things have changed over the last probably four to
five years is that most of the conversation that we have users and providers and all of those sorts of
people, really focuses now on unstructured data. And initially, again five years ago a lot of this
was kind of office productivity files, things like that; but it quickly grew into all types of rich
media, which includes videos and images and audio.
You know, we deal with some providers that, for example, store images and the images that may
cost you nothing to store the image, but then the use of that image will cost you something, or they
take those images and package them into holiday cards or cases for cell phones, you know, they
screen the image on the back of it and things like that. So there is -– the concept has changed. The
other real big impact is the quality of the image or the audio file or the video file, or even the office
productivity document has really gotten significant better.
And as a result, the capacity that you -– that each of these file consumes has grown dramatically.
So if you look at the -– kind of the slide there, the data growth is sort of the key problem being
5. faced by certainly cloud providers and then large enterprises. And the other big thing is the
petabyte is sort of the new terabyte if you will; especially when we‟re talking with cloud providers
and large enterprises, dozens of petabytes and several now in the hundred petabyte range is not
uncommon.
And so when you‟re looking at data protection and making sure that the data stays viable for an
extended period of time and things like that, the world changes when you‟re dealing with 50, 60, 70
petabytes of information. So that becomes a big problem. And so these double digit petabyte
storage infrastructures are becoming very, very commonplace and it is something that we deal with
all the time nowadays. And again, hearkening back to my early back up days, you know, when we
were trying to design a back up infrastructure, we would look for file servers that had millions of
files and that would always kind of choke the whole back up process.
Millions of files is nothing nowadays. We constantly run into sites with billions and billions of
files and so it is a challenge that we need to keep after, and it is really driving some fundamental
change in how we store information. And what we‟ve seen come out of this is an increased interest
in object storage and the way that works versus the way a standard file system works. So let me
kind of talk about a standard file system quickly, I am assuming most people on the phone know
what that is, but it is hierarchical in nature and it is really designed for humans, right?
What is Object Storage?
Standard File System
6. We tend to think in folders and groups and things like that and so it is really designed for sort of the
same process that creates a data base, right? I mean, you think in terms of records and fields within
a record and things like that. Humans like things to be organized and in the right place and if it‟s
not in the right place, you know, it‟s a problem. The change is though, from a computer standpoint,
a computer doesn‟t necessarily need all this organization and frankly, the organization sort of gets
in it‟s way.
And so there‟s maybe some differences there that we need to take advantage of. So the standard
file system, like I said, we‟ve put -– the little dots represent files, but we‟re talking about in the
billions here and I feel like drawing a billion dots if you don‟t mind. But the challenge is to this
kind of standard file system approach is like I said, it is really ideal for a human. You can kind of
navigate your way down to the right folder and things like that. So it helps; you know, the example
that I like to use is when you drop off your dry cleaning, if it was a self service dry cleaner, you‟d
want the -– your clothes hung by your last name.
So my name is Crump and so I would want them in the C‟s so that I could at least go to the C‟s and
maybe even Cr‟s so that I could get to my clothes, right? And that is how humans need to operate.
I don‟t want to have to go from, you know, whatever the ticket number one to ticket number 5,000
and check each one until I get to the right one, right? So that is really where this sort of file system
architecture comes into place. But the challenge is that as we get into the billions of object
category, the file system approach doesn‟t scale and really starts to incur bottlenecks, right?
7. What is Object Storage?
Standard File System
• Challenges To Traditional
File Systems
๏ File system approach is ideal for
human
๏ BUT with billions of objects the
file system approach does not
scale and incurs bottlenecks
๏ also limitations in use of
metadata
And finally it really develops a limitation in the use of meta data, right? The meta data is data
about data or in the case of an object storage system, it is data about an object. And I might want to
keep things in an object or in meta data that more than what I do today in a typical file system,
right? So if you think about the real basics in a typical file system, part of the meta data is that
location that file path set up, part of the meta data is the create data and the modified date, maybe
the archive byte. And that is really about it.
And in object storage, I might want a much more robust capability so that I can control different
things, than I was able to do before. And so we will talk about some of those advantages. So let‟s
talk about what is object storage? It is essentially the opposite. It -– you can still kind of think of
an object, I think it is simpler for most people to think of an object as a file; it is sort of the lowest
common denominator if you will. But it is not a you know, it is not scattered across the file system
in blocks or things like that. The object is the sum or all parts, so to speak.
8. What is Object Storage?
Object File System
๏ Object storage is an emerging alternative to file-based systems; ideal for
storing large volumes of unstructured data
๏ Object storage decouples data from its physical location through the use
of object IDs
๏ flat and infinite namespace makes object storage scalable
๏ Provides a foundation for other data longevity techniques
And it is ideal for storing large volumes of unstructured data. It decouples -– object storage
decouples data from it‟s physical location from the use of IDs. So if I could just go back to my dry
cleaner example, what really happens when probably all of us go to the dry cleaner, we give the
guy a ticket, it has a number and they move the little conveyer belt thing to exactly that number.
They don‟t know your last name, so more or less, and they just take that number and give you the
clothes associated with that numbered slot, right? Very, very efficient. They can manage
thousands of customers in a very, very small shop.
So think of the same thing with object storage. Your object, or your file, essentially gets a number
like my dry cleaner gives me a number and when you go to get that object, to go to retrieve that
object, you just identify it by this number. So that is sort of the kind of the simplest way I can think
of to describe how that works. And so this flat and sort of infinite name space, really makes object
storage very scalable. I don‟t have to worry about developing these incredibly complex paths to
files and things like that. And it really provides a strong foundation for other data longevity
techniques.
And so I will tie in meta data here and so again, as I talked about in the file system meta data thing
where we were thinking about modify data and create date and things like that, here we are talking
about -– we‟ll have those but we will also have, OK, when I first upload this object I want to
9. maintain six copies in six different continents because it‟s the latest video from you know, insert
your favorite rock star here, you know. And so -– and then when that becomes less popular in six
months to a year, maybe I only want to keep three copies on three continents to service the demand
and then to maintain some level of redundancy.
And then maybe in three years, it doesn‟t really matter at all, and so I only want to keep two copies,
again one for redundancy. So that is an example of some of the things that you can do with
enhanced metadata; there is also things that you can do as far as compliance and making sure that
you have a chain of custody developed and things like that. And so there is a lot of things that you
can do around metadata once you are in a more object oriented type of environment. So that is -–
and you and use that to really maintain the longevity techniques of making sure that one of the
things that you can do is compare the state of an object five years from now, make sure that it still
looks like it was supposed to.
And then compare that object to the other objects and make sure that there hasn‟t been any you
know, data melting or data degradation somewhere else in the environment. So you can always
make sure that the object that you uploaded stays exactly the way that it should be. So let‟s talk
about what we would use object storage for. So there is a lot of different use cases, you know, a lot
of different people think of us as cloud providers and clearly cloud providers are top candidates for
this. It is really anything where there is a very, very high file count environment, or you know, to
use the correct term, object count.
Where we‟re -– and again, we‟re typically talking in the billions and it is also environments where
either long term data retention, data verification are required or where some sort of multi-
geography movement of data is required. And not for the kind of classic DR replication type of
requirement, but more for the access type of thing. So then again, going back to the video example
I used earlier, where to lower latency you might make sure it is available on multiple continents and
things like that.
10. Using Object Storage
• Use cases for object storage
๏ High File Count Environments
๏ Environments where long term data retention
and data verification are required
• How to talk to object storage
๏ API Set
๏ Gateway
๏ Built-in alternate protocol support
So those are the kind of key environments that look to take advantage of these. One of the
challenges that people face is OK, how do we talk to object storage. Again, in the provider market,
it is generally worth it to then to leverage an API set and optimize their -– whatever their
application is to talk directly to object storage. That takes some time, and so people have sort of
build sort of alternate ways to get there. A very common example is a gateway; so if you use any
of the cloud sharing or file share -– file share and file cloud relations (type) that either synchronizes
or does something through the internet, you are using a small form of a gateway.
And so you don‟t know that you are necessarily running cloud storage, it does all the interface for
you. And then finally we are seeing an increase in what we‟ll call alternate protocol support, where
the object storage system can essentially front end itself as some sort of a mount, whether it be a
file system mount or a block storage protocol mount. So those are the typically the three ways to
get to object storage, and like I said, the -– there is -– I don‟t -– probably the most common is
gateways. But generally in the provider space, we see a pretty quick move to the API set to be able
to have direct control over it.
11. What to look for in object storage
• Beware of Do It Yourself Solutions
o Most IT departments don’t have the time or
resources to build their own
• Look At Ingest Rates And Geo
Scalability
o Performance does matter
• Cloud Storage needs more object
file systems
o Intelligent dispersant, automation, back
office integration
So let‟s talk a little bit about what to do -– or what to look for in object storage. My number one
recommendation here is beware of do it yourself solutions. You know, clearly they exist, you can
go get an open storage or an open software type of solution, go buy your own hardware, sew it all
together, and you know, and then typically an example is drawn to some of the larger providers that
already exist today as people that do it themselves. What we find is, most providers frankly just
don‟t have the time or the resources to really do that and then maintain it over time.
You are generally better off with a pre-built solution that is specifically is designed for this market
and doesn‟t require you spending a lot of time doing it yourself. So again, not saying that these
open system -– open solutions are necessarily bad, it is just that for -– they‟re probably not practical
for a large majority of people who are looking for this type of solution. The other thing that I
would like to recommend as you look at interest rates and what we call GO scalability. So a lot of
people kind of hear the term cloud storage and they think, oh well it‟s all about dollar per gigabyte
and performance really doesn‟t matter.
Well, it does matter; the ability to get data into it at a good rate is very, very important. And also,
the ability to do what we call G.O. scalability, the ability to have multiple object storage systems
throughout the country or world and then have data automatically go to those based on policy or
whatever makes the most sense in your environment. So you know, scaling is a critical issue, not
12. only just from a raw capacity standpoint, but also from a performance standpoint, because the
closer that you can automatically put data to a potential user, the better their overall experience is
going to be.
And then finally you want to look for more than just the file system itself in a cloud storage device,
I mean, object storage in and of itself is important but then it‟s what has the vendor done in addition
to just providing an object file system. I don‟t want to minimize the effort in creating just the file
system itself, but clearly there is more things to do it. So a couple of key things that I like to
recommend is what we call intelligent disbursement again, that goes also back to that geo scaling; it
also ties that example I used earlier about you know, a new video comes out or a new music -– new
top ten song, whatever -– or a new movie, and you want to put it in more pods initially because it‟s
populate and then be able to pull it back to a few pods to save money, right?
So that allows you -– that intelligent disbursment gives that capability and again, we think that‟s
very important. Also automation -– the key inn for most providers is how many full time heads to
they require per terabyte/petabyte, right? So obviously the less people that are required to manage
the storage the more profitable whatever the venture is and frankly the -– one of the biggest
challenging is just finding enough skilled storage people to operate it. So sometimes it is not even
so much a money issue as it is a skills shortages issue. So having one person that can manage, you
know, multiple petabytes, if not hundreds of petabytes of information becomes a key requirement.
And then finally, you know, the integration to back office type of functions, whether that be from a
billing or a customer service perspective, things like that. So being able to tie into the sort of more
traditional back office functions becomes very, ,very important. So those are really the key things
that we look for in object storage systems and then you know, kind of go beyond just the object part
itself. So you know, from our point, I‟m going to stay on for questions of course, but I want to
thank you for tuning in, there is my contact info.
13. Thank you!
George Crump, Chief Steward, Storage Switzerland
http://www.storage-switzerland.comgcrump@storage-
switzerland.com
Storage Swiss on Twitter:
http://twitter.com/storageswiss
Storage Swiss on YouTube:
http://www.youtube.com/user/storageswiss
George, I know that you -– first of all, awesome first name, of course; and you know, I want to kind
of hand it over to you and kind of talk about how your guys Atmos projects will kind of tie into that
we are talking about here.
George Hamilton: Oh, that‟s great, George. Thank you very much. And that was a great
overview to object storage, and I especially liked your comment, that the petabyte is the new
terabyte. We find kind of the same thing; it‟s so true. And that is really kind of the new normal for
more and more companies moving forward, especially over the next decade. And they are all going
to have to adapt and begin to operate at petabyte scale. As George mentioned, one of the first
problems is, of course, the sheer amount of unstructured content is growing tremendously.
14. EMC ATMOS
Object-based Cloud Storage
You see a lot of big numbers associate with that and will grow 50 x over the next 10 years; but you
know, more so than that is that there is this application shift going on at the same time, and that is
what‟s driving the content. You know, EMC, you know, pioneered object storage almost 11 years
ago with the introduction of EMC centera. And it really gave it‟s foothold and became such a
strong product because of it‟s ability to very efficiently archive unstructured content.
So when people had to keep a ton of e-mails and keep them for a huge amount of time, it was just
more efficient to do that in object based storage platform because of that unique, because it‟s a flat
addressing scheme, not a hierarchical addressing scheme; so you could scale it very simply and
always access the content, just have the object ID. So for years, object has really cut it‟s teeth as an
archiving platform and been wildly successful at that. But as we‟ve seen over the last several
years, that object storage is also preferred architecture for cloud use cases.
15. What’s Driving Unstructured
Data Growth?
Unstructured data Application shift Instant access
50X growth over 10 years - Web and mobile de facto Demand instant access
store over longer periods of standard delivery and from device of choice –
time consumption models from any location
And so in addition to all of this unstructured data, there is this big shift in applications. We have
more Web and mobile devices being used by everybody and that is really almost a standard
delivery and consumption model for content today. We are post PC era -– there is no preparing for
that -– there is not thinking about it -– that is the reality of today. We are post PC era and what‟s
driving a lot of unstructured data growth is the fact that users want to use all of these different types
of applications on Web and mobile devices and they want to get instant access to their content,
wherever they are, whatever network connection they have and whatever device they‟re on.
17. Web Apps Driving Unstructured Data
Growth
According to IDC, 80% of apps today browser based
Register your Pay your
car Bills
File your Taxes
So looking at this picture, what do these three things have in common? Whether it is new cars, a
bank, or the federal government; you don‟t think of them in terms of new and cool stuff, right? But
they actually all are transforming their application -– their experience with what they do. Even
myself, I renewed my license this past summer; I went online and very easily got onto their Web
site and renewed my license with a few mouse clicks. I didn‟t have to go to the (RMV) and stand
in line for two hours and do that.
And who among us doesn‟t pay our bills online anymore? Or do other financial transactions on a
mobile device or a tablet. We can all do these things. And you know what? I just did my taxes; I
took advance of the Monday holiday and I got my taxes done; and I did that all online within a
couple of hours; really, really easy and I never had to talk to a human being to do it. So these
companies which you don‟t think of as dot com and cutting edge companies have all transformed
the way people interact and access and share the information in these apps through the use of Web
and mobile type of applications.
18. Mobile Devices Primary Access And
Delivery Vehicle
Smartphones outsold PCs in 2011 – Rise in BYOD
Go Grocery
Shopping Access your
stuff
Log Expense
Reports
Diagnose
your
Health
And the use of these applications is driving a lot more content creation and consequently the need
to store it. So again, as I mentioned, this is the post PC era now, it is not on the horizon, we are in
it. You know, how many devices do people in the audience have? I bet everybody out there has
multiple devices, in fact, the average worker now has more than three on average. I think the
average went from 2.8 a couple of years ago to now 3.2. So most of us have now more than three
devices.
And I can vouch for that, because I do have a work laptop, and I have a tablet and I have a smart
phone. Most people can probably say the same thing. So it completely changes how you interact
with application and do your actual job. From a mobile device I can order my groceries, I can do
my expense reports on a mobile app delivered by my company; I can access all of my stuff. I
actually don‟t use iCloud but I use other cloud services. And I can access all of my stuff and do
think and share and share files with people very easily from whatever device that I happen to be
using at that time.
It really ends up being my primary device whenever I am outside of the office now, where there is
that a couple of years ago that was probably my laptop. And in fact, a recent survey said that about
95 percent of companies now support bring your own device in some fashion. And along with that,
comes a sea of content that users want access to -– again, wherever they are and on whatever
19. device they have. Whether it‟s a sale meeting type presentation, whether it‟s a nurse that needs an
image or a medical record, and the back end storage needs to be optimized for this new
consumption and delivery model.
Why File Systems Don’t Work
For Next Gen applications file systems have major
disadvantages
Locking (whole file and byte-range) is
complex
Distributed (WAN) access is complex
Single-site HA clustering is complex
Geographic HA clustering is extremely
complex
File system replication is extremely complex
File system security is complex
Folder/file access control, inheritance
Reliance on complex, session-based
authentication
So file systems, as George kind of alluded to, are built for a very different purpose then these next
generation Web and mobile and cloud types of applications. And it is not to pick on file systems
unfairly, they have a very critical place in IT. But they are optimized to deliver and store structured
content, that is generally very tightly tied to an application and delivered over a local area network.
So it is really built to perform in that context. What -– but what makes file systems work so well in
that context, makes then actually a poor architectural fit when you get into native cloud and Web
and mobile applications.
Because there is simply a lot of built in complexity with file systems. And you know, this helps
with the data protection integrity and working with relational database management systems and
their associated applications. And a primary storage system, serving kind of performance sensitive
applications, you know, these complexities can be manageable. But when you operate at scale, the
limitations become very apparent very quickly. When you want to scale beyond a single site,
replication becomes very complex. High availability clustering whether you‟re in a single site or a
multi site environment also gets very complex.
20. Security, file access, et cetera; they all contribute to the complexity of file systems. But
importantly for developers, they need to factor all of this into their code; and this makes coding
take longer, they have to spend cycles on mundane coding tasks rather than on application
functionality and they have to depend on storage admins quite a bit throughout that whole process.
But really with today‟s environment, object storage is a much better fit. As George alluded to,
developers can write to an API and that‟s where object and cloud storage really excels.
Top 3 Reasons Web and Mobile Apps work
better with Cloud Storage
1. Location Transparency -- use one storage system & access point
across many global apps
2. Self-managing storage – NO LUNS, never recode when systems
change
3. REST (HTTP-based) APIs – simplify and speed development
app app app
https://accesspoint.yourcompany.com
EMC Atmos Cloud
Storage
So one of the three reasons that, you know, we‟ve identified as why Web and mobile applications
work better with cloud storage based on object -– and number one is location transparency. You
use one storage system, one access point across however many global applications that you have.
The application simply doesn‟t care where the data is located; it doesn‟t have that hierarchical
addressing scheme of the tight relationships. It is more -– the analogy that we always use in object
storage is that it is like valet parking your car.
21. EMC Atmos Cloud Storage
A Platform For Next Generation Applications & Cloud
Services
File tiering/ Medical imaging/ Custom apps Package apps Storage-as-a-service
archiving/backup VNA (web/mobile)
ATMOS SDK, HTTP/S (REST), S3, CAS, NFS, CIFS
https://accesspoint.yourcompany.com
New York U.K.
You know that when you go into the restaurant you have your ticket, you don‟t know where your
car is, you don‟t care where your car is, they might even move it a couple of times while you are
having your meal, but at the end of the night, you hand in your ticket, you get your car back. And
object storage works exactly the same way. So you don‟t have to worry about the location of the
data. The application doesn‟t care. It has the object ID, it can get to the object. Also, it is self-
managing storage. I don‟t have to work and have an IT -– a storage administrator in the IT
department provision storage for them.
There are no (Luns) there‟s no (Raid) clearance, there‟s no replication schemes to take into account
when developing. And also with that location transparency, that means you don‟t have to recode
when the systems change; you‟ve written to the API, the underlying infrastructure doesn‟t matter to
the application developer. That is really what cloud is all about; I want a utility -– and so, having
self managed the storage without all the complexities really helps developers achieve that. And
lastly of course, is using Web based Web services API, using (rest).
This simplifies application development and it speeds application development, which reduced risk
to the organization. If I have applications that have less code that is -– that makes testing easier, it
makes QA easier, it speeds up development and that reduces my overall risk and I can get
application project out the door faster. So let me talk a little bit about -– drill a little bit down on
22. EMC Atmos cloud storage. And essentially EMC kind of built on the foundation of centera which
is object based storage, but added cloud capabilities and the ability to geographically disperse
content over multiple locations and Web searches access and cloud like capabilities to it.
So Atmos presents itself as a single global system with one global name space. And that global
name space is accessible by this multiple access methods to that single global name space. So
developers can use the Atmos software development kit and Web services standard as well as
traditional file access protocols to simply access this big storage pool, a distributed object pool over
multiple locations but it is presented as one logical system. And the primary use cases we‟ve seen
people start with maybe tiering and archiving to the cloud, but then once they‟ve done that and they
see that it drives down the cost of storing content long term, but also making it available, there is
also a tremendous value in doing customerized Web and mobile applications, makes that easier.
Store, Archive And Access
Distributed Unstructured Data At
Scale
APP
3 Single storage cloud
Limitless scale
Multi-tenancy
Metadata-driven policies
Storage-as-a-service
Multiple access methods and APIs
Instant access from any device
New York U.K.
There is also packaged applications using those protocols, can get access to storage very simply and
also be able to deliver storage as a service, and we have a (fixed use) service providers that offer
storage as a service with EMC Atmos as the back end storage platform. So as we say, EMC Atmos
gives you the ability to store, archive and access distributed, unstructured data at scale. And I will
go through these in a bit more detail but essentially it presents itself as a single storage cloud, it
scales very easily in largely limitless scale, it features built in multitenacny, and very importantly it
23. has meta drive policy management, so that you can optimize the placement and retention
disposition of objects within the system and do that in an automated way according to policy.
Unique attributes of object and cloud storage
Objects can live anywhere (location transparency) and are not tied to a specific
underlying file server or file system
Flat, universal namespace is ―application-friendly‖ and allows global access to stored
content from anywhere the distributed application runs
REST (HTTP-based) APIs promote rapid application development
Applications can easily associate custom metadata with stored objects – no need for a
Reduced separate, synchronized database
Complexity Easy to restrict access by placing files in secure sandboxes (multi-tenancy)
Policy driven management controls automated file distribution & access, and provide
data resiliency and high availability
Self-managing storage makes it easy to grow capacity or add new sites
No need to provision LUNS or create filesystems, mount points or shares
No need to modify an application that’s running out of space
Near-limitless scale – just add more hardware when needed
Increased Automatic load balancing as new objects are stored
Scalability
Scales elastically – apps simply create and delete files as needed
You can deliver storage as a service; if you are either a service provider or an enterprise that wants
to be able to offer a self service storage model to your organization you can do that. And it offers
multiple access methods and APIs and gives that ability to get instant access to storage from any
device. So what the -– really the unique attributes of objects in cloud storage again, as I mentioned
the location transparency. It‟s not tied to any specific underlying file server or file system, you
have the flat universal name space, which is very friendly to applications.
So it allows that kind of global access to all of the content from anywhere distributed applications
may run. In the end, (rest based APIs) promote very easy application development, the applications
can very easily associate meta data with their objects so there is no need to build a separate
extraction layer with meta data and then manage that separately. Objects and their metadata are
stored together in the system. And again, as I mentioned (inaudible) to all of these, but itself
managing storage. It makes it very easy to grow capacity, to add new sites, you don‟t have to
provision (lunds), not creating additional file systems, mount points or shares.
24. Then there is no need to modify an application that may be running out of space. And of course, I
mentioned the scalability of it. It really is on a limitless scale, you just add new hardware when
needed; it is a node based architecture; we simply add more nodes, more locations and itself
configures, it‟s very easy to scale. And again, it self configures, to it automatically load balances as
new objects are stored in the system and you can very in an elastic way scale up and down. So as
an IT department I can now operate as a service provider by giving self service access to different
business units and I can actually manage and monitor what they use for storage resources.
A Single Storage Cloud
Unlimited Applications, Services And Users
Distributed object store
–Best fit for unstructured data
–Uses Object IDs with metadata in Blob storage
–No RAID Groups, LUNs or File systems
https://accesspoint.yourcompany.com
Global namespace
–Common view independent of location
–Abstracts storage from the application
–No need to recode apps – ever!
Multi-site active/active
–Distributes objects & access across all sites
–No dedicated replication or back up required
–Instant access to data
Site1 Site 2
So as I mentioned, Atmos acts as a single storage cloud. At it‟s root, it is a distributed objects
store. So you have a very large object storage system that can be distributed across multiple
locations, but again presents itself as a single global system managed through one pane of glass and
access through a single global name space. That is what makes it such a fit for unstructured data;
and again, there is no (raid) groups or (lunds), none of the traditional file system mechanisms
involved in it. And again it features that single global name space, so you have a common view
independent of the location.
25. Limitless Scale
Eliminate Storage Sprawl And Downtime
App n
Scale out architecture
–Node-based for instant scale
–Flex out to public clouds
–Performance scales linearly with capacity
Operationally efficient
https://accesspoint.yourcompany.com Non-disruptively –Rebalance nodes to optimize performance
Add Apps, Users, –Redundancy and multiple access points
Or Capacity –Flexible configurations to set SLAs
Self-configures, self-heals
–Recognizes new capacity, sites,
applications, and tenants instantly
–Automated self-healing
One management view
–Web-based
–Aggregated alerting
Site1 Site 2 Site 3
It really extracts storage from the applications; so that is really the key in things like the health care
field that is where interoperability is such a huge thing. Being able to abstract all of your image
data from all of these different picture archive and communications systems, means I can swap out
a Pax -– picture archive and communication system or new application and I don‟t have to migrate
the data. I have basically extracted that data from the applications, so I‟ve got a single global
archive that can act as an archive to multiple applications and types.
So I don‟t have to recode apps and I don‟t have to migrate my data. And it is a multi site active
architecture. As I said, in the traditional file system world, you have mostly active/passive
architecture, so you have a primary data center and a secondary data center, that would act as a fail
over in case I have major issues at my primary data center. Atmos works fundamentally different
in that it is multi site active. Every site can actively serve content and it distributes objects across
all of the sites so there is no dedicated replication, there is no backup required, you always have
instant access to the data, even surviving a site outage.
That architecture also lends itself very well to scaling. It‟s a scale out node based architecture. So
as I said, you can simply add new sites, new capacity into the infrastructure on demand and it will
self configure and be ready to recuperate that into the environment. So the performance scale is
linearly with capacity and that makes it very efficient; you can rebalance nodes to optimize
26. performance, you have redundancy built into it, multiple access points. You have flexible
configuration to set SLAs. And again, I mentioned it, it self configures itself, heals -– and again,
all of this with one management view.
Multi-Tenancy
Share Resources Across Tenants, Apps, And Users
Maximize storage utilization
Department A Department B Department n –Securely isolate and share across dept, app, users
Atmos Policy A Atmos Policy B Atmos Policy n –Eliminate ‘over-provisioning’
Tenant A Tenant B Tenant n Simplify management
–Set policies, SLAs and access across tenants
–Aggregate view of resources and utilization
Improve IT agility
–Provide instant access to storage
–Empower tenants with self-service access
Site1 Site 2
So even if you have multiple locations, you have aggregated a learning for all of that that is all Web
based, you can monitor the whole environment from single pane of glass. And even when I am
adding applications, users, it is not disruptive to the system whatsoever, it automatically just self
configures when you add that capacity. And of course nothing is really a true cloud unless you can
support multi-tenancy. So Atmos gives you the resources to share resources across different
tenants, whether that is applications, users, locations; so you can really maximize your utilization of
storage but you security isolate the different tenants on this system.
This eliminates a lot of over provisioning, because tenants can simply subscribe to the amount of
capability that they need. And if they don‟t need it any more they can simply release it back into
your storage pool. And so it simplified management because it is self service and you can also kind
of set different policies and service levels across different tenants. If you are a service provider and
example, you know, you could have offered different storage service levels for different classes of
tenants. If you are an enterprise IT department, it enables you to act as a service provider,
improving your agility in being able to provide instant self service access to storage.
27. Metadata-Driven Policies
Automate Data Lifecycle Management
Policy A. Automate data lifecycle
―Status=UserPaid‖
Multiple copies
X management
multiple sites –Placement, retention, disposition and expiration
–Use age and trend to drive to tier 2 storage
Improve storage efficiency
–Set number, type and location of replicas
–Synchronous and asynchronous options
Policy B:
―Modality=MRI‖ –GeoParity erasure coding 65% more efficient
Multiple copies
retained 5 years Customize at object, tenant and
system levels
Archive
X
So another key feature of EMC Atmos is the fact that it includes meta data driven policies, so you
can automate the placement and retention of all of the content in the system. You are basically
automating your information lifecycle management. So you can set policies for the placement,
retention, disposition, expiration; you can you age, trend, and drive it to a different tier of storage
based on that. As we always say, it is like disk to disk to somewhere else, instead of a policy that I
set. I‟ll have something in local storage for say 30 to 60 days, then I want to move it to my private
cloud archive on premise for the next year or two, and then off load it to a third party service
provider for another maybe seven years according to some compliance mandate.
You can set that all by policy and if you are actually working with an Atmos powered service
provider, you can have a hybrid model where you store locally for a certain amount of time and
then automatically by policy, push to an internal cloud archive and then push it to an external third
party cloud archive on Atmos and manage that whole process internally; again, as one global
system, even when you are working with a third party cloud provider. So you an customize all
these settings at the object level, tenant, system levels.
28. Manage And Deliver Storage-As-A-
Service
Transform From Cost To Value Center
Manage basic storage services
–Set pre-defined policies & SLAs
–Quota management
–Granular metering, historical trending
–Chargeback and billing support
Develop your storage service
–Atmos Cloud Delivery Platform open source storage-as-a-
service portal – instantly..
–Monetize your solution with web services
Transform IT
–Automate basic IT storage tasks
40+ Service Providers –Enable self-service access
deliver their storage service on
ATMOS
So the examples that we show here is like something like an MRI where you want to have -– I can
say, I want this many copies stored in this many locations for this amount of time, and even
automate the movement of that to different tiers of storage. It is crucially important the amount of
data that is out there that people have to manage, to automate this. There is simply not going to be
enough IT staff hired over the next decade to keep up with the amount of data and manually do this.
So with all of that, you know, Atmos allows whether you are a service provider or an IT
department, to manage and deliver storage as a service.
And so for IT there is a lot transparency that comes with this. If you can prove your value to the
business, you can manage your storage services very easily, very transparently, you can granularly
meter what people are using, you can do charge back. If you are a service provider you can
integrate that charge back with the billing applications and as I said you can actually deliver a
storage service. So for an enterprise IT department, they can transform their IT from a cost center
to more of a value center if the service provider that allows them to get out the door, were the cloud
storage service, very quickly.
And I even have to correct this line because it is now over 50 service providers that delivery their
storage service on EMC Atmos. What is also key to EMC Atmos is that it is API driven storage
and we do have a software development kit that provides a lot different language bindings codes;
29. we have a whole community building up around this now. We support multiple access methods,
like I said, Web service standards like the rest and traditional access methods such as sys and nfs.
So Atmos, at the end of the day, can service as a location independent and application agnostic
archive.
Application Access Methods
Atmos SDK Provides Language Bindings, Code And
Sample Apps
Access methods
–Web services: REST/SOAP
–Traditional access: CIFS, NFS, CAS
APIs
–Atmos REST API
–Native S3 API
Broad range of language
bindings
You can have multiple applications can you Atmos as their archiving target whether it is a
traditional application using sys nfs or whether it is a Web based application that is using (ras);
Atmos can be the archiving target for it. And with that comes very easy instant user access, you
can get basic http access to storage for any device; we have a product called geo drive which allows
you to kind of almost create a cloud drive on a desktop and be able to push stuff up into EMC
Atmos very, very simply or create a Linux mount point, and some browser plug ins.
30. Instant User Access
Basic http access
–Upload and share files w/expiration
–Anonymous URLs to share files
GeoDrive Windows and Linux
–Drag, drop, stub, backup, recover, share
–Atmos Windows cloud drive – e.g. G:
–Atmos Linux mount point
Browser plug-ins
–HTML5, AtmosFox, AtmosChrome
So all of this makes it very, very easy for mobile devices and client end points to get very easy
access to the cloud storage. And these are, you know, geo-drive is a free product -– a separate add
on product -– but free, for users to be able to do that. So let‟s take a look at some of the
deployment options for it. So Atmos at the end of the day is basically hardware and software that
EMC sells for enterprises and service providers to build a private or a public cloud service. So an
enterprise can build a private cloud storage service, but then they can tap into again, one of those
over 50 Atmos powered cloud service providers and basically have a hybrid cloud model, where
they store things, certain content privately but then confederate out to public clouds.
31. Deployment Options
And some additional add on software, I mentioned Atmos trio drive here on the bottom, but also we
have the Atmos cloud delivery platform, so the Atmos cloud delivery platform allows either a
service provider or an enterprise to build a storage as a service offering and deploy it right out of
the box. It is a total turnkey solution. Many service provider will kind of customize and build their
own self service portals and management frameworks, but Atmos provides a way for service
providers to get out the door very quickly with a cloud storage service and have all that right away.
32. Atmos Add-On Software
Atmos Cloud
A turnkey solution to deliver and manage storage-as-a-
Delivery service
Platform Offer self-service access and storage management
Manage and meter utilization and bandwidth per user
Integrate chargeback and billing
Atmos
Instant Windows and Linux access to any Atmos storage
GeoDrive cloud
Creates a virtual drive (Windows) or mount point (Linux)
Enables users to upload/download/share files
And we kind of say it‟s crate to credit card within 90 days. And there is some customization that is
allowed with that, too. Service providers may only use certain functionality at the cloud delivery
platform, or they may be a total customs solutions or they may just go right out of the box and put
their own branding on the Atmos cloud delivery platform. And Atmos also, you know, the value of
a cloud storage service is really the applications that are on top of it. And so Atmos through the
power of it‟s developer community that it has built and it‟s partner program, has already
reintegrated a lot of applications through solutions that could work directly with the EMC Atmos.
33. Private Or Public
Storage Clouds
Build & manage Use one of the 40+
a Atmos-powered public
private storage cloud services around
cloud the globe
Custo
m
APP
Federation
ATMOS
And they fall into a couple of broad categories, and at the top of course is just being able to write
your own custom applications using these new development frameworks. But a big use case of
course is archiving and back up and tiering content to the cloud. Basically acting as an archive tier.
But also doing kind of file sink and share applications with the EMC (simplicity) and oxygen cloud
and as well as doing things with medical image archiving and being able to take that even a step
further in having a mobile access to an archive content.
Let me introduce some examples that kind of showcase what Atmos has done in the real world and
you are probably familiar with vista print they are a company local here in Massachusetts. They do
a lot of custom -– marketing collateral and business cards and the problem that that were having a
few years ago is that the explosive growth in the amount of content that that had to store, and they
had applications and users, globally distributed. They really need to get their cost low while
presenting a very high level of service.
But what they also needed to do is provide different levels of service by customer type. That was
something they there weren‟t really able to do. So you couldn‟t tier the service for one company
that might order business cards once every two years versus a customer that was interacting with
their applications on a daily basis. You know, printing different collateral and doing stuff on a
more regular basis. They couldn‟t offer a tier of service and better monetize the service. So what
34. they did was to reverse their storage problem and the ability to tier the service, is they implemented
the EMC Atmos across three different locations and that allowed them to start sharing their
resources and distribute their objects across those multiple sites.
And it also helped them accelerate the development of their Web based application because it is all
browser based. Then be able to build SLAs for that and tier their service according to the user type
and that gave them more of a competitive edge. So being able to automate the service levels by
customer type was a big thing for them but they also just purely saved on storage costs and over
$1.5 million a year in storage savings. Using an object based platform they could distribute across
multiple locations which is much more efficient. And they also, because they were able to
distribute that content across multiple systems that didn‟t have a lot of replication that goes along
with trying to do that with traditional file systems, they were able to save like $300,000 in
bandwidth costs.
Atmos Integrated Solutions
Custom applications
Archive to the cloud SourceOne
Cloud backup and recovery
NetWorker
Content management & collaboration
DCTM
File tiering CTA DiskXtender
Medical imaging archive
Mobile and file sync and share
Server gateway
WAN optimization and fast file transfer
So not only were they more efficient on the storage side, but they are also using their network more
efficiently. With that if there is any other information that people need I‟d recommend that they do
go to the Atmos page at EMC dot com and there is also multiple Web casts that they can view and
also I would head to Atmos online dot com, where you can not only access the Atmos blog, but you
can also interact with -– we have a developer sandbox there available there at EMC on Atmos
35. online dot com, where you can actually interact with the EMC Atmos storage platform. You can
actually work with developer tools and actually configure the storage space and do some
development work there.
Atmos: Custom/Traditional
Applications Vistaprint
Situation
Efficiently scale and manage 100%+ digital media growth
Applications and users globally distributed
Competitive pressures to retain low cost/high service
Provide different levels of service by customer type
Solution
EMC Atmos cloud storage across 3 distributed locations
Efficiently shared resources and distributed objects across
sites
Accelerated development with Atmos Web Services &
REST API
Automated SLAs by user type for competitive edge
Business Benefits
Automated service levels by consumer type
$1.4M / year storage savings
$300,000 / year bandwidth savings
36. EMC Atmos
Resources & Tools
Datasheets, White Papers, Videos
www.emc.com/atmos
Webcasts
www.brighttalk.com/channel/7397
Stay Up To Date—Twitter, Blog
Twitter: www.twitter.com/emcatmos
Blog: www.atmosonline.com
We have offered that as for different people to kind of get familiar with the platform. And again,
access to the developer community is available on Atmos online dot com as well, where you‟ll get
access to all the different language findings, code examples and developer forums there. And with
that, we can open it up for Q&A.
37. (Don Keefe): We are now going to move on to the Q&A portion of today‟s Webcast. If you have
any questions for our speakers today you can enter them by clicking on the questions tab which is
on the left hand side of your screen, and click on the submit question button. We‟ll answer as
many questions as time allows. OK -– we did have a couple of questions that came in during the
Webcast. The first one that we have here, this person would like to know what exactly is a global
name space.
George Hamilton: What exactly is a global name space? Well, let‟s see, thank of in the world of
file systems, very simply a name space is an abstraction of a file system resource. Perhaps it is a
file server, and in the old way of doing things, you‟d have multiple file servers would have their
own name space. And so when developers would write to a particular name space, so again, they
would need to know the location of the data and where -– to find out where is the data, what server
is it on, so they would write to that particular name space.
A global name space virtualizes, excuse me, all of the different file system resources that are
underneath; so again, it gets rid of the location dependency. This single global name space extracts
every resource that is underneath it, the developer only needs to write to that single global name
space and doesn‟t care about the location of the data underneath. That is really the simplest way to
say it.
38. (Don Keefe): OK, and the next question that we have here, the person would like to know, can you
describe a little bit more about active/active architectures.
George Hamilton: Sure, and again, that -– I‟ll use a good example, too is that we had a customer
of the University of Illinois Health Science System in Chicago and they have two locations with
EMC Atmos running at the two locations. If they were in an active/passive environment, their data
center in Chicago would be their primary data center and perhaps their data center that is located
about 20 miles outsides of Chicago would be their secondary data center.
So they would have to implement a replication and back up scheme that replicates everything to
this second data center and they would typically use snapshots and other technology to make sure
that you are -– you have both data centers serve content in case of a failure. But if you do have
some sort of an outage that required a fail over, there is usually a process that you have to go
through and it can take a couple of hours before you can actually fail over to the other data center in
an active/passive environment depending on how you‟re configured. With active/active, what we
do is we distribute the objects across every node and every location in the infrastructure.
So there isn‟t an environment -– there isn‟t a node or a location that is in passive mode; they are all
active. So using load balancing you can actually take requests from end users and actually serve
the content from whichever data center is actually closest to the end user. So you are getting more
usable capacity in that way, because both sites can service content but they also can survive a site
failure. There is enough -– we use a geo parity to be able to kind of stripe data -– stripe (inaudible)
across both locations in the infrastructure, or more than two data centers.
And regardless, even if you suffer a site failure, you can still service content. So it is very data -– it
is just more efficient use of storage because you aren‟t provisioning it -– you don‟t have as much
redundant storage and you don‟t have as much overhead which you get when you are trying to
replicate from a traditional file system. So it is a much more efficient use of storage and in the case
of (UIL) they told us that they have been able to take down their primary data center in the middle
of the day to do maintenance on something -– they‟ll take down a system in the middle of the day
and the end users don‟t even notice.
They said that is a huge help to them because they don‟t have to come in on nights and weekends to
do the kind of the routine maintenance stuff -– they can do it during the middle of the business day.
(Don Keefe): OK, and you mentioned geo-parity and that kind of ties into the next question that
this person had; they said, I‟ve heard that Atmos has a unique data protection scheme called geo
parity; how is that different than (raid)?
George Hamilton: Right, good question. Again, (raid) at it‟s core is parity and so if you think of
the differences, if (raid) is parity, what is geo parity -– well, it‟s parity with geographic distribution.
So if you are working with (raid), there is -– you are striping data across multiple discs within an
array. And so you -– you‟re caught -– and if you -– in order to get additional redundancy you have
to replicate such as in like an active, passive fashion. And you are not only replicating the data but
39. you are replicating parity you are replicating any other content that is going to help you rebuild data
in case you lose discs.
I mean, that‟s how it is set up. As you scale that, though, you are not only adding disc capacity for
the actual storage of data but you are actually growing the mechanisms by which you can recover
that data. So you keep having to add more overhead as you try and scale out a raid based system;
but when you have a single global system in an active/active architecture using geo parity, now you
can do that some of striping of objects across every node or location within the infrastructure and
you‟re not increasing overhead as you scale, it stays the same.
So it is much more efficient; so that is the basic difference is -– you know, single location versus
more than one location. Geo-parity allows you to kind of do it what it does at a disc level but do it
over a whole distributed infrastructure over multiple locations.
(Don Keefe): OK, and we do have time for one more question today. And that question is you had
mentioned the (rest) API and S3 -– do you have an SDK for these?
George Hamilton: Yes, as I mentioned if you go to Atmos online dot com, you can access our
developer network there. You can get actual access to a sandbox of Atmos to work with but also
get access to all of the software development tools and the developer forums, language bi links, all
of that stuff is available online at www.atmosonline.com.
(Don Keefe): OK. As I did mention, we are out of time but I would like to thank today‟s speakers,
George Crump and George Hamilton for taking the time to join us today. I would also like to thank
today‟s sponsor, EMC for making this event possible. And as always, I would like to thank you,
the audience, for taking the time to join us today. This is (Don Keefe), have a great day.
END