50 Shades of Grey in Software-Defined Storage

50 SHADES OF GREY IN SOFTWARE-DEFINED STORAGE:
CHOOSING THE STORAGE INFRASTRUCTURE THAT BEST FITS
YOUR BUSINESS NEEDS
BY
JON TOIGO
CHAIRMAN
THE DATA MANAGEMENT INSTITUTE
SUMMARY
Software-Defined Storage (SDS) has become a meme in industry and trade press discussions of
storage technology lately, though the term itself lacks rigorous technical definition. Essentially,
SDS is touted as a model for building storage that will work better with virtualized workloads
running under server hypervisor technology than do “legacy” NAS and SAN infrastructure.
Regardless of the veracity of these claims, the business-savvy IT planner should base his or her
choice of storage infrastructure not on trendy memes, but on traditional selection criteria: cost,
availability, and simplicity.

50 SHADES OF GREY IN SOFTWARE-DEFINED STORAGE 2
© COPYRIGHT 2015 BY THE DATA MANAGEMENT INSTITUTE, LLC. ALL RIGHTS RESERVED.
INTRODUCTION
IT planners have been hearing a lot about software-defined storage (SDS) lately -- and how it is
supposed to replace all of the “legacy storage” that companies have deployed for the past
decade or so, including Storage Area Networks (SANs) and Network Attached Storage (NAS)
appliances. Unfortunately, many of the arguments advanced by evangelists for adopting SDS
are only partially grounded in fact or have nothing whatsoever to do with improvements in
storage performance, allocation or utilization efficiency.
The purpose of this paper is to quickly review the arguments for SDS, to provide the proper
context for analysis of the technologies, and to help guide the business-savvy IT manager or
planner to basic common-sense criteria for evaluating the various products available to
implement the SDS model if it is deemed suitable. In the final analysis, the kind of storage
infrastructure that is appropriate for a data center, branch office or small business equipment
room should be determined by what the application workload requires, first and foremost.

WHY SDS?
Software-defined storage is touted by evangelists as the fix for what ails traditional or legacy
storage – as the next step in storage evolution. It is unclear what that means beyond the
obvious marketing appeal of flowery rhetoric. If there is any sort of evolution in storage, it has
been the movement away from isolated “islands” of storage that created obstacles to accessing
and sharing stored data, and toward storage infrastructure models that enabled greater data
sharing.
In the 1960’s and 70’s, the predominant forms of storage were internal storage (storage devices
mounted inside the server frame itself) and direct-attached storage (a crate of disk drives
connected to the outside of the server cabinet via an external extension of the server
mainboard bus). From those implementation models, storage technology expanded in two
directions: one supporting the storage and sharing of file-based data across a network; the
other supporting the storage and sharing of block data from database and transaction-oriented
systems. Block data storage “evolved” from island architectures (internal and directly attached
storage arrays that were accessed only through the server itself, making stored data difficult to
access, share or scale efficiently) to shareable arrays – that is, arrays with multiple ports for
connecting many server hosts. Scaling these systems, however, became problematic, so
engineers figured out ways to attach storage arrays into a switched fabric (a sort of rudimentary
network) using a serial SCSI protocol such as Fibre Channel or SCSI over IP (iSCSI). The ultimate
goal was to enable the storage infrastructure to be shared by all applications by creating a true
“storage area network” or SAN.

However, the holy grail of an open and heterogeneous SAN never appeared in the market.
Vendors preferred to keep their storage equipment sales profitable by adding functionality to
the proprietary controllers installed in their external arrays, and never fully participated in any
sort of scheme to enable different equipment from different vendors to be managed in concert.
Absent common management, the fabric SAN was incredibly pricey to own because of all the
proprietary value-add software on the proprietary controllers, and to operate because of the
lack of a universal management scheme and the need to embed expensive storage experts in IT
staff.
The above situation already existed when server virtualization took hold in the early 2000s. This
made legacy storage an easy target for server hypervisor vendors to blame for application
performance issues that arose following the consolidation of application workloads on fewer
server platforms.
Truth be told, whatever inefficiencies that already existed in SANs and NAS, server virtualization
workloads just made them worse. Consolidating applications onto fewer servers dramatically
changed the amount of I/O emanating from a single server, requiring on average the addition of
7 to 16 I/O connections per server. Hypervisor computing also changed the traffic patterns on
networks and fabric interconnects and switching systems.
Hypervisor vendors assured customers that many good things would accrue to this model. For
one, virtual machines could be replicated from host to host and joined into highly available
clusters. Virtual machines could also move around server to server to distribute load more
efficiently and to provide high availability.

Only, consolidating a lot of virtual machines into a single server host, then migrating workload
at will from one physical server to another, created challenges in terms of maintaining
connections between applications and their stored data. When an app moved locations, often
intervention was required by an administrator to provide the application with a new route to
access its storage shares from a new server perch. Moreover, we were introduced to a new
problem that resulted from combining the I/O from multiple workloads in the same server –
something that the industry has started to refer to as the “I/O blender effect.” In operation,
every hosted application will simply write its data out at random into a shared I/O path.
Without special technology to intervene, this randomized I/O will eventually clog memory and
disk storage devices and slow applications down to a crawl.
With application performance issues becoming the number one complaint of hypervisor
computing customers, hypervisor vendors pushed several explanations that blamed “legacy
storage” inefficiencies and began arguing for the “ripping and replacing” of older storage
infrastructure and its replacement with something new, which they came to call software-
defined storage.

HYPERVISOR VENDOR ARGUMENTS OFTEN FLAWED, BUT SDS IS VIABLE
NONETHELESS
Surveying the literature, hypervisor vendors have developed five reasons to de-provision legacy
storage. Some are valid arguments while others are based on half-truths or downright
falsehoods.
1. “Slow application performance due to ‘legacy SAN (and NAS) infrastructure’” – This is
often a falsehood. You can see whether the storage I/O path is causing application
delays rather simply by checking storage I/O queue depths. In many cases, queue
depths are not significant (meaning that there is no data waiting in line to be written to
disk). When this queue depth reading is combined with a processor cycle or processor
activity measurement showing high rates of processor cycling, a more logical conclusion
is that a log jam exists in “raw I/O” – the communications path between CPU and
memory where application processing occurs. If that is the case, the application
slowdown is a function of either application code or hypervisor code, not storage at all.
2. “High cost of storage (OPEX & CAPEX) due to proprietary storage” – Legacy storage was
and is without a doubt expensive. Hardware vendors argue that value-add software
functionality added to array controllers are differentiators of their products and
represent much R&D that needs to be rewarded. Unfortunately, value-add often adds
configuration complexity and expensive administrative staff skills requirements, and
sometimes interferes with unified management of the kit as part of infrastructure.
3. “Poor utilization efficiency and management difficulty due to lack of common
management” – Per #2, proprietary value-add storage is difficult to manage in common,
and this has long been a reason to argue for infrastructure that might deliver better
management.
4. “Lack of agility due to legacy infrastructure” – Allocating and de-allocating storage
resources and services is much more difficult with proprietary and complicated value-
add storage. However, this has nothing whatsoever to do with application performance.
5. “DAS storage outperforms SAN” – This is not true and makes no sense since both NAS
and SAN are also direct attached storage configurations. NAS is storage directly-attached
to a thin server that provides network-based access to data. A SAN is direct-attached
storage with a physical or fabric-layer switch that makes and breaks server/storage
connections at high speed, giving the illusion of network attachment.

Valid or not, the arguments of hypervisor vendors regarding need to abandon legacy storage
have resonated. In its place, vendors are recommending something called software-defined
storage – which is, once again, storage that is physically, directly attached to virtual server hosts.
The only difference between this and direct-attached storage in the past is that value-add
storage services (software) are not located on the controller of the external array, but are
instead instantiated in a software layer in the server, usually as part of the hypervisor vendor’s
software stack.
Vendors can’t agree, of course, on a single definition for software defined storage. Instead, it is
simply a panacea – a cure-all for everything that is wrong with “legacy storage” for use in a
software-defined data center. The key difference is where the intelligence is located: is it on
the array controller or is it somewhere in the server hypervisor software stack?
Actually, there are two sets of discriminators to consider when looking at the many SDS
products from hypervisor vendors and independent software developers that have appeared in
the market of late.

Some SDS solutions are hypervisor-dedicated or hypervisor-dependent. Certainly, this is the
case with VMware’s Virtual SAN, which only works with the company’s proprietary hypervisor.
To a lesser extent, Microsoft’s Clustered Storage Spaces is proprietary to Microsoft, though they
do claim to be able to share their storage with VMware (you just convert your VMware
workload into Microsoft VHD format and import it into Hyper-V so you can share the Microsoft
SDS infrastructure!).
The other end of the spectrum are the many third party SDS offerings, including StorMagic’s
SvSAN technology, that are hypervisor-agnostic. Vendors of these solutions are earnestly trying
to make their software-defined storage infrastructure work with multiple hypervisors and, in
some cases, with non-virtualized workload as well. Some implement a common storage
environment into which the data from different hypervisor workloads can be written, while
others enable common management of storage infrastructures created with their software but
dedicated to specific hypervisor workloads and their data. Either way, they are more robust
than hypervisor-dedicated solutions and offer a better fit for companies that are deploying
more than one hypervisor or that may do so in the future.
The other differentiator to consider is hardware dependency and hardware agnosticism.
Despite the ideology of software-defined (“use any hardware you want”), the truth is that some
software-defined solutions have fairly rigid requirements when it comes to hardware
components and topology. Some of the latest “hyper-converged infrastructure appliances” for
example are just another kind of proprietary storage array. Logically, these products would be
placed on the hardware-dependent side of the SDS solution spectrum. Hypervisor-dependent

SDS products like VMware’s EVO:RAIL are also hardware dependent, and even their VMware
Virtual SAN carries with it some fairly exacting and expensive hardware and topology
requirements, including a minimum of three storage nodes.
In the middle of this spectrum are a number of third party virtual SAN SDS solutions, which
unlike their hypervisor software cousins, do not require special configurations or limit what kind
of disk or what kind of flash devices you can use. StorMagic, for example, provides a solution
that can embrace a wide range of hardware components.
On the hardware-agnostic end of the spectrum are storage virtualization software products. To
SDS purists, storage virtualization is not software-defined storage because SDS seeks to
decouple storage capacity from storage value add services functionality and performance;
capacity virtualization is not part of their SDS model.
In general, if you are considering an SDS solution, your storage infrastructure requirements are
usually best served by pursuing hardware- and hypervisor-agnostic SDS solutions. That decision
is more likely to enable you to customize your storage gear selections to your needs and budget
and to consolidate the management of different storage infrastructures supporting different
workloads.
If you are operating a large shop, such as a managed hosting environment or a large enterprise
data center, moving to a Virtual SAN may actually be a step backward on the evolutionary chart.
A real SAN, assuming that one is ever delivered to market, will deliver all of the capabilities and
cost metrics that you need for such environments.
On the other hand, if yours is a small to medium sized business or if you are tasked to provide
storage infrastructure for remote or branch office operations, big centralized storage may not be
your best or most strategic infrastructure choice. You may be attracted to VMware’s virtual SAN
or Microsoft’s all-in-one Cluster Storage Spaces SDS solution as a template for building your
storage.
That is, until you find out what it will cost.
The fact is that VMware’s Virtual SAN only works with VMware vSphere workloads and requires
you to limit your hardware to a specific list of supported equipment. The same is true of
Microsoft’s Clustered Storage Spaces model, which is heavily dependent on SAS storage. Such
equipment requirements may not be a deal breaker for firms that want a simple single vendor
solution, but they do limit options. Planners may not be able to take advantage of newer
storage technologies or leverage existing investments in storage equipment that they have
today.

More to the point, the cost of implementing a hypervisor dependent storage stack can be
prohibitive. A recent lab evaluation published in a popular computing publication placed the
cost of a basic Virtual SAN implementation, which requires three nodes minimum, at $11 to 14K
in software licenses per node and about that much again for the hardware that is approved by
VMware for use in their infrastructure. The prices for Microsoft are significantly lower, but
there is still a minimum 3 node requirement and a preference for PCIe flash and SAS disk drives
rather than SATA.
One significant benefit of using a third party Software Defined Storage solution is that it can
insulate the planner from hardware lock-ins and usually does not require a three-node
minimum hardware cluster. SDS can be implemented initially as a much less expensive two
node cluster.
For the smaller shop, or the ROBO environment, the independent software vendor’s SDS
solution might be just the thing to test the value of software defined storage to application
performance.

CONCLUSION
The fit for software-defined storage, whether from your hypervisor vendor or from an
independent developer like StorMagic, should be a function of its fit with deployed (and likely
to be deployed) applications and their requirements, with strategic goals for infrastructure and
data management, and with budget, staff skills and other practical boundary conditions. It is a
good idea to determine what your needs are before you seek out an SDS solution.
It is worth noting that, in the absence of uniform standards or guarantees of interoperability
between different software stacks, IT planners may be looking at a need to deploy and manage
different SDS solutions to meet different requirements. A product aimed at large enterprise
shops and hosting environments may not be well suited to deployment in a ROBO environment,
for example.
It is a good idea to try the SDS solution before you buy it. StorMagic offers a 60 day trial version
of its product, StorMagic SvSAN, that you can access from the company website
(http://www.stormagic.com/60-day-free-trial/).

50 Shades of Grey in Software-Defined Storage

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à 50 Shades of Grey in Software-Defined Storage

Similaire à 50 Shades of Grey in Software-Defined Storage (20)

Dernier

Dernier (20)

50 Shades of Grey in Software-Defined Storage