1. A Huginn Consulting Technical White Paper.
The End of Appliances
Creating Compute Intensive Storage Solutions for
Integrated IT Infrastructures
2. A Huginn Consulting Technical White Paper.
Improving IT efficiencies, reducing costs and simplifying management are common goals in enterprise
data centers. IT service providers play a key role in helping businesses achieve these goals. When
considering IT Infrastructure as a service (IaaS) environments, reliability and uptime are critical in these
demanding environments. The impact of a single failure can be devastating to the multitude of
businesses using the hosting service.
As such, our goal was to create a fully integrated IT infrastructure and, through testing, isolate and
quantify the performance gains of adding memory and compute power to the server and storage
systems. A team of engineers and architects from Huginns Consulting and Genesis Hosting, an IT IaaS
provider, built and tested a prototype of a truly integrated IT infrastructure environment. This Proof of
Concept (POC) demonstrated the steps for building and deploying an infrastructure for a genuinely
responsive, service-oriented IT organizations in 90 days or less. Complete with servers, networking, and
storage, the model also integrated three diverse and essential IT control elements: self-provisioning,
self-service and automated management.
Our POC shows how to create efficient and high performance compute and storage intensive
solutions—including data de-duplication and compression from generic storage, server and network
resources—into a virtualized and integrated IT infrastructure. These solutions fit comfortably into the
service provider’s business model, and can easily be integrated into the existing management
“IT infrastructure is moving quickly toward becoming delivered through a service
model. Machines are becoming virtual, running in secure data centers on large,
partitionable machines. Self-provisioned virtual IT resources are the key to success for
a service model, which requires all aspects of the physical hardware to be abstracted
or partitioned with permissions and given only to a tenant of the service.”
Eric Miller, CEO, Genesis Hosting
3. A Huginn Consulting Technical White Paper.
To begin, we defined the drivers behind the prototype POC to include three critical aspects for IT service
1. The business model for IaaS providers must be centered on a self-service, self-provisioning
2. To effectively manage and share these solutions across multiple customers and applications in a
self-service hosting environment, the hosting infrastructure needs to be fully virtualized at all
3. Self-service in a shared service hosting environment requires a new management stack.
We tested two aspects of the integrated infrastructure model. First, we tested the requirements for the
management stack in an IaaS service provider infrastructure through the design of a prototype self-
service management facility. The results of this study can be found in the Huginn Consulting report
titled, “Enabling Self-Service and Self-Provisioning in an IT Infrastructure.”
The second test analyzed the creation of compute and storage intensive solutions in this infrastructure.
These solutions must be available for self-provisioning within the tenant’s VDC (virtualized data center),
which we will explore further in this study. The POC was tested in a service provider environment at
Genesis Hosting using an infrastructure model identical to Genesis’ deployed infrastructure.
Findings from this POC validated the proposition that computing resources can be added to storage to
create much more cost-effective storage systems and solutions in IaaS environments:
1. A two-thirds reduction in storage consumption can be realized from intensifying the
compute resources dedicated to storage management and operation.
2. Adding compute resources to storage arrays can improve the resilience of storage
infrastructure by supporting full performance with RAID-6 protection.
3. Array recovery and maintenance operations time, which includes RAID rebuild time and LUN
expansion can be reduced by approximately 80%.
The Service Hosting Business
Profitable and efficient service hosting relies on large-scale infrastructures, a high degree of utilization of
the resources, and a completely integrated infrastructure based on virtualization of servers and storage.
The service provider builds and manages the shared infrastructure. A self-service UI and a virtualization
client enable the tenant to provision and manage his leased virtual data center, which is a collection of
VMs (virtual machines) with storage and resources running licensed applications or solutions. The
resources are leased time based.
4. A Huginn Consulting Technical White Paper.
The scale, utilization, efficiency and simplified management of this infrastructure enables the service
provider to provide IT services that are lower than the cost of running these services in house1
Computational power in the form of CPU cores and RAM memory are becoming abundant at lower price
points. In a fully virtualized IT infrastructure that supports the self-service service model, these
resources can be applied easily, even in storage intensive solutions. Data de-duplication and
compression have long been used in secondary storage applications, backup, archiving and more. These
technologies are now available in software products, including open source software. The availability of
compute power, and faster and cheaper RAM makes these technologies applicable for primary storage
applications with surprising affordability and performance.
Compute Intensive Storage Solutions for Self-Service
The service provider’s business model relies on being able to support customers with a single, virtualized
and integrated infrastructure. The customers’ needs are met by creating and running the customer’s
VDC in the infrastructure. All applications, solutions and services are implemented as VMs with
provisioned servers, storage and other resources from the infrastructure’s resource pool.
Compute intensive storage solutions must follow the same model in order to meet the service providers’
requirements. Solutions created by first provisioning a VM with generic server, network and storage
resources, then installing the software into this VM, are ideally suited for the IaaS business model. These
new solutions can be added simply by integrating the new software into the licensing and billing system,
making these available for self-service by the tenant, and adding generic storage and server capacity to
the existing resource pools.
Solutions that rely on adding hardware appliances built for a specific purpose to the infrastructure are
much more difficult to integrate. Purpose built appliances also lead to less than optimal scaling because
they often have interfaces not optimized for this type of deployment. New products must be added as
separate resource pools, which results in a management framework that gets progressively more
complex. Resource efficiency is also compromised as it becomes more difficult to limit idle resource
For a more detailed overview of the service provider business model and the required infrastructure, please refer to the
companion report: “Enabling Self-Service and Self-Provisioning in an IT Infrastructure.”
5. A Huginn Consulting Technical White Paper.
Protection, Protection, Protection
Service hosting and virtualization implies an increased concentration of tenants and users supported by
a single physical infrastructure. As a result, the effects of failures and data losses are amplified. A single
failure can be catastrophic for thousands or even tens of thousands of users or numerous organizations.
Service hosting companies are even more at risk of failure; therefore they must provide much stronger
protection of customers’ data. As a consequence, all data must be carefully protected against double
disk failures, as well as other storage-related failure modes. This requires increasing storage system
computational power. Genesis Hosting exclusively deploys storage arrays configured with RAID-6- to
protect its customers against double disk failures.
Infrastructure for Self-Service Solutions
Effective service hosting with a self-service model implies virtualization at all levels of the infrastructure:
storage, networking and servers. This is the only way that shared physical compute and storage
resources can be integrated into one infrastructure; this approach enables the most scalable, flexible
platform for IT solutions. This infrastructure can then be partitioned into the logical application entities
or solutions that customers can provision and manage for themselves. Self-service enables service
hosting organizations to build a single scalable and integrated infrastructure of servers, storage arrays,
network equipment and more—all of which can be managed from a single console. Customers are able,
through service portals and management clients, to provision, implement and manage their own VDCs.
The entire premise of this POC revolves around the integrated multi-layered infrastructure enabled
through virtualization. The focus of this report is to document two aspects of the POC:
1. The requirements for storage to be deployed in a self-service infrastructure.
2. The creation of more efficient storage solutions within the framework of the self-service
Designing a Proof of Concept
The hypothesis that intensifying the
compute resources in storage solutions,
both in VM-based solutions as well as inside
the underlying storage, was shaped by the
1. How can compute resources be used to
improve the performance, efficiency
and user experience of storage
solutions? What types of compute
intensive storage solutions can be
created? Figure 1: POC Physical Configuration
6. A Huginn Consulting Technical White Paper.
2. Which architecture is required by an IT infrastructure that supports these solutions and the self-
service hosting model?
Structure and Organization
The prototype infrastructure for the POC was constructed at Genesis Hosting’s facilities. The
architecture was chosen to mirror Genesis’ production infrastructure where Genesis’ customers are
provisioning and building their VDCs and running their applications. In fact, the POC team operated as a
typical Genesis customer. The prototype was configured as a VDC where the team members provisioned
resources and built the VMs used for the compute and storage intensive services tests in this POC.
The Hardware Components
• NEC Express5800/A1080a (GX) server: The server is configured with four compute modules,
each with two Intel “Westmere” processors and 128 GB of RAM.
• The new NEC M100 and the previous generation NEC D4 storage arrays. Both were configured
with 7.2K RPM SATA disks. Performance of the two arrays was used as a measure for the benefit
of increasing the compute and storage intensity factors.
o All tests were run on RAID-6 configured LUNs.
o Two disk configurations were used: 6 disks, 12 disks.
• The Qlogic 8/4 GBit FC switch connected servers and storage.
• NEC 1Gbit Ethernet ProgrammableFlow (PF) switch provided connectivity for system
The Software Stack
• All software was run on VMs in VMware vSphere 4.1 environments on the NEC GX server.
• The Blackball Search-In Software indexing engine and the Microsoft Exchange JetStress load
generator were run on VMs with Microsoft Server 2008 R2.
• The NexentaStor (version 3.1.1) software was installed as a virtual appliance in vSphere.
7. A Huginn Consulting Technical White Paper.
The POC Prototype—Storage for Service Hosing
The compute and storage intensive prototype includes NEC’s M100 storage array, data compaction
solutions, a file system indexing solution, and vCenter for cloning VMs and VM templates.
Each of the two controllers for the M100 includes the new high performance Jasper Forest processor
from Intel and 8GB of RAM. This is considerably more compute capacity than typical arrays; this
configuration of resources is required for deployments in a shared service hosting infrastructure as it
provides maximum protection while maintaining full service levels to the users.
The data compaction solutions (Figure 2)
were used to test the effectiveness and the
performance of in-line data de-duplication
and compression in a solution stack. The
solution is based on the NexentaStor virtual
appliance. Data writes and reads to and
from the LUN exported by NexentaStor are
compacted or expanded in real-time.
NexentaStor uses the array for storing the
The compaction and expansion
performance and efficiency, as well as the
overall solution performance, are tested by
changing the configuration of the VMs and
the underlying storage configuration. Two
storage configurations were used: 6 and 12 disks.
Testing the Solutions
The prototype workloads were designed to test the performance and efficiency of two compute
intensive storage solutions. The first solution tested was the new NEC M100 storage array with
considerably more compute performance compared to previous generation products. The second was a
set of compute intensive storage solutions created in the virtualized POC infrastructure by using the
NexentaStor software. The tests were run using storage allocated directly from the storage array, and
with the storage allocated from the de-duplicated or compressed LUNs exported by NexentaStor.
VM cloning was used to test bulk read and write performance of storage solutions. vSphere was used to
clone the VM with the file system to and from the compacting LUN exported by NexentaStor. The
Figure 2: Logical Data Compacting Configuration
8. A Huginn Consulting Technical White Paper.
respective source or destination was a 12 disk SAS data store on the NEC D4 storage array, which
delivered much higher I/O rates than Nexenta or SATA.
Two tests were performed. First, the 62.2 GB VM with the file system used in the indexing test was
cloned. Second, another 9.2 GB VM template with only the Microsoft Server 2008 R2 OS was cloned to
see the differential efficiency and performance after the first OS instance had been written.
File System Indexing
The Blackball indexing engine generates a very small index, only ~2% of the indexed data. The file
indexing solution was therefore used to test the file read performance of the compacted LUNs.
The indexing engine performed indexing of the file system in its VM. The file system consisted of a total
of 53 GB of file data, including text, email, music, images, a document archive and more.
Email Performance and Resiliency
The Exchange 2010 JetStress load testing tool was used to measure the performance of a storage
subsystem for a synthetic Exchange email workload. Since JetStress generates its own data, it was not
used to test the performance of compacted storage.
JetStress determines sustained performance in Microsoft Exchange IOPS (input/output per second), i.e.,
the total number of Exchange reads and writes to the storage subsystem. The number includes message
and log file I/O, and uses a fixed ratio between reads and writes.
9. A Huginn Consulting Technical White Paper.
The following outcomes related to compute and storage intensive solutions were generated in this POC.
We investigated the bulk write and read performance and the efficiency of the data compaction
Three configurations of the NexentaStor VM were used in testing:
. The cloned file indexing VM – including the file system data, and a barebones VM template –
was cloned to the compacting LUN exported by NexentaStor.
• 4 cores, 8 GB RAM, 6 disk LUN
• 8 cores, 32 GB RAM, 6 disk LUN
• 8 cores, 32 GB RAM, 12 disk LUN
Table 1: Write Performance and Compacting Ratios for Compacting LUN
4 8 6 disks None 613 1.00 1.00 101
4 8 6 disks Dedupe 5362 0.68 8.75 12
4 8 6 disks Compress 1398 0.86 2.28 44
4 8 6 disks D+C 3635 0.60 5.93 17
8 32 6 disks None 420 1.00 1.00 148
8 32 6 disks Dedupe 1720 0.68 4.09 36
8 32 6 disks Compress 464 0.86 1.10 134
8 32 6 disks D+C 1196 0.60 2.85 52
8 32 12 disks None 421 1.00 1.00 147
8 32 12 disks Dedupe 1274 0.68 3.03 49
8 32 12 disks Compress 469 0.86 1.11 133
8 32 12 disks D+C 1144 0.60 2.72 54
We also tested the read performance of the compacting LUNs. The test was performed by cloning the
VM from the compacting LUNs to a SAS LUN on the NEC D4 storage array.
See the File Indexing section for the relative file system read performance.
10. A Huginn Consulting Technical White Paper.
Table 2: Read Performance from a Compacting LUN
Disk Set Size Compaction Type Read Rate
8 32 12 disks None 90
8 32 12 disks Dedupe 74
8 32 12 disks Compress 86
8 32 12 disks Compress+dedupe 63
Table 3 shows the results for cloning a second VM to the same compacting LUN. The first row is for the
same VM as the first, the row is for a VM template with only the Windows 2008 Server R2 guest OS
installed. The size of the VM template is 9.2 GB.
Table 3: Incremental VM Cloning Performance
8 32 12
62.2 Dedupe 2010 0.03 4.78 21
8 32 12
9.2 Dedupe 250 0.03 4.00 37
Results can be summarized as follows:
• For the initial writing of the VM with data to the compacting LUN, the resulting compacted data
set varied from 86% (compression), 68% (dedupe) and 60% (compress + dedupe). These
compaction ratios were independent of VM resources or storage subsystem.
• Both compression and de-duplication benefit significantly from the increased number of cores
and the amount of RAM. The relative performance degradation is reduced by a factor of 2 when
increasing VM resources from 4 to 8 cores, and from 8 to 32 GB of RAM.
• Write speed to a compressed LUN is 90% when compared to non-compressed LUN.
• Write speed to a de-duplicated LUN is 33% when compared to non-compressed LUN.
• Write speed to a compressed and de-duplicated LUN is 36% when compared to non-compressed
• Read performance for compacted LUNs is much closer to non-compacted read performance.
The performance reduction is only 18%, 5% and 30% for dedupe, compressed and compress +
• The incremental cloning of both the full VM and the VM template was compacted to only 3.2%
of original size, from 62.2 to 2 GB and from 9.2 GB to .3 GB, respectively.
11. A Huginn Consulting Technical White Paper.
Table 4 presents the indexing performance for file indexing of a file system stored in un-compacted
RAID-6 storage, de-duplicated storage, and compressed storage. These tests all involved compacting
storage; the NexentaStor compacting VM was configured with 8 cores and 32 GB RAM.
Table 4: Indexing Performance
Compaction Type Elapsed
2 4 6 disks None 4:39 0% 0%
2 4 6 disks De-duped 4:30 3.3% 3%
4 16 6 disks None 4:39 0% 0%
4 16 6 disks De-duped 4:30 3.3% 3%
4 16 12 Disk None 4:00 14% 0%
4 16 12 Disk De-duped 4:34 1.8% (14.2%)
4 16 12 Disk Compressed 5:04 (8.9%) (26.7%)
4 16 12 Disk Compress+Dedupe (1) 3:52 16% 3.3%
8 64 12 Disk None (1) 4:15 8.6% 0%
8 64 12 Disk De-duped 3:50 17.6% 9.8%
8 64 12 Disk Compressed 4:14 8.9% 0.4%
8 64 12 Disk Compress+Dedupe 4.28 3.9% (5.1%)
Results can be summarized as follows:
• De-duplicated storage indexes are marginally faster than for non-compacting storage. This is
most likely due to caching in the NexentaStor engine.3
• Compressed storage indexes 20% slower than for non-compacting storage.4
• Increasing the number of cores and amount of RAM improves indexing performance between
5-20%. We believe the increased RAM is the most significant factor.
• Doubling the number of disks from 6 to 12 improves non-compacted performance by 17%.
There are some inconsistencies in the measured results. Further or repeated tests are required to understand the source and
The performance measurements give a clear indication of the general read performance of compacting storage.
12. A Huginn Consulting Technical White Paper.
Data Compaction Summary
When a VM that performs the compaction or expansion has sufficient compute and memory resources,
we observed interesting results. A data de-duplicated LUN shows significant degradation in write
performance, but equal or better than equal read performance when compared to non-compacted LUN.
De-duplication of an initial large data set typically reduces data to 2/3 of the original size. A LUN
containing a large number of largely identical data sets, such as the VM system disks in virtualized
infrastructure will see much higher compaction ratios. A compressed LUN shows only a small read and
write performance degradation. Data is compacted to 87% of original size. Combining de-duplication
and compression yields the highest data compaction ration, 40%, but both read and write performance
is significantly reduced. This reduction can be mitigated by adding more CPU power or RAM.
Data de-duplication is better suited for read intensive workloads, e.g., document archives or storage for
the system disk when the system (OS and application code) and the application data are separated into
separate (virtual) disks. Compression is better suited for write intensive workloads. The data compaction
ratio is smaller than for de-duplication.
These findings show that for the right applications, computing resources can be applied with great
benefit and can have a dramatic impact on the data footprint and the system performance, as well as
the overall economies of IT operations. While this solution benefits significantly from increasing the
number of cores and amount of RAM, compression performed well with 4 cores and 8 GB of RAM.
JetStress was used to test the sustained performance of the array. In the prototype, the JetStress load
ran for 1 hour. Table 5 presents the sustained performance observed when using JetStress to generate
the simulated Exchange email workload.
Table 5: Simulated Exchange Back-end Throughput
Cores RAM (GB) Disk Configuration Compaction Exchange
4 8 6 disks (RAID-6) None 160 0%
8 32 6 disks (RAID-6) None 166 4%
8 32 12 disks (RAID-6) None 279 68%
Results can be summarized as follows:
The Exchange workload is I/O bound. There were only minor improvements between the small (4 cores,
8 GB RAM) and the large (8 cores, 32 GB RAM) VM configurations. The performance increased by a
factor 1.68 when changing from 6 to 12 disks in the array for un-compacted storage.
13. A Huginn Consulting Technical White Paper.
The Storage Array
We ran all tests on the NEC M100 array and on the previous generation D4 storage arrays. The following
results indicate that the M100 is a good candidate for service hosting environments.
• The M100 delivered high performance when all disk groups were configured for RAID-6.
• The prototype ran two high load performance tests (JetStress + data de-duplication)
concurrently on the M100. There was no observed degradation on either test when compared
to running these tests independently. The M100, even when configured for RAID-6, has ample
compute power to support the JetStress and the data compaction tests in parallel without any
measurable performance effect on either test.
• The extra compute power in the M100 improves the performance of recovery operations, such
as LUN rebuild and maintenance operations including disk group expansion. The staging and
configuring of the POC system indicates that the M100 is up to 5X faster than the D4, but these
operations have less impact on running applications.
Compute Intensive Solutions for Self-Provisioning.
Our POC shows how to create efficient and high performance compute and storage intensive
solutions—including data de-duplication and compression from generic storage, server and network
resources—into a virtualized and integrated IT infrastructure. These solutions fit into the service
provider’s business model. They can be integrated easily into the existing management framework and
included in the list of solutions, software, etc. that is available for self-provisioning by the tenant.
The above tests show how a service provider can create a set of new storage solutions with different
benefits from simply using available software products with the server, storage and networking
resources already in use. These solutions are made available to the IaaS customers as new types of
storage pools in the existing self-service framework. The solutions are created simply by creating a new
VM and installing software, or in this case installing a pre-created “virtual appliance,” provisioning the
required compute and RAM resources, then making the compacting LUNs available as new data stores
that the tenant can allocate as storage for their VMs.
For an IaaS service provider with a self-service based service offering, it is critical that the solutions can
be built from generic resources of existing resource pools. This assures simple integration into the
infrastructure and the self-service model, and assures a high level of resource utilization. All of this is
essential when running a service provider operation.
IT service providers rely on the self-service model in order to remain efficient, competitive and
profitable. The most cost-effective solutions are virtualized solutions that can be supported directly by
14. A Huginn Consulting Technical White Paper.
the service provider’s virtualized and integrated infrastructure. No purpose-built hardware appliances
are required. Created with VMs and generic resources like CPU cores, RAM, and storage, these solutions
can be made available easily to tenants in a self-service service hosting environment.
The falling price point of computing resources, including CPU cores and RAM make it cost-effective to
use compute power to create more effective storage solutions. These resources are integrated into the
arrays to provide better performance and resiliency. In addition, as environments take on more users
and customers, the increased number of users or applications supported by a single array requires a
higher degree of protection against failures, and shorter time/ less client impact when carrying out
recovery or maintenance operations. All of these data protection and disaster recovery imperatives
require more compute power.
One can also apply compute resources to create compute-intensive storage solutions in the
infrastructure. Software for solutions, including data de-duplication and compression, are becoming
readily available. The resulting solutions provide the same efficiencies as purpose-built appliances, yet
these solutions fit comfortably in the service provider business model.
This POC was created to test virtualized solutions for data compacting. The measured data compaction
ratios for a large write into a LUN are .86, .68 and .60 for compression, de-duplication and de-
duplication plus compression, respectively. The data compaction ratios were independent of VM
resources. Both de-duplication and compression require significant compute resources and a good
amount of RAM to give acceptable read and write speed.
The POC demonstrates and documents how adding compute resources to storage arrays increase their
value in a service provider environment, and how compute resources can be used in the virtualized
infrastructure to create efficient data compacting storage solutions. More specifically, we have shown
that these solutions can be created, provisioned and utilized in a virtualized infrastructure designed for
self-service and self-provisioning. In the companion white paper, “Enabling Self-Service and Self-
Provisioning in an IT Infrastructure,” we have also outlined a new management stack that is required in a
service provider infrastructure in order to support the self-service and self-provisioning model, including
creating and provisioning the solutions described here.
We believe that implementation of this prototype will enable forward-thinking IT architects and
managers to reap the full benefit of virtualization and to operate far more efficiently and cost
effectively. It will also enable IT executives to re-organize and reshape their operations as corporate
service hosting providers. Under this model, the IT organization’s primary mission evolves to building
and managing IT infrastructures on behalf of business units, which are then charged-back via a self-
service model. This is by far the most effective way to organize corporate IT.
15. A Huginn Consulting Technical White Paper.
The mention of any vendor’s name or specific product in this white paper does not imply any
endorsement of the vendor or product.
The products used in the proof of concept were selected based on consultation with the customer,
Other products can be incorporated in future efforts based on circumstances or goals.
Huginn Consulting was commissioned by NEC Corporation to build and evaluate the proof of concept
outcomes and to write this technical white paper.
The Huginn team has a total of more than 50 years in product development and engineering and
business management in the field of IT. Huginn Consulting provides IT consulting services including
building and testing proof of concepts, technical concept evaluation, specification development,
requirement analysis, and prototype creation in the areas that include storage and data management.