Avere Systems provides a solution to optimize seismic data processing workflows by flexibly scaling performance and reducing costs. Their solution improves throughput by 50% while reducing storage footprint by 50% using flash storage and auto-tiering. It simplifies workflows by eliminating unnecessary data copies between specialty storage silos and provides a unified storage system. This allows for faster time to results, lower costs, and easier management compared to existing solutions from NetApp, EMC Isilon, Panasas, and Lustre/DDN.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Processing Demands
1. Optimizing the Upstream Workflow
Flexibly Scaling Performance to Meet Seismic Processing
Demands
AVERE SYSTEMS, INC
5000 McKnight Road, Suite 404
Pittsburgh, PA 15237
(412) 635-7170
averesystems.com
2. Seismic Processing Use Case
– World’s Largest Exploration & Production Company
2Proprietary & Confidential
Challenges
– Lower performance
– Larger footprint
– 600TB of unnecessary SATA disks
– No Flash/SSD
– Multiple manual copies between
Panasas and NetApp required
– Proprietary client code, complicates
compute farm management
Avere Benefits
– 50% higher throughput
– 50% smaller footprint
– Lower cost:
– 33% lower CAPEX ($/op)
– 50% lower OPEX (space,
power, cooling)
– Auto-tiering of hot data blocks to/
from Avere cluster
– Smooth transitions between
workflow stages
Before
After
Compute Farm
NetApp FAS 3270
Panasas Cluster
Manual
copies
Compute Farm NetApp FAS 3270Avere FXT 4200 Cluster
• 16TB of Flash/SSD
• 22GB/s Throughput
Auto-tiering
3. Upstream Workflow – Technology Needs
3Proprietary & Confidential
Seismic
Acquisition
Seismic
Processing
Seismic
Interpretation
Reservoir
Simulation
Reservoir
Engineering
Technology Needs
Cost-effective tape
replacement
High IOPS, multi-
threaded IO
High B/W, single-
thread IO
Read, write, metadata
performance
General file system
workload
Dense Multi-PB file system,
large flat files
Many data types Multi-threaded IO Small files & relational
DBs
Portable 100TB+ “scratch
areas”
Flat files & relational
DBs
Multi-TB “scratch
areas”
Data protection,
replication
Scalable to 10s of
thousands of CPUs
Workstation
interactivity
Small CPU clusters De-dupe &
compression
4. How We Got Here
• Seismic processing challenges
– Provide 10’s of GB/sec of throughput
– Cost-effectively store 100’s of TBs or even PBs of data
– Specialty, proprietary storage silos complicate workflow
4Proprietary & Confidential
Need High
Performance
Need Cheap
Capacity
5. What We Are Doing About It
• Scale performance
– Tiering places active data on fast media (e.g. RAM, Flash)
– Linearly scale performance through clustering
• Reduce cost
– Support existing NAS environments
– Store everything on near-line disks (7.2k SATA/SAS)
• Simplify workflow
– Seamless transitions between upstream workflow stages
– Avoid storage silos and inefficient data copy steps
5Proprietary & Confidential
6. How We Do It – Scale Performance
• Auto-tiering active data to RAM and Flash
• Automatic replication and striping of hot blocks on
multiple FXT nodes
• Deliver 2+ GB/sec per FXT node
• Scale to 50 FXT nodes à 100+ GB/sec
6Proprietary & Confidential
Compute Farm NetApp FAS 3270Avere FXT 4200 Cluster
• 16TB of Flash/SSD
• 22GB/s Throughput
Auto-tiering
7. How We Do It – Save Cost
• Support existing NAS environments
– Avoid costly upgrade
– Support heterogeneous NAS vendors
• Store primary data on near-line disks (e.g. 7.2k SATA)
– CAPEX savings (avoid using lots of 15k disks)
– OPEX savings (due to reduced space, power, and cooling)
• Reduce cost by 50% or more
– Proven in customer environments and benchmark testing
– See next slide for example…
7Proprietary & Confidential
8. Comparing 1,000,000 IOPS Solutions*
EMC Isilon
$10.7 / IOPS
NetApp
$5.1 / IOPS
Avere
$2.3 / IOPS
Throughput
(IOPS)
Latency/ORT
(ms)
List Price $/IOPS Disk
Quantity
Rack
Units
Cabinets Product Config
Avere FXT 3800 1,592,334 1.24 $3,637,500 $2.3 549 76 1.8
32-node cluster,
cloud storage config
NetApp FAS 6240 1,512,784 1.53 $7,666,000 $5.1 1728 436 12 24-node cluster
EMC Isilon S200 1,112,705 2.54 $11,903,540 $10.7 3360 288 7 140-node cluster
*Comparing the top SPEC SFS results for a single NFS file system/namespace (as of 02Apr2013). See www.spec.org/sfs2008 for more information.
9. Upstream Workflow – Challenges
9Proprietary & Confidential
Seismic
Acquisition
Seismic
Processing
Seismic
Interpretation
Reservoir
Simulation
Reservoir
Engineering
Challenges
Expense and risk due to multiple infrastructure silos
Expensive, specialty storage required for high-IO steps
Management complexity and data downtime due to copying data
Longer time to final results
NetApp or
EMC Isilon
Panasas or
Lustre/DDN
Copy
NetApp or
EMC Isilon
Panasas or
Lustre/DDN
NetApp or
EMC Isilon
Copy Copy Copy
10. Avere Optimizes Upstream Workflow
10Proprietary & Confidential
Seismic
Acquisition
Seismic
Processing
Seismic
Interpretation
Reservoir
Simulation
Reservoir
Engineering
Avere Benefits
Integrated and unified workflow
Faster time and lower risk to final results
Improved application performance/spend
Better enable remote access & WAN efficiency
3rd-Party Core Filer
Avere Edge Filer
14. Thank You!
AVERE SYSTEMS, INC
5000 McKnight Road, Suite 404
Pittsburgh, PA 15237
(412) 635-7170
averesystems.com
Notes de l'éditeur
Our initial success within upstream came in seismic processing environments. In this particular example, the customer had chosen Panasas for their HPC needs, but had NetApp for interpretation. They were buying duplicate capacity, and had to manage both arrays independently. What we found here was that the way in which Avere's Edge filer tiers data lent itself very well to parallel IO requests. So much so that we were able to help this customer realize ~50% more throughput, up from 15GB/sec to ~22GB/sec. In half the footprint, using only 20u of backspace as opposed to 40u with the Panasas. 50% more throughput is based on getting 22 GB/s from Avere vs 15 GB/s from Panasas. For CAPEX I used a $/op comparison. I assumed the cost of 10 nodes of Avere to be roughly equal to 10 nodes of Panasas (~$1M for both at list). Avere is $1M / 22GBps = $45/MBps. Panasas is $1M / 15 GBps = $67/MBps. So we are 33% less. We are 20U total and they are 40U total, so this is where the 50% less OPEX (space, power, cooling) comes in.
Any issue with 3 challenges and 2 hard things? Could dangle anvil over his head.
Avere has delivered performance acceleration and scaling to many different customers in many different industries and applications. This includes applications such as VMware, Oracle, rendering, transcoding, software build, chip design and verification, seismic processing, financial simulations, genomic sequencing, and more. This is a place in the presentation where you may want to insert a customer case study from our library that is relevant to the audience of your presentation. In the standard presentation we used the SPEC SFS benchmark since it is the most relevant workload across the broad range of customers we sell to. In the world of file systems there is a well know benchmark called SPEC SFS that is used to compare the performance of NAS systems. All the NAS vendors use the benchmark and post their results on the website shown at the bottom of this slide. SPEC does a great job of providing a detailed, apples-to-apples comparison of NAS products running in a “typical” enterprise-class NAS environment. This slides compares the three top performance results on the SPEC site. Avere is the current record holder with almost 1.6 million ops/sec achieved on a 44-node FXT cluster. Note that this is not a max cluster from Avere. We used just enough nodes to achieve the top spot. Today we can go to 50 nodes per cluster and will go beyond this in the future. In second place is NetApp with a max 24-node cluster mode system. In third place is EMC/Isilon with a max 140-node system S-Series cluster. While achieving the highest performance was an important point for Avere, our primary point was the efficiency of our solution. Just look at the sizes of the systems. We are faster than NetApp and Isilon in just a fraction of the space. 2.5 racks and 6 feet wide for us. 14 feet wide for Isilon. 24 feet wide for NetApp. If you scan across the orange row you can see the details of our performance advantage and our higher efficiency. Avere is the highest performance, the lowest latency, the lowest cost, and use the least amount of space and power.
This slide shows four scenarios where Avere can be used to implement a cloud infrastructure. Some enterprises may use just one scenario whiles other may use multiple. The slide starts with the scenario where Avere is used to accelerate data access at remote offices. Avere has many customers doing this type on thing. One example is a company that is headquartered in San Jose but has software engineers in offices all over the world. In their Boulder office they replaced their NetApp storage with an Avere cluster. The software engineers use the Avere cluster to access their homedirs, run their software buidls, etc. Meanwhile, their data is actually stored, managed, and protected back in the San Jose data center. The Avere cluster automatically caches the active data at the Boulder site and guarantees the data is pushed back to the San Jose site for long-term data retention. While this “data center to remote office” may be the most common cloud scenario today, there are other needs across the enterprise. (click) A second example is when there are two data centers that the enterprise wants to run as a single “mega” data center. These exist for disaster recovery reasons, due to acquisitions, due to partnerships, due to growing out of space in the primary data center which requires leasing space at a co-lo facility, etc. In these cases, the goal is run the two data centers as one large datacenter where each individual data center can borrow resources of the other data center. Avere can help make this happen. An example of this is Pixar and Disney. Pixar is in N. Cal and Disney is in S. Cal. Pixar and Disney are Avere customers and we allow them to run these two data centers as a single “mega” data center. For example, if Pixar is working a movie with a near-term release date and they need to crank up the volume on the renders, rather than going out and buying new render nodes, they can borrow them from Disney. They can do this because they have placed Avere nodes by Disney’s render farm. When Pixar fires up some renders on the Disney farm, the Avere nodes are automatically populated with the data for the renders. Hence, the renders see the WAN latency once on the first read, but subsequent reads are very low latency. Until recently, when studios like this would borrow renders nodes they would literally pull them out of the rack, put them in a truck, and drive them to the other facility. This is because the renders nodes need to be near the data. With Avere, these studios can quickly and easily move the data so it is near the render farm. (click) The next example is cloud computing. This comes in two forms, public and private. Most of the demand we have seen to data is for private compute cloud but technically both are similar. In the private cloud case, a customer moves their compute infrastructure somewhere else because it’s cheaper or they are out of space in their data center. Avere makes this possible since an Avere cluster can be co-located with the compute gear to hold the data that is actively being processed. We have a number of customers doing this today. In Las Vegas, there is a co-lo facility called the SuperNAP. This facility was originally built by ENRON, the failed energy company. You may recall that ENRON was getting into all sorts of strange businesses and one of them was “bandwidth trading.” So they built a giant co-lo facility in LV with tons of b/w coming in an out. Well, after ENRON failed, a company called Switch Communications bought the facility and turned it into an outsource co-lo facility. We have a number of customers who are using the SuperNAP. Digital Domain is one of these customers who are using the SuperNAP because: 1) it cheaper using their own CA-based data centers and 2) its centrally located between the 3 sites (LA, SF, Vanc) that use the compute resources at the SuperNAP. Today, when D2 fires up their renders (whether from LA, SF, or Vanc.) they fire them up on the LV farm and the Avere nodes in LV automatically populate the data needed for the renders. Another nice element of this architecture is the security. D2 likes the security of this architecture since they store their data in their own data centers. (click) The fourth example is cloud storage. Today, cloud storage is used mainly for backup and not primary data due to the latency of the WAN. However, Avere makes it possible to use cloud storage for primary apps. Whether the users of the cloud storage are in the primary datacenter or a remote site or a cloud computing facility, an Avere cluster can hold the active data local and hide the latency of the cloud. (click) Over time, much of the storage in data centers today will move to the cloud. At this point a solution like Avere is critical to avoid the latency of the WAN/public cloud.
Avere has delivered performance acceleration and scaling to many different customers in many different industries and applications. This includes applications such as VMware, Oracle, rendering, transcoding, software build, chip design and verification, seismic processing, financial simulations, genomic sequencing, and more. This is a place in the presentation where you may want to insert a customer case study from our library that is relevant to the audience of your presentation. In the standard presentation we used the SPEC SFS benchmark since it is the most relevant workload across the broad range of customers we sell to. In the world of file systems there is a well know benchmark called SPEC SFS that is used to compare the performance of NAS systems. All the NAS vendors use the benchmark and post their results on the website shown at the bottom of this slide. SPEC does a great job of providing a detailed, apples-to-apples comparison of NAS products running in a “typical” enterprise-class NAS environment. This slides compares the three top performance results on the SPEC site. Avere is the current record holder with almost 1.6 million ops/sec achieved on a 44-node FXT cluster. Note that this is not a max cluster from Avere. We used just enough nodes to achieve the top spot. Today we can go to 50 nodes per cluster and will go beyond this in the future. In second place is NetApp with a max 24-node cluster mode system. In third place is EMC/Isilon with a max 140-node system S-Series cluster. While achieving the highest performance was an important point for Avere, our primary point was the efficiency of our solution. Just look at the sizes of the systems. We are faster than NetApp and Isilon in just a fraction of the space. 2.5 racks and 6 feet wide for us. 14 feet wide for Isilon. 24 feet wide for NetApp. If you scan across the orange row you can see the details of our performance advantage and our higher efficiency. Avere is the highest performance, the lowest latency, the lowest cost, and use the least amount of space and power.