This is the full presentation I did with Ranajit Nevatia from Panzura (@ranajitN) at Cloud Computing Expo NY in June 2012.
It introduces and explains the concepts of Structured and Unstructured data and why Object Storage will prevail when it comes to the latter.
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
1. Cloud Storage Made Seamless
Marc Villemade Ranajit Nevatia
Technology Evangelist VP, Marketing
Scality Panzura
Slide 1
2. There are two types of data
(roughly)
! Structured
! We (sort of) know how to manage this
! Unstructured
! This is the new beast we have issues with
Slide 2
3. How to define Structured Data?
! Structured data is a set of organized pieces of data
! Relational databases are a perfect example
! Atomic pieces are, on their own, meaningless
Slide 3
4. What about Unstructured Data?
! Unstructured data is self-contained pieces of data
! Self-descriptive
! Meaningful in and of itself
! Typically has metadata attached to it
! Email, Videos, Presentations, Spreadsheets,
satellite images…
! An easy way to think about it is anything that can
be stored in one file is unstructured data
Slide 4
5. Some numbers…
! In 2012, Humanity will generate 2.7 ZB of data 1
! It is estimated that we permanently store ~ 1 ZB of it 2 (~40%)
! 80% of it is unstructured 1
! 500 Quadrillion files (500,000 million million files)
! Next year and so on, it will grow by 50% y-o-y 1
! It will double every 2 years in the next 10 years
Kind of unfathomable, ain’t it?
(1) IDC numbers – (2) University of Southern California (2007)
Slide 5
6. Humans like organized things
Well, some of them at least…
! Structured storage systems have been used for Unstructured Data
! Organized in file systems, hierarchies, directories
! Easier for us
! And then new data creation patterns emerged early 2000s
! The model doesn’t fit anymore
! And here’s why
Slide 6
7. Typical SAN / NAS issues at Scale
! Technology refresh and migration necessary to
benefit from larger disks
! Scheduled maintenance window nuisance
! Limitations on # of files
! Volume management is complex
! Serial architecture compromises performance
! RAID is less efficient for large drives
! FC networks are expensive & point-to-point
! Cost is prohibitive for large capacity
Slide 7
8. Humans like organized things
Well, some of them at least…
! Structured data storage systems are used for Unstructured.
! Organized in file systems, hierarchies, directories
! Easier for us
! And then new data creation patterns emerged early 2000s
! The model doesn’t fit anymore
! SANs and NASes were not made to handle this
Slide 8
9. So what’s the solution?
! We believe it’s Object Storage
! Yahoo!, Amazon, Google.. were the pioneers
! Main Characteristics
! Flat Namespace
! Infinite Scalability
! Elasticity
! Cost-efficiency
! Data availability and durability
Slide 9
10. Scality’s Storage Vision
Their DC Their DC YOUR DC
Their App. YOUR App. YOUR App.
YOUR Data YOUR Data YOUR Data
Slide 10
11. What is the Secret Sauce?
Scality has developed a distributed (scale–out) object-based
storage software to turn x86 servers into Petabyte scale
storage for unstructured data (files).
(Scality is NOT designed for VM, VDI, Relational Database)
• Distributed System
• Distributed metadata
• No Single point of failure
• Self healing
• Organic upgrades
Slide 11
12. What’s unique about Scality RING
• Performance
• ESG Lab report: we’re 10x faster than any other object store
• Hardware-agnostic
• Software Vendor
• Mixed hardware (disks, nodes)
• Erasure-Coding with No penalty on read
• With only 60% overhead
• Tiering
• Policy driven
• Automatic, Transparent
Slide 12