3. 3
RED HAT GLUSTER STORAGE
• Software-defined, distributed, scale-out and resilient file store
• Cost-efficient and performant at scale
• Easy to consume, deploy, scale and manage in public,
private, and hybrid cloud architectures
• Offers mature NFS, SMB and HDFS interfaces for enterprise
applications
Nimble file storage for petabyte-scale workloads
Analytics
• Machine analytics with
Splunk
• Big data analytics
Enterprise File Sharing
• Media streaming
• Active Archives
Enterprise Virtualization
Rich Media and Archival
TARGETUSECASES
5. Tick Data
What is a Tick?
A “tick” is the minimum upward or downward movement (any change) in the price of a security
as measured over a period of time.
An "uptick" refers to a trade where the current transaction occurred at a price higher than the
Previous transaction and a "downtick" refers to a transaction that has occurred at a lower price
than the previous transaction. Consequently, a “zerotick” refers to a trade where the current
transaction occurred at a price higher than the previous transaction.
What is Tick Data?
Tick data is time series data containing price, volume and many other dimensions (bid/ask prices,
bid/ask sizes, quote time, trade time, exchange information) for each point of granularity.
Tick Data and Storage
The higher the resolution of tick data collected, the larger will be the dataset size and hence,
the amount of storage capacity required.
Customer Challenge
Standardize on an economical, scale-out, distributed, shared filestore as an alternative to their existing expensive
Isilon storage system for Historical Tick Data.
Financial compliance and regulations require banks, brokerages, and trading institutions to store each
transaction tick. The historical tick data is also used to create richer and more accurate risk models, to
perform backtesting and to run trading simulations for quantitative analytics.
A scalable, cost-efficient solution is imperative since storage capacity is a direct function of the number of data
dimensions and the resolution of tick data collected.
6. High-level Tick Data Workflow
Data Feed 1 Data Feed 2 Data Feed 3
Market Data Servers
(Aggregation of Feed Handlers)
Data Feed N
KDB+
In-memory
TickDB
(Real-time)
Tick
LogFile
Historical Tick Database
EndofDay(EOD)
Intraday
EOD data stored as a distinct Historical Database Partitioned Format “hdpf” file
for that day. This file is typically written as a large sequential stream of blocks.
News
Social
Media
7. Use Case: Historical Tick Database on
Red Hat Gluster Storage
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
GLUSTERFSRHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
Data Feed 1 Data Feed 2 Data Feed 3
Market Data Servers
(Aggregation of Feed Handlers)
Data Feed N
In-memory
TickDB
(Real-time)
Tick
LogFile
EndofDay(EOD)
Intraday
EOD data stored as a distinct Historical Database Partitioned Format “hdpf” file
for that day. This file is typically written as a large sequential stream of blocks.
News
Social
Media
Scale Out Architecture
Mathematical, Algorithmic
Low frequency Calculations:
Risk Modeling, P&L Reporting,
Mark-to-market, Stress Tests,
MC Simulations
8. Historical Tick Database on
Red Hat Gluster Storage –
Disaster Recovery Implementation
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
GLUSTERFSRHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
Data Feed 1 Data Feed 2 Data Feed 3
Market Data Servers
(Aggregation of Feed Handlers)
Data Feed N
In-memory
TickDB
(Real-time)
Tick
LogFile
EOD
Intraday
EOD data stored as a distinct Historical Database Partitioned Format “hdpf” file
for that day. This file is typically written as a large sequential stream of blocks.
News
Social
Media
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
GLUSTERFSRHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
SITE A SITE B
Async
Geo-rep
9. Hadoop Analytics on an
Historical Tick Database
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
GLUSTERFSRHGS
NODE
RHGS
NODE
RHGS
NODE
RHGS
NODE
Data Feed 1 Data Feed 2 Data Feed 3
Market Data Servers
(Aggregation of Feed Handlers)
Data Feed N
In-memory
TickDB
(Real-time)
Tick
LogFile
Historical Tick Database
EndofDay(EOD)
Intraday
News
Social
Media
Hadoop YARN
Batch
(MR)
Script
(Pig)
SQL
(Hive)
Online
(HBase)
Real-time
(Storm)
In-memory
(Spark)
* Risk Analysis
* Trading Risk Analysis
* Ticker Plant Data
* Archival/retention
11. 11
Comparing Throughput
and Costs at Scale
STORAGE PERFORMANCE
SCALABILITY
STORAGE COSTS
SCALABILITY
Number of Storage Nodes Number of Storage Nodes
TotalStorageCosts($)
Reads/WritesThroughput(mBps)
Software Defined
Scale out Storage
Traditional
Enterprise NAS
Storage
Traditional
Enterprise NAS
Storage
Software Defined
Scale out Storage
12. 12
3 Year TCO
(incl. support)
TCO COMPARISON
Throughput Configuration
• HDD-only media
• Higher CPU-to-media ratio
than archive configurations
• 2x replication with RHGS
• 8:3 Erasure Coding with
EMC Isilon
X-210 12LFF 12LFF
(small)
X-410 36LFF 36LFF
(medium)
For 1PB Usable Capacity Throughput-optimized Solutions
Pricing Sources: EMC Isilon: Gartner Competitive Profiles, as of 2/16/16) & Supermicro: Thinkmate, as of 1/13/16)
13. 13
HEAD TO HEAD FEATURE COMPARISON
STANDARD
VS. ADD ON
FEATURES
TOTAL COST
OF OWNERSHIP
HARDWARE
CONFIGURATION
CHOICES
FAULT
TOLERANCE
COMPLEX
PRICING
14. 14
RED HAT GLUSTER STORAGE
Data Services
Erasure Coding
Tiering
Bit-Rot Detection
Snapshots
Quotas
Geo Replication
Protocols
Active/Active NFSv4
SMB3 (protocol negotiation, in-flight
encryption, server-side copy)
Management
Device Management
Dashboard
Monitoring
Command and Control
Security & Data Integrity
SSL based in-flight encryption
At-rest encryption using dm-crypt
SELinux Enforcing Mode
Self Healing
Half the price for comparable features & greater flexibility