Contenu connexe Similaire à AquaQ Analytics Kx Event - Data Direct Networks Presentation (20) AquaQ Analytics Kx Event - Data Direct Networks Presentation1. Getting the most out of multi-year
and multi-source trading history
Glenn Wright, EMEA Systems Architect DDN
June 2014
2. © 2013 DataDirect Networks, Inc.
ddn.com
Agenda
Uh? Who is DDN?
The Evolution of Data in Data Handling Market Systems
The Big Analytics Crunch
What’s hot, what not…. It’s Parallel Performance, stupid!
3. © 2013 DataDirect Networks, Inc.
ddn.com
DDN | The “Big” In Big Data
800%
Paypal accelerates
stream processing
and fraud analytics
by 8x with DDN,
saves $100Ms.
1TB/s
The world’s fastest
file system, to power
the US’s fastest
supercomputer, is
powered by DDN.
Tier 1
Tier1 CDN accelerates
the world’s video traffic
using DDN technology
to exceed customer
SLAs.
3
4. © 2013 DataDirect Networks, Inc.
ddn.com
DDN | The Technology Behind The World’s Leading
Data-Driven Organizations
HPC &
Big Data Analysis
Cloud &
Web Infrastructure
Professional
Media
Security
5. © 2013 DataDirect Networks, Inc.
ddn.com
Big Data & Cloud Infrastructure
DDN’s Award-Winning Product Portfolio
Analytics
Reference
Architectures
EXAScaler™
10Ks of Clients
1TB/s+, HSM
Linux HPC Clients
NFS & CIFS [2014]
Petascale
Lustre® Storag
e
Enterprise
Scale-Out File
Storage
GRIDScaler™
~10K Clients
1TB/s+, HSM
Linux/Windows HPC Clients
NFS & CIFS
SFA™12KX
48GB/s, 1.7M IOPS
1,680 Drives in 2 Racks
Optional Embedded Computing
SFA7700
12.5GB/s, 450K IOPS
60 Drives in 4U
228 Drives in 12U
Storage Fusion Architecture™ Core Storage Platforms
SATA SSD
Flexible Drive Configuration
SAS
SFX™ Automated Flash Caching
WOS® 3.0
32 Trillion Unique Objects
Geo-Replicated Cloud Storage
256 Million Objects/Second
Self-Healing Cloud
Parallel Boolean Search
Cloud Foundation
Big Data Platform
Management
DirectMon
™
Cloud
Tiering
Infinite Memory Engine™ [Tech Preview]
Distributed File System Buffer Cache
WOS7000
60 Drives in 4U
Self-Contained Servers
Adaptive Transparent Flash Cache
SFX API Gives Users Control
[pre-staging, alignment, by-pass]
6. © 2013 DataDirect Networks, Inc.
ddn.com
0.0
20,000,000.0
40,000,000.0
60,000,000.0
80,000,000.0
100,000,000.0
120,000,000.0
140,000,000.0
160,000,000.0
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
TOTAL
Americas
Asia - Pacific
Evolution of Market Systems
SOURCE: World Federation of Exchanges 2011 Annual Report and Statistics
DASD
F DASD
Scale-out NAS
Parallel File System
7. © 2013 DataDirect Networks, Inc.
ddn.com
UNDERLYING ISSUE:
Gaping Performance Bottlenecks
• Moore’s Law has out-stripped improvements to
disk drive technology by two orders of
magnitude during the last decade
• Analytics moved to HPC clusters
• Today’s servers are hopelessly unbalanced
between the CPUs need for data and the
HDDs ability to keep up
HDD vs. CPU Relative Performance Improvement
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
20,000 x
1gb
16gb
8. © 2013 DataDirect Networks, Inc.
ddn.com
Welcome to the Big Analytics Crunch
• 500TB to > 2PB of historical data for one TZ
• Distributed cache : online model reads data at 100s of GB/s IO
(Tick DB application such as kdb+)
• 3D “cube” of in memory distributed data, online, realtime
• 100’s of services/servers working together in memory: low
latency analytics w/ simplicity of persistent File system semantics
• Burst buffer low latency operation mainstream in FSI
► Real time Back testing
► Real time intra-day risk positioning
9. © 2013 DataDirect Networks, Inc.
ddn.com
Why DDN & Why Parallel ?
In Production
Many systems deployed W/W
@ Global Investment Banks and Hedge Funds
Performance and Consolidation
Back test in a few seconds is much closer to the trade event
Mix online history and real time trade analytics
Consolidate in-memory databases against on copy of data
At Scale
Flash – is NOT scale @ capacity
Single namespace, history and real-time
10. © 2013 DataDirect Networks, Inc.
ddn.com
Limitless Scale up and Scale out with kdb+…
Compute Fabric
KDB+ (1) KDB+ (2) KDB+ (3) KDB+ (16)
MDS
Primary
MDS
Replica
OSS1
MDT DDN
SFA7700
DDN
SFA7700
OSS2 OSS3 OSS4
11. © 2013 DataDirect Networks, Inc.
ddn.com
What we changed:
export SLAVECOUNT=160 # number of kdb+ client tasks
Export CLIENTCOUNT=10 # number of processes per kdb server
Q script Query:
l beforeeach.q
R1S:rrdextras flip`k`v!(" S*";",")0:`:rrd.csv
/ year-hibid
outp t:”YRHIBID";
fn:{[f;s;d] flip`date`sym`a!flip raze(f each s)peach d};
NRS:.tasks.rxsg[H;`$t;1;(fn[hb];apickAs[R1S;`Symbol];reverse ALLDATES2011)];
l aftereach.q
symbols:
$glenn head rrd.csv
1,Symbol,LKQQ
1,Symbol,LHDE
1,Symbol,LNJO
1,Symbol,LLTR
1,Symbol,LRFC
1,Symbol,LQGA
1,Symbol,LTNQ
1,Symbol,LSAG
1,Symbol,LQIA
1,Symbol,LKSJ
… x850 symbols vs 84
12. © 2013 DataDirect Networks, Inc.
ddn.com
glenn$ more hostport.txt
127.0.0.1:5000
127.0.0.1:5001
127.0.0.1:5002
127.0.0.1:5003
127.0.0.1:5004
127.0.0.1:5005
127.0.0.1:5006
127.0.0.1:5007
127.0.0.1:5008
127.0.0.1:5009
What we changed (2):
# replace $QEXEC initdb.k -g 1 -p $((baseport+i)) </dev/null &>log$((baseport+i)).log&
for i in `seq 20000 20009`
do
for j in `seq 0 15`
do
echo ssh server-$j "cd $HOME;QHOME=/home/glenn/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &"
ssh gp-2-$j "cd $HOME;QHOME=/home/mpiuser/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &"
while ! nc -z "gp-2-$j" $i; do sleep 0.1; done
done
done
# get ready ??
echo `date -u` $SLAVECOUNT slave tasks started
# then start the servers aimed at the slaves
baseport=5000
for ((i=0; i<$CLIENTCOUNT; i++));
do
$QEXEC initdb.k -g 1 -s -$SLAVECOUNT -p $((baseport+i)) </dev/null &>log$((baseport+i)).log&
while ! nc -z localhost $((baseport+i)); do sleep 0.1; done
Done
# check that everything can startup : $QEXEC startdb.q -s -$SLAVECOUNT -q
13. © 2013 DataDirect Networks, Inc.
ddn.com
What we changed (3):
Startdb.q …
…
/ check all servers are there
/{hopen(x;500)}each("I"$getenv`BASEPORT)+til"I"$getenv`SLAVECOUNT;
{hopen(x;2500)}each hsym`$read0`:slavehostport.txt;
l initdb.k
{hopen(x;500)}each 5000+til"I"$getenv`CLIENTCOUNT;
Cat slavehostport.txt:
192.168.3.51:20000
192.168.3.51:20001
192.168.3.51:20002
192.168.3.51:20003
192.168.3.51:20004
192.168.3.51:20005
192.168.3.51:20006
192.168.3.51:20007
192.168.3.51:20008
192.168.3.51:20009
192.168.3.52:20000
192.168.3.52:20001
192.168.3.52:20002
…. 160 times
14. © 2013 DataDirect Networks, Inc.
ddn.com
Slave (1) slave (2) Slave (3) Slave n
Lustre/DDN Service
/mnt/onefilesystem
Q clients:
Slave x10
Slave x10
Slave x10 Slave x10
Up to 1TB/sec… “n” way server striping or by date/sym
15. © 2013 DataDirect Networks, Inc.
ddn.com
Results of Scaling the service ….
0
50
100
150
200
250
Single Thread Lustre
Latency reduction (number of
seconds for query) *Lower is better
The Parallel FS solution shows a
near linear scalability model for
one instance running over many
nodes, as measured from kdb+.
Latency is the time to wait from
the kdb+ query of 245GB of data.
To put this in context, these nodes
were only equipped with 64GB of
memory.
16. © 2013 DataDirect Networks, Inc.
ddn.com
Some of the many Benefits of kdb+ on Parallel FS
1. Significant decrease in operational latency per kdb+ query, especially when running queries that search
through significant amounts of historical market information. Achieved by balancing content around
multiple file system servers
2. Parallelization of kdb+ query “threads” in a single shared namespace, allowing a user to treat any data
workload independently from other data workloads. “query from hell” on production system is now OK?”
3. Simultaneous read/write operations on a single namespace for the entre database and for any number
of kdb+ clients, (e.g. end of day data consolidations into a hdb instance)
4. Sharing of data amongst different independent hdb/rdb instances. Many instances of kdb can view the
same data, meaning that strategies for data sharing and private data segments may be
consolidated onto the same space. Avoids the need for kdb+ admins to physically copy data around
the network or disks
5. Kdb+ context can be “striped” around all FS servers, or can be allocated in a round robin fashion against
each server. Striping allows the opportunity for some files to attain maximal I/O rates for a single kdb+
“object”.
Notes de l'éditeur Don’t create data copies in local flash/NVRAM
Higher Cost: Capacity, Admin, Power, Space, Software Licensing
No Share, Higher Data Risk, long time to consolidate and checkpoint
Consolidate to a single system that delivers
Linear scaling performance
Single Point of Admin
Higher Density