Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cassandra

Big Data Warehousing Meetup
December 10, 2013

Real-time Trade Data Monitoring
with Storm & Cassandra

Agenda
7:00

Networking
Grab a slice of pizza and a drink...

7:15

Welcome & Intro

President, Caserta Concepts
Author, Data Warehouse ETL Toolkit

7:30

Joe Caserta

About the Meetup and about Caserta Concepts

Elliott Cordo

Cassandra

Chief Architect, Caserta Concepts

8:00

Noel Vega
Consultant, Caserta Concepts
Consultant, Dimension Data, LLC

8:309:00

Q&A / More Networking

Storm

About the BDW Meetup
• Big Data is a complex, rapidly changing

landscape
• We want to share our stories and hear

about yours
• Great networking opportunity for like

minded data nerds
• Opportunities to collaborate on exciting

projects
• Founded by Caserta Concepts, Big Data

Analytics, DW & BI Consulting
• Next BDW Meetup: JANUARY 20

About Caserta Concepts
Focused
Expertise
•
•
•
•

Big Data Analytics
Data Warehousing
Business Intelligence
Strategic Data
Ecosystems

Industries Served
•
•
•
•
•

Financial Services
Healthcare / Insurance
Retail / eCommerce
Digital Media / Marketing
K-12 / Higher Education

Founded in 2001
• President: Joe Caserta, industry thought leader,
consultant, educator and co-author, The Data
Warehouse ETL Toolkit (Wiley, 2004)

Caserta Concepts
Listed as one of the 20 Most Promising
Data Analytics Consulting Companies

CIOReview looked at hundreds of data analytics consulting companies and shortlisted
the ones who are at the forefront of tackling the real analytics challenges.
A distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial
board of CIOReview selected the Final 20.

Expertise & Offerings
Strategic Roadmap/
Assessment/Consulting/
Implementation

Big Data
Analytics

Data Warehousing/
ETL/Data Integration

BI/Visualization/
Analytics

Client Portfolio
Finance
& Insurance

Retail/eCommerce
& Manufacturing

Education
& Services

We are hiring
Does this word cloud excite you?

Speak with us about our open positions: jobs@casertaconcepts.com

Why talk about Storm & Cassandra?
Traditional BI

ERP
ETL

Traditional
EDW

Finance
ETL

Ad-Hoc/Canned
Reporting

Legacy

Big Data BI

Big Data Cluster
NoSQL
Database

Storm

Data Analytics

Mahout

N1

MapReduce

N2

N3

Pig/Hive

N4

N5

Hadoop Distributed File System (HDFS)
Horizontally Scalable Environment - Optimized for Analytics

Data Science

What is Storm
• Distributed Event Processor
• Real-time data ingestion and dissemination
• In-Stream ETL
• Reliably process unbounded streams of data
• Storm is fast: Clocked it at over a million tuples per second per node
• It is scalable, fault-tolerant, guarantees your data will be processed

• Preferred technology for real-time big data processing by organizations

worldwide:
• Partial list at https://github.com/nathanmarz/storm/wiki/Powered-By
• Incubator:
• http://wiki.apache.org/incubator/StormProposal

Components of Storm
• Spout – Collects data from upstream feeds and submits

it for processing
• Tuple – A collection of data that is passed within Storm
• Bolt – Processes tuples (Transformations)
• Stream – Identifies outputs from Spouts/Bolts

• Storm usually outputs to a NoSQL database

Why NoSQL?
• Performance:
• Relational databases have a lot of features, overhead that we don’t
need in many cases. Although we will miss some…
• Scalability:
• Most relational databases scale vertically giving them limits to how
large they can get. Federation and Sharding is an awkward manual
process.
• Agile
• Sparse Data / Data with a lot of variation
• Most NoSQL scale horizontally on commodity hardware

What is Cassandra?
• Column families are the equivalent to a table in a RDMS
• Primary unit of storage is a column, they are stored

contiguously
Skinny Rows: Most like relational database. Except
columns are optional and not stored if omitted:

Wide Rows: Rows can be billions of columns wide, used
for time series, relationships, secondary indexes:

REAL TIME TRADE DATA MONITORING
Elliott Cordo
Chief Architect, Caserta Concepts

The Use Case
• Trade data (orders and executions)
• High volume of incoming data
• 500 thousand records per second
• 12 billion messages per day
• Required that data be aggregated and monitored in real

time (end to end latency measured in 100's of ms)
• Both raw messages and analytics stored, persisted to a

database

The Data
• Primarily FIX messages: Financial Information Exchange 

• Established in early 90's as a standard for trade data

communication  widely used throughout the industry

• Basically a delimited file of variable attribute-value pairs
• Looks something like this:
8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 |
11=ATOMNOCCC9990900 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 |
44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 |
10=128 |

• A single trade can be comprised of 1000's of such messages,

although typical trades have about a dozen

Additional Requirements
• Linearly scalable
• Highly available  no single point of failure ,quick recovery
• Quicker time to benefit

• Processing guarantees  NO DATA IS LOST!

Some Sample Analytic Use Cases
• Sum(Notional volume) by Ticker: Daily, Hourly, Minute
• Average trade latency (Execution TS – Order TS)
• Wash Sales (sell within x seconds of last buy) for same

Client/Ticker

How has this system traditionally been
handled
• Typically by manually partitioning the application  Having a number

Message Queue

of independent systems and databases “dividing” the problem
Use Case 1:
Partition A

Database A

Use Case 1:
Partition B

Database B

Use Case 2:
All Partitions

Database C

Main issues 
• Growth requires changing these systems to accept the new
partitioning scheme: Development!
• A lot of different applications replicating complex architecture, tons of
boilerplate code
• Performing analysis across the partitioning schemes very difficult

Need to Establish a Platform as a Service
Architecture
d3.js Analytics

Atomic data

Sensor
Data

Aggregates
Storm Cluster

Event Monitors

• Redis queue is used for ingestion
• Storm is used for real-time ETL and outputs atomic data
and derived data needed for analytics
• Redis is used as a reference data lookup cache and
state
• Real time analytics are produced from the aggregated
data.
• Higher latency ad-hoc analytics are done in Hadoop
using Pig and Hive

Low Latency
Analytics

Deeper Dive: Cassandra as an Analytic
Database
• Based on a blend of Dynamo and BigTable
• Distributed, master-less
• Super fast writes  Can ingest lots of data!
• Very fast reads

Why did we choose it:
• Data throughput requirements
• High availability
• Simple expansion
• Interesting data models for time series data (more on this
later)

Design Practices
• Cassandra does not support aggregation or joins 

Data model must be tuned to usage
• Denormalize your data (flatten your primary dimensional

attributes into your fact)
• Storing the same data redundantly is OK

Might sound weird but we've been doing this all along
in the traditional world modeling our data to make
analytic queries simple!

Wide rows are our friends
• Cassandra composite columns are powerful for analytic

models
• Facilitate multi-dimensional analysis
• A wide row table may have N number of rows, and a
variable number of columns (millions of columns)
ClientA
ClientB
ClientC
…

20130101 20130102 20130103 20130104 20130104 20130105 …
10003
9493
43143
45553
54553
34343 …
45453
34313
54543
`23233
4233
34423 …
3323
35313
43123
54543
43433
4343 …
…
…
…
…
…
..
…

• And now with CQL3 we have “unpacked” wide rows into

named columns  Easy to work with!

More about wide rows!
• The left-most column is the ROW KEY
• It is the mechanism by which the row is distributed across the Cassandra cluster…
• Care must be taken to prevent hot spots: Dates for example are not generally good

candidates because all load will go to given set of servers on a particular day!
• Data can be filtered using equal and “in” clause
ClientA
ClientB
ClientC
…

20130101 20130102 20130103 20130104 20130104 20130105 …
10003
9493
43143
45553
54553
34343 …
45453
34313
54543
`23233
4233
34423 …
3323
35313
43123
54543
43433
4343 …
…
…
…
…
…
..
…
Create table Client_Daily_Summary (
Client text,
Date_ID int,
Trade_Count int,
Primary key (Client, Date_ID))

• The top row is the COLUMN KEY
• Their can be a variable number of columns
• It is acceptable to have millions/ even billions of columns in a table
• Columns keys are sorted and can accept a range query (greater than / less than)

Traditional Cassandra Analytic Model
If we wanted to track trade count by day, hour we could
stream our ETL to two (or more) summary fact tables
ClientA
ClientB
ClientC

20130101 20130102 20130103 20130104 20130104 20130105
10003
9493
43143
45553
54553
34343
45453
34313
54543
`23233
4233
34423
3323
35313
43123
54543
43433
4343

Sample analytic query: Give me daily trade counts for ClientA between Jan 1 and Jan 3:
Select Date_ID, Trade_Count from Client_Hourly_Summary `
where Client='ClientA' and Date_ID>=20130101 and Date_ID <=20130103
ClientA|20131101
ClientA|20131102
ClientB|20131101

0900
1000
4545
332

1000
949
3431
3531

1100
4314
5454
4312

1200
4555
2323
5454

1300
5455
423
4343

1400
3434
3442
434

Sample analytic query: Give me hourly trade counts for ClientA for Jan1 between 9 and 11 AM
Select Hour, Trade_Count from Client_Hourly_Summary `
where Client_Date='ClientA|20131101' and hour >= 900 and <= 1100

But there are other methods too
• Assuming some level of client side aggregation (and additive measures) we

could also further unpack and leverage column keys using CQL 3  A slightly
different use case:
Create table Client_Ticker_Summary (
Client text,
Date_ID int,
Ticker text,
Trade_Count int,
Notional_Volume float,
Primary Key (Client, Date_ID, Ticker))

The first column in the PK definition
is the Row Key aka Partition Key

Look at all this flexible SQL goodness:
select * from Client_Ticker_Summary
where Client in ('ClientA','ClientB')
where Client in ('ClientA','ClientB') and Date_ID >= 20130101 and Date_ID <= 20130103
where Client ='ClientA' and Date_ID >= 20130101 and Date_ID <= 20130103
Select * from Client_Ticker_Summary
where Client = 'ClientA’ and Date_ID=20130101 and Ticker in ('APPL','GE','PG')
ALSO  But not recommended!
where Date_ID > 20120101 allow filtering;
where Date_ID = 20120101 and ticker in ('APPL','GE') allow filtering;

Storing the Atomic data
8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 | 11=ATOMNOCCC9990900 |
20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY TESTING |
59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=128 |

• We must land all atomic data:
• Persistence
• Future replay (new metrics, corrections)
• Drill down capabilities/auditability

• The sparse nature of the FIX data fits the Cassandra data model very

well.
• We will store tags which are actually present in the data, saving space  a few

approaches depending on usage pattern.
Create table Trades_Skinny(
OrderID Text Primary_Key,
Date_ID int,
Ticker int,
Client text,
…Many more columns)
Create index ix_Date_ID on
Trade_Data_Skinny (Date_ID)

Create table Trades_Wide(
Order_ID Text Primary_Key,
Tag text,
Value text,
Primary key (Order_ID, Tag))

Create table Trades_Map(
OrderID Text Primary_Key,
Date_ID int,
Ticker int,
Client text,
Tags map <text, text>)
Create index ix_Date_ID on
Trade_Data_Map (Date_ID)

Big data solutions usually employ multiple DB types
Some considerations:
 Size type requirements:
• Volume: which is a disk space size requirement.
• Velocity: which is an message rate requirement.
 Data-Structure & Query Pattern complexity: Simple K/V pair -vs- Relational -vs- …
 C.A.P. theorem alignment: Which two does of your use-case benefit from?
 Value-add features:
• API: (Interface: e.g. HTTP ReST -vs- Client classes). (Power: e.g.
mget, incrementBy).
• Replication and/or H/A support. (B.C./D.R.)
• Support for Data Processing Patterns (e.g. Riak has Map/Reduce; Redis zSets
has Top-N)
• Transaction support (Redis: Multi; Command list; Exec).
• and so on.

Contact

Elliott Cordo
Principal Consultant, Caserta
Concepts
P: (855) 755-2246 x267
E: elliott@casertaconcepts.com

info@casertaconcepts.com
1(855) 755-2246
www.casertaconcepts.com

DEEP-DIVE INTO STORM TOPOLOGY
Noel Milton Vega
Consultant, Dimension Data, LLC.

Practical Deep Dive: Continuity-of-Service across Storm
failures
An approach to making topologies more resilient to task failure
 Tasks in Storm are the units that do the actual work.
 Tasks can individually fail due to:
 Resource starvation (OOM, CPU)
 Unhandled exceptions
 Timeouts (such as waiting for I/O)
 and so on
 Tasks also fail because parent Executors, Workers or Supervisors fail.
 Nimbus will spawn a replacement task, but in the context of C.o.S. is that
enough?
Answer: No. But, maybe we can work around that.

http://bit.ly/1bsBooT

 My “storm-user” Google group question:

Storyboard: Continuity-of-Service
ACME C
heck Deposit C (H.Q.)
orp

X

S
tep1: deposit client [A-I] checks
S
tep2: update checkbook balance

S
tep1: deposit client [J-R] checks
S

S
tep1: deposit client [S checks
-Z]
S

Blue:
 Deposits a check for an [A-I] client, and is given a deposit receipt for it (Step1).
 Before he’s able to journal the receipt to the check register journal, he quits. (Step2).
1) ACME H.Q. notices that [A-I] checks aren’t being processed. Should the workload be
redistributed? No! (exception policy).
2) Policy Consequence: there’s no difference before & after event, so context has to be
remembered:
 The new hire’s role is as check depositor for ACME (not a plumber for sub-company
FOOBAR).
 Their specific ACME role is to deposit checks for clients [A-I].
 The role did have state: there’s an Aggregate check register; and an incomplete
Transaction.

Storyboard: Continuity-of-Service

Why this example? It has the operational requirements of real-world use cases:
 Distributed model (where processors are autonomous). Suitable for Big Data.
 Specific Failure / Recovery requirements:
 Incomplete Transaction are completed
 Aggregated state is remembered
 Behavior Persistence: Same behavior before & after an exception event (stikyness).

Modeling this use-case story in Storm

Blue:



Deposits a batch of checks for clients [A-I] and is given a deposit receipt for them (Step1).
Before he’s able to journal the receipt to the check register journal, he quits. (Step2).

1) ACME H.Q. notices that [A-I] checks aren’t being processed. Should the workload be
redistributed? No! (by policy).

2) Policy Consequence: there’s no difference before & after event, so context has to be
remembered:
acmeBolt
 The role is check depositor for ACME (not a plumber for sister-company FOO).
acmeBolt task (fields grouped
 The specific ACME role is to deposit checks for clients [A-I].
 The role did have state: there’s an Aggregate check register; and an incomplete
Java objects in the JVM associated with
Transaction.
acmeBolt task

Modeling this use-case story in Storm

What does Storm remember across task fail/restarts? (if
anything)

worker
exec
t0

X

worker
exec

worker
exec

t0
t0
supervisor node 1-of-3

worker
exec
t1

worker
exec

worker
exec

t1
t2

worker
exec
t2

worker
exec

worker
exec

t2
t2

- What is Storm’s grouping/re-grouping policy?
- Will replacement tasks use the same identifier?

Programmatically, what we’re asking is this …

// ===============================
// Constructor.
// ===============================
public bolt01(Properties properties) {
}

worker
exec
t0

X

worker
exec

t0
t0

// ===============================
// prepare() method
// ===============================
public void prepare(Map stormConf, TopologyContext
}
// ===============================
// execute() method.
// ===============================
public void execute(Tuple inTuple) {
}

worker
exec

context,

worker
exec
t1

worker
exec

worker
exec

t1
t2

worker
exec
t2

worker
exec

worker
exec

t2
t2

OutputCollector collector) {

Is identification remembered here?
Is grouping remembered here? (i.e. redistribution
policy)

Lab behavior observations shows Storm does
remember …

componentID =
context.getThisComponentId();
# Defined in topology class. E.g. bolt01

ComponentID
taskPntr1
0
taskPntr2
1
taskPntr3
2
…
taskPntrN
N-1

taskID = context.getThisTaskId();
# An integer between [1 – N], where N is
the
number of tasks, topology-wide.
taskIndex = context.getThisTaskIndex();
# An integer between [0-(N-1)], where N
is
the number of tasks, component-wide.
fqid = componentID + “.0” +
Integer.toString(taskIndex)
# Ex: bolt02.05; spout01.03; bolt01.00

Lab tests show Storm does remember, but what’s
missing?

So in Lab tests we observed the following behaviors in Storm:
 Preserve the FQID (e.g.: bolt01.02) before & after task failures. IDENTITY
PERSISTANCE!
 Tasks with a given FQID will receive the same grouping of data throughout the life of a
topology. (Analogy: New hire will be an ACME check depositor for clients [A-I]).

And yet, there is something still missing?
While Storm can replay unprocessed Tuples that timed-out during the
fail/restart period, it can’t regenerate in-memory (in-JVM) aggregated state
What to do? 

REDIS to the rescue :: Continuity-of-Service

Since we observed the following behaviors in Storm:
 Preserves the FQID (e.g.: bolt01.02) before & after task failures. IDENTITY
PERSISTANCE!
 Tasks with a given FQID will receive the same grouping of data throughout
the life of a topology.

REDIS to the rescue :: Continuity-of-Service
FQID is maintained across task Fail/Restarts
(i.e. for the lifetime of the topology).

// ===============================
// prepare() method
// ===============================
public void prepare(Map stormConf, TopologyContext
[ ... snip ... ]

context,

OutputCollector collector) {

this.componentID = context.getThisComponentId(); // e.g. bolt01; spout03
this.taskIndex =

context.getThisTaskIndex();

// [0-(N-1)]; where N = Number of component tasks.

this.fqid = componentID + “.0” + Integer.toString(this.taskIndex); // bolt01.04; spout03.00
this.redisKeyPrefix = this.fqid; // Use your unique Fully Qualified I.D. as a Redis key prefix.
// Establish connection to Redis [not shown], and recover lost data structures, if any.
this.hashMap = this.jedisClient.hgetAll(this.redisKeyPrefix + “-myMap”); //bolt01.01-myMap
}
// ===============================
// execute() method
// ===============================
public void execute(Tuple inTuple) {
[ ... snip ... ]

Tuple grouping/partitioning is maintained across task
fail/restarts (i.e. for the lifetime of the topology).

String customer = inTuple.getString(0);
double balance += inTuple.getString(1);
this.hashMap.put(customer, balance); // Recovered, as necessary, in prepare().
this.jedisClient.hput(this.redisKeyPrefix + “-myMap”,
customer,
balance);
}

Summary :: Storm / Redis and Continuity-of-Service

Master

r/o Slave (local)

host:6379

Fields grouping within a stream
is based on field-1 of the Tuple.

}

KEY: dataSourceQueue01

spout01.00

bolt01.00

 taskIndex -vstaskID

bolt01.01

bolt01.02

spout01.01

KEY: dataSourceQueue02

spout01.02
spout01.03
spout01.04

KEY: spout01.tupleAchHash

tupleGUID
GUID1
GUID2

...

GUID-n

Tuple
tuple1
tuple2
Tuple-n

KE bolt01.02-dat aS
Y:
truct 1
KE bolt01.02-dat aS
Y:
truct 2
KE bolt01.02-dat aS
Y:
tructN

KE bolt02.00-dat aS
Y:
truct 1
KE bolt02.00-dat aS
Y:
truct 2
KE bolt01.00-dat aS
Y:
tructN

...

spout01.05

bolt02.00
bolt02.01
bolt02.02

}
}






v
v

S
trings (Byte-arrays).
Lists (2-way queue, as linked list)
S
ets
Hashes
S
orted S (Hashes w/ sorted values)
ets
S
e/De-serialize objects as JS
ON
Other in-memory solution: e.g. MemS
QL.

Noel Milton Vega
Consultant, Dimension Data, LLC.
P: (212) 699-2660
E1: noel@casertaconcepts.com
E2: nmvega@didata.us

1(855) 755-2246
www.casertaconcepts.com

Q&A / THANK YOU
501 Fifth Ave
17th Floor
New York, NY 10017
1-855-755-2246

Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cassandra

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cassandra

Similaire à Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cassandra (20)

Plus de Caserta

Plus de Caserta (20)

Dernier

Dernier (20)

Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cassandra

Notes de l'éditeur