Caserta Concepts' implementation team presented a solution that performs big data analytics on active trade data in real-time. They presented the core components – Storm for the real-time ingest, Cassandra, a NoSQL database, and others. For more information on future events, please check out http://www.casertaconcepts.com/.
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cassandra
1. Big Data Warehousing Meetup
December 10, 2013
Real-time Trade Data Monitoring
with Storm & Cassandra
2. Agenda
7:00
Networking
Grab a slice of pizza and a drink...
7:15
Welcome & Intro
President, Caserta Concepts
Author, Data Warehouse ETL Toolkit
7:30
Joe Caserta
About the Meetup and about Caserta Concepts
Elliott Cordo
Cassandra
Chief Architect, Caserta Concepts
8:00
Noel Vega
Consultant, Caserta Concepts
Consultant, Dimension Data, LLC
8:309:00
Q&A / More Networking
Storm
3. About the BDW Meetup
• Big Data is a complex, rapidly changing
landscape
• We want to share our stories and hear
about yours
• Great networking opportunity for like
minded data nerds
• Opportunities to collaborate on exciting
projects
• Founded by Caserta Concepts, Big Data
Analytics, DW & BI Consulting
• Next BDW Meetup: JANUARY 20
4. About Caserta Concepts
Focused
Expertise
•
•
•
•
Big Data Analytics
Data Warehousing
Business Intelligence
Strategic Data
Ecosystems
Industries Served
•
•
•
•
•
Financial Services
Healthcare / Insurance
Retail / eCommerce
Digital Media / Marketing
K-12 / Higher Education
Founded in 2001
• President: Joe Caserta, industry thought leader,
consultant, educator and co-author, The Data
Warehouse ETL Toolkit (Wiley, 2004)
5. Caserta Concepts
Listed as one of the 20 Most Promising
Data Analytics Consulting Companies
CIOReview looked at hundreds of data analytics consulting companies and shortlisted
the ones who are at the forefront of tackling the real analytics challenges.
A distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial
board of CIOReview selected the Final 20.
6. Expertise & Offerings
Strategic Roadmap/
Assessment/Consulting/
Implementation
Big Data
Analytics
Data Warehousing/
ETL/Data Integration
BI/Visualization/
Analytics
8. We are hiring
Does this word cloud excite you?
Speak with us about our open positions: jobs@casertaconcepts.com
9. Why talk about Storm & Cassandra?
Traditional BI
ERP
ETL
Traditional
EDW
Finance
ETL
Ad-Hoc/Canned
Reporting
Legacy
Big Data BI
Big Data Cluster
NoSQL
Database
Storm
Data Analytics
Mahout
N1
MapReduce
N2
N3
Pig/Hive
N4
N5
Hadoop Distributed File System (HDFS)
Horizontally Scalable Environment - Optimized for Analytics
Data Science
10. What is Storm
• Distributed Event Processor
• Real-time data ingestion and dissemination
• In-Stream ETL
• Reliably process unbounded streams of data
• Storm is fast: Clocked it at over a million tuples per second per node
• It is scalable, fault-tolerant, guarantees your data will be processed
• Preferred technology for real-time big data processing by organizations
worldwide:
• Partial list at https://github.com/nathanmarz/storm/wiki/Powered-By
• Incubator:
• http://wiki.apache.org/incubator/StormProposal
11. Components of Storm
• Spout – Collects data from upstream feeds and submits
it for processing
• Tuple – A collection of data that is passed within Storm
• Bolt – Processes tuples (Transformations)
• Stream – Identifies outputs from Spouts/Bolts
• Storm usually outputs to a NoSQL database
12. Why NoSQL?
• Performance:
• Relational databases have a lot of features, overhead that we don’t
need in many cases. Although we will miss some…
• Scalability:
• Most relational databases scale vertically giving them limits to how
large they can get. Federation and Sharding is an awkward manual
process.
• Agile
• Sparse Data / Data with a lot of variation
• Most NoSQL scale horizontally on commodity hardware
13. What is Cassandra?
• Column families are the equivalent to a table in a RDMS
• Primary unit of storage is a column, they are stored
contiguously
Skinny Rows: Most like relational database. Except
columns are optional and not stored if omitted:
Wide Rows: Rows can be billions of columns wide, used
for time series, relationships, secondary indexes:
14. REAL TIME TRADE DATA MONITORING
Elliott Cordo
Chief Architect, Caserta Concepts
15. The Use Case
• Trade data (orders and executions)
• High volume of incoming data
• 500 thousand records per second
• 12 billion messages per day
• Required that data be aggregated and monitored in real
time (end to end latency measured in 100's of ms)
• Both raw messages and analytics stored, persisted to a
database
16. The Data
• Primarily FIX messages: Financial Information Exchange
• Established in early 90's as a standard for trade data
communication widely used throughout the industry
• Basically a delimited file of variable attribute-value pairs
• Looks something like this:
8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 |
11=ATOMNOCCC9990900 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 |
44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 |
10=128 |
• A single trade can be comprised of 1000's of such messages,
although typical trades have about a dozen
17. Additional Requirements
• Linearly scalable
• Highly available no single point of failure ,quick recovery
• Quicker time to benefit
• Processing guarantees NO DATA IS LOST!
18. Some Sample Analytic Use Cases
• Sum(Notional volume) by Ticker: Daily, Hourly, Minute
• Average trade latency (Execution TS – Order TS)
• Wash Sales (sell within x seconds of last buy) for same
Client/Ticker
19. How has this system traditionally been
handled
• Typically by manually partitioning the application Having a number
Message Queue
of independent systems and databases “dividing” the problem
Use Case 1:
Partition A
Database A
Use Case 1:
Partition B
Database B
Use Case 2:
All Partitions
Database C
Main issues
• Growth requires changing these systems to accept the new
partitioning scheme: Development!
• A lot of different applications replicating complex architecture, tons of
boilerplate code
• Performing analysis across the partitioning schemes very difficult
20. Need to Establish a Platform as a Service
Architecture
d3.js Analytics
Atomic data
Sensor
Data
Aggregates
Storm Cluster
Event Monitors
• Redis queue is used for ingestion
• Storm is used for real-time ETL and outputs atomic data
and derived data needed for analytics
• Redis is used as a reference data lookup cache and
state
• Real time analytics are produced from the aggregated
data.
• Higher latency ad-hoc analytics are done in Hadoop
using Pig and Hive
Low Latency
Analytics
21. Deeper Dive: Cassandra as an Analytic
Database
• Based on a blend of Dynamo and BigTable
• Distributed, master-less
• Super fast writes Can ingest lots of data!
• Very fast reads
Why did we choose it:
• Data throughput requirements
• High availability
• Simple expansion
• Interesting data models for time series data (more on this
later)
22. Design Practices
• Cassandra does not support aggregation or joins
Data model must be tuned to usage
• Denormalize your data (flatten your primary dimensional
attributes into your fact)
• Storing the same data redundantly is OK
Might sound weird but we've been doing this all along
in the traditional world modeling our data to make
analytic queries simple!
23. Wide rows are our friends
• Cassandra composite columns are powerful for analytic
models
• Facilitate multi-dimensional analysis
• A wide row table may have N number of rows, and a
variable number of columns (millions of columns)
ClientA
ClientB
ClientC
…
20130101 20130102 20130103 20130104 20130104 20130105 …
10003
9493
43143
45553
54553
34343 …
45453
34313
54543
`23233
4233
34423 …
3323
35313
43123
54543
43433
4343 …
…
…
…
…
…
..
…
• And now with CQL3 we have “unpacked” wide rows into
named columns Easy to work with!
24. More about wide rows!
• The left-most column is the ROW KEY
• It is the mechanism by which the row is distributed across the Cassandra cluster…
• Care must be taken to prevent hot spots: Dates for example are not generally good
candidates because all load will go to given set of servers on a particular day!
• Data can be filtered using equal and “in” clause
ClientA
ClientB
ClientC
…
20130101 20130102 20130103 20130104 20130104 20130105 …
10003
9493
43143
45553
54553
34343 …
45453
34313
54543
`23233
4233
34423 …
3323
35313
43123
54543
43433
4343 …
…
…
…
…
…
..
…
Create table Client_Daily_Summary (
Client text,
Date_ID int,
Trade_Count int,
Primary key (Client, Date_ID))
• The top row is the COLUMN KEY
• Their can be a variable number of columns
• It is acceptable to have millions/ even billions of columns in a table
• Columns keys are sorted and can accept a range query (greater than / less than)
25. Traditional Cassandra Analytic Model
If we wanted to track trade count by day, hour we could
stream our ETL to two (or more) summary fact tables
ClientA
ClientB
ClientC
20130101 20130102 20130103 20130104 20130104 20130105
10003
9493
43143
45553
54553
34343
45453
34313
54543
`23233
4233
34423
3323
35313
43123
54543
43433
4343
Sample analytic query: Give me daily trade counts for ClientA between Jan 1 and Jan 3:
Select Date_ID, Trade_Count from Client_Hourly_Summary `
where Client='ClientA' and Date_ID>=20130101 and Date_ID <=20130103
ClientA|20131101
ClientA|20131102
ClientB|20131101
0900
1000
4545
332
1000
949
3431
3531
1100
4314
5454
4312
1200
4555
2323
5454
1300
5455
423
4343
1400
3434
3442
434
Sample analytic query: Give me hourly trade counts for ClientA for Jan1 between 9 and 11 AM
Select Hour, Trade_Count from Client_Hourly_Summary `
where Client_Date='ClientA|20131101' and hour >= 900 and <= 1100
26. But there are other methods too
• Assuming some level of client side aggregation (and additive measures) we
could also further unpack and leverage column keys using CQL 3 A slightly
different use case:
Create table Client_Ticker_Summary (
Client text,
Date_ID int,
Ticker text,
Trade_Count int,
Notional_Volume float,
Primary Key (Client, Date_ID, Ticker))
The first column in the PK definition
is the Row Key aka Partition Key
Look at all this flexible SQL goodness:
select * from Client_Ticker_Summary
where Client in ('ClientA','ClientB')
select * from Client_Ticker_Summary
where Client in ('ClientA','ClientB') and Date_ID >= 20130101 and Date_ID <= 20130103
select * from Client_Ticker_Summary
where Client ='ClientA' and Date_ID >= 20130101 and Date_ID <= 20130103
Select * from Client_Ticker_Summary
where Client = 'ClientA’ and Date_ID=20130101 and Ticker in ('APPL','GE','PG')
ALSO But not recommended!
select * from Client_Ticker_Summary
where Date_ID > 20120101 allow filtering;
select * from Client_Ticker_Summary
where Date_ID = 20120101 and ticker in ('APPL','GE') allow filtering;
27. Storing the Atomic data
8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 52=20071123-05:30:00.000 | 11=ATOMNOCCC9990900 |
20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY TESTING |
59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=128 |
• We must land all atomic data:
• Persistence
• Future replay (new metrics, corrections)
• Drill down capabilities/auditability
• The sparse nature of the FIX data fits the Cassandra data model very
well.
• We will store tags which are actually present in the data, saving space a few
approaches depending on usage pattern.
Create table Trades_Skinny(
OrderID Text Primary_Key,
Date_ID int,
Ticker int,
Client text,
…Many more columns)
Create index ix_Date_ID on
Trade_Data_Skinny (Date_ID)
Create table Trades_Wide(
Order_ID Text Primary_Key,
Tag text,
Value text,
Primary key (Order_ID, Tag))
Create table Trades_Map(
OrderID Text Primary_Key,
Date_ID int,
Ticker int,
Client text,
Tags map <text, text>)
Create index ix_Date_ID on
Trade_Data_Map (Date_ID)
28. Big data solutions usually employ multiple DB types
Some considerations:
Size type requirements:
• Volume: which is a disk space size requirement.
• Velocity: which is an message rate requirement.
Data-Structure & Query Pattern complexity: Simple K/V pair -vs- Relational -vs- …
C.A.P. theorem alignment: Which two does of your use-case benefit from?
Value-add features:
• API: (Interface: e.g. HTTP ReST -vs- Client classes). (Power: e.g.
mget, incrementBy).
• Replication and/or H/A support. (B.C./D.R.)
• Support for Data Processing Patterns (e.g. Riak has Map/Reduce; Redis zSets
has Top-N)
• Transaction support (Redis: Multi; Command list; Exec).
• and so on.
31. Practical Deep Dive: Continuity-of-Service across Storm
failures
An approach to making topologies more resilient to task failure
Tasks in Storm are the units that do the actual work.
Tasks can individually fail due to:
Resource starvation (OOM, CPU)
Unhandled exceptions
Timeouts (such as waiting for I/O)
and so on
Tasks also fail because parent Executors, Workers or Supervisors fail.
Nimbus will spawn a replacement task, but in the context of C.o.S. is that
enough?
Answer: No. But, maybe we can work around that.
http://bit.ly/1bsBooT
My “storm-user” Google group question:
32. Storyboard: Continuity-of-Service
ACME C
heck Deposit C (H.Q.)
orp
X
S
tep1: deposit client [A-I] checks
S
tep2: update checkbook balance
S
tep1: deposit client [J-R] checks
S
tep2: update checkbook balance
S
tep1: deposit client [S checks
-Z]
S
tep2: update checkbook balance
Blue:
Deposits a check for an [A-I] client, and is given a deposit receipt for it (Step1).
Before he’s able to journal the receipt to the check register journal, he quits. (Step2).
1) ACME H.Q. notices that [A-I] checks aren’t being processed. Should the workload be
redistributed? No! (exception policy).
2) Policy Consequence: there’s no difference before & after event, so context has to be
remembered:
The new hire’s role is as check depositor for ACME (not a plumber for sub-company
FOOBAR).
Their specific ACME role is to deposit checks for clients [A-I].
The role did have state: there’s an Aggregate check register; and an incomplete
Transaction.
33. Storyboard: Continuity-of-Service
Why this example? It has the operational requirements of real-world use cases:
Distributed model (where processors are autonomous). Suitable for Big Data.
Specific Failure / Recovery requirements:
Incomplete Transaction are completed
Aggregated state is remembered
Behavior Persistence: Same behavior before & after an exception event (stikyness).
34. Modeling this use-case story in Storm
Blue:
Deposits a batch of checks for clients [A-I] and is given a deposit receipt for them (Step1).
Before he’s able to journal the receipt to the check register journal, he quits. (Step2).
1) ACME H.Q. notices that [A-I] checks aren’t being processed. Should the workload be
redistributed? No! (by policy).
2) Policy Consequence: there’s no difference before & after event, so context has to be
remembered:
acmeBolt
The role is check depositor for ACME (not a plumber for sister-company FOO).
acmeBolt task (fields grouped
The specific ACME role is to deposit checks for clients [A-I].
The role did have state: there’s an Aggregate check register; and an incomplete
Java objects in the JVM associated with
Transaction.
acmeBolt task
38. Lab behavior observations shows Storm does
remember …
http://bit.ly/1bsBooT
componentID =
context.getThisComponentId();
# Defined in topology class. E.g. bolt01
ComponentID
taskPntr1
0
taskPntr2
1
taskPntr3
2
…
taskPntrN
N-1
taskID = context.getThisTaskId();
# An integer between [1 – N], where N is
the
number of tasks, topology-wide.
taskIndex = context.getThisTaskIndex();
# An integer between [0-(N-1)], where N
is
the number of tasks, component-wide.
fqid = componentID + “.0” +
Integer.toString(taskIndex)
# Ex: bolt02.05; spout01.03; bolt01.00
40. Lab tests show Storm does remember, but what’s
missing?
http://bit.ly/1bsBooT
So in Lab tests we observed the following behaviors in Storm:
Preserve the FQID (e.g.: bolt01.02) before & after task failures. IDENTITY
PERSISTANCE!
Tasks with a given FQID will receive the same grouping of data throughout the life of a
topology. (Analogy: New hire will be an ACME check depositor for clients [A-I]).
And yet, there is something still missing?
While Storm can replay unprocessed Tuples that timed-out during the
fail/restart period, it can’t regenerate in-memory (in-JVM) aggregated state
What to do?
41. REDIS to the rescue :: Continuity-of-Service
Since we observed the following behaviors in Storm:
Preserves the FQID (e.g.: bolt01.02) before & after task failures. IDENTITY
PERSISTANCE!
Tasks with a given FQID will receive the same grouping of data throughout
the life of a topology.
42. REDIS to the rescue :: Continuity-of-Service
FQID is maintained across task Fail/Restarts
(i.e. for the lifetime of the topology).
// ===============================
// prepare() method
// ===============================
public void prepare(Map stormConf, TopologyContext
[ ... snip ... ]
context,
OutputCollector collector) {
this.componentID = context.getThisComponentId(); // e.g. bolt01; spout03
this.taskIndex =
context.getThisTaskIndex();
// [0-(N-1)]; where N = Number of component tasks.
this.fqid = componentID + “.0” + Integer.toString(this.taskIndex); // bolt01.04; spout03.00
this.redisKeyPrefix = this.fqid; // Use your unique Fully Qualified I.D. as a Redis key prefix.
// Establish connection to Redis [not shown], and recover lost data structures, if any.
this.hashMap = this.jedisClient.hgetAll(this.redisKeyPrefix + “-myMap”); //bolt01.01-myMap
}
// ===============================
// execute() method
// ===============================
public void execute(Tuple inTuple) {
[ ... snip ... ]
Tuple grouping/partitioning is maintained across task
fail/restarts (i.e. for the lifetime of the topology).
String customer = inTuple.getString(0);
double balance += inTuple.getString(1);
this.hashMap.put(customer, balance); // Recovered, as necessary, in prepare().
this.jedisClient.hput(this.redisKeyPrefix + “-myMap”,
customer,
balance);
}
43. Summary :: Storm / Redis and Continuity-of-Service
Master
r/o Slave (local)
host:6379
Fields grouping within a stream
is based on field-1 of the Tuple.
}
KEY: dataSourceQueue01
spout01.00
bolt01.00
taskIndex -vstaskID
bolt01.01
bolt01.02
spout01.01
KEY: dataSourceQueue02
spout01.02
spout01.03
spout01.04
KEY: spout01.tupleAchHash
tupleGUID
GUID1
GUID2
...
GUID-n
Tuple
tuple1
tuple2
Tuple-n
KE bolt01.02-dat aS
Y:
truct 1
KE bolt01.02-dat aS
Y:
truct 2
KE bolt01.02-dat aS
Y:
tructN
KE bolt02.00-dat aS
Y:
truct 1
KE bolt02.00-dat aS
Y:
truct 2
KE bolt01.00-dat aS
Y:
tructN
...
spout01.05
bolt02.00
bolt02.01
bolt02.02
}
}
v
v
S
trings (Byte-arrays).
Lists (2-way queue, as linked list)
S
ets
Hashes
S
orted S (Hashes w/ sorted values)
ets
S
e/De-serialize objects as JS
ON
Other in-memory solution: e.g. MemS
QL.