Big Data researchers have recently raised an important issue -- a single machine with a state-of-art RAM+SSD+HDD+etc storage engine and multiple cores can do better than a distributed system -- where Hadoop and MapReduce are the most popular examples. This started a race for single-machine processors. The term massively multicore refers to an architecture with a high number of cores -- ideally in hundreds. In such environments, it is crucial to have an optimal engine for dynamic packing of heterogeneous jobs. This paper builds on top of recent technologies like lockfree parallelization, streaming algorithms and hotspot distributions, but introduces new methods and algorithms that make such a package feasible for massively multicore processors.
2. .
Why the All-In-One Package?
• we need a new Big Data processor
• HPC, ManyCore -- etc. are often incorrectly used in Big Data context
• ManyCore is expected to replace MultiCore 12 -- but not good for irregular
jobs
◦ InfiniBand and other ManyCore devices expect highly regular jobs and
data structures
◦ in this paper, Massively Multicore is different from ManyCore
• existing Big Data processors -- Hadoop/MapReduce 01 -- are bad
◦ no support for and no using advantages from multicore 03
◦ bottleneck is at 60Mbps 02
◦ key-value datatype is inefficient, this paper replaces it with data
streaming
12 R.Brightwell+0 "Workshop on Managed Many-Core Systems" 1st Workshop on Managed Many-Core Systems (2008)
01 "Apache Hadoop" http://hadoop.apache.org/ (2015)
03 A.Rowstron+4 "Nobody ever got fired for using Hadoop on a cluster" 1st Hot Topics in Cloud Data Proc. (2012)
02 K.Shvachko "HDFS Scalability: the Limits to Growth" the Magazine of USENIX, vol.35, no.2 (2012)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 2/26
...
2/26
3. .
The Packet Traffic Story
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 3/26
...
3/26
4. .
Traffic -and- BigData Similarities
• volume: 10G+ bits per second
• variety=heterogeneity: new capture engines require/use variable header
depth -- DPI in some cases
• variety=heterogeneity (2): various concurrent processing jobs, different
targets and output datatypes
◦ example: M2M pattern detection, heavy hitters, superspreaders
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 4/26
...
4/26
5. .
Multicore Traffic Processor
Meter
To infrastructure
proper
Gateway
Mirroring
PF_RING
… other PF_RINGs
CPU Cores
Time
Probing Job A
Probing Job B
Probing Job C
Shared Memory
… more CPU cores (same ring, different cores)
Lifespan
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 5/26
...
5/26
6. .
Lockfree Shared Memory
• PFRING is a faster capture driver for raw
packets 07
• key 1: a Lockfree Shared Memory design
• key 2: Double-Linked List (DLL) for sharing
pointers across processes (zero copy) 13
• key 3: spreading the load via stale check
• key 4: No locks, but light non-locking
polling on both sides
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 6/26
...
6/26
7. .
The Lockfree Design
• locks or MPI, both impose
major overhead -- up to 70%
of time
• lockfree 07: no locking, use DLL
to push stale items to the
tail -- regularly pop the stale
tail
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 7/26
...
7/26
9. .
Multicore for Big Data
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 9/26
...
9/26
10. .
Multicore of Big Data
• Standard HPC: regular structures and jobs, network and storage bottlnecks are
not considered
• bigdata: moving the opposite direction, needs to take care of all the
bottlenecks first
Network
(NW)
Bulk
Storage
(BS)
Shared
Memory
(SM)
Core Output
Big Data Processing
HPC, Simulators, Modeling
Small
Data
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 10/26
...
10/26
11. .
Smart Multicore for Big Data
• help (1) : circuits for bulk network transfer 09
• help (2) : only one process uses bulk storage for buffering and
distribution
• contention/congestion on RAM cannot be easily avoided -- this overhead
has to be minimized
Bulk
Storage
(BS)
Network
(NW)1
RAM-based
Shared Memory
(sSM)
Parallelaccesses
Ability to isolate
Core Output
Small
Data
09 myself+0 "Circuit Emulation for Big Data Transfers in Clouds" Networking for Big Data, CRC (2015)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 11/26
...
11/26
12. .
The Big Data Replay Method
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 12/26
...
12/26
13. .
Traditional Hadoop
Name Node
Storage Node (shard)
file A
file B
file C
…
Hadoop Space
Manager
Hadoop Job
(your code)
Hadoop Job
(your code)
Hadoop Job
(your code)
MapReduce
job (your code)
manymany
Name
Server(s)
Client Machine
Hadoop Client
Your
Code
You
Start Use
Deploy
FindRead/parse
many
• jobs travel over the network
and run on shards
• Name Server is a major
bottleneck and SPOF
• client machine is
outside of the Hadoop space
-- this is why Hadoop
installations are not easily
opened to public
01 "Apache Hadoop" http://hadoop.apache.org/ (2015)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 13/26
...
13/26
14. .
Proposed: Big Data Replay
Storage Node
(shard)
Time-Aware
Sub-Store(s)
Manager
Client Machine
Client
Your
Sketcher
You
Start Use
Schedule
Multicore
Replay
Replay Node
many
• dumb storage, bulk transfer
to the Replay Node for replay
• jobs are scheduled by
clients -- easy to API
• biggest feature: full access to a
massively multicore
processor
• ... many other features
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 14/26
...
14/26
15. .
Simple Big Data Repslay
• note: traditional MapReduce jobs are not time-aware!
Core 1
Core 1
Core X
Replay
Manager
Now(replay)
….
Time-Aligned Big Data
Cursor
Time
Direction
One Sketch One SketchOne Sketch
Start End End End
Read/prepare
Shared Memory
Start
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 15/26
...
15/26
16. .
Big Data Replay + Hetero. +Massive
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• matching jobs
are packed in
batches
• heterogeneity is
managed by:
1. monitoring the buffer
and
2. repacking on the fly
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 16/26
...
16/26
17. .
Data Streaming as Big Data Jobs
• jobs based on data streaming 04 are much better: (1) statistically rigid, (2)
accountable, (3) richer/free datatype, (4)....
• since data streaming targets are based on information theory 05,
performance bounds can be estimated statistically
04 S.Muthukrishnan "Data Streams: Algorithms and Applications" Theoretical Computer Science (2005)
05 myself+0 "Methods and Algorithms for Fast Hashing in Data Streaming" Cryptography, CRC (2014)
10 M.Sung+4 "Scalable and Efficient Data Streaming Algorithms..." ICDE Workshop (2006)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 17/26
...
17/26
19. .
Analysis Setup
• 8 cores, each core is one batch
• 500 concurrent jobs, random starting times, per-item overhead is defined by
the hotspot distribution
• two models of batch management : drop and grow
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 19/26
...
19/26
20. .
Analysis: Drop and Grow Models
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• drop model: assume a fixed
batch size, each lagging job is
dropped
◦ ideally, repacked into another batch
• grow model: allow for lagging jobs by
expanding the buffer
◦ ... expend = keep more and more of DLL
tail
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 20/26
...
20/26
22. .
Analysis: Result Visualization
0 10 20 30 40 50 60 70 80 90
Number of dropped jobs
2.8
8.4
14
19.6
25.2
30.8
Averagebatchspan(s)
300/5
350/5
350/1
250/1
250/10
450/1
450/5
300/1
400/1
300/10
Drop modelGrow model
• grow model: takes
between 2 to 3
times larger
batches to avoid
drops
• drop model:
between 5% and
10% or drops
depending on the
hotspot distribution
• note: did not
repack the jobs
this time, but this will
help reduce the
number of drops
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 22/26
...
22/26
23. .
That’s all, thank you ...
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 23/26
...
23/26
24. .
The Time-Aware Big Data Datatype
• time-aware bigdata is in mid-range between the two extremes -- key-value and
traditional Hadoop shards
KV
Store
Hadoop
(HDFS)
and
MapReduce
TABID
Time-Aware
Big Data
(this demo)
HDFS
+
Lucene
Index
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 24/26
...
24/26
25. .
DLL: The Double-Linked List
• 4-way DLL with sideways linking is often used when collisions are non-negligible
Item
Item
Item
ItemItem
sideprev
sidenext
sideprev
prev
next
sdienext
next
prev
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 25/26
...
25/26
26. .
Data Streaming + Bloom + Fast Hashing
• practical data streaming is a complex technology that depends on:
1. efficient Bloom filters
2. fast hashing
Other
Uses
Data
Streaming
Other uses Bloom Filter
Other Types of Hashing Fast Hashing
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 26/26
...
26/26