The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming

The All-In-One
Package for
2015/08/05
Marat Zhanikeev
maratishe@gmail.com
SWOPP＠Beppu
PDF: http://bit.do/150805
maratishe.github.io

.
Why the All-In-One Package?
• we need a new Big Data processor
• HPC, ManyCore -- etc. are often incorrectly used in Big Data context
• ManyCore is expected to replace MultiCore 12 -- but not good for irregular
jobs
◦ InfiniBand and other ManyCore devices expect highly regular jobs and
data structures
◦ in this paper, Massively Multicore is different from ManyCore
• existing Big Data processors -- Hadoop/MapReduce 01 -- are bad
◦ no support for and no using advantages from multicore 03
◦ bottleneck is at 60Mbps 02
◦ key-value datatype is inefficient, this paper replaces it with data
streaming
12 R.Brightwell+0 "Workshop on Managed Many-Core Systems" 1st Workshop on Managed Many-Core Systems (2008)
01 "Apache Hadoop" http://hadoop.apache.org/ (2015)
03 A.Rowstron+4 "Nobody ever got fired for using Hadoop on a cluster" 1st Hot Topics in Cloud Data Proc. (2012)
02 K.Shvachko "HDFS Scalability: the Limits to Growth" the Magazine of USENIX, vol.35, no.2 (2012)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 2/26
...
2/26

.
The Packet Traffic Story
...
3/26

.
Traffic -and- BigData Similarities
• volume: 10G+ bits per second
• variety=heterogeneity: new capture engines require/use variable header
depth -- DPI in some cases
• variety=heterogeneity (2): various concurrent processing jobs, different
targets and output datatypes
◦ example: M2M pattern detection, heavy hitters, superspreaders
...
4/26

.
Multicore Traffic Processor
Meter
To infrastructure
proper
Gateway
Mirroring
PF_RING
… other PF_RINGs
CPU Cores
Time
Probing Job A
Probing Job B
Probing Job C
Shared Memory
… more CPU cores (same ring, diﬀerent cores)
Lifespan
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 5/26
...
5/26

.
Lockfree Shared Memory
• PFRING is a faster capture driver for raw
packets 07
• key 1: a Lockfree Shared Memory design
• key 2: Double-Linked List (DLL) for sharing
pointers across processes (zero copy) 13
• key 3: spreading the load via stale check
• key 4: No locks, but light non-locking
polling on both sides
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)
...
6/26

.
The Lockfree Design
• locks or MPI, both impose
major overhead -- up to 70%
of time
• lockfree 07: no locking, use DLL
to push stale items to the
tail -- regularly pop the stale
tail
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)
...
7/26

.
Lockfree <> MPI Connection
...
8/26

.
Multicore for Big Data
...
9/26

.
Multicore of Big Data
• Standard HPC: regular structures and jobs, network and storage bottlnecks are
not considered
• bigdata: moving the opposite direction, needs to take care of all the
bottlenecks first
Network
(NW)
Bulk
Storage
(BS)
Shared
Memory
(SM)
Core Output
Big Data Processing
HPC, Simulators, Modeling
Small
Data
...
10/26

.
Smart Multicore for Big Data
• help (1) : circuits for bulk network transfer 09
• help (2) : only one process uses bulk storage for buffering and
distribution
• contention/congestion on RAM cannot be easily avoided -- this overhead
has to be minimized
Bulk
Storage
(BS)
Network
(NW)1
RAM-based
Shared Memory
(sSM)
Parallelaccesses
Ability to isolate
Core Output
Small
Data
09 myself+0 "Circuit Emulation for Big Data Transfers in Clouds" Networking for Big Data, CRC (2015)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 11/26
...
11/26

.
The Big Data Replay Method
...
12/26

.
Traditional Hadoop
Name Node
Storage Node (shard)
file A
file B
file C
…
Hadoop Space
Manager
Hadoop Job
(your code)
Hadoop Job
(your code)
Hadoop Job
(your code)
MapReduce
job (your code)
manymany
Name
Server(s)
Client Machine
Hadoop Client
Your
Code
You
Start Use
Deploy
FindRead/parse
many
• jobs travel over the network
and run on shards
• Name Server is a major
bottleneck and SPOF
• client machine is
outside of the Hadoop space
-- this is why Hadoop
installations are not easily
opened to public
01 "Apache Hadoop" http://hadoop.apache.org/ (2015)
...
13/26

.
Proposed: Big Data Replay
Storage Node
(shard)
Time-Aware
Sub-Store(s)
Manager
Client Machine
Client
Your
Sketcher
You
Start Use
Schedule
Multicore
Replay
Replay Node
many
• dumb storage, bulk transfer
to the Replay Node for replay
• jobs are scheduled by
clients -- easy to API
• biggest feature: full access to a
massively multicore
processor
• ... many other features
...
14/26

.
Simple Big Data Repslay
• note: traditional MapReduce jobs are not time-aware!
Core 1
Core 1
Core X
Replay
Manager
Now(replay)
….
Time-Aligned Big Data
Cursor
Time
Direction
One Sketch One SketchOne Sketch
Start End End End
Read/prepare
Shared Memory
Start
...
15/26

.
Big Data Replay + Hetero. +Massive
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• matching jobs
are packed in
batches
• heterogeneity is
managed by:
1. monitoring the buffer
and
2. repacking on the fly
...
16/26

.
Data Streaming as Big Data Jobs
• jobs based on data streaming 04 are much better: (1) statistically rigid, (2)
accountable, (3) richer/free datatype, (4)....
• since data streaming targets are based on information theory 05,
performance bounds can be estimated statistically
04 S.Muthukrishnan "Data Streams: Algorithms and Applications" Theoretical Computer Science (2005)
05 myself+0 "Methods and Algorithms for Fast Hashing in Data Streaming" Cryptography, CRC (2014)
10 M.Sung+4 "Scalable and Efficient Data Streaming Algorithms..." ICDE Workshop (2006)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 17/26
...
17/26

.
Analysis
...
18/26

.
Analysis Setup
• 8 cores, each core is one batch
• 500 concurrent jobs, random starting times, per-item overhead is defined by
the hotspot distribution
• two models of batch management : drop and grow
...
19/26

.
Analysis: Drop and Grow Models
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• drop model: assume a fixed
batch size, each lagging job is
dropped
◦ ideally, repacked into another batch
• grow model: allow for lagging jobs by
expanding the buffer
◦ ... expend = keep more and more of DLL
tail
...
20/26

.
Analysis: Hotspots
•
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350
Ordered list
0
0.1
0.2
0.3
0.4
0.5
CPULoad,Overhead,etc.
Pop/Hot/Flash distributions (increasing thickness)
an(350) am(5) av(2)
...
21/26

.
Analysis: Result Visualization
0 10 20 30 40 50 60 70 80 90
Number of dropped jobs
2.8
8.4
14
19.6
25.2
30.8
Averagebatchspan(s)
300/5
350/5
350/1
250/1
250/10
450/1
450/5
300/1
400/1
300/10
Drop modelGrow model
• grow model: takes
between 2 to 3
times larger
batches to avoid
drops
• drop model:
between 5% and
10% or drops
depending on the
hotspot distribution
• note: did not
repack the jobs
this time, but this will
help reduce the
number of drops
...
22/26

.
That’s all, thank you ...
...
23/26

.
The Time-Aware Big Data Datatype
• time-aware bigdata is in mid-range between the two extremes -- key-value and
traditional Hadoop shards
KV
Store
Hadoop
(HDFS)
and
MapReduce
TABID
Time-Aware
Big Data
(this demo)
HDFS
+
Lucene
Index
...
24/26

.
DLL: The Double-Linked List
• 4-way DLL with sideways linking is often used when collisions are non-negligible
Item
Item
Item
ItemItem
sideprev
sidenext
sideprev
prev
next
sdienext
next
prev
...
25/26

.
Data Streaming + Bloom + Fast Hashing
• practical data streaming is a complex technology that depends on:
1. efficient Bloom filters
2. fast hashing
Other
Uses
Data
Streaming
Other uses Bloom Filter
Other Types of Hashing Fast Hashing
...
26/26

The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming

Similaire à The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming (20)

Plus de Tokyo University of Science

Plus de Tokyo University of Science (20)

Dernier

Dernier (20)

The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming