SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
The All-In-One
Package for
2015/08/05
Marat Zhanikeev
maratishe@gmail.com
SWOPP@Beppu
PDF: http://bit.do/150805
maratishe.github.io
.
Why the All-In-One Package?
• we need a new Big Data processor
• HPC, ManyCore -- etc. are often incorrectly used in Big Data context
• ManyCore is expected to replace MultiCore 12 -- but not good for irregular
jobs
◦ InfiniBand and other ManyCore devices expect highly regular jobs and
data structures
◦ in this paper, Massively Multicore is different from ManyCore
• existing Big Data processors -- Hadoop/MapReduce 01 -- are bad
◦ no support for and no using advantages from multicore 03
◦ bottleneck is at 60Mbps 02
◦ key-value datatype is inefficient, this paper replaces it with data
streaming
12 R.Brightwell+0 "Workshop on Managed Many-Core Systems" 1st Workshop on Managed Many-Core Systems (2008)
01 "Apache Hadoop" http://hadoop.apache.org/ (2015)
03 A.Rowstron+4 "Nobody ever got fired for using Hadoop on a cluster" 1st Hot Topics in Cloud Data Proc. (2012)
02 K.Shvachko "HDFS Scalability: the Limits to Growth" the Magazine of USENIX, vol.35, no.2 (2012)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 2/26
...
2/26
.
The Packet Traffic Story
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 3/26
...
3/26
.
Traffic -and- BigData Similarities
• volume: 10G+ bits per second
• variety=heterogeneity: new capture engines require/use variable header
depth -- DPI in some cases
• variety=heterogeneity (2): various concurrent processing jobs, different
targets and output datatypes
◦ example: M2M pattern detection, heavy hitters, superspreaders
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 4/26
...
4/26
.
Multicore Traffic Processor
Meter
To infrastructure
proper
Gateway
Mirroring
PF_RING
… other PF_RINGs
CPU Cores
Time
Probing Job A
Probing Job B
Probing Job C
Shared Memory
… more CPU cores (same ring, different cores)
Lifespan
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 5/26
...
5/26
.
Lockfree Shared Memory
• PFRING is a faster capture driver for raw
packets 07
• key 1: a Lockfree Shared Memory design
• key 2: Double-Linked List (DLL) for sharing
pointers across processes (zero copy) 13
• key 3: spreading the load via stale check
• key 4: No locks, but light non-locking
polling on both sides
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 6/26
...
6/26
.
The Lockfree Design
• locks or MPI, both impose
major overhead -- up to 70%
of time
• lockfree 07: no locking, use DLL
to push stale items to the
tail -- regularly pop the stale
tail
07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 7/26
...
7/26
.
Lockfree <> MPI Connection
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 8/26
...
8/26
.
Multicore for Big Data
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 9/26
...
9/26
.
Multicore of Big Data
• Standard HPC: regular structures and jobs, network and storage bottlnecks are
not considered
• bigdata: moving the opposite direction, needs to take care of all the
bottlenecks first
Network
(NW)
Bulk
Storage
(BS)
Shared
Memory
(SM)
Core Output
Big Data Processing
HPC, Simulators, Modeling
Small
Data
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 10/26
...
10/26
.
Smart Multicore for Big Data
• help (1) : circuits for bulk network transfer 09
• help (2) : only one process uses bulk storage for buffering and
distribution
• contention/congestion on RAM cannot be easily avoided -- this overhead
has to be minimized
Bulk
Storage
(BS)
Network
(NW)1
RAM-based
Shared Memory
(sSM)
Parallelaccesses
Ability to isolate
Core Output
Small
Data
09 myself+0 "Circuit Emulation for Big Data Transfers in Clouds" Networking for Big Data, CRC (2015)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 11/26
...
11/26
.
The Big Data Replay Method
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 12/26
...
12/26
.
Traditional Hadoop
Name Node
Storage Node (shard)
file A
file B
file C
…
Hadoop Space
Manager
Hadoop Job
(your code)
Hadoop Job
(your code)
Hadoop Job
(your code)
MapReduce
job (your code)
manymany
Name
Server(s)
Client Machine
Hadoop Client
Your
Code
You
Start Use
Deploy
FindRead/parse
many
• jobs travel over the network
and run on shards
• Name Server is a major
bottleneck and SPOF
• client machine is
outside of the Hadoop space
-- this is why Hadoop
installations are not easily
opened to public
01 "Apache Hadoop" http://hadoop.apache.org/ (2015)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 13/26
...
13/26
.
Proposed: Big Data Replay
Storage Node
(shard)
Time-Aware
Sub-Store(s)
Manager
Client Machine
Client
Your
Sketcher
You
Start Use
Schedule
Multicore
Replay
Replay Node
many
• dumb storage, bulk transfer
to the Replay Node for replay
• jobs are scheduled by
clients -- easy to API
• biggest feature: full access to a
massively multicore
processor
• ... many other features
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 14/26
...
14/26
.
Simple Big Data Repslay
• note: traditional MapReduce jobs are not time-aware!
Core 1
Core 1
Core X
Replay
Manager
Now(replay)
….
Time-Aligned Big Data
Cursor
Time
Direction
One Sketch One SketchOne Sketch
Start End End End
Read/prepare
Shared Memory
Start
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 15/26
...
15/26
.
Big Data Replay + Hetero. +Massive
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• matching jobs
are packed in
batches
• heterogeneity is
managed by:
1. monitoring the buffer
and
2. repacking on the fly
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 16/26
...
16/26
.
Data Streaming as Big Data Jobs
• jobs based on data streaming 04 are much better: (1) statistically rigid, (2)
accountable, (3) richer/free datatype, (4)....
• since data streaming targets are based on information theory 05,
performance bounds can be estimated statistically
04 S.Muthukrishnan "Data Streams: Algorithms and Applications" Theoretical Computer Science (2005)
05 myself+0 "Methods and Algorithms for Fast Hashing in Data Streaming" Cryptography, CRC (2014)
10 M.Sung+4 "Scalable and Efficient Data Streaming Algorithms..." ICDE Workshop (2006)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 17/26
...
17/26
.
Analysis
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 18/26
...
18/26
.
Analysis Setup
• 8 cores, each core is one batch
• 500 concurrent jobs, random starting times, per-item overhead is defined by
the hotspot distribution
• two models of batch management : drop and grow
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 19/26
...
19/26
.
Analysis: Drop and Grow Models
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• drop model: assume a fixed
batch size, each lagging job is
dropped
◦ ideally, repacked into another batch
• grow model: allow for lagging jobs by
expanding the buffer
◦ ... expend = keep more and more of DLL
tail
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 20/26
...
20/26
.
Analysis: Hotspots
•
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350
Ordered list
0
0.1
0.2
0.3
0.4
0.5
CPULoad,Overhead,etc.
Pop/Hot/Flash distributions (increasing thickness)
an(350) am(5) av(2)
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 21/26
...
21/26
.
Analysis: Result Visualization
0 10 20 30 40 50 60 70 80 90
Number of dropped jobs
2.8
8.4
14
19.6
25.2
30.8
Averagebatchspan(s)
300/5
350/5
350/1
250/1
250/10
450/1
450/5
300/1
400/1
300/10
Drop modelGrow model
• grow model: takes
between 2 to 3
times larger
batches to avoid
drops
• drop model:
between 5% and
10% or drops
depending on the
hotspot distribution
• note: did not
repack the jobs
this time, but this will
help reduce the
number of drops
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 22/26
...
22/26
.
That’s all, thank you ...
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 23/26
...
23/26
.
The Time-Aware Big Data Datatype
• time-aware bigdata is in mid-range between the two extremes -- key-value and
traditional Hadoop shards
KV
Store
Hadoop
(HDFS)
and
MapReduce
TABID
Time-Aware
Big Data
(this demo)
HDFS
+
Lucene
Index
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 24/26
...
24/26
.
DLL: The Double-Linked List
• 4-way DLL with sideways linking is often used when collisions are non-negligible
Item
Item
Item
ItemItem
sideprev
sidenext
sideprev
prev
next
sdienext
next
prev
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 25/26
...
25/26
.
Data Streaming + Bloom + Fast Hashing
• practical data streaming is a complex technology that depends on:
1. efficient Bloom filters
2. fast hashing
Other
Uses
Data
Streaming
Other uses Bloom Filter
Other Types of Hashing Fast Hashing
M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 26/26
...
26/26

Contenu connexe

Tendances

Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCPaco Nathan
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Cloudera, Inc.
 
Virtual memory
Virtual memoryVirtual memory
Virtual memoryMohd Arif
 
Virtual memory
Virtual memoryVirtual memory
Virtual memoryrapunzel08
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataDataWorks Summit
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comAlluxio, Inc.
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio, Inc.
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing FrameworksAntonios Katsarakis
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoopabord
 
Cloud-Friendly Hadoop and Hive - StampedeCon 2013
Cloud-Friendly Hadoop and Hive - StampedeCon 2013Cloud-Friendly Hadoop and Hive - StampedeCon 2013
Cloud-Friendly Hadoop and Hive - StampedeCon 2013StampedeCon
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...Franck Pachot
 
ACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsMingliang Liu
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra
 

Tendances (20)

Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Ch09
Ch09Ch09
Ch09
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
The Practice of Alluxio in JD.com
The Practice of Alluxio in JD.comThe Practice of Alluxio in JD.com
The Practice of Alluxio in JD.com
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
 
Distributed Processing Frameworks
Distributed Processing FrameworksDistributed Processing Frameworks
Distributed Processing Frameworks
 
MapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR HadoopMapReduce Improvements in MapR Hadoop
MapReduce Improvements in MapR Hadoop
 
Cloud-Friendly Hadoop and Hive - StampedeCon 2013
Cloud-Friendly Hadoop and Hive - StampedeCon 2013Cloud-Friendly Hadoop and Hive - StampedeCon 2013
Cloud-Friendly Hadoop and Hive - StampedeCon 2013
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
 
ACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC Applications
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 

Similaire à The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming

Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
Irregularity Countermeasures in Massively Parallel BigData Processors
Irregularity Countermeasures in Massively Parallel BigData ProcessorsIrregularity Countermeasures in Massively Parallel BigData Processors
Irregularity Countermeasures in Massively Parallel BigData ProcessorsTokyo University of Science
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Javamalduarte
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
History of Computer Systems - Why we are doing it that way
History of Computer Systems - Why we are doing it that wayHistory of Computer Systems - Why we are doing it that way
History of Computer Systems - Why we are doing it that wayLeo Lorieri
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
 

Similaire à The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming (20)

Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Irregularity Countermeasures in Massively Parallel BigData Processors
Irregularity Countermeasures in Massively Parallel BigData ProcessorsIrregularity Countermeasures in Massively Parallel BigData Processors
Irregularity Countermeasures in Massively Parallel BigData Processors
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 
Mahti quick-start guide
Mahti quick-start guide Mahti quick-start guide
Mahti quick-start guide
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
History of Computer Systems - Why we are doing it that way
History of Computer Systems - Why we are doing it that wayHistory of Computer Systems - Why we are doing it that way
History of Computer Systems - Why we are doing it that way
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Time For DIME
Time For DIMETime For DIME
Time For DIME
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 

Plus de Tokyo University of Science

A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...Tokyo University of Science
 
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesUltrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesTokyo University of Science
 
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Tokyo University of Science
 
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?Tokyo University of Science
 
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Tokyo University of Science
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Tokyo University of Science
 
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Tokyo University of Science
 
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingThe Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingTokyo University of Science
 
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...Tokyo University of Science
 
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesBulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesTokyo University of Science
 
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesFog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesTokyo University of Science
 
On a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicOn a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicTokyo University of Science
 
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsImage-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsTokyo University of Science
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsTokyo University of Science
 
The Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksThe Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksTokyo University of Science
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in CloudsTokyo University of Science
 
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out CodeTokyo University of Science
 
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTowards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTokyo University of Science
 
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Tokyo University of Science
 

Plus de Tokyo University of Science (20)

A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
 
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesUltrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
 
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
 
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
 
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...
 
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
 
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingThe Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
 
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
 
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesBulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
 
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesFog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
 
On a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicOn a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching Logic
 
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsImage-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on Metromaps
 
The Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksThe Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service Networks
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
 
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
 
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTowards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
 
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
 

Dernier

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

The All-In-One Package for Massively Multicore, Heterogeneous Jobs with Hotspots, and Data Streaming

  • 1. The All-In-One Package for 2015/08/05 Marat Zhanikeev maratishe@gmail.com SWOPP@Beppu PDF: http://bit.do/150805 maratishe.github.io
  • 2. . Why the All-In-One Package? • we need a new Big Data processor • HPC, ManyCore -- etc. are often incorrectly used in Big Data context • ManyCore is expected to replace MultiCore 12 -- but not good for irregular jobs ◦ InfiniBand and other ManyCore devices expect highly regular jobs and data structures ◦ in this paper, Massively Multicore is different from ManyCore • existing Big Data processors -- Hadoop/MapReduce 01 -- are bad ◦ no support for and no using advantages from multicore 03 ◦ bottleneck is at 60Mbps 02 ◦ key-value datatype is inefficient, this paper replaces it with data streaming 12 R.Brightwell+0 "Workshop on Managed Many-Core Systems" 1st Workshop on Managed Many-Core Systems (2008) 01 "Apache Hadoop" http://hadoop.apache.org/ (2015) 03 A.Rowstron+4 "Nobody ever got fired for using Hadoop on a cluster" 1st Hot Topics in Cloud Data Proc. (2012) 02 K.Shvachko "HDFS Scalability: the Limits to Growth" the Magazine of USENIX, vol.35, no.2 (2012) M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 2/26 ... 2/26
  • 3. . The Packet Traffic Story M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 3/26 ... 3/26
  • 4. . Traffic -and- BigData Similarities • volume: 10G+ bits per second • variety=heterogeneity: new capture engines require/use variable header depth -- DPI in some cases • variety=heterogeneity (2): various concurrent processing jobs, different targets and output datatypes ◦ example: M2M pattern detection, heavy hitters, superspreaders M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 4/26 ... 4/26
  • 5. . Multicore Traffic Processor Meter To infrastructure proper Gateway Mirroring PF_RING … other PF_RINGs CPU Cores Time Probing Job A Probing Job B Probing Job C Shared Memory … more CPU cores (same ring, different cores) Lifespan 07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 5/26 ... 5/26
  • 6. . Lockfree Shared Memory • PFRING is a faster capture driver for raw packets 07 • key 1: a Lockfree Shared Memory design • key 2: Double-Linked List (DLL) for sharing pointers across processes (zero copy) 13 • key 3: spreading the load via stale check • key 4: No locks, but light non-locking polling on both sides 07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014) M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 6/26 ... 6/26
  • 7. . The Lockfree Design • locks or MPI, both impose major overhead -- up to 70% of time • lockfree 07: no locking, use DLL to push stale items to the tail -- regularly pop the stale tail 07 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014) M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 7/26 ... 7/26
  • 8. . Lockfree <> MPI Connection M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 8/26 ... 8/26
  • 9. . Multicore for Big Data M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 9/26 ... 9/26
  • 10. . Multicore of Big Data • Standard HPC: regular structures and jobs, network and storage bottlnecks are not considered • bigdata: moving the opposite direction, needs to take care of all the bottlenecks first Network (NW) Bulk Storage (BS) Shared Memory (SM) Core Output Big Data Processing HPC, Simulators, Modeling Small Data M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 10/26 ... 10/26
  • 11. . Smart Multicore for Big Data • help (1) : circuits for bulk network transfer 09 • help (2) : only one process uses bulk storage for buffering and distribution • contention/congestion on RAM cannot be easily avoided -- this overhead has to be minimized Bulk Storage (BS) Network (NW)1 RAM-based Shared Memory (sSM) Parallelaccesses Ability to isolate Core Output Small Data 09 myself+0 "Circuit Emulation for Big Data Transfers in Clouds" Networking for Big Data, CRC (2015)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 11/26 ... 11/26
  • 12. . The Big Data Replay Method M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 12/26 ... 12/26
  • 13. . Traditional Hadoop Name Node Storage Node (shard) file A file B file C … Hadoop Space Manager Hadoop Job (your code) Hadoop Job (your code) Hadoop Job (your code) MapReduce job (your code) manymany Name Server(s) Client Machine Hadoop Client Your Code You Start Use Deploy FindRead/parse many • jobs travel over the network and run on shards • Name Server is a major bottleneck and SPOF • client machine is outside of the Hadoop space -- this is why Hadoop installations are not easily opened to public 01 "Apache Hadoop" http://hadoop.apache.org/ (2015) M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 13/26 ... 13/26
  • 14. . Proposed: Big Data Replay Storage Node (shard) Time-Aware Sub-Store(s) Manager Client Machine Client Your Sketcher You Start Use Schedule Multicore Replay Replay Node many • dumb storage, bulk transfer to the Replay Node for replay • jobs are scheduled by clients -- easy to API • biggest feature: full access to a massively multicore processor • ... many other features M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 14/26 ... 14/26
  • 15. . Simple Big Data Repslay • note: traditional MapReduce jobs are not time-aware! Core 1 Core 1 Core X Replay Manager Now(replay) …. Time-Aligned Big Data Cursor Time Direction One Sketch One SketchOne Sketch Start End End End Read/prepare Shared Memory Start M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 15/26 ... 15/26
  • 16. . Big Data Replay + Hetero. +Massive …. Time Now (buffer head) Manager Job Job Buffer tail pos pos Controller Kill 2 Report Manage in realtime One Replay Batch One Buffer One Buffer One BufferJobs Jobs Jobs Replay at a scale 1 • matching jobs are packed in batches • heterogeneity is managed by: 1. monitoring the buffer and 2. repacking on the fly M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 16/26 ... 16/26
  • 17. . Data Streaming as Big Data Jobs • jobs based on data streaming 04 are much better: (1) statistically rigid, (2) accountable, (3) richer/free datatype, (4).... • since data streaming targets are based on information theory 05, performance bounds can be estimated statistically 04 S.Muthukrishnan "Data Streams: Algorithms and Applications" Theoretical Computer Science (2005) 05 myself+0 "Methods and Algorithms for Fast Hashing in Data Streaming" Cryptography, CRC (2014) 10 M.Sung+4 "Scalable and Efficient Data Streaming Algorithms..." ICDE Workshop (2006)M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 17/26 ... 17/26
  • 18. . Analysis M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 18/26 ... 18/26
  • 19. . Analysis Setup • 8 cores, each core is one batch • 500 concurrent jobs, random starting times, per-item overhead is defined by the hotspot distribution • two models of batch management : drop and grow M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 19/26 ... 19/26
  • 20. . Analysis: Drop and Grow Models …. Time Now (buffer head) Manager Job Job Buffer tail pos pos Controller Kill 2 Report Manage in realtime One Replay Batch One Buffer One Buffer One BufferJobs Jobs Jobs Replay at a scale 1 • drop model: assume a fixed batch size, each lagging job is dropped ◦ ideally, repacked into another batch • grow model: allow for lagging jobs by expanding the buffer ◦ ... expend = keep more and more of DLL tail M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 20/26 ... 20/26
  • 21. . Analysis: Hotspots • 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 Ordered list 0 0.1 0.2 0.3 0.4 0.5 CPULoad,Overhead,etc. Pop/Hot/Flash distributions (increasing thickness) an(350) am(5) av(2) M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 21/26 ... 21/26
  • 22. . Analysis: Result Visualization 0 10 20 30 40 50 60 70 80 90 Number of dropped jobs 2.8 8.4 14 19.6 25.2 30.8 Averagebatchspan(s) 300/5 350/5 350/1 250/1 250/10 450/1 450/5 300/1 400/1 300/10 Drop modelGrow model • grow model: takes between 2 to 3 times larger batches to avoid drops • drop model: between 5% and 10% or drops depending on the hotspot distribution • note: did not repack the jobs this time, but this will help reduce the number of drops M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 22/26 ... 22/26
  • 23. . That’s all, thank you ... M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 23/26 ... 23/26
  • 24. . The Time-Aware Big Data Datatype • time-aware bigdata is in mid-range between the two extremes -- key-value and traditional Hadoop shards KV Store Hadoop (HDFS) and MapReduce TABID Time-Aware Big Data (this demo) HDFS + Lucene Index M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 24/26 ... 24/26
  • 25. . DLL: The Double-Linked List • 4-way DLL with sideways linking is often used when collisions are non-negligible Item Item Item ItemItem sideprev sidenext sideprev prev next sdienext next prev M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 25/26 ... 25/26
  • 26. . Data Streaming + Bloom + Fast Hashing • practical data streaming is a complex technology that depends on: 1. efficient Bloom filters 2. fast hashing Other Uses Data Streaming Other uses Bloom Filter Other Types of Hashing Fast Hashing M.Zhanikeev -- maratishe@gmail.com -- allinone: Massively Multicore, Heterogeneous Jobs, and Data Streaming -- http://bit.do/150805 26/26 ... 26/26