SlideShare une entreprise Scribd logo
1  sur  93
Télécharger pour lire hors ligne
Performance and
scalability for machine
learning.
Arnaud Rachez (arnaud.rachez@gmail.com)
!
November 2nd, 2015
Outline
• Performance (7mn)
• Parallelism (7mn)
• Scalability (10mn)
Numbers everyone should know
(2015 update)
3
Source: http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
ThrougputGB/s
0
200
400
600
800
L1 L2 L3 RAM
ThrougputGB/s
0
7,5
15
22,5
30
RAM SSD Network
~800MB ~1.25GB~30GB
source: http://forums.aida64.com/topic/2864-i7-5775c-l4-cache-performance/ source: http://www.macrumors.com/2015/05/21/15-inch-retina-macbook-pro-2gbps-throughput/
Outline
• Performance (5-7mn)
• Parallelism (5-7mn)
• Scalability (7-10mn)
Optimising SGD
• Linear regression (like)
stochastic gradient descent
with d=5 features and
n=1,000,000 examples.
• Using Python (1), Numba (2),
Numpy (3) and Cython (4)
(https://gist.github.com/zermelozf/
3cd06c8b0ce28f4eeacd)
• Also compared it to pure C++
code (https://gist.github.com/
zermelozf/
4df67d14f72f04b4338a)
(1)
(2)
(3)
(4)
Runtime optimisation
6
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
Runtime optimisation
6
Optimisation strategies (d=5 & n=1,000,000)
time(ms)
1
10
100
1000
10000
Python Numpy Cython Numba c++
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
Runtime optimisation
6
Optimisation strategies (d=5 & n=1,000,000)
time(ms)
1
10
100
1000
10000
Python Numpy Cython Numba c++
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
Runtime optimisation
6
Optimisation strategies (d=5 & n=1,000,000)
time(ms)
1
10
100
1000
10000
Python Numpy Cython Numba c++
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
Runtime optimisation
6
Optimisation strategies (d=5 & n=1,000,000)
time(ms)
1
10
100
1000
10000
Python Numpy Cython Numba c++
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
Runtime optimisation
6
Optimisation strategies (d=5 & n=1,000,000)
time(ms)
1
10
100
1000
10000
Python Numpy Cython Numba c++
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
Runtime optimisation
6
Optimisation strategies (d=5 & n=1,000,000)
time(ms)
1
10
100
1000
10000
Python Numpy Cython Numba c++
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
Runtime optimisation
6
Optimisation strategies (d=5 & n=1,000,000)
time(ms)
1
10
100
1000
10000
Python Numpy Cython Numba c++
Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
M
em
ory
views
M
em
ory
views
pointers
pointers?
M
em
ory
views?
Runtime optimisation
7
Runtime optimisation
7
Cache optimisation (d=5 & n=1,000,000)
time(ms)
0
40
80
120
160
Numba c++ cython
random linear
Runtime optimisation
7
Cache optimisation (d=5 & n=1,000,000)
time(ms)
0
40
80
120
160
Numba c++ cython
random linear
Runtime optimisation
7
Cache optimisation (d=5 & n=1,000,000)
time(ms)
0
40
80
120
160
Numba c++ cython
random linear
Cache miss
Cache miss
Cache miss
Runtime optimisation
7
Cache optimisation (d=5 & n=1,000,000)
time(ms)
0
40
80
120
160
Numba c++ cython
random linear
Cache hit
Cache hitCache miss
Cache miss
Cache miss
Cache hit
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
Original C
Numpy
Cython
Cython + BLAS
Cython + BLAS + sigmoid table
word/sec (x1000)
0 30 60 90 120
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
Original C
Numpy
Cython
Cython + BLAS
Cython + BLAS + sigmoid table
word/sec (x1000)
0 30 60 90 120
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
Original C
Numpy
Cython
Cython + BLAS
Cython + BLAS + sigmoid table
word/sec (x1000)
0 30 60 90 120
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
Original C
Numpy
Cython
Cython + BLAS
Cython + BLAS + sigmoid table
word/sec (x1000)
0 30 60 90 120
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
Original C
Numpy
Cython
Cython + BLAS
Cython + BLAS + sigmoid table
word/sec (x1000)
0 30 60 90 120
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
Original C
Numpy
Cython
Cython + BLAS
Cython + BLAS + sigmoid table
word/sec (x1000)
0 30 60 90 120
(d>>1) Gensim word2vec case study
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
Original C
Numpy
Cython
Cython + BLAS
Cython + BLAS + sigmoid table
word/sec (x1000)
0 30 60 90 120
pointers
pointers
pointers
What’s this BLAS magic?
Source: https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec_inner.pyx
• vectorized y = alpha*x !
• replaced 3 lines of code!
• translated into a 3x speedup over Cython alone!
• please read http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
**On my MacBook Pro, SciPy automatically links against Apple’s vecLib, which contains an excellent BLAS.
Similarly, Intel’s MKL, AMD’s AMCL, Sun’s SunPerf or the automatically tuned ATLAS are all good choices.
Outline
• Performance (5-7mn)
• Parallelism (5-7mn)
• Scalability (7-10mn)
Hardware trends: CPU
11
Numberofcores
0
1
2
3
4
ClockspeedMhz
0
1000
2000
3000
4000
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Clock speed (Mhz) #Cores
Source: http://www.gotw.ca/publications/concurrency-ddj.htm
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
• … and parallelised with threads!
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
1 thread
2 threads
3 threads
4 threads
word/sec (x1000)
0 100 200 300 400
Original C
Cython + BLAS + sigmoid table
• … and parallelised with threads!
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
1 thread
2 threads
3 threads
4 threads
word/sec (x1000)
0 100 200 300 400
Original C
Cython + BLAS + sigmoid table
• … and parallelised with threads!
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
1 thread
2 threads
3 threads
4 threads
word/sec (x1000)
0 100 200 300 400
Original C
Cython + BLAS + sigmoid table
• … and parallelised with threads!
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
1 thread
2 threads
3 threads
4 threads
word/sec (x1000)
0 100 200 300 400
Original C
Cython + BLAS + sigmoid table
• … and parallelised with threads!
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
1 thread
2 threads
3 threads
4 threads
word/sec (x1000)
0 100 200 300 400
Original C
Cython + BLAS + sigmoid table
• … and parallelised with threads!
(d>>1) Gensim word2vec continued
• Elman style RNN trained with
SGD: 15,079×200 matrix on a
1M word corpus.
• Baseline written by Tomas
Mikolov in optimised C.
• Rewritten by Radim Rehurec in
python.
• Optimised by Radim Rehurec
using Cython, BLAS…
Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
1 thread
2 threads
3 threads
4 threads
word/sec (x1000)
0 100 200 300 400
Original C
Cython + BLAS + sigmoid table
• … and parallelised with threads!
2.85x speedup
(d>>1) Hogwild!on SAG
• Fabian’s experimentation with Julia (lang).
• Running SAG in
parallel, without
a lock.
(d>>1) Hogwild!on SAG
• Fabian’s experimentation with Julia (lang).
• Running SAG in
parallel, without
a lock.
• Very nice
speed up!!!
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
…
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 1 …
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 2 …
job 1 …
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 3 …
job 1
job 2
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 3 job 4 …
job 1
job 2
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 3 job 4 job 5 …
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
done
done
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 5 …
job 3
job 4
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
done
done
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 5 …
job 3
job 4
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
done
done
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
job 5 …
job 4
…
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
done
done
done
Et cetera…
Data and does not fit in memory…
Stream data from disk…
… but you cannot read in parallel…
Producer/Consumer pattern
chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 …
…
job 4
job 5 …
…thread 2"
(consumer)
thread 2"
(consumer)
thread 1"
(producer)
done
done
done
Et cetera…
How many consumers?
It depends…
!
• Gensim (R. Rehurec)
• Saw the impact up to 4 consumers earlier

• Vowpal Wabbit (J. Langford)
• Claims no gain with more than 1 consumer!
• 2’10’’ on my macbook pro for ~10GB and 50MM lines 

(Criteo’s advertising dataset).
!
• CNNs pre-processing (S. Dieleman)
• Big impact with ?? (several) consumers!
• Useful for data augmentation/preprocessing
5.3GB (~105MM lines) word count
0
55
110
165
220
Number of consumers
1 2 3 4 5 6
Word count java benchmark
source: https://gist.github.com/nicomak/1d6561e6f71d936d3178
• Macbook pro 15’’ 2014
• `sudo purge`
Outline
• Performance (5-7mn)
• Parallelism (5-7mn)
• Scalability (7-10mn)
Hardware trends: HDD
Capacity(GB)
0
150
300
450
600
Timetoread(sec)
0
1 000
2 000
3 000
4 000
1979 1983 1993 1998 1999 2001 2003 2008 2011
Read full disk (sec.) Capacity (GB)
Source: https://tylermuth.wordpress.com/2011/11/02
18
Distributed computing
19
Scalability - A perspective on Big data
Distributed computing
19
Scalability - A perspective on Big data
Distributed computing
19
• Strong scaling: if you throw twice as many machines at
the task, you solve it in half the time.

Usually relevant when the task is CPU bound.
Scalability - A perspective on Big data
Distributed computing
19
• Strong scaling: if you throw twice as many machines at
the task, you solve it in half the time.

Usually relevant when the task is CPU bound.
• Weak scaling: if the dataset is twice as big, throw twice
as many machines at it to solve the task in constant time.

Memory bound tasks… usually.
Scalability - A perspective on Big data
Distributed computing
19
• Strong scaling: if you throw twice as many machines at
the task, you solve it in half the time.

Usually relevant when the task is CPU bound.
• Weak scaling: if the dataset is twice as big, throw twice
as many machines at it to solve the task in constant time.

Memory bound tasks… usually.
Most “big data” problems are I/O bound. Hard to solve the task in an
acceptable time independently of the size of the data (weak scaling).
Scalability - A perspective on Big data
Bring computation to data
20
Bring computation to data
20
Map-Reduce: Statistical query model
Bring computation to data
20
Map-Reduce: Statistical query model
the sum corresponds	

to a reduce operation
Bring computation to data
20
Map-Reduce: Statistical query model
f, the map function, is	

sent to every machine
the sum corresponds	

to a reduce operation
Bring computation to data
20
Map-Reduce: Statistical query model
f, the map function, is	

sent to every machine
the sum corresponds	

to a reduce operation
• D. Caragea et al., A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning
Decision Trees. Int. J. Hybrid Intell. Syst. 2004	

• Chu et al., Map-Reduce for Machine Learning on Multicore. NIPS’06.
Spark on Criteo’s data
!
• Logistic regression trained with
minibatch SGD"
• 10GB of data (50MM lines). 

Caveat: Quite small for a benchmark
• Super linear strong
scalability. 

Not theoretically possible => small
dataset + few instances saturate.
Numberofcores
0
10
20
30
40
timeinsec.
0
325
650
975
1300
Number of AWS nodes
4 6 8 10
time (sec) #cores
Spark on Criteo’s data
!
• Logistic regression trained with
minibatch SGD"
• 10GB of data (50MM lines). 

Caveat: Quite small for a benchmark
• Super linear strong
scalability. 

Not theoretically possible => small
dataset + few instances saturate.
Numberofcores
0
10
20
30
40
timeinsec.
0
325
650
975
1300
Number of AWS nodes
4 6 8 10
time (sec) #cores
Manual setup of the cluster
was a bit painful…
Software stack for big data
22
Software stack for big data
22
Local Standalone YARNMESOS
Cluster"
manager
Software stack for big data
22
Local Standalone YARNMESOS
HDFS Tachyon Cassandra HBase Others…
Cluster"
manager
Storage "
layer
Software stack for big data
22
Local Standalone YARNMESOS
HDFS Tachyon Cassandra HBase Others…
Spark"
Memory-optimised execution
engine
Flink"
Apache incubated excution
engine.
Hadoop MR 2"
Cluster"
manager
Storage "
layer
Execution "
layer
Software stack for big data
22
Local Standalone YARNMESOS
HDFS Tachyon Cassandra HBase Others…
Spark"
Memory-optimised execution
engine
Flink"
Apache incubated excution
engine.
Hadoop MR 2"
MLlib
GraphX
Streaming
SQL/
Datafra
me
Cluster"
manager
Storage "
layer
Execution "
layer
Libraries
Software stack for big data
22
Local Standalone YARNMESOS
HDFS Tachyon Cassandra HBase Others…
Spark"
Memory-optimised execution
engine
Flink"
Apache incubated excution
engine.
Hadoop MR 2"
MLlib
GraphX
Streaming
SQL/
Datafra
me
FlinkML
Gelly
(graph)
TableAPI
Batch
Cluster"
manager
Storage "
layer
Execution "
layer
Libraries
Software stack MESOS vs YARN
23
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
• … but resources are requested for the entire job.
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
• … but resources are requested for the entire job.
Cluster management frameworks
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
• … but resources are requested for the entire job.
Cluster management frameworks
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
• … but resources are requested for the entire job.
Cluster management frameworks
• Concurrent access (multiuser)
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
• … but resources are requested for the entire job.
Cluster management frameworks
• Concurrent access (multiuser)
• Hyperparameter tuning (multijob)
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
• … but resources are requested for the entire job.
Cluster management frameworks
• Concurrent access (multiuser)
• Hyperparameter tuning (multijob)
Software stack MESOS vs YARN
23
• Standalone mode is fastest…
• … but resources are requested for the entire job.
Cluster management frameworks
• Concurrent access (multiuser)
• Hyperparameter tuning (multijob)
Mesos YARN
• Framework receive offers
• Easy install on AWS, GCE
• Lots of compatible frameworks:
Spark, MPI, Cassandra,
HDFS…
• Mesosphere’s DCOS is really,
really easy to use.
• Frameworks make offers
• Configuration hell (can be
made easier with puppet/
ansible recipes
• Several compatible
frameworks: Spark, Flink,
HDFS…
Infrastructure stack
• AWS = AWeSome
• Basic instance with spot price
!
!
!
!
• Graphical Network Designer
Infrastructure stack
• AWS = AWeSome
• Basic instance with spot price
!
!
!
!
• Graphical Network Designer
Infrastructure stack
• AWS = AWeSome
• Basic instance with spot price
!
!
!
!
• Graphical Network Designer
10 r2.2xlarge instances for
(350GB mem. & 40 cores)
0.85$/hour
Infrastructure stack
• AWS = AWeSome
• Basic instance with spot price
!
!
!
!
• Graphical Network Designer
Infrastructure stack
Infrastructure stack
VPC
Infrastructure stack
VPC
Subnets
public/private
Infrastructure stack
VPC
Subnets
public/private
Security rules
Infrastructure stack
VPC
Subnets
public/private
Security rules
Bootstrap config
for master/slaves
Infrastructure stack
VPC
Subnets
public/private
Security rules
Bootstrap config
for master/slaves
Network
entry point
Infrastructure stack
Source: https://aws.amazon.com/architecture/
- Questions -
“Thank you”
What’s coming in the next few years ?
BONUS

Contenu connexe

Tendances

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemShuai Yuan
 
Inside the ABC's new Media Transcoding system, Metro
Inside the ABC's new Media Transcoding system, MetroInside the ABC's new Media Transcoding system, Metro
Inside the ABC's new Media Transcoding system, MetroDaphne Chong
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Video Transcoding at the ABC with Microservices at GOTO Chicago
Video Transcoding at the ABC with Microservices at GOTO ChicagoVideo Transcoding at the ABC with Microservices at GOTO Chicago
Video Transcoding at the ABC with Microservices at GOTO ChicagoDaphne Chong
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 

Tendances (10)

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
Inside the ABC's new Media Transcoding system, Metro
Inside the ABC's new Media Transcoding system, MetroInside the ABC's new Media Transcoding system, Metro
Inside the ABC's new Media Transcoding system, Metro
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Video Transcoding at the ABC with Microservices at GOTO Chicago
Video Transcoding at the ABC with Microservices at GOTO ChicagoVideo Transcoding at the ABC with Microservices at GOTO Chicago
Video Transcoding at the ABC with Microservices at GOTO Chicago
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
R and C++
R and C++R and C++
R and C++
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 

En vedette

Lightning: large scale machine learning in python
Lightning: large scale machine learning in pythonLightning: large scale machine learning in python
Lightning: large scale machine learning in pythonFabian Pedregosa
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
Mobile commerce km
Mobile commerce kmMobile commerce km
Mobile commerce kmKartik Mehta
 
TIBCO Loyalty Lab paris event
TIBCO Loyalty Lab paris eventTIBCO Loyalty Lab paris event
TIBCO Loyalty Lab paris eventGerald Guigui
 
Implications of 4G Deployments (MEF for MPLS World Congress Ethernet Wholesa...
Implications of 4G Deployments (MEF for MPLS World Congress  Ethernet Wholesa...Implications of 4G Deployments (MEF for MPLS World Congress  Ethernet Wholesa...
Implications of 4G Deployments (MEF for MPLS World Congress Ethernet Wholesa...Javier Gonzalez
 
Seerus analytics or how integrate smart data in your company
Seerus analytics or how integrate smart data in your company Seerus analytics or how integrate smart data in your company
Seerus analytics or how integrate smart data in your company Quentin Liénart
 
Growth hacking - Telecom bretagne - 2015-10-21
Growth hacking - Telecom bretagne - 2015-10-21Growth hacking - Telecom bretagne - 2015-10-21
Growth hacking - Telecom bretagne - 2015-10-21Francois Pacot
 
CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...
CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...
CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...af83
 
Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113Erwan Pigneul
 
Brand Positioning, a component of INDIGITAL BRANDING MODEL©
Brand Positioning, a component of INDIGITAL BRANDING MODEL©Brand Positioning, a component of INDIGITAL BRANDING MODEL©
Brand Positioning, a component of INDIGITAL BRANDING MODEL©Alfredo Escobar
 
Zéphir, ERP dans le Cloud
Zéphir, ERP dans le CloudZéphir, ERP dans le Cloud
Zéphir, ERP dans le CloudZéphir
 
Big on Mobile, Big on Facebook. How the European super startups did it.
Big on Mobile, Big on Facebook. How the European super startups did it. Big on Mobile, Big on Facebook. How the European super startups did it.
Big on Mobile, Big on Facebook. How the European super startups did it. Julien Lesaicherre
 
IBM MQ Overview (IBM Message Queue)
IBM MQ Overview (IBM Message Queue)IBM MQ Overview (IBM Message Queue)
IBM MQ Overview (IBM Message Queue)Juarez Junior
 
Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...
Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...
Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...Saurabh Mittra
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQLSurat Singh Bhati
 

En vedette (20)

Lightning: large scale machine learning in python
Lightning: large scale machine learning in pythonLightning: large scale machine learning in python
Lightning: large scale machine learning in python
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
CANDDi Insights
CANDDi InsightsCANDDi Insights
CANDDi Insights
 
Mobile commerce km
Mobile commerce kmMobile commerce km
Mobile commerce km
 
TIBCO Loyalty Lab paris event
TIBCO Loyalty Lab paris eventTIBCO Loyalty Lab paris event
TIBCO Loyalty Lab paris event
 
Implications of 4G Deployments (MEF for MPLS World Congress Ethernet Wholesa...
Implications of 4G Deployments (MEF for MPLS World Congress  Ethernet Wholesa...Implications of 4G Deployments (MEF for MPLS World Congress  Ethernet Wholesa...
Implications of 4G Deployments (MEF for MPLS World Congress Ethernet Wholesa...
 
Seerus analytics or how integrate smart data in your company
Seerus analytics or how integrate smart data in your company Seerus analytics or how integrate smart data in your company
Seerus analytics or how integrate smart data in your company
 
Introduction to C#3
Introduction to C#3Introduction to C#3
Introduction to C#3
 
Growth hacking - Telecom bretagne - 2015-10-21
Growth hacking - Telecom bretagne - 2015-10-21Growth hacking - Telecom bretagne - 2015-10-21
Growth hacking - Telecom bretagne - 2015-10-21
 
CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...
CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...
CogLab | Imaginove | UI#02 – BCI : Usages et enjeux pour l’innovation et la c...
 
Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113
 
Brand Positioning, a component of INDIGITAL BRANDING MODEL©
Brand Positioning, a component of INDIGITAL BRANDING MODEL©Brand Positioning, a component of INDIGITAL BRANDING MODEL©
Brand Positioning, a component of INDIGITAL BRANDING MODEL©
 
Zéphir, ERP dans le Cloud
Zéphir, ERP dans le CloudZéphir, ERP dans le Cloud
Zéphir, ERP dans le Cloud
 
sfPot aop
sfPot aopsfPot aop
sfPot aop
 
Big on Mobile, Big on Facebook. How the European super startups did it.
Big on Mobile, Big on Facebook. How the European super startups did it. Big on Mobile, Big on Facebook. How the European super startups did it.
Big on Mobile, Big on Facebook. How the European super startups did it.
 
IBM MQ Overview (IBM Message Queue)
IBM MQ Overview (IBM Message Queue)IBM MQ Overview (IBM Message Queue)
IBM MQ Overview (IBM Message Queue)
 
Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...
Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...
Indian IT industry analysis of 5 slides and company ( Infosys) analysis ( FY ...
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQL
 
Best Bourbons
Best BourbonsBest Bourbons
Best Bourbons
 

Similaire à Performance and scalability for machine learning

Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileDatabricks
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009
DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009
DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009Harshal Hayatnagarkar
 
DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0
DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0
DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0Plain Concepts
 
Writing high performance code in NetCore 3.0
Writing high performance code in NetCore 3.0Writing high performance code in NetCore 3.0
Writing high performance code in NetCore 3.0Javier Cantón Ferrero
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit
 
Node.js - Advanced Basics
Node.js - Advanced BasicsNode.js - Advanced Basics
Node.js - Advanced BasicsDoug Jones
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Efficient use of NodeJS
Efficient use of NodeJSEfficient use of NodeJS
Efficient use of NodeJSYura Bogdanov
 
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...Ontico
 
Node.js security - JS Day Italy 2018
Node.js security - JS Day Italy 2018Node.js security - JS Day Italy 2018
Node.js security - JS Day Italy 2018Liran Tal
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Labs
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetEric Haibin Lin
 
Resource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsResource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsSharma Podila
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceScyllaDB
 

Similaire à Performance and scalability for machine learning (20)

Serial-War
Serial-WarSerial-War
Serial-War
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009
DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009
DSL Construction with Ruby - ThoughtWorks Masterclass Series 2009
 
DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0
DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0
DotNet 2019 | Javier Cantón - Writing high performance code in NetCore 3.0
 
Writing high performance code in NetCore 3.0
Writing high performance code in NetCore 3.0Writing high performance code in NetCore 3.0
Writing high performance code in NetCore 3.0
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
Node.js - Advanced Basics
Node.js - Advanced BasicsNode.js - Advanced Basics
Node.js - Advanced Basics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Efficient use of NodeJS
Efficient use of NodeJSEfficient use of NodeJS
Efficient use of NodeJS
 
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
Высокопроизводительный инференс глубоких сетей на GPU с помощью TensorRT / Ма...
 
Node.js security - JS Day Italy 2018
Node.js security - JS Day Italy 2018Node.js security - JS Day Italy 2018
Node.js security - JS Day Italy 2018
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 
Resource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsResource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native Environments
 
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
 
Venkat ns2
Venkat ns2Venkat ns2
Venkat ns2
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
 

Dernier

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Dernier (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Performance and scalability for machine learning

  • 1. Performance and scalability for machine learning. Arnaud Rachez (arnaud.rachez@gmail.com) ! November 2nd, 2015
  • 2. Outline • Performance (7mn) • Parallelism (7mn) • Scalability (10mn)
  • 3. Numbers everyone should know (2015 update) 3 Source: http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html ThrougputGB/s 0 200 400 600 800 L1 L2 L3 RAM ThrougputGB/s 0 7,5 15 22,5 30 RAM SSD Network ~800MB ~1.25GB~30GB source: http://forums.aida64.com/topic/2864-i7-5775c-l4-cache-performance/ source: http://www.macrumors.com/2015/05/21/15-inch-retina-macbook-pro-2gbps-throughput/
  • 4. Outline • Performance (5-7mn) • Parallelism (5-7mn) • Scalability (7-10mn)
  • 5. Optimising SGD • Linear regression (like) stochastic gradient descent with d=5 features and n=1,000,000 examples. • Using Python (1), Numba (2), Numpy (3) and Cython (4) (https://gist.github.com/zermelozf/ 3cd06c8b0ce28f4eeacd) • Also compared it to pure C++ code (https://gist.github.com/ zermelozf/ 4df67d14f72f04b4338a) (1) (2) (3) (4)
  • 7. Runtime optimisation 6 Optimisation strategies (d=5 & n=1,000,000) time(ms) 1 10 100 1000 10000 Python Numpy Cython Numba c++ Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
  • 8. Runtime optimisation 6 Optimisation strategies (d=5 & n=1,000,000) time(ms) 1 10 100 1000 10000 Python Numpy Cython Numba c++ Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
  • 9. Runtime optimisation 6 Optimisation strategies (d=5 & n=1,000,000) time(ms) 1 10 100 1000 10000 Python Numpy Cython Numba c++ Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
  • 10. Runtime optimisation 6 Optimisation strategies (d=5 & n=1,000,000) time(ms) 1 10 100 1000 10000 Python Numpy Cython Numba c++ Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
  • 11. Runtime optimisation 6 Optimisation strategies (d=5 & n=1,000,000) time(ms) 1 10 100 1000 10000 Python Numpy Cython Numba c++ Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
  • 12. Runtime optimisation 6 Optimisation strategies (d=5 & n=1,000,000) time(ms) 1 10 100 1000 10000 Python Numpy Cython Numba c++ Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd
  • 13. Runtime optimisation 6 Optimisation strategies (d=5 & n=1,000,000) time(ms) 1 10 100 1000 10000 Python Numpy Cython Numba c++ Source: https://gist.github.com/zermelozf/3cd06c8b0ce28f4eeacd M em ory views M em ory views pointers pointers? M em ory views?
  • 15. Runtime optimisation 7 Cache optimisation (d=5 & n=1,000,000) time(ms) 0 40 80 120 160 Numba c++ cython random linear
  • 16. Runtime optimisation 7 Cache optimisation (d=5 & n=1,000,000) time(ms) 0 40 80 120 160 Numba c++ cython random linear
  • 17. Runtime optimisation 7 Cache optimisation (d=5 & n=1,000,000) time(ms) 0 40 80 120 160 Numba c++ cython random linear Cache miss Cache miss Cache miss
  • 18. Runtime optimisation 7 Cache optimisation (d=5 & n=1,000,000) time(ms) 0 40 80 120 160 Numba c++ cython random linear Cache hit Cache hitCache miss Cache miss Cache miss Cache hit
  • 19. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/
  • 20. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ Original C Numpy Cython Cython + BLAS Cython + BLAS + sigmoid table word/sec (x1000) 0 30 60 90 120
  • 21. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ Original C Numpy Cython Cython + BLAS Cython + BLAS + sigmoid table word/sec (x1000) 0 30 60 90 120
  • 22. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ Original C Numpy Cython Cython + BLAS Cython + BLAS + sigmoid table word/sec (x1000) 0 30 60 90 120
  • 23. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ Original C Numpy Cython Cython + BLAS Cython + BLAS + sigmoid table word/sec (x1000) 0 30 60 90 120
  • 24. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ Original C Numpy Cython Cython + BLAS Cython + BLAS + sigmoid table word/sec (x1000) 0 30 60 90 120
  • 25. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ Original C Numpy Cython Cython + BLAS Cython + BLAS + sigmoid table word/sec (x1000) 0 30 60 90 120
  • 26. (d>>1) Gensim word2vec case study • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ Original C Numpy Cython Cython + BLAS Cython + BLAS + sigmoid table word/sec (x1000) 0 30 60 90 120 pointers pointers pointers
  • 27. What’s this BLAS magic? Source: https://github.com/piskvorky/gensim/blob/develop/gensim/models/word2vec_inner.pyx • vectorized y = alpha*x ! • replaced 3 lines of code! • translated into a 3x speedup over Cython alone! • please read http://rare-technologies.com/word2vec-in-python-part-two-optimizing/ **On my MacBook Pro, SciPy automatically links against Apple’s vecLib, which contains an excellent BLAS. Similarly, Intel’s MKL, AMD’s AMCL, Sun’s SunPerf or the automatically tuned ATLAS are all good choices.
  • 28. Outline • Performance (5-7mn) • Parallelism (5-7mn) • Scalability (7-10mn)
  • 29. Hardware trends: CPU 11 Numberofcores 0 1 2 3 4 ClockspeedMhz 0 1000 2000 3000 4000 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Clock speed (Mhz) #Cores Source: http://www.gotw.ca/publications/concurrency-ddj.htm
  • 30. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/
  • 31. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/ • … and parallelised with threads!
  • 32. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/ 1 thread 2 threads 3 threads 4 threads word/sec (x1000) 0 100 200 300 400 Original C Cython + BLAS + sigmoid table • … and parallelised with threads!
  • 33. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/ 1 thread 2 threads 3 threads 4 threads word/sec (x1000) 0 100 200 300 400 Original C Cython + BLAS + sigmoid table • … and parallelised with threads!
  • 34. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/ 1 thread 2 threads 3 threads 4 threads word/sec (x1000) 0 100 200 300 400 Original C Cython + BLAS + sigmoid table • … and parallelised with threads!
  • 35. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/ 1 thread 2 threads 3 threads 4 threads word/sec (x1000) 0 100 200 300 400 Original C Cython + BLAS + sigmoid table • … and parallelised with threads!
  • 36. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/ 1 thread 2 threads 3 threads 4 threads word/sec (x1000) 0 100 200 300 400 Original C Cython + BLAS + sigmoid table • … and parallelised with threads!
  • 37. (d>>1) Gensim word2vec continued • Elman style RNN trained with SGD: 15,079×200 matrix on a 1M word corpus. • Baseline written by Tomas Mikolov in optimised C. • Rewritten by Radim Rehurec in python. • Optimised by Radim Rehurec using Cython, BLAS… Source: http://rare-technologies.com/parallelizing-word2vec-in-python/ 1 thread 2 threads 3 threads 4 threads word/sec (x1000) 0 100 200 300 400 Original C Cython + BLAS + sigmoid table • … and parallelised with threads! 2.85x speedup
  • 38. (d>>1) Hogwild!on SAG • Fabian’s experimentation with Julia (lang). • Running SAG in parallel, without a lock.
  • 39. (d>>1) Hogwild!on SAG • Fabian’s experimentation with Julia (lang). • Running SAG in parallel, without a lock. • Very nice speed up!!!
  • 40. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … … … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) Et cetera…
  • 41. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 1 … … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) Et cetera…
  • 42. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 2 … job 1 … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) Et cetera…
  • 43. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 3 … job 1 job 2 … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) Et cetera…
  • 44. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 3 job 4 … job 1 job 2 … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) Et cetera…
  • 45. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 3 job 4 job 5 … … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) done done Et cetera…
  • 46. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 5 … job 3 job 4 … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) done done Et cetera…
  • 47. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 5 … job 3 job 4 … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) done done Et cetera…
  • 48. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … job 5 … job 4 … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) done done done Et cetera…
  • 49. Data and does not fit in memory… Stream data from disk… … but you cannot read in parallel… Producer/Consumer pattern chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 … … job 4 job 5 … …thread 2" (consumer) thread 2" (consumer) thread 1" (producer) done done done Et cetera…
  • 50. How many consumers? It depends… ! • Gensim (R. Rehurec) • Saw the impact up to 4 consumers earlier
 • Vowpal Wabbit (J. Langford) • Claims no gain with more than 1 consumer! • 2’10’’ on my macbook pro for ~10GB and 50MM lines 
 (Criteo’s advertising dataset). ! • CNNs pre-processing (S. Dieleman) • Big impact with ?? (several) consumers! • Useful for data augmentation/preprocessing
  • 51. 5.3GB (~105MM lines) word count 0 55 110 165 220 Number of consumers 1 2 3 4 5 6 Word count java benchmark source: https://gist.github.com/nicomak/1d6561e6f71d936d3178 • Macbook pro 15’’ 2014 • `sudo purge`
  • 52. Outline • Performance (5-7mn) • Parallelism (5-7mn) • Scalability (7-10mn)
  • 53. Hardware trends: HDD Capacity(GB) 0 150 300 450 600 Timetoread(sec) 0 1 000 2 000 3 000 4 000 1979 1983 1993 1998 1999 2001 2003 2008 2011 Read full disk (sec.) Capacity (GB) Source: https://tylermuth.wordpress.com/2011/11/02 18
  • 54. Distributed computing 19 Scalability - A perspective on Big data
  • 55. Distributed computing 19 Scalability - A perspective on Big data
  • 56. Distributed computing 19 • Strong scaling: if you throw twice as many machines at the task, you solve it in half the time.
 Usually relevant when the task is CPU bound. Scalability - A perspective on Big data
  • 57. Distributed computing 19 • Strong scaling: if you throw twice as many machines at the task, you solve it in half the time.
 Usually relevant when the task is CPU bound. • Weak scaling: if the dataset is twice as big, throw twice as many machines at it to solve the task in constant time.
 Memory bound tasks… usually. Scalability - A perspective on Big data
  • 58. Distributed computing 19 • Strong scaling: if you throw twice as many machines at the task, you solve it in half the time.
 Usually relevant when the task is CPU bound. • Weak scaling: if the dataset is twice as big, throw twice as many machines at it to solve the task in constant time.
 Memory bound tasks… usually. Most “big data” problems are I/O bound. Hard to solve the task in an acceptable time independently of the size of the data (weak scaling). Scalability - A perspective on Big data
  • 60. Bring computation to data 20 Map-Reduce: Statistical query model
  • 61. Bring computation to data 20 Map-Reduce: Statistical query model the sum corresponds to a reduce operation
  • 62. Bring computation to data 20 Map-Reduce: Statistical query model f, the map function, is sent to every machine the sum corresponds to a reduce operation
  • 63. Bring computation to data 20 Map-Reduce: Statistical query model f, the map function, is sent to every machine the sum corresponds to a reduce operation • D. Caragea et al., A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees. Int. J. Hybrid Intell. Syst. 2004 • Chu et al., Map-Reduce for Machine Learning on Multicore. NIPS’06.
  • 64. Spark on Criteo’s data ! • Logistic regression trained with minibatch SGD" • 10GB of data (50MM lines). 
 Caveat: Quite small for a benchmark • Super linear strong scalability. 
 Not theoretically possible => small dataset + few instances saturate. Numberofcores 0 10 20 30 40 timeinsec. 0 325 650 975 1300 Number of AWS nodes 4 6 8 10 time (sec) #cores
  • 65. Spark on Criteo’s data ! • Logistic regression trained with minibatch SGD" • 10GB of data (50MM lines). 
 Caveat: Quite small for a benchmark • Super linear strong scalability. 
 Not theoretically possible => small dataset + few instances saturate. Numberofcores 0 10 20 30 40 timeinsec. 0 325 650 975 1300 Number of AWS nodes 4 6 8 10 time (sec) #cores Manual setup of the cluster was a bit painful…
  • 66. Software stack for big data 22
  • 67. Software stack for big data 22 Local Standalone YARNMESOS Cluster" manager
  • 68. Software stack for big data 22 Local Standalone YARNMESOS HDFS Tachyon Cassandra HBase Others… Cluster" manager Storage " layer
  • 69. Software stack for big data 22 Local Standalone YARNMESOS HDFS Tachyon Cassandra HBase Others… Spark" Memory-optimised execution engine Flink" Apache incubated excution engine. Hadoop MR 2" Cluster" manager Storage " layer Execution " layer
  • 70. Software stack for big data 22 Local Standalone YARNMESOS HDFS Tachyon Cassandra HBase Others… Spark" Memory-optimised execution engine Flink" Apache incubated excution engine. Hadoop MR 2" MLlib GraphX Streaming SQL/ Datafra me Cluster" manager Storage " layer Execution " layer Libraries
  • 71. Software stack for big data 22 Local Standalone YARNMESOS HDFS Tachyon Cassandra HBase Others… Spark" Memory-optimised execution engine Flink" Apache incubated excution engine. Hadoop MR 2" MLlib GraphX Streaming SQL/ Datafra me FlinkML Gelly (graph) TableAPI Batch Cluster" manager Storage " layer Execution " layer Libraries
  • 72. Software stack MESOS vs YARN 23
  • 73. Software stack MESOS vs YARN 23 • Standalone mode is fastest…
  • 74. Software stack MESOS vs YARN 23 • Standalone mode is fastest… • … but resources are requested for the entire job.
  • 75. Software stack MESOS vs YARN 23 • Standalone mode is fastest… • … but resources are requested for the entire job. Cluster management frameworks
  • 76. Software stack MESOS vs YARN 23 • Standalone mode is fastest… • … but resources are requested for the entire job. Cluster management frameworks
  • 77. Software stack MESOS vs YARN 23 • Standalone mode is fastest… • … but resources are requested for the entire job. Cluster management frameworks • Concurrent access (multiuser)
  • 78. Software stack MESOS vs YARN 23 • Standalone mode is fastest… • … but resources are requested for the entire job. Cluster management frameworks • Concurrent access (multiuser) • Hyperparameter tuning (multijob)
  • 79. Software stack MESOS vs YARN 23 • Standalone mode is fastest… • … but resources are requested for the entire job. Cluster management frameworks • Concurrent access (multiuser) • Hyperparameter tuning (multijob)
  • 80. Software stack MESOS vs YARN 23 • Standalone mode is fastest… • … but resources are requested for the entire job. Cluster management frameworks • Concurrent access (multiuser) • Hyperparameter tuning (multijob) Mesos YARN • Framework receive offers • Easy install on AWS, GCE • Lots of compatible frameworks: Spark, MPI, Cassandra, HDFS… • Mesosphere’s DCOS is really, really easy to use. • Frameworks make offers • Configuration hell (can be made easier with puppet/ ansible recipes • Several compatible frameworks: Spark, Flink, HDFS…
  • 81. Infrastructure stack • AWS = AWeSome • Basic instance with spot price ! ! ! ! • Graphical Network Designer
  • 82. Infrastructure stack • AWS = AWeSome • Basic instance with spot price ! ! ! ! • Graphical Network Designer
  • 83. Infrastructure stack • AWS = AWeSome • Basic instance with spot price ! ! ! ! • Graphical Network Designer 10 r2.2xlarge instances for (350GB mem. & 40 cores) 0.85$/hour
  • 84. Infrastructure stack • AWS = AWeSome • Basic instance with spot price ! ! ! ! • Graphical Network Designer
  • 90. Infrastructure stack VPC Subnets public/private Security rules Bootstrap config for master/slaves Network entry point
  • 93. What’s coming in the next few years ? BONUS