6. Goal:
Extend
ML
to
the
Big
Data
SeAng
Challenge:
ML
not
developed
with
scalability
in
mind
✦
Does
not
naturally
scale
/
leverage
distributed
compuOng
Machine
Learning
Big
Data
Distributed
CompuOng
7. Goal:
Extend
ML
to
the
Big
Data
SeAng
Challenge:
ML
not
developed
with
scalability
in
mind
✦
Does
not
naturally
scale
/
leverage
distributed
compuOng
Our
approach:
Divide-‐and-‐conquer
✦
Apply
exisOng
base
algorithms
to
subsets
of
data
and
combine
Machine
Learning
Big
Data
Distributed
CompuOng
8. Goal:
Extend
ML
to
the
Big
Data
SeAng
Challenge:
ML
not
developed
with
scalability
in
mind
✦
Does
not
naturally
scale
/
leverage
distributed
compuOng
Our
approach:
Divide-‐and-‐conquer
✦
Apply
exisOng
base
algorithms
to
subsets
of
data
and
combine
✓
✓
✓
Build
upon
exisOng
suites
of
ML
algorithms
Preserve
favorable
algorithm
properOes
Naturally
leverage
distributed
compuOng
Machine
Learning
Big
Data
Distributed
CompuOng
9. Goal:
Extend
ML
to
the
Big
Data
SeAng
Challenge:
ML
not
developed
with
scalability
in
mind
✦
Does
not
naturally
scale
/
leverage
distributed
compuOng
Our
approach:
Divide-‐and-‐conquer
✦
Apply
exisOng
base
algorithms
to
subsets
of
data
and
combine
✓
✓
✓
✦
Build
upon
exisOng
suites
of
ML
algorithms
Preserve
favorable
algorithm
properOes
Naturally
leverage
distributed
compuOng
E.g.,
✦
✦
✦
Machine
Learning
Big
Data
Matrix
factorizaOon
(DFC) [MTJ, NIPS11; TMMFJ, ICCV13]
[KTSJ, ICML12; KTSJ,
Assessing
esOmator
quality
(BLB) JRSS13; KTASJ, KDD13]
Genomic
Variant
Calling [BTTJPYS13, submitted, CTZFJP13, submitted]
Distributed
CompuOng
10. Goal:
Extend
ML
to
the
Big
Data
SeAng
Challenge:
ML
not
developed
with
scalability
in
mind
✦
Does
not
naturally
scale
/
leverage
distributed
compuOng
Our
approach:
Divide-‐and-‐conquer
✦
Apply
exisOng
base
algorithms
to
subsets
of
data
and
combine
✓
✓
✓
✦
Build
upon
exisOng
suites
of
ML
algorithms
Preserve
favorable
algorithm
properOes
Naturally
leverage
distributed
compuOng
E.g.,
✦
✦
✦
Machine
Learning
Big
Data
Matrix
factorizaOon
(DFC) [MTJ, NIPS11; TMMFJ, ICCV13]
[KTSJ, ICML12; KTSJ,
Assessing
esOmator
quality
(BLB) JRSS13; KTASJ, KDD13]
Genomic
Variant
Calling [BTTJPYS13, submitted, CTZFJP13, submitted]
Distributed
CompuOng
16. Matrix
CompleOon
Goal: Recover a matrix from a
subset of its entries
Can we do this at scale?
✦
✦
✦
✦
✦
Netflix: 30M users, 100K+ videos
Facebook: 1B users
Pandora: 70M active users, 1M songs
Amazon: Millions of users and products
...
18. Reducing
Degrees
of
Freedom
✦
Problem: Impossible without
additional information
✦
mn degrees of freedom
n
m
19. Reducing
Degrees
of
Freedom
✦
Problem: Impossible without
additional information
✦
✦
mn degrees of freedom
Solution: Assume small # of
factors determine preference
n
m
r
=m
n
r
‘Low-rank’
20. Reducing
Degrees
of
Freedom
✦
Problem: Impossible without
additional information
✦
✦
mn degrees of freedom
Solution: Assume small # of
factors determine preference
✦
O(m + n) degrees of freedom
✦
Linear storage costs
n
m
r
=m
n
r
‘Low-rank’
25. Bad
InformaOon
Spread
✦
Problem:
Other
raOngs
don’t
inform
us
about
missing
raOng
✦
SoluOon:
Assume
incoherence
with
standard
basis [Candes and Recht, 2009]
bad
spread
of
informaOon
34. Divide-‐Factor-‐Combine
(DFC)
[MTJ, NIPS11]
✦
D
step:
Divide
input
matrix
into
submatrices
✦
F
step:
Factor
in
parallel
using
a
base
MC
algorithm
✦
C
step:
Combine
submatrix
esOmates
Advantages:
✦
Submatrix
factorizaOon
is
much
cheaper
and
easily
parallelized
✦
Minimal
communicaOon
between
parallel
jobs
✦
Retains
comparable
recovery
guarantees
(with
proper
choice
of
division
/
combinaOon
strategies)
36. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
37. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
✦
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
38. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
✦
✦
Roughly,
share
informaOon
across
sub-‐soluOons
39. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
✦
✦
Roughly,
share
informaOon
across
sub-‐soluOons
✦
Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons
40. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
✦
✦
Roughly,
share
informaOon
across
sub-‐soluOons
✦
Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons
=
41. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
✦
✦
Roughly,
share
informaOon
across
sub-‐soluOons
✦
Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons
=
42. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
✦
✦
Roughly,
share
informaOon
across
sub-‐soluOons
✦
Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons
=
43. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
✦
✦
Roughly,
share
informaOon
across
sub-‐soluOons
✦
Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons
=
=
44. DFC-‐Proj
✦
D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:
✦
F
step:
Complete
the
submatrices
in
parallel
✦
Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
✦
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC
C
step:
Project
onto
single
low-‐dimensional
column
space
✦
✦
✦
✦
Roughly,
share
informaOon
across
sub-‐soluOons
Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons
Ensemble: Project onto column space of each sub-solution and
average
45. Does
It
Work?
Yes,
with
high
probability.
Theorem:
Assume:
✦ L
0
is
low-‐rank
and
incoherent,
✦
˜
entries
sampled
uniformly
at
random,
⌦(r(n + m))
✦
Nuclear
norm
heurisOc
is
base
algorithm.
46. Does
It
Work?
Yes,
with
high
probability.
Theorem:
Assume:
✦ L
0
is
low-‐rank
and
incoherent,
✦
˜
entries
sampled
uniformly
at
random,
⌦(r(n + m))
✦
Nuclear
norm
heurisOc
is
base
algorithm.
ˆ
Then
L
=
L0
with
(slightly
less)
high
probability.
47. Does
It
Work?
Yes,
with
high
probability.
Theorem:
Assume:
✦ L
0
is
low-‐rank
and
incoherent,
✦
˜
entries
sampled
uniformly
at
random,
⌦(r(n + m))
✦
Nuclear
norm
heurisOc
is
base
algorithm.
ˆ
Then
L
=
L0
with
(slightly
less)
high
probability.
✦
Noisy
seang:
(2
✏)
approximaOon
of
original
bound
+
✦
Can
divide
into
an
increasing
number
of
subproblems
˜
(
t
!
1
)
when
number
of
observed
entries
in ! (r2 (n + m))
58. Video
Surveillance
✦
Goal:
separate
foreground
from
background
✦
✦
✦
Store
video
as
matrix
Low-rank
=
background
Outliers
=
movement
Original
Frame
59. Video
Surveillance
✦
Goal:
separate
foreground
from
background
✦
✦
✦
Store
video
as
matrix
Low-rank
=
background
Outliers
=
movement
Original
Frame
Nuclear
Norm
(342.5s)
60. Video
Surveillance
✦
Goal:
separate
foreground
from
background
✦
✦
✦
Store
video
as
matrix
Low-rank
=
background
Outliers
=
movement
Original
Frame
Nuclear
Norm
(342.5s)
DFC-‐5%
(24.2s)
DFC-‐0.5%
(5.2s)
74. MoOvaOon:
Face
images
Subspace
Segmenta5on
=
In
+
‘noise’
Low-rank
✦
Model
images
of
five
people
via
five
low-‐dimensional
subspaces
75. MoOvaOon:
Face
images
Subspace
Segmenta5on
=
In
+
‘noise’
Low-rank
✦
Model
images
of
five
people
via
five
low-‐dimensional
subspaces
✦
Recover
subspaces
cluster
images
76. MoOvaOon:
Face
images
Subspace
Segmenta5on
=
In
✦
+
‘noise’
Low-rank
Nuclear
norm
heurisOc
to
provably
recovers
subspaces
✦ Guarantees
are
preserved
with
DFC [TMMFJ, ICCV13]
77. MoOvaOon:
Face
images
Subspace
Segmenta5on
=
In
+
‘noise’
Low-rank
✦
Toy
Experiment:
IdenOfy
images
corresponding
to
same
person
(10
people,
640
images)
✦
DFC
Results:
Linear
speedup,
State-‐of-‐the-‐art
accuracy
80. Video
Event
DetecOon
✦
✦
✦
Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦
Featurize
each
video
81. Video
Event
DetecOon
✦
✦
✦
Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦
✦
Featurize
each
video
Learn
video
clusters
via
nuclear
norm
heurisOc
82. Video
Event
DetecOon
✦
✦
✦
Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦
✦
✦
Featurize
each
video
Learn
video
clusters
via
nuclear
norm
heurisOc
Given
labeled
nodes
and
cluster
structure,
make
predicOons
83. Video
Event
DetecOon
✦
✦
✦
Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦
✦
✦
Featurize
each
video
Learn
video
clusters
via
nuclear
norm
heurisOc
Given
labeled
nodes
and
cluster
structure,
make
predicOons
Can
do
this
at
scale
with
DFC!
84. DFC
Summary
✦
DFC:
distributed
framework
for
matrix
factorizaOon
✦ Similar
recovery
guarantees
✦ Significant
speedups
✦
DFC
applied
to
3
classes
of
problems:
✦ Matrix
compleOon
✦ Robust
matrix
factorizaOon
✦ Subspace
recovery
✦
Extend
DFC
to
other
MF
methods,
e.g.,
ALS,
SGD?
85. Big
Data
and
Distributed
CompuOng
are
valuable
resources,
but
...
86. Big
Data
and
Distributed
CompuOng
are
valuable
resources,
but
...
✦
Challenge
1:
ML
not
developed
with
scalability
in
mind
87. Big
Data
and
Distributed
CompuOng
are
valuable
resources,
but
...
✦
Challenge
1:
ML
not
developed
with
scalability
in
mind
Divide-‐and-‐Conquer
(e.g.,
DFC)
88. Big
Data
and
Distributed
CompuOng
are
valuable
resources,
but
...
✦
Challenge
1:
ML
not
developed
with
scalability
in
mind
Divide-‐and-‐Conquer
(e.g.,
DFC)
✦
Challenge
2:
ML
not
developed
with
ease-‐of-‐use
in
mind
89. Big
Data
and
Distributed
CompuOng
are
valuable
resources,
but
...
✦
Challenge
1:
ML
not
developed
with
scalability
in
mind
ML base
ML base
Divide-‐and-‐Conquer
(e.g.,
DFC)
ML base
ML base
✦
Challenge
2:
ML
not
developed
with
ease-‐of-‐use
in
mind
ML base
ML base
www.mlbase.org
ML base