Contenu connexe
Similaire à Cmu 2011 09.pptx
Similaire à Cmu 2011 09.pptx (20)
Plus de MapR Technologies
Plus de MapR Technologies (20)
Cmu 2011 09.pptx
- 1. 10/11/11
©
MapR
Confiden0al
1
MapR,
Implica0ons
for
Integra0on
CMU
–
September
2011
- 2. 10/11/11
©
MapR
Confiden0al
2
Outline
• MapR
system
overview
• Map-‐reduce
review
• MapR
architecture
• Performance
Results
• Map-‐reduce
on
MapR
• Architectural
implica0ons
• Search
indexing
/
deployment
• EM
algorithm
for
machine
learning
• …
and
more
…
- 3. 10/11/11
©
MapR
Confiden0al
3
Map-‐Reduce
!"
!"
!#
!#
$%&'()"*" +,&)!'%-(./%0)
"*#
12'!!3)"*4 536'-3)
!'%-(./%0)
"*7
8'(&'()"930)
"*:
@/-,9)
A.0B
@/-,9)
A.0B
!"!#
Input
Output
Shuffle
- 4. 10/11/11
©
MapR
Confiden0al
4
BoQlenecks
and
Issues
• Read-‐only
files
• Many
copies
in
I/O
path
• Shuffle
based
on
HTTP
• Can’t
use
new
technologies
• Eats
file
descriptors
• Spills
go
to
local
file
space
• Bad
for
skewed
distribu0on
of
sizes
- 5. 10/11/11
©
MapR
Confiden0al
5
MapR
Areas
of
Development
Map
Reduce
Storage
Services
Ecosystem
HBase
Management
- 6. 10/11/11
©
MapR
Confiden0al
6
MapR
Improvements
• Faster
file
system
• Fewer
copies
• Mul0ple
NICS
• No
file
descriptor
or
page-‐buf
compe00on
• Faster
map-‐reduce
• Uses
distributed
file
system
• Direct
RPC
to
receiver
• Very
wide
merges
- 7. 10/11/11
©
MapR
Confiden0al
7
MapR
Innova0ons
• Volumes
• Distributed
management
• Data
placement
• Read/write
random
access
file
system
• Allows
distributed
meta-‐data
• Improved
scaling
• Enables
NFS
access
• Applica0on-‐level
NIC
bonding
• Transac0onally
correct
snapshots
and
mirrors
- 8. 10/11/11
©
MapR
Confiden0al
8
MapR's
Containers
l Each
container
contains
l Directories
&
files
l Data
blocks
l Replicated
on
servers
l No
need
to
manage
directly
Files/directories
are
sharded
into
blocks,
which
are
placed
into
mini
NNs
(containers
)
on
disks
Containers
are
16-‐32
GB
segments
of
disk,
placed
on
nodes
- 9. 10/11/11
©
MapR
Confiden0al
9
MapR's
Containers
l Each
container
has
a
replica0on
chain
l Updates
are
transac0onal
l Failures
are
handled
by
rearranging
replica0on
- 10. 10/11/11
©
MapR
Confiden0al
10
Container
loca0ons
and
replica0on
CLDB
N1,
N2
N3,
N2
N1,
N2
N1,
N3
N3,
N2
N1
N2
N3
Container
loca0on
database
(CLDB)
keeps
track
of
nodes
hos0ng
each
container
and
replica0on
chain
order
- 11. 10/11/11
©
MapR
Confiden0al
11
MapR
Scaling
Containers
represent
16
-‐
32GB
of
data
l Each
can
hold
up
to
1
Billion
files
and
directories
l 100M
containers
=
~
2
Exabytes
(a
very
large
cluster)
250
bytes
DRAM
to
cache
a
container
l 25GB
to
cache
all
containers
for
2EB
cluster
- But
not
necessary,
can
page
to
disk
l Typical
large
10PB
cluster
needs
2GB
Container-‐reports
are
100x
-‐
1000x
<
HDFS
block-‐reports
l Serve
100x
more
data-‐nodes
l Increase
container
size
to
64G
to
serve
4EB
cluster
l Map/reduce
not
affected
- 12. 10/11/11
©
MapR
Confiden0al
12
MapR's
Streaming
Performance
Read Write
0
250
500
750
1000
1250
1500
1750
2000
2250
Read Write
0
250
500
750
1000
1250
1500
1750
2000
2250
Hardware
MapR
HadoopMB
per
sec
Tests:
i.
16
streams
x
120GB
ii.
2000
streams
x
1GB
11
x
7200rpm
SATA
11
x
15Krpm
SAS
Higher
is
be;er
- 13. 10/11/11
©
MapR
Confiden0al
13
Terasort
on
MapR
1.0
TB
0
10
20
30
40
50
60
3.5
TB
0
50
100
150
200
250
300
MapR
Hadoop
Elapsed
=me
(mins)
10+1
nodes:
8
core,
24GB
DRAM,
11
x
1TB
SATA
7200
rpm
Lower
is
be;er
- 14. 10/11/11
©
MapR
Confiden0al
14
HBase
on
MapR
Records
per
second
Higher
is
be;er
0
5000
10000
15000
20000
25000
Zipfian
Uniform
MapR
Apache
YCSB
Random
Read
with
1
billion
1K
records
10+1
node
cluster:
8
core,
24GB
DRAM,
11
x
1TB
7200
RPM
- 15. 10/11/11
©
MapR
Confiden0al
15
#
of
files
(m)
Rate(files/sec)
Op:
-‐
create
file
-‐
write
100
bytes
-‐
close
Notes:
-‐
NN
not
replicated
-‐
NN
uses
20G
DRAM
-‐
DN
uses
2G
DRAM
Out
of
box
Tuned
Small
Files
(Apache
Hadoop,
10
nodes)
- 16. 10/11/11
©
MapR
Confiden0al
16
MUCH
faster
for
some
opera0ons
#
of
files
(millions)
Create
Rate
Same
10
nodes
…
- 17. 10/11/11
©
MapR
Confiden0al
17
What
MapR
is
not
• Volumes
!=
federa0on
• MapR
supports
>
10,000
volumes
all
with
independent
placement
and
defaults
• Volumes
support
snapshots
and
mirroring
• NFS
!=
FUSE
• Checksum
and
compress
at
gateway
• IP
fail-‐over
• Read/write/update
seman0cs
at
full
speed
• MapR
!=
maprfs
- 19. 10/11/11
©
MapR
Confiden0al
19
Alterna0ve
NFS
moun0ng
models
• Export
to
the
world
• NFS
gateway
runs
on
selected
gateway
hosts
• Local
server
• NFS
gateway
runs
on
local
host
• Enables
local
compression
and
check
summing
• Export
to
self
• NFS
gateway
runs
on
all
data
nodes,
mounted
from
localhost
- 20. 10/11/11
©
MapR
Confiden0al
20
Export
to
the
world
NFS
Server
NFS
Server
NFS
Server
NFS
Server
NFS
Client
- 21. 10/11/11
©
MapR
Confiden0al
21
Client
NFS
Server
Local
server
Applica0on
Cluster
Nodes
- 22. 10/11/11
©
MapR
Confiden0al
22
Cluster
Node
NFS
Server
Universal
export
to
self
Task
Cluster
Nodes
- 23. 10/11/11
©
MapR
Confiden0al
23
Cluster
Node
NFS
Server
Task
Cluster
Node
NFS
Server
Task
Cluster
Node
NFS
Server
Task
Nodes
are
iden0cal
- 24. 10/11/11
©
MapR
Confiden0al
24
Applica0on
architecture
• High
performance
map-‐reduce
is
nice
• But
algorithmic
flexibility
is
even
nicer
- 25. 10/11/11
©
MapR
Confiden0al
25
Sharded
text
Indexing
Map
Reducer
Input
documents
Local
disk
Search
Engine
Local
disk
Clustered
index
storage
Assign
documents
to
shards
Index
text
to
local
disk
and
then
copy
index
to
distributed
file
store
Copy
to
local
disk
typically
required
before
index
can
be
loaded
- 26. 10/11/11
©
MapR
Confiden0al
26
Sharded
text
indexing
• Mapper
assigns
document
to
shard
• Shard
is
usually
hash
of
document
id
• Reducer
indexes
all
documents
for
a
shard
• Indexes
created
on
local
disk
• On
success,
copy
index
to
DFS
• On
failure,
delete
local
files
• Must
avoid
directory
collisions
• can’t
use
shard
id!
• Must
manage
and
reclaim
local
disk
space
- 27. 10/11/11
©
MapR
Confiden0al
27
Conven0onal
data
flow
Map
Reducer
Input
documents
Local
disk
Search
Engine
Local
disk
Clustered
index
storage
Failure
of
a
reducer
causes
garbage
to
accumulate
in
the
local
disk
Failure
of
search
engine
requires
another
download
of
the
index
from
clustered
storage.
- 28. 10/11/11
©
MapR
Confiden0al
28
Search
Engine
Simplified
NFS
data
flows
Map
Reducer
Input
documents
Clustered
index
storage
Failure
of
a
reducer
is
cleaned
up
by
map-‐reduce
framework
Search
engine
reads
mirrored
index
directly.
Index
to
task
work
directory
via
NFS
- 29. 10/11/11
©
MapR
Confiden0al
29
Simplified
NFS
data
flows
Map
Reducer
Input
documents
Search
Engine
Mirrors
Search
Engine
Mirroring
allows
exact
placement
of
index
data
Aribitrary
levels
of
replica0on
also
possible
- 30. 10/11/11
©
MapR
Confiden0al
30
How
about
another
one?
- 31. 10/11/11
©
MapR
Confiden0al
31
K-‐means
• Classic
E-‐M
based
algorithm
• Given
cluster
centroids,
• Assign
each
data
point
to
nearest
centroid
• Accumulate
new
centroids
• Rinse,
lather,
repeat
- 32. 10/11/11
©
MapR
Confiden0al
32
Aggregate
new
centroids
K-‐means,
the
movie
Assign
to
Nearest
centroid
Centroids
I
n
p
u
t
- 34. 10/11/11
©
MapR
Confiden0al
34
Average
models
Parallel
Stochas0c
Gradient
Descent
Train
sub
model
Model
I
n
p
u
t
- 35. 10/11/11
©
MapR
Confiden0al
35
Update
model
Varia0onal
Dirichlet
Assignment
Gather
sufficient
sta0s0cs
Model
I
n
p
u
t
- 36. 10/11/11
©
MapR
Confiden0al
36
Old
tricks,
new
dogs
• Mapper
• Assign
point
to
cluster
• Emit
cluster
id,
(1,
point)
• Combiner
and
reducer
• Sum
counts,
weighted
sum
of
points
• Emit
cluster
id,
(n,
sum/n)
• Output
to
HDFS
Read
from
HDFS
to
local
disk
by
distributed
cache
WriQen
by
map-‐reduce
Read
from
local
disk
from
distributed
cache
- 37. 10/11/11
©
MapR
Confiden0al
37
Old
tricks,
new
dogs
• Mapper
• Assign
point
to
cluster
• Emit
cluster
id,
(1,
point)
• Combiner
and
reducer
• Sum
counts,
weighted
sum
of
points
• Emit
cluster
id,
(n,
sum/n)
• Output
to
HDFS
MapR
FS
Read
from
NFS
WriQen
by
map-‐reduce
- 38. 10/11/11
©
MapR
Confiden0al
38
Poor
man’s
Pregel
• Mapper
• Lines
in
bold
can
use
conven0onal
I/O
via
NFS
38
while not done:!
read and accumulate input models!
for each input:!
accumulate model!
write model!
synchronize!
reset input format!
emit summary!
- 39. 10/11/11
©
MapR
Confiden0al
39
Click
modeling
architecture
Feature
extrac0on
and
down
sampling
I
n
p
u
t
Side-‐data
Data
join
Sequen0al
SGD
Learning
Map-‐reduce
Now
via
NFS
- 40. 10/11/11
©
MapR
Confiden0al
40
Click
modeling
architecture
Map-‐reduce
Map-‐reduce
Feature
extrac0on
and
down
sampling
I
n
p
u
t
Side-‐data
Data
join
Sequen0al
SGD
Learning
Map-‐reduce
cooperates
with
NFS
Sequen0al
SGD
Learning
Sequen0al
SGD
Learning
Sequen0al
SGD
Learning
- 42. 10/11/11
©
MapR
Confiden0al
42
??
Hybrid
model
flow
Map-‐reduce
Map-‐reduce
Feature
extrac0on
and
down
sampling
SVD
(PageRank)
(spectral)
Deployed
Model
Down
stream
modeling
- 44. 10/11/11
©
MapR
Confiden0al
44
Map-‐reduce
Sequen0al
Hybrid
model
flow
Feature
extrac0on
and
down
sampling
SVD
(PageRank)
(spectral)
Deployed
Model
Down
stream
modeling
- 45. 10/11/11
©
MapR
Confiden0al
45
And
visualiza0on…
- 46. 10/11/11
©
MapR
Confiden0al
46
Trivial
visualiza0on
interface
• Map-‐reduce
output
is
visible
via
NFS
• Legacy
visualiza0on
just
works
$ R!
> x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)!
> plot(error ~ t, x)!
> q(save=‘n’)!
- 47. 10/11/11
©
MapR
Confiden0al
47
Conclusions
• We
used
to
know
all
this
• Tab
comple0on
used
to
work
• 5
years
of
work-‐arounds
have
clouded
our
memories
• We
just
have
to
remember
the
future