1. Streaming
Exa-‐scale
Data
over
100Gbps
Networks
Mehmet
Balman
Scien.fic
Data
Management
Group
Computa.onal
Research
Division
Lawrence
Berkeley
Na.onal
Laboratory
3. Applications’
Perspective
• Increasing
the
bandwidth
is
not
sufficient
by
itself;
we
need
careful
evaluaLon
of
high-‐bandwidth
networks
from
the
applicaLons’
perspecLve.
• Data
distribu.on
for
climate
science
•
How
scien*fic
data
movement
and
analysis
between
geographically
disparate
supercompu*ng
facili*es
can
benefit
from
high-‐bandwidth
networks?
4. Climate
Data
over
100Gbps
• Data
volume
in
climate
applicaLons
is
increasing
exponenLally.
• An
important
challenge
in
managing
ever
increasing
data
sizes
in
climate
science
is
the
large
variance
in
file
sizes.
• Climate
simulaLon
data
consists
of
a
mix
of
relaLvely
small
and
large
files
with
irregular
file
size
distribuLon
in
each
dataset.
• Many
small
files
5. Keep
the
data
channel
full
FTP
RPC
request a file
request a file
send file
send file
request
data
send data
• Concurrent
transfers
• Parallel
streams
9. Advantages
of
MemzNet
• Decoupling
I/O
and
network
operaLons
• front-‐end
(I/O
processing)
• back-‐end
(networking
layer)
• Not
limited
by
the
characterisLcs
of
the
file
sizes
On
the
fly
tar
approach,
bundling
and
sending
many
files
together
• Dynamic
data
channel
management
Can
increase/decrease
the
parallelism
level
both
in
the
network
communicaLon
and
I/O
read/write
operaLons,
without
closing
and
reopening
the
data
channel
connecLon
(as
is
done
in
regular
FTP
variants).
10. ANI
100Gbps
testbed
ANI 100G
Router
nersc-diskpt-2
nersc-diskpt-3
nersc-diskpt-1
nersc-C2940
switch
4x10GE (MM)
4x 10GE (MM)
Site Router
(nersc-mr2)
anl-mempt-2
anl-mempt-1
anl-app
nersc-app
NERSC ANL
Updated December 11, 2011
ANI Middleware Testbed
ANL Site
Router
4x10GE (MM)
4x10GE (MM)
100G
100G
1GE
1 GE
1 GE
1 GE
1GE
1 GE
1 GE
1 GE
10G
10G
To ESnet
ANI 100G
Router
4x10GE (MM)
100G
100G
ANI 100G Network
anl-mempt-1 NICs:
2: 2x10G Myricom
anl-mempt-2 NICs:
2: 2x10G Myricom
nersc-diskpt-1 NICs:
2: 2x10G Myricom
1: 4x10G HotLava
nersc-diskpt-2 NICs:
1: 2x10G Myricom
1: 2x10G Chelsio
1: 6x10G HotLava
nersc-diskpt-3 NICs:
1: 2x10G Myricom
1: 2x10G Mellanox
1: 6x10G HotLava
Note: ANI 100G routers and 100G wave available till summer 2012;
Testbed resources after that subject funding availability.
nersc-asw1
anl-C2940
switch
1 GE
anl-asw1
1 GE
To ESnet
eth0
eth0
eth0
eth0
eth0
eth0
eth2-5
eth2-5
eth2-5
eth2-5
eth2-5
eth0
anl-mempt-3
4x10GE (MM)
eth2-5 eth0
1 GE
anl-mempt-3 NICs:
1: 2x10G Myricom
1: 2x10G Mellanox
4x10GE (MM)
10GE (MM)
10GE (MM)
SC11
100Gbps
demo
11. Disadvantage
of
many
TCP
Streams
(a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single
NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps
pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a
different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).
12. ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16,
32 64 concurrent jobs - 5min intervals], TCP buffer size is 50M
Effects
of
many
streams
13. MemzNet’s
Performance
TCP
buffer
size
is
set
to
50MB
MemzNetGridFTP
SC11 demo
ANI Testbed
15. Acknowledgements
Eric
Pouyoul,
Yushu
Yao,
E.
Wes
Bethel,
Burlen
Loring,
Prabhat,
John
Shalf,
Alex
Sim,
Brian
L.
Tierney,
Peter
Nugent,
Zarija
Lukic
,
Patrick
Dorn,
Evangelos
Chaniotakis,
John
Christman,
Chin
Guok,
Chris
Tracy,
Lauren
Rotman,
Jason
Lee,
Shane
Canon,
Tina
Declerck,
Cary
Whitney,
Ed
Holohan,
Adam
Scovel,
Linda
Winkler,
Jason
Hill,
Doug
Fuller,
Susan
Hicks,
Hank
Childs,
Mark
Howison,
Aaron
Thomas,
John
Dugan,
Gopal
Vaswani